? (?) 00:00.310
does AMI like what products if any does AMI plan to to produce or make is it research or more than that
Yann LeCun (Chief AI Scientist) 00:09.350
no it's more than that it's actual products OK but you know what things i have to do with with you know what models and you know planning and and basically we have the ambition of becoming kind of one of the main suppliers of intelligent systems down the line we think the the
Yann LeCun (Chief AI Scientist) 00:28.990
current architectures that are employed you know at adams or you know agent systems that are based on NLM's umm work OK for language even agenting systems really don't work very well they require a lot of data to basically clone the behavior of humans and they're not that
Yann LeCun (Chief AI Scientist) 00:46.750
reliable so we think the proper way to handle this and i've been saying this for almost ten years now is have role models that are capable of predicting what would be the consequence or the consequences of an action or a sequence of actions that an AI system might take and then
Yann LeCun (Chief AI Scientist) 01:07.510
the system arrives at a sequence of actions or an output by optimization by figuring out what sequence of actions will optimally accomplish a test you know setting for myself that's planning OK so i think in the central part of intelligence is being able to predict the
Yann LeCun (Chief AI Scientist) 01:27.040
consequences of your actions and then use them for planning and that's what we're that's what i've been working on for many years we've been making fast progress with you know a combination of projects here at NYU and also at meta and now it's time to basically make it make it
Yann LeCun (Chief AI Scientist) 01:46.990
real
Ravid Shwartz-Ziv (Assistant Professor) 01:49.160
and what do you think are the missing parts like and why you think it's taking so long because you're talking about it as you said like for many years already but it's still not better than LLM
Yann LeCun (Chief AI Scientist) 02:02.150
right it's not the same thing as LLM right it's designed to handle modalities that are high dimensional continuous and noisy and LNM's completely suck at this like they really do not work right if you try to train an LLM to kind of learn good representations of images or video
Yann LeCun (Chief AI Scientist) 02:21.560
they're really not that great you know generally vision capabilities for for AI AI systems right are trained separately they're not part of the whole LLM thing so yeah if you want to handle data that is high dimensional continuous and noisy you cannot use generative models you
Yann LeCun (Chief AI Scientist) 02:46.200
can certainly not use generative models that tokenize your data into kind of discrete symbols OK it's just no way and we have a lot of empirical evidence that this simply doesn't work very well what does work is learning an abstract representation space that illuminates a lot of
Yann LeCun (Chief AI Scientist) 03:02.670
details about the input essentially all these details that are not predictable which includes noise and make predictions in that representation space and this is the idea of jetpack johnson then in predictive architectures which you know you are as familiar to yeah
Ravid Shwartz-Ziv (Assistant Professor) 03:19.430
with sorry because you worked on this yeah so also randall was a hosted in the past in in the podcast i probably talked about this at length
Yann LeCun (Chief AI Scientist) 03:31.310
so so there's a lot of ideas around this and let me tell you my history around this OK i have been convinced for a long time probably the better part of twenty years that the the proper way to building intelligent systems was through some form of unsupervised learning i started
Yann LeCun (Chief AI Scientist) 03:51.710
working on unsupervised learning as the basis for you know making progress in the early two thousands mid two thousands before that i wasn't so convinced this was the way to go and and basically this was the idea of you know training auto encoders to learn representations right
Yann LeCun (Chief AI Scientist) 04:12.430
so you have an input you run into an encoder it finds a representation of it and then you decode so you guarantee that the representation contains all the information about the input does that that iteration is long like insisting that the representation contains all the
Yann LeCun (Chief AI Scientist) 04:26.910
information about the input is a bad idea OK i didn't know this at the time so what we worked on was you have several ways of doing this you know jeff hinton at the time was working on restricted boss machines joshua benji was working on the noisy autoeurs which actually became
Yann LeCun (Chief AI Scientist) 04:43.480
quite successful in different contexts right for NLP among others and i was working on sparse auto encoders so basically you know if you train on auto encoder you need to recognize the the representation so that the autoencoder does not trivially learn an identity function and
Yann LeCun (Chief AI Scientist) 04:58.550
this is the information bottleneck podcast listens about information bottleneck right you need to create an information bottleneck to limit the information content of the representation and i thought high dimensional sparse representations was actually a good way to go so so a
Yann LeCun (Chief AI Scientist) 05:15.190
bunch of my students did their PH D on this corelu who is not a chief AI architect at alphabet and also the CTO actually did this finished here on this with me and you know a few a few other a few other false macro on zetto and eden bro and a few others so this was kind of the
Yann LeCun (Chief AI Scientist) 05:37.080
idea and then as it turned out and the idea the reason why we worked on this was because we wanted to pre train very deep neural nets by pre training in those things that auto encoders we thought that was the way to go what happened though was that we started like you know
Yann LeCun (Chief AI Scientist) 05:55.070
experimenting with things like normalization rectification instead of hyperbole tangential to sigmoid like radius that ended up you know basically allowing us to train fairly deep network completely supervised so self supervised learning and this was at the same time that data
Yann LeCun (Chief AI Scientist) 06:16.790
set started to get bigger and so it turned out like you know supervisor it worked fine so the whole idea of self supervised or unsupervised learning was put put aside and then came resnet and you know that's a completely solved the problem of training very deep architecture
Yann LeCun (Chief AI Scientist) 06:32.400
that's right in twenty fifteen but then in twenty fifteen i started you know thinking again about like how do we push towards like human level AI which really was the original objective of fair really and my objective my life 's you know mission and realized that you know all
Yann LeCun (Chief AI Scientist) 06:51.110
the approaches of reinforcement learning and things of that type were basically not scaling you know reinforcement learning is incredibly inefficient in terms of samples and so this is not the way to go and and so the idea of world models right the system that can predict the
Yann LeCun (Chief AI Scientist) 07:09.400
consequence consequences of its action they can plan i started researching playing with this around twenty fifteen sixteen my keynote at what was still called nips at the time in twenty sixteen was on role model i was arguing for it those basically the centerpiece of my talk was
Yann LeCun (Chief AI Scientist) 07:28.590
like this is what we should be working on like you know grandma was action condition and if your residents are working on this on video prediction and things like that we had some papers on video prediction in twenty sixteen and i made a the same mistake as before and the same
Yann LeCun (Chief AI Scientist) 07:47.230
mistake that everybody is doing at the moment which is training a video prediction system to predict at the pixel level which is really impossible and you can't really represent useful probability distributions on the space of video frames those things don't work i knew for a
Yann LeCun (Chief AI Scientist) 08:07.990
fact that because the prediction was nondeterministic we had to have a model with latent variables to represent all the stuff you don't know about the variable you're supposed to predict and so we experimented with this for years i had a student here who is now a scientist at
Yann LeCun (Chief AI Scientist) 08:25.190
fair michael enough who developed a video prediction system with latent variables and he kind of solved this problems we're facing slightly i mean today the solution that a lot of people are employing is diffusion models which is a way to train in non deterministic function
Yann LeCun (Chief AI Scientist) 08:44.710
essentially or energy based models which have been advocating for decades now which also is another way of training non deterministic functions but in the end i discovered that this was all about the idea that the really the way to get around the fact that you can't predict that
Yann LeCun (Chief AI Scientist) 09:03.240
the pixel level is to just not predict the pixel level is to another representation and predict that the representation level eliminating all the details you cannot predict and i wasn't really thinking about those methods early on because i thought there was a huge problem of
Yann LeCun (Chief AI Scientist) 09:24.640
preventing collapse so i'm sure randall talked about this but you know when you train let's say you have an observed variable X and you're trying to predict a variable Y but you don't want to predict all the details right to run X and Y to encoders so now you have both a
Yann LeCun (Chief AI Scientist) 09:40.510
representation for's for X SX and representation for Y S Y you can train a predictor to produce you know predict the representation of Y from the representation of X but if you want to train this whole thing end to end simultaneously this is a trivial solution where the system
Yann LeCun (Chief AI Scientist) 09:57.950
ignores the input and produces constant representations and the predictors probably know is trivial right so if you're on the criterion to train the system is minimized dictionary it's not going to work it's going to collapse i knew about this problem for a very long time
Yann LeCun (Chief AI Scientist) 10:14.110
because i worked on joint embedding architectures we used to call them siamese networks back in the nineties