? (?) 00:00.110
those are the same because people have been using that term sign means networks even recently
Yann LeCun (Chief AI Scientist) 00:05.510
that's right i mean the concept is still you know up to date right so you have you have an X and Y then think of the X as some sort of degraded transformed or corrupted version of Y OK you run both X and Y two encoders and you tell the system look X and Y really are two views of
Yann LeCun (Chief AI Scientist) 00:24.150
the same thing presentation you compute should be the same right so if you just train a neural net you know two neural nets with shared weights right to produce the same representation for slightly different versions of the same object view whatever it is it collapses it doesn't
Yann LeCun (Chief AI Scientist) 00:44.670
produce anything useful so you have to find a way to make sure that the system you know extract as much information from the input as possible and the original idea that we had you know it was a newspaper from nineteen ninety three with simmons net was to have a contrastive term
Yann LeCun (Chief AI Scientist) 01:00.950
right so you have other pairs of samples that you know are different and you train the system to produce different representations so you have a cost function that attracts the two representations when you show two examples that are identified or similar and you repel them when
Yann LeCun (Chief AI Scientist) 01:16.230
you show it two examples that are dissimilar and we came up with this idea because someone came to us and said like can you encode signatures of someone you know drawing a signature on the tablet can you encode this on less than eighty bytes because if you can encode it in less
Yann LeCun (Chief AI Scientist) 01:33.630
than eighty bytes we can write it on the magnetic tape of a credit card so we can do signature application for credit cards right and so we give up this idea i came up with this idea of training a neural net to produce ad variables that will quantize one by each and then
Yann LeCun (Chief AI Scientist) 01:54.080
training training it to kind of do this thing
Ravid Shwartz-Ziv (Assistant Professor) 01:57.110
and did they use it so it worked
Yann LeCun (Chief AI Scientist) 01:59.110
really well and they showed it to their you know business people who said oh we're just going to ask people to type pink codes we have every lesson of like that like how you can integrate the technology right and you know i knew this thing was kind of fishy in the first place
Yann LeCun (Chief AI Scientist) 02:17.440
because like you know there were countries in europe that were using smart cards right and it was much better but they just didn't want to use smart crops for some reason anyway so so we had this technology in the mid two thousand i worked with two of my students on to revise
Yann LeCun (Chief AI Scientist) 02:35.150
this idea we came up with a new objective functions to train those so these are where people now call contractive methods it's a special case of contrastive methods we have like positive examples negative examples and you train you know on positive examples you train the system
Yann LeCun (Chief AI Scientist) 02:48.430
to have low energy and for negative samples you train them to have higher energy where energy is the distance between the representations so we had two papers at CDPR in two thousand five two thousand six by raya hadsel who is now the head of deep mind foundation the sort of
Yann LeCun (Chief AI Scientist) 03:07.270
fair like division of deep mind if you want and summit chopra who is actually a faculty here at NYU now working on medical imaging and so this gathered a bit of interest in the community and sort of revived a little bit of work on those ideas but it still wasn't working very
Yann LeCun (Chief AI Scientist) 03:27.150
well those contrasting methods really were producing representations of images for example that were kind of relatively low dimensional if we measured like those the eigenvalue spectrum of the coherence matrix of the representations that came out of those things it would fill up
Yann LeCun (Chief AI Scientist) 03:41.670
maybe two hundred dimensions never more like even training on imagenet and things like that even with data augmentation and so that was kind of disappointing and it did work OK there was a bunch of papers on this and it worked OK there was there was white paper from deeper it
Yann LeCun (Chief AI Scientist) 03:58.110
seemed clear that demonstrated you could get decent performance with contrastive training applied to siamese nets but then about five years ago one of my postdocs stephane denis at meta tried an idea that at first i didn't think it would work which was to essentially have some
Yann LeCun (Chief AI Scientist) 04:23.150
measure of information quantity that comes out of the encoder and then trying to maximize that OK and the reason i didn't think it would work is because i'd seen a lot of experiments along those lines that jeff hinton was doing in the nineteen eighties are trying to maximize
Yann LeCun (Chief AI Scientist) 04:41.270
information and you can never maximize information because you never have appropriate measures of information content that is at orbound if you want to maximize something you want to either be able to compute it or you want to a lower bound on it so you can push it up right and
Yann LeCun (Chief AI Scientist) 04:59.570
for information content we only have upper balance so i always thought this was completely hopeless and then you know stefan came up with a technique which was was called battle twins baru is a famous theoretical neuroscientist who came up with the idea information maximization
Yann LeCun (Chief AI Scientist) 05:20.790
and and it kind of worked it was wow so there i said like we have to push this right so we came up with another method with a student of mine adrian bard co advised with jean-ponse who's affiliated with NYU two technique called vikrag variance invariance covariance
Yann LeCun (Chief AI Scientist) 05:42.240
regularization and that's a network to be simpler and work even better and since then we made progress and randall recently you know discussed an idea with him that can be pushed and made practical it's called sigreg the whole system is called he's responsible for the name at
Yann LeCun (Chief AI Scientist) 06:00.470
all latent euclidean japa right yeah and sigreg has to do with sort of making sure that there are distribution of vectors that come out of the encoder is an isotropic gaussian that's the i in the G so i mean there's a lot of things happening in this domain which are really cool
Yann LeCun (Chief AI Scientist) 06:29.590
i think there's going to be some more progress over the next year or two and we'll get a lot of experience with this and and i think that's kind of a really good promising set of techniques to train models that learn abstract representations which i think is key