Janna Levin (Professor of Physics and Astronomy) 00:00.200
So, I I could play devil's advocate and say, "Well, how do I know that what a human being doing is doing is that much different, right? We're trained on lots of language. We get some dopamine hit or some reward system for having said the right word at the right time and the
Janna Levin (Professor of Physics and Astronomy) 00:16.580
right grammatical structure for the language that we're immersed in. And um
Janna Levin (Professor of Physics and Astronomy) 00:21.380
and we back propagate. We try to do a better job the next time. In some sense, how how is that different than what a human being is doing? And you you were saying maybe it's the sensory experience of being immersed in the world?
Yann LeCun (Chief AI Scientist) 00:35.940
Okay. Um, a typical LLM as and I mentioned is trained on tens of trillions of of words. Typically There's only
Janna Levin (Professor of Physics and Astronomy) 00:47.220
a few hundred thousand words of it. You're just saying sentences. It's combinations.
Yann LeCun (Chief AI Scientist) 00:51.140
No, it's 30 trillion 30 trillion words is is a a typical size for the training set pre-training of of an LLM. Uh, a a word is represented actually as sequences of tokens, doesn't really matter. Uh, And a token is about three bytes. So, the total is about 10 to the 14 bytes,
Yann LeCun (Chief AI Scientist) 01:11.020
right? One with 14 zeros of training data to train those LLM's.
Yann LeCun (Chief AI Scientist) 01:16.620
And that corresponds to basically all the text that is publicly available on the internet plus some other stuff and it would take any of us something like half a million half a million years for any of us to read through that material, right? So, it's an enormous amount of
Yann LeCun (Chief AI Scientist) 01:31.580
textual data. Now,
Yann LeCun (Chief AI Scientist) 01:33.140
compare this with what a child proceeds during the first few years of life. Psychologist tell us that a 4-year old has been awake a total of 16,000 hours. And there's about one byte per second going through the optic nerve, every single fiber of the optic nerve, and we have 2
Yann LeCun (Chief AI Scientist) 01:56.460
million of them.
Yann LeCun (Chief AI Scientist) 01:57.340
So, it's about 2 megabytes per second getting to the visual cortex. Um, during 16,000 hours, do the mathematics and it's about 10 to the 14 bytes. A 4-year-old has seen as much visual data as the biggest LLM trained on the entire text ever produced.
Yann LeCun (Chief AI Scientist) 02:17.820
And so what it tells you is that there is way more information in the real world, but it's also much more complicated. It's noisy, it's high dimensional, it's continuous and basically the methods that are employed to train LLM's do not work in the real world.
Yann LeCun (Chief AI Scientist) 02:36.020
That explains why we have LLMs that can pass the bar exam or solve equations or compute integrals like college students and solve math problems. But we still don't have a domestic robot. They can, you know, do the chores in the house. We don't We don't even have level five cell
Yann LeCun (Chief AI Scientist) 02:52.700
driving cars. I mean, we have them, but we cheat. So,
Yann LeCun (Chief AI Scientist) 02:56.500
um I mean, we certainly don't have cell driving cars that can learn to drive in 20 hours of practice like any teenager. Right? So, obviously, we're missing something very big to get machines to the level of human or even animal intelligence. Well, let's not talk about language.
Yann LeCun (Chief AI Scientist) 03:12.060
Let's talk about how a cat is intelligent or dog. Um,
Yann LeCun (Chief AI Scientist) 03:16.060
we we're not even at that level with AI systems.
Janna Levin (Professor of Physics and Astronomy) 03:20.460
Adam, you you think you impart more comprehension on the part of the LLMs at this point already?
Adam Brown (Research Scientist) 03:31.220
Uh, I think that's right. So, I mean, Jan is making sort of excellent points that the LLMs are much less, for example, sample efficient than humans. Humans or indeed your cat or just a a cat, I don't know if it was your cat or It's a your very cat in your smart cat in your
Adam Brown (Research Scientist) 03:47.580
example um is able to learn from many fewer examples than a large language model, for example, can learn from
Adam Brown (Research Scientist) 03:57.220
that takes way more data to teach it to the same level of proficiency. Um and and that's true and that that is a thing that is better about uh the, you know, architecture of animal minds compared to these artificial minds that we're building. Um, on the other hand, sample
Adam Brown (Research Scientist) 04:12.540
efficiency isn't everything. Um,
Adam Brown (Research Scientist) 04:15.940
we see this frequently, in fact, when we try and, you know, before large language models, when we try and put uh machines on, you know, make artificial minds to do other tasks. Even the famous chess bots that we built uh on built on types of large language models. Uh, the way
Adam Brown (Research Scientist) 04:32.660
they were trained sort of Alpha Zero and various other ones, they would play each other uh
Adam Brown (Research Scientist) 04:37.540
they would play itself at chess a huge number of times. And to begin with, it would just be making random moves and then uh every time it it won or lost the game, when it was playing itself, it was sort of uh you know reward that neural pathway or punish that neural pathway and
Adam Brown (Research Scientist) 04:52.340
they play each other at chess again and again and when they played as many games as a human grandmaster has played, they were still making essentially random moves.