Ilya Sutskever (Co-founder and Chief Scientist) 00:00.740
I think there are some similarities Yeah. Yeah. The amount of pre-training data is very very staggering. Yes. And somehow a human being after even 15 years with the tiny fraction of the pre-training data, they know much less. Yeah. But whatever they do know, they know much more
Ilya Sutskever (Co-founder and Chief Scientist) 00:31.920
deeply somehow. And the mistakes, like Like already at that age, you would not make mistakes that our eyes make. Yeah. There is another thing you might say, could it be something like evolution? And the answer is maybe. But in this case, I think evolution might actually have an
Ilya Sutskever (Co-founder and Chief Scientist) 00:44.640
edge. Like there is this I remember reading about this case where some you know that's one thing that neuroscientists do or rather one way in which neuroscientists can learn about the brain is by studying people with brain damage to different parts of the brain. And And so And
Ilya Sutskever (Co-founder and Chief Scientist) 01:04.840
some people have the most strange symptoms you could imagine. It's actually really really interesting. And there was one case that comes to mind that's relevant. I read about this person who had some kind of brain damage that took out I think a stroke or an accident that took
Ilya Sutskever (Co-founder and Chief Scientist) 01:22.520
out his emotional processing. So he stopped feeling any emotion. And as a result of that, you know, he still still remained very articulate and he could solve little puzzles and on tests he seemed to be just fine. But he felt no emotion. He didn't feel sad. He didn't feel anger.
Ilya Sutskever (Co-founder and Chief Scientist) 01:42.640
He didn't feel animated. And he became somehow extremely bad at making any decisions at all. It would take him hours to decide on which socks to wear and he would make very bad financial decisions. And that's very What does it What What does it say about the role of our built-in
Ilya Sutskever (Co-founder and Chief Scientist) 02:06.120
emotions in making us like a viable agent essentially. And I guess to connect to your question Yes. about pre-training it's like maybe pre like maybe if you're good enough at like getting everything out of pre-training you can get you you could get that as well. But that's the
Ilya Sutskever (Co-founder and Chief Scientist) 02:23.280
kind of thing which seems Well, it may or may not be possible to get that from pre-training
Dwarkesh Patel (Host) 02:33.000
pain. What is that? Clearly not just directly emotion. It seems like some almost value function like thing which is giving telling you which decision to be made like what what the end reward for any decision should be. And you think that doesn't sort of implicitly come from
Ilya Sutskever (Co-founder and Chief Scientist) 02:53.280
I think it could. I'm just saying it's not one it's not 100% obvious. Yeah.
Dwarkesh Patel (Host) 02:58.360
But what what what is that? Like what how do you think about emotions in what is the ML analogy for emotions.
Ilya Sutskever (Co-founder and Chief Scientist) 03:04.240
It should be some kind of a value function thing. Yeah. But I don't think there is a great analogy because right now value functions don't play a very prominent role in the things people do.
Dwarkesh Patel (Host) 03:14.440
It might be worth defining for the audience what a value function is if you if you want to do that.
Ilya Sutskever (Co-founder and Chief Scientist) 03:18.000
I mean, certainly. I I'll be very happy to do that. Right? So So when people do reinforcement learning, the way reinforcement learning is done right now, how does it do? How do people train those agents. So, you have a neural net and you give it a problem. And then you tell the
Ilya Sutskever (Co-founder and Chief Scientist) 03:38.320
model go solve it. The model takes maybe thousands, hundreds of thousands of actions or thoughts or something and then it produces a solution. The solution is graded. And then the score is used to provide a training signal for every single action in your trajectory. Mhm. So,
Ilya Sutskever (Co-founder and Chief Scientist) 03:58.240
that means that if you are doing something that goes for a long time, if you're training a task that takes a long time to solve. You will do no learning at all until you solve until you came up with the proposed solution. That's how reinforcement learning is done naively. That's
Ilya Sutskever (Co-founder and Chief Scientist) 04:13.600
how O1, R1 ostensibly are done. The value function says something like, "Okay, look, maybe I could sometimes, not always, could tell you if you're doing well or badly." The notion of a value function is more useful in some domains than others. So, for example, when you play
Ilya Sutskever (Co-founder and Chief Scientist) 04:32.680
chess And you lose a piece, you know, I messed up. You don't need to play the whole game to know that what I just did was bad and therefore whatever um whatever preceded it was also bad. So the value function lets you short circuit the weight until the very end. Like let's
Ilya Sutskever (Co-founder and Chief Scientist) 04:53.160
suppose that you started to pursue some kind of um okay, let's suppose that you are doing some kind of a math thing or a programming thing and you're trying to explore a particular solution direction. And after let's say after 1,000 steps of thinking, you concluded that this
Ilya Sutskever (Co-founder and Chief Scientist) 05:09.440
direction is unpromising. As soon as you conclude this, you could already get a reward signal a thousand times steps previously when you decided to pursue down this path, you say, "Oh, next time, I shouldn't pursue this path in a similar situation." Long before you actually came
Ilya Sutskever (Co-founder and Chief Scientist) 05:28.220
up with the proposed solution.