Ilya Sutskever – We're moving from the age of scaling to the age of research

Ilya Sutskever (Co-founder and Chief Scientist) 00:00.000

I think there are some similarities Yeah.

Dwarkesh Patel (Host) 00:02.660

between

Ilya Sutskever (Co-founder and Chief Scientist) 00:03.060

both of these two pre-training and pre-training tries to play the role of both of these but I think there are some big differences as well the amount of pre-training data is very very staggering

Dwarkesh Patel (Host) 00:17.244

Yes.

Ilya Sutskever (Co-founder and Chief Scientist) 00:18.150

And somehow a a human being after

Ilya Sutskever (Co-founder and Chief Scientist) 00:21.460

even 15 years with a tiny fraction of the pre-training data they know much less but whatever they do know they know much more deeply somehow and the mistakes like like already at that age you would not make mistakes that our AIs make. Yeah.

Dwarkesh Patel (Host) 00:37.060

There

Ilya Sutskever (Co-founder and Chief Scientist) 00:37.220

is another thing you might say, could it be something like evolution? And the answer is maybe. But in this case, I think evolution might actually have an edge. Like there is this I remember reading about this case where some you know that's one thing that neuroscientists do or

Ilya Sutskever (Co-founder and Chief Scientist) 00:54.860

rather one way in which neuroscientists can learn about the brain is by studying people with brain damage to different parts of the brain. And And so And some people have the most strange symptoms you could imagine. It's actually really really interesting. And there was one case

Ilya Sutskever (Co-founder and Chief Scientist) 01:10.300

that comes to mind that's relevant. I read about this person who had some kind of brain damage that took out I think a stroke or an accident that took out his emotional processing. So he stopped feeling any emotion. And as a result of that, you know, he still remained very

Ilya Sutskever (Co-founder and Chief Scientist) 01:32.900

articulate and he could solve little puzzles and on tests he seemed to be just fine. But he felt no emotion. He didn't feel sad. He didn't feel anger. He didn't feel animated. And he became somehow extremely bad at making any decisions at all. It would take him hours to decide

Ilya Sutskever (Co-founder and Chief Scientist) 01:50.340

on which socks to wear and he would make very bad financial decisions. And that's very what does it what what does it say about the role of our built-in emotions in making us like a viable agent essentially. And I guess to connect to your question Yes.

Ilya Sutskever (Co-founder and Chief Scientist) 02:12.700

about pre-training it's like maybe pre like maybe if you're good enough at like getting everything out of pre-training you can get you you could get that as well. But that's the kind of thing which seems Well, it may or may not be possible to get that from pre-training.

Dwarkesh Patel (Host) 02:33.340

What is that? Clearly not just directly emotion. It seems like some almost value function like thing which is giving telling you which decision to be made like what what the end reward for any decision should be. And you think that doesn't sort of implicitly come from

Ilya Sutskever (Co-founder and Chief Scientist) 02:52.540

I think it could. I'm just saying it's not one it's not 100% obvious.

Dwarkesh Patel (Host) 02:56.580

Yeah. But what what what is that? Like what how do you think about emotions in what is the ML analogy for emotions.

Ilya Sutskever (Co-founder and Chief Scientist) 03:03.500

It should be some kind of a value function thing. Yeah.

Ilya Sutskever (Co-founder and Chief Scientist) 03:06.500

But I don't think there is a ML analogy because right now value functions don't play a very prominent role in the things people do.

Dwarkesh Patel (Host) 03:13.700

It might be worth defining for the audience what a value function is if you if you want to do that.

Ilya Sutskever (Co-founder and Chief Scientist) 03:17.260

I mean, certainly. I I'll be very happy to do that. Right, so so when people do reinforcement learning, the way reinforcement learning is done right now, how does it do? How do people train those agents. So, you have a neural net and you give it a problem. And then you tell the

Ilya Sutskever (Co-founder and Chief Scientist) 03:37.580

model go solve it. The model takes maybe thousands, hundreds of thousands of actions or thoughts or something and then it produces a solution. The solution is graded. And then the score is used to provide a training signal for every single action in your trajectory. Mhm.

Dwarkesh Patel (Host) 03:57.420

So,

Ilya Sutskever (Co-founder and Chief Scientist) 03:57.500

that means that if you are doing something that goes for a long time, if you're training a task that takes a long time to solve. You will do no learning at all until you solve until you came up with the proposed solution. That's how reinforcement learning is done naively. That's

Ilya Sutskever (Co-founder and Chief Scientist) 04:12.860

how O1, R1 ostensibly are done. The value function says something like, "Okay, look, maybe I could sometimes, not always, could tell you if you're doing well or badly." The notion of a value function is more useful in some domains than others. So, for example, when you play

Ilya Sutskever (Co-founder and Chief Scientist) 04:31.940

chess And you lose a piece, you know, I messed up. You don't need to play the whole game to know that what I just did was bad and therefore whatever um whatever preceded it was also bad. So the value function lets you short circuit the weight until the very end. Like let's

Ilya Sutskever (Co-founder and Chief Scientist) 04:52.420

suppose that you started to pursue some kind of um okay, let's suppose that you are doing some kind of a math thing or a programming thing and you're trying to explore a particular solution direction. And after let's say after 1,000 steps of thinking, you concluded that this

Ilya Sutskever (Co-founder and Chief Scientist) 05:08.700

direction is unpromising. As soon as you conclude this, you could already get a reward signal a thousand times steps previously when you decided to pursue down this path, you say, "Oh, next time, I shouldn't pursue this path in a similar situation." Long before you actually came

Ilya Sutskever (Co-founder and Chief Scientist) 05:27.480

up with the proposed solution.