Ilya Sutskever – We're moving from the age of scaling to the age of research

Dwarkesh Patel (Host) 00:00.180

This was in the Deep Sea Car One paper is that the space of trajectories is so wide that maybe it's hard to learn a mapping from an intermediate trajectory and value and also given that you know, in coding for example, you'll have the wrong idea, then you'll go back, then you'll

Dwarkesh Patel (Host) 00:19.820

change something. This sounds

Ilya Sutskever (Co-founder and Chief Scientist) 00:21.180

like such lack of faith in deep learning. Yeah. Like, I mean, sure it might be difficult, but nothing deep learning can do. Yeah. So my expectation is that like value function should be useful and then I fully I fully expect that they will be used in the future if not already.

Ilya Sutskever (Co-founder and Chief Scientist) 00:42.620

What was I alluding to with the person who's Yeah. emotional center got um damaged is more that maybe what it suggests is that the value function of humans is modulated by emotions in some important way that's hardcoded by evolution. And maybe that is important for people to be

Ilya Sutskever (Co-founder and Chief Scientist) 01:06.860

effective in the world.

Dwarkesh Patel (Host) 01:08.500

That That's the thing I was actually going to planning on asking you. There's something really interesting about emotions of the value function which is that it's impressive that they have this much utility while still being rather simple to understand.

Ilya Sutskever (Co-founder and Chief Scientist) 01:23.980

So I have two responses. I do agree that compared to the kind of things that we learn and the things we are talking about, the kind of ways we are talking about, emotions are relatively simple. They might even be so simple that maybe you could map them out in a human

Ilya Sutskever (Co-founder and Chief Scientist) 01:42.700

understandable way. I think it would be cool to do. In terms of utility though, I think there is a thing where you know there is this complexity robustness straight off where complex things can be very useful, but simple things are very useful in very broad range of situations.

Ilya Sutskever (Co-founder and Chief Scientist) 02:07.060

And so I think what what one way to interpret what we are seeing is that we've got these emotions that essentially evolved mostly mostly from our male ancestors and then fine-tuned a little bit while we were hominids, just a bit. We do have like a decent amount of social

Ilya Sutskever (Co-founder and Chief Scientist) 02:22.860

emotions though which mammals may lack. But they're not very sophisticated and because they're not sophisticated the server is so well in this very different world compared to the one that we've been living in. Actually, they are they also make mistakes. For example, our

Ilya Sutskever (Co-founder and Chief Scientist) 02:39.140

emotions, well, I don't know, this hunger count is an emotion. Debatable, but I think for example Our intuitive feeling of hunger is not succeeding in guiding us correctly in this world with an abundance of food.

Dwarkesh Patel (Host) 02:57.180

Yeah. People have been talking about scaling data, scaling parameter, a scaling compute. Is there a more general way to think about scaling? What are the other scaling axes?

Ilya Sutskever (Co-founder and Chief Scientist) 03:08.660

So, the thing so so here here is a perspective. Here's a perspective I think might be might be true. So, the way a male used to work is that people would just think of with stuff and try to and try to get interesting results. That's what's been going on in the past. Then the

Ilya Sutskever (Co-founder and Chief Scientist) 03:36.340

scaling insight arrived, right? Scaling loss, GPT-3, and suddenly everyone realized we should scale. And it's just this this is an example of how language affects thought. Scaling is what just one word, but it's such a powerful word because it informs people what to do. They

Ilya Sutskever (Co-founder and Chief Scientist) 03:59.220

say, "Okay, let's let's try to scale things." And so you say, "Okay, so what are we scaling?" And pre-training was a thing to scale. It was a particular scaling recipe. Yes. The big breakthrough of pre-training is the realization that this recipe is good. So you say, "Hey, if

Ilya Sutskever (Co-founder and Chief Scientist) 04:17.180

you mix some compute with some data into a neural net of a certain size, you will get results and you will know that it will be better if you just scale the recipe up. And this is also great companies love this because it gives you a very uh low risk way of investing Yeah. your

Ilya Sutskever (Co-founder and Chief Scientist) 04:38.900

resources. Yeah. Right? It's much harder to invest your resources in research. Compare that, you know, if do research. You need to have like go forth researchers and research and come up with something versus get more data, get more compute, you know, you'll get something from

Ilya Sutskever (Co-founder and Chief Scientist) 04:54.780

pre-training. And indeed, you know, it looks like like I based on various um things people say on some people say on Twitter maybe it appears that Gemini have found a way to get more out of pre-training. At some point though pre-training will run out of data. The data is very

Ilya Sutskever (Co-founder and Chief Scientist) 05:12.360

clearly finite. And so then okay, what do you do next? Either you do some kind of a souped up pre-training, different recipe from the one you've done before or you're doing IRL or maybe something else. But now that compute is big, compute is now very big. In some sense we are

Ilya Sutskever (Co-founder and Chief Scientist) 05:28.400

back to the age of research. So maybe here's another way to put it. Up until 2020 from 2015 from 2012 to 2020, it was the age of research. Now from 2020 to 2025, it was the age of scaling or maybe plus minus, let's add the arrow bars to those years. Because people say this is

Ilya Sutskever (Co-founder and Chief Scientist) 05:46.640

amazing, you got to scale more, keep scaling, the one word scaling. But now the scale is so big, like is is it is the belief really that oh it's so big, but if had a 100x more, everything would be so different. Like it would be different for sure. But like is the belief that if

Ilya Sutskever (Co-founder and Chief Scientist) 06:05.480

you just 100x the scale, everything would be transformed. I don't think that's true. So it's back to the age of research again, just with big computers.