Dwarkesh Patel (Host) 00:00.720
That's a very interesting way to put it. But let me ask you the question you just posed then. What are we scaling and what what is what would it mean to have a recipe? Because I guess I'm not aware of a very clean relationship that almost looks like a law of physics which
Dwarkesh Patel (Host) 00:18.160
existed in pre-training. There was a power law between data or computer parameters and loss. What is the kind of relationship we should be seeking and how how should we think about what this new recipe might look like?
Ilya Sutskever (Co-founder and Chief Scientist) 00:32.800
So, we've we've already witnessed a transition from one type of scaling to a different type of scaling from pre-training to RL Now people are scaling RL. Now based on what people say on Twitter, they spend more computer on RL than on pre-training at this point because RL can
Ilya Sutskever (Co-founder and Chief Scientist) 00:55.240
actually consume quite a bit of compute. You know, you do very very long roll outs.
Dwarkesh Patel (Host) 01:00.240
Yes.
Ilya Sutskever (Co-founder and Chief Scientist) 01:01.160
So it takes a lot of compute to produce those roll outs. And then you get relatively small amount of learning per roll out so you really can spend you really can spend a lot of compute. And I could imagine like I wouldn't at at this at this It's It's more like I wouldn't even
Ilya Sutskever (Co-founder and Chief Scientist) 01:16.320
call it a scale scaling. I would say, "Hey, like what are you doing?" And is the thing you are doing the the the the most productive thing you could be doing.
Dwarkesh Patel (Host) 01:25.440
Yeah. Can
Ilya Sutskever (Co-founder and Chief Scientist) 01:25.880
you find a most more productive way of using your compute? We've discussed the value function business earlier. And maybe once people get good at value functions, they'll be using their their um resources more productively. And if you find a whole other way of training models.
Ilya Sutskever (Co-founder and Chief Scientist) 01:46.000
You could say, is this scaling or is it just using your resources? I think it becomes a little bit ambiguous. In a sense that when people were in the age of research, back then it was like people say, "Hey, let's try this and this and this. Let's try that and that and that. Oh,
Ilya Sutskever (Co-founder and Chief Scientist) 01:58.800
look, something interesting is happening." And I think there will be a return to that.
Dwarkesh Patel (Host) 02:04.280
So if we're back in the era of research, stepping back, what is the part of the recipe that we need to think most about? When you say value function, people are already trying the current recipe, but then having a LLM as a judge and so forth. You can say that's the value
Dwarkesh Patel (Host) 02:18.400
function, but it sounds like you have something much more fundamental in mind. Do we need Do we need to go back to should we even rethink pre-training at all and not just add more steps to the end of that process? Yeah.
Ilya Sutskever (Co-founder and Chief Scientist) 02:30.400
So, the the the discussion about value function, I think it was interesting. I want to like emphasize that I think the value function is something like it's going to make our RL more efficient. And I think that makes a difference. But I think that anything you can do with a
Ilya Sutskever (Co-founder and Chief Scientist) 02:49.080
value function, you can do without just more slowly. The thing which I think is the most fundamental is that these models somehow just generalize dramatically worse than people.
Dwarkesh Patel (Host) 03:00.400
Yes.
Ilya Sutskever (Co-founder and Chief Scientist) 03:01.880
And it's super obvious. That's that seems like a very fundamental thing.
Dwarkesh Patel (Host) 03:06.880
Okay. So this is the crux to generalization and there's two sub questions. There's one which is about sample efficiency, which is why should it take so much more data for these models to learn than humans. There's a second about even separate from the amount of data it takes,
Dwarkesh Patel (Host) 03:24.160
there's a question of why is it so hard to teach the thing we want to a model than to a human, which is to say for to a human that we don't necessarily need a verifiable world war to be able to you're probably mentoring a bunch of researchers right now and you're you know
Dwarkesh Patel (Host) 03:40.640
talking with them, you're showing them your code and you're showing them how you think. And from that, they're picking up your way of thinking and how you they should do research. You don't have to set like a verifiable reward for them. That's like, "Okay, this is the next part
Dwarkesh Patel (Host) 03:52.240
of their curriculum." And now this is the next part of their curriculum and oh it was this training was unstable. and we got a there's not this schleppy bespoke process. So, perhaps these two issues are actually related in some way. But I'd be curious to explore this this second
Dwarkesh Patel (Host) 04:07.060
thing which was more like continual learning and this first thing which feels just like um sample efficiency.
Ilya Sutskever (Co-founder and Chief Scientist) 04:13.380
Yeah, so you know you could actually wonder one one possible explanation for the human sample efficiency that needs to be considered is evolution. And evolution has given us a small amount of the the most useful information possible. And for things like vision, hearing, and
Ilya Sutskever (Co-founder and Chief Scientist) 04:36.300
locomotion, I think there's a pretty strong case that evolution actually has given us a lot.
Dwarkesh Patel (Host) 04:42.740
Mhm.
Ilya Sutskever (Co-founder and Chief Scientist) 04:43.500
So, for example, human dexterity far exceeds I mean robots can become dexterous too if you subject them to like a huge amount of training and simulation. But to train a robot in the real world to quickly like pick up a new skill like a person does. Seems very out of reach. And
Ilya Sutskever (Co-founder and Chief Scientist) 05:01.260
here you could say, "Oh yeah, like locomotion." All our ancestors needed great locomotion, squirrels like So, locomotion maybe like we've got like some unbelievable prior. You could make the same case for vision, you know. I I believe Yann LeCun made the point, "Oh, like um
Ilya Sutskever (Co-founder and Chief Scientist) 05:19.260
children learn to drive after 16 hour after like 10 hours of practice." Which is true. But our vision is so good. At least for me, when I remember myself being 5 year old, my I was I was very excited about cars back then. And I'm pretty sure my car recognition was more than
Ilya Sutskever (Co-founder and Chief Scientist) 05:38.140
adequate for self driving already as a 5 year old. You don't get to see that much data as a 5 year old. You spend most of your time in your parents' house. So you have very low data diversity. But you could say maybe that's evolution too. But then language and math and coding,
Ilya Sutskever (Co-founder and Chief Scientist) 05:52.620
probably not.