Dwarkesh Patel (Host) 00:00.000
But is this is this contradicted by what human like learning implies? Is that like it can learn?
Ilya Sutskever (Co-founder and Chief Scientist) 00:03.880
It can, but, but, you have accumulated learning, you have a big investment. You spent a lot of compute to become really, really, really good, really phenomenal at this thing. And someone else spent a huge amount of computer and a huge amount of experience to get really, really
Ilya Sutskever (Co-founder and Chief Scientist) 00:19.040
good at some other thing. Right. You apply a lot of human learning to get there, but now like you you are you are at this high point where someone else would say look like I don't wanna start learning what you've learned to go
Dwarkesh Patel (Host) 00:30.000
I with guess this that would require many different companies to begin at the human like continue a learning agent at the same time, so that they can start their different research in different branches, but if one company you know, gets that agent first or gets that learner
Dwarkesh Patel (Host) 00:47.480
first, it does then seem like well, you know, they like we if you just think about every single job in the economy,
Dwarkesh Patel (Host) 00:56.800
you just have uh instance learning each one seems tractable for a company.
Ilya Sutskever (Co-founder and Chief Scientist) 01:01.800
Yeah, that's that that's that's a valid argument. My my strong intuition is that it's not how it's going to go. My strong intuition is that yeah like the argument says it will go this way.
Dwarkesh Patel (Host) 01:12.160
Yeah.
Ilya Sutskever (Co-founder and Chief Scientist) 01:12.600
But my strong intuition is that it will not go this way. That this is the You know, in in theory, there is no difference between theory and practice. In practice, there is and I think that's going to be one of those.
Dwarkesh Patel (Host) 01:23.500
A lot of people's models of recursive self-improvement literally explicitly state we will have a million Ilya's in a server that are coming in with different ideas and this will lead to a super intelligence emerging very fast. Do you have some intuition about how parallelizable
Dwarkesh Patel (Host) 01:38.300
the thing you are doing is? How how much how what are the gains from making copies of Ilya?
Ilya Sutskever (Co-founder and Chief Scientist) 01:45.020
I don't know. I think I think I think they'll definitely be a they'll be diminishing returns because you want you want people who think differently rather than the same. I think that if they were literal copies of me, I'm not sure how much more incremental value you'd get. I
Ilya Sutskever (Co-founder and Chief Scientist) 02:04.620
think that but people who think differently, that's what you want.
Dwarkesh Patel (Host) 02:09.860
Why is it that it's been if you look at different models even released by totally different companies trained on potentially It's actually crazy non-overlapping data sets how similar LLMs are to each other.
Ilya Sutskever (Co-founder and Chief Scientist) 02:20.020
Maybe the data sets are not as non-overlapping as it seems.
Dwarkesh Patel (Host) 02:23.580
But there's there's some sense that there's like even if an individual human might be less productive than the future AI, maybe there's something to the fact that human teams have more diversity than teams of AI's might have, but how do we elicit meaningful diversity among AI's?
Dwarkesh Patel (Host) 02:37.460
So I think just raising the temperature just to results in gibberish. I think you want something more like different scientists have different different prejudices or different ideas. How do you get that kind of diversity among AI agents.
Ilya Sutskever (Co-founder and Chief Scientist) 02:48.900
So, the reason there has been no diversity, I believe, is because of pre-training. All the pre-trained models are the same, pretty much, because they're pre-trained on the same data. Now, RL and post-training is where some differentiation starts to emerge because different
Ilya Sutskever (Co-founder and Chief Scientist) 03:05.740
people come up with different RL training.
Dwarkesh Patel (Host) 03:09.060
Yeah. And then I've heard you hint in the past about self-play as a way to either get data or match agents to other agents with equivalent intelligence to kick off learning. How should we think about why there's no public um proposals of this kind of thinking working with other
Dwarkesh Patel (Host) 03:30.260
LMs?
Ilya Sutskever (Co-founder and Chief Scientist) 03:31.020
I would say there are two things to say. I would say that the reason why I thought self-play was interesting is because it offered a way to create models using compute only without data. Right? And if you think that data is the ultimate bottleneck Then using computer only is
Ilya Sutskever (Co-founder and Chief Scientist) 03:48.940
very interesting. So that's what makes it interesting. Now, the the thing is that self play, at least the way it was done in the past when you have agents which are somehow compete with each other, it's only good for developing a certain set of skills. It is too narrow. It's
Ilya Sutskever (Co-founder and Chief Scientist) 04:10.020
only good for like negotiation, conflict, certain social skills. strategizing that kind of stuff. And so if you care about those skills, then self-play will be useful. Now, actually I think that self-play did find a home, but just in a different form, in a different form. So
Ilya Sutskever (Co-founder and Chief Scientist) 04:32.420
things like debate, prover verifier, you have some kind of an LLM as a judge which is also incentivized to find mistakes in your work. You could say this is not exactly self-play, but this is, you know, a related adversarial set that up that people are doing, I believe. And
Ilya Sutskever (Co-founder and Chief Scientist) 04:48.860
really self-play is an example of um is a special case of more general like um competition between between agents. Right? The response, the natural response to competition is to try to be different. And so if you were to put multiple agents and you tell them, you know, you all
Ilya Sutskever (Co-founder and Chief Scientist) 05:03.940
need to work on some problem and you are an agent and you're inspecting what everyone else is working, you're going to say, "Well, if they're already taking this approach, it's not clear I should pursue it. I should pursue something different And so I think that something like
Ilya Sutskever (Co-founder and Chief Scientist) 05:19.060
this could also create an incentive for a diversity of approaches.