Dwarkesh Patel (Host) 00:00.000
curious if you say we are back in the era of research. You were there from 2012 to 2020. And do do you have Yeah, what what is now the vibe going to be if we go back to the era of research? For example, even after Alex Net, the amount of compute that was used to run experiments
Dwarkesh Patel (Host) 00:20.960
kept increasing and the size of frontier systems kept increasing. And do you think now that This era of research will still require a tremendous amount of compute. Um, do you think it will require going back into archives and reading old papers? What is Maybe what was the vibe
Dwarkesh Patel (Host) 00:41.200
of like you were at Google and OpenAI and Stanford these places when there was like a more of a vibe of research. What what kind of thing should we be expecting in the community?
Ilya Sutskever (Co-founder and Chief Scientist) 00:52.700
So, one consequence of um the age of scaling is that there was this um scaling sucked
Dwarkesh Patel (Host) 01:02.340
out all the air in
Ilya Sutskever (Co-founder and Chief Scientist) 01:03.020
the room. Yeah. And so because scaling sucked out all the air in the room everyone's start to do the same thing. We got to the point where uh we are in a world where there are more companies than ideas by quite a bit. Actually, on that, you know, there is the Silicon Valley
Ilya Sutskever (Co-founder and Chief Scientist) 01:25.660
saying that says that ideas are cheap, execution is everything.
Dwarkesh Patel (Host) 01:32.260
And people say that a
Ilya Sutskever (Co-founder and Chief Scientist) 01:33.020
lot. Yeah. And there is truth to that. But then I saw I saw someone say on Twitter um something like if ideas are are so cheap. How come no one's having any ideas? And I think it's true too. I think like if you think about a research progress in terms of bottlenecks, there are
Ilya Sutskever (Co-founder and Chief Scientist) 01:54.660
several bottlenecks. If you go back to the if if you and um one of them is ideas and one of them is
Dwarkesh Patel (Host) 02:01.020
your ability to bring them to
Ilya Sutskever (Co-founder and Chief Scientist) 02:02.020
life which might be compute, but also engineering. So if you go back to the 90s, let's say, you had people who had had pretty good ideas. And if they had had much larger computers, maybe they could demonstrate that their ideas were viable, but they could not. So, they could only
Ilya Sutskever (Co-founder and Chief Scientist) 02:17.540
have very very small demonstration
Dwarkesh Patel (Host) 02:19.140
and did not convince anyone.
Ilya Sutskever (Co-founder and Chief Scientist) 02:20.420
Yeah. So, the bottleneck was compute. Then in the age of scaling, computers increased a lot. And of course, there is a question of how much computers needed, but compute is large, so compute is large enough such that it's like not obvious that you need that much more compute to
Ilya Sutskever (Co-founder and Chief Scientist) 02:43.580
prove some idea. Like I'll give you an analogy. AlexNet was built on two GPUs. That was the total amount of compute used for it. The transformer was built on eight to 64 GPUs. No single transformer paper experiment used more than 64 GPUs of 2017, which would be like what? Two
Ilya Sutskever (Co-founder and Chief Scientist) 03:06.700
GPUs of today. So, the ResNet, right? Many like even the you could argue that the like 01 reasoning was not the most compute heavy thing in the world. So they're definitely for for research you need definitely some amount of compute but it's far from obvious that you need the
Ilya Sutskever (Co-founder and Chief Scientist) 03:32.540
absolutely largest amount of compute ever for research.
Ilya Sutskever (Co-founder and Chief Scientist) 03:36.180
You might argue and I think it is true that if you want to build the absolutely best system. If you want to build the absolutely best system, then it helps to have much more compute and especially if everyone is within the same paradigm, then compute becomes one of the big
Ilya Sutskever (Co-founder and Chief Scientist) 03:53.820
differentiators.
Dwarkesh Patel (Host) 03:56.060
Yeah, I guess while it was possible to develop these ideas, I'm I'm asking you for the history because you were actually there. I'm not sure what actually happened, but it sounds like it was possible to develop these ideas using minimal amount of compute, but it wasn't the
Dwarkesh Patel (Host) 04:09.740
transformer didn't immediately become famous. It became the thing everybody started doing and then started experimenting on top of and building on top of because it was validated
Ilya Sutskever (Co-founder and Chief Scientist) 04:18.380
at higher and higher levels of compute.
Dwarkesh Patel (Host) 04:20.100
Correct. And if you at SSI have 50 different ideas, how will you know which one is the next transformer and which one is, you know, brittle without having the kinds of compute that other frontier
Ilya Sutskever (Co-founder and Chief Scientist) 04:35.100
labs have? So, I can I can comment on that, which is the short comment is that, you know, you mentioned the SSI. Specifically for us, the amount of compute that SSI has for research is really not that small, and I want to explain why. Like a simple math can explain why the
Ilya Sutskever (Co-founder and Chief Scientist) 04:58.220
amount of compute that we have is actually a lot more comparable for research than one might think. Now explain. So, SSI has raised $3 billion which is like not small by it's like a lot by any absolute sense but you could say but look at the other company is raising much more.
Ilya Sutskever (Co-founder and Chief Scientist) 05:22.780
But a lot of what their a lot of their compute goes for inference. Like these big numbers these big loans it's earmarked for inference. That's number one. Number two, you need if you want to have a product on which you do inference, you need to have a big staff of engineers of
Ilya Sutskever (Co-founder and Chief Scientist) 05:41.540
sales people, a lot of the research needs to be dedicated for producing all kinds of product-related features. So, then when you look at what's actually left for research, the difference becomes a lot smaller. Now, the other thing is is that if you're doing something different,
Ilya Sutskever (Co-founder and Chief Scientist) 06:01.000
do you really need the absolute maximal scale to prove it? I don't think it's true at all. I think that in our case, we have sufficient compute to prove to convince ourselves and anyone else that what we're doing is correct.