Dwarkesh Patel (Host) 00:00.000
I'm curious if you say we are back in the era of research. You were there from 2012 to 2020. And do do you have Yeah, what what is now the vibe going to be if we go back to the era of research? For example, even after Alex Net, the amount of compute that was used to run
Dwarkesh Patel (Host) 00:20.160
experiments kept increasing and the size of frontier systems kept increasing. And do you think now that This era of research will still require a tremendous amount of compute. Um, do you think it will require going back into archives and reading old papers? What is Maybe what
Dwarkesh Patel (Host) 00:40.840
was the vibe of like you were at Google and OpenAI and Stanford these places when there was like a more of a vibe of research. What what kind of thing should we be expecting in the community?
Ilya Sutskever (Co-founder and Chief Scientist) 00:52.900
So, one consequence of um the age of scaling is that there was this um scaling sucked out all the air in
Ilya Sutskever (Co-founder and Chief Scientist) 01:03.220
the room.
Dwarkesh Patel (Host) 01:03.900
Yeah.
Ilya Sutskever (Co-founder and Chief Scientist) 01:05.300
And so because scaling sucked out all the air in the room everyone's start to do the same thing. We got to the point where uh we are in a world where there are more companies than ideas by quite a bit. Actually, on that, you know, there is the Silicon Valley saying that says
Ilya Sutskever (Co-founder and Chief Scientist) 01:28.220
that ideas are cheap, execution is everything. And people say that a lot.
Dwarkesh Patel (Host) 01:33.544
Yeah.
Ilya Sutskever (Co-founder and Chief Scientist) 01:33.869
And there is truth to that. But then I saw I saw someone say on Twitter um something like if ideas are are so cheap. How come no one's having any ideas? And I think it's true too. I think like if you think about a research progress in terms of bottlenecks, there are several
Ilya Sutskever (Co-founder and Chief Scientist) 01:55.220
bottlenecks. If you go back to the if if you and um one of them is ideas and one of them is your ability to bring them to life which might be compute, but also engineering. So if you go back to the 90s, let's say, you had people who had had pretty good ideas. And if they had had
Ilya Sutskever (Co-founder and Chief Scientist) 02:11.900
much larger computers, maybe they could demonstrate that their ideas were viable, but they could not. So, they could only have very very small demonstration and did not convince anyone.
Dwarkesh Patel (Host) 02:20.620
Yeah.
Ilya Sutskever (Co-founder and Chief Scientist) 02:21.620
So, the bottleneck was compute. Then in the age of scaling, computers increased a lot. And of course, there is a question of how much computers needed, but compute is large, so compute is large enough such that it's like not obvious that you need that much more compute to prove
Ilya Sutskever (Co-founder and Chief Scientist) 02:45.460
some idea. Like I'll give you an analogy. AlexNet was built on two GPUs. That was the total amount of compute used for it. The transformer was built on eight to 64 GPUs. No single transformer paper experiment used more than 64 GPUs of 2017, which would be like what? Two GPUs of
Ilya Sutskever (Co-founder and Chief Scientist) 03:07.380
today. So, the ResNet, right? Many like even the you could argue that the like 01 reasoning was not the most compute heavy thing in the world. So they're definitely for for research you need definitely some amount of compute but it's far from obvious that you need the absolutely
Ilya Sutskever (Co-founder and Chief Scientist) 03:33.460
largest amount of compute ever for research.
Ilya Sutskever (Co-founder and Chief Scientist) 03:36.380
You might argue and I think it is true that if you want to build the absolutely best system. If you want to build the absolutely best system, then it helps to have much more compute and especially if everyone is within the same paradigm, then compute becomes one of the big
Ilya Sutskever (Co-founder and Chief Scientist) 03:54.020
differentiators.
Dwarkesh Patel (Host) 03:56.260
Yeah, I guess while it was possible to develop these ideas, I'm I'm asking you for the history because you were actually there. I'm not sure what actually happened, but it sounds like it was possible to develop these ideas using minimal amount of compute, but it wasn't the
Dwarkesh Patel (Host) 04:09.940
transformer didn't immediately become famous. It became the thing everybody started doing and then started experimenting on top of and building on top of because it was validated
Ilya Sutskever (Co-founder and Chief Scientist) 04:18.580
at higher and higher levels of compute.
Ilya Sutskever (Co-founder and Chief Scientist) 04:20.300
Correct.
Dwarkesh Patel (Host) 04:21.500
And if you at SSI have 50 different ideas, how will you know which one is the next transformer and which one is, you know, brittle without having the kinds of compute that other frontier labs have?
Ilya Sutskever (Co-founder and Chief Scientist) 04:36.300
So, I can I can comment on that, which is the short comment is that, you know, you mentioned the SSI. Specifically for us, the amount of compute that SSI has for research is really not that small, and I want to explain why. Like a simple math can explain why the amount of
Ilya Sutskever (Co-founder and Chief Scientist) 04:58.820
compute that we have is actually a lot more comparable for research than one might think. Now I'll explain. So, SSI has raised $3 billion which is like not small by it's like a lot by any absolute sense but you could say but look at the other company is raising much more. But a
Ilya Sutskever (Co-founder and Chief Scientist) 05:23.340
lot of what their a lot of their compute goes for inference. Like these big numbers these big loans it's earmarked for inference. That's number one. Number two, you need if you want to have a product on which you do inference, you need to have a big staff of engineers of sales
Ilya Sutskever (Co-founder and Chief Scientist) 05:42.020
people, a lot of the research needs to be dedicated for producing all kinds of product-related features. So, then when you look at what's actually left for research, the difference becomes a lot smaller. Now, the other thing is is that if you're doing something different, do you
Ilya Sutskever (Co-founder and Chief Scientist) 06:01.440
really need the absolute maximal scale to prove it? I don't think it's true at all. I think that in our case, we have sufficient compute to prove to convince ourselves and anyone else that what we're doing is correct.