Ilya Sutskever – We're moving from the age of scaling to the age of research - part 7/17
2025-11-25_17-29 • 1h 36m 3s
Dwarkesh Patel (Host)
00:00.000
curious
if
you
say
we
are
back
in
the
era
of
research.
You
were
there
from
2012
to
2020.
And
do
do
you
have
Yeah,
what
what
is
now
the
vibe
going
to
be
if
we
go
back
to
the
era
of
research?
For
example,
even
after
Alex
Net,
the
amount
of
compute
that
was
used
to
run
experiments
Dwarkesh Patel (Host)
00:20.960
kept
increasing
and
the
size
of
frontier
systems
kept
increasing.
And
do
you
think
now
that
This
era
of
research
will
still
require
a
tremendous
amount
of
compute.
Um,
do
you
think
it
will
require
going
back
into
archives
and
reading
old
papers?
What
is
Maybe
what
was
the
vibe
Dwarkesh Patel (Host)
00:41.200
of
like
you
were
at
Google
and
OpenAI
and
Stanford
these
places
when
there
was
like
a
more
of
a
vibe
of
research.
What
what
kind
of
thing
should
we
be
expecting
in
the
community?
Ilya Sutskever (Co-founder and Chief Scientist)
00:52.700
So,
one
consequence
of
um
the
age
of
scaling
is
that
there
was
this
um
scaling
sucked
Dwarkesh Patel (Host)
01:02.340
out
all
the
air
in
Ilya Sutskever (Co-founder and Chief Scientist)
01:03.020
the
room.
Yeah.
And
so
because
scaling
sucked
out
all
the
air
in
the
room
everyone's
start
to
do
the
same
thing.
We
got
to
the
point
where
uh
we
are
in
a
world
where
there
are
more
companies
than
ideas
by
quite
a
bit.
Actually,
on
that,
you
know,
there
is
the
Silicon
Valley
Ilya Sutskever (Co-founder and Chief Scientist)
01:25.660
saying
that
says
that
ideas
are
cheap,
execution
is
everything.
Dwarkesh Patel (Host)
01:32.260
And
people
say
that
a
Ilya Sutskever (Co-founder and Chief Scientist)
01:33.020
lot.
Yeah.
And
there
is
truth
to
that.
But
then
I
saw
I
saw
someone
say
on
Twitter
um
something
like
if
ideas
are
are
so
cheap.
How
come
no
one's
having
any
ideas?
And
I
think
it's
true
too.
I
think
like
if
you
think
about
a
research
progress
in
terms
of
bottlenecks,
there
are
Ilya Sutskever (Co-founder and Chief Scientist)
01:54.660
several
bottlenecks.
If
you
go
back
to
the
if
if
you
and
um
one
of
them
is
ideas
and
one
of
them
is
Dwarkesh Patel (Host)
02:01.020
your
ability
to
bring
them
to
Ilya Sutskever (Co-founder and Chief Scientist)
02:02.020
life
which
might
be
compute,
but
also
engineering.
So
if
you
go
back
to
the
90s,
let's
say,
you
had
people
who
had
had
pretty
good
ideas.
And
if
they
had
had
much
larger
computers,
maybe
they
could
demonstrate
that
their
ideas
were
viable,
but
they
could
not.
So,
they
could
only
Ilya Sutskever (Co-founder and Chief Scientist)
02:17.540
have
very
very
small
demonstration
Dwarkesh Patel (Host)
02:19.140
and
did
not
convince
anyone.
Ilya Sutskever (Co-founder and Chief Scientist)
02:20.420
Yeah.
So,
the
bottleneck
was
compute.
Then
in
the
age
of
scaling,
computers
increased
a
lot.
And
of
course,
there
is
a
question
of
how
much
computers
needed,
but
compute
is
large,
so
compute
is
large
enough
such
that
it's
like
not
obvious
that
you
need
that
much
more
compute
to
Ilya Sutskever (Co-founder and Chief Scientist)
02:43.580
prove
some
idea.
Like
I'll
give
you
an
analogy.
AlexNet
was
built
on
two
GPUs.
That
was
the
total
amount
of
compute
used
for
it.
The
transformer
was
built
on
eight
to
64
GPUs.
No
single
transformer
paper
experiment
used
more
than
64
GPUs
of
2017,
which
would
be
like
what?
Two
Ilya Sutskever (Co-founder and Chief Scientist)
03:06.700
GPUs
of
today.
So,
the
ResNet,
right?
Many
like
even
the
you
could
argue
that
the
like
01
reasoning
was
not
the
most
compute
heavy
thing
in
the
world.
So
they're
definitely
for
for
research
you
need
definitely
some
amount
of
compute
but
it's
far
from
obvious
that
you
need
the
Ilya Sutskever (Co-founder and Chief Scientist)
03:32.540
absolutely
largest
amount
of
compute
ever
for
research.
Ilya Sutskever (Co-founder and Chief Scientist)
03:36.180
You
might
argue
and
I
think
it
is
true
that
if
you
want
to
build
the
absolutely
best
system.
If
you
want
to
build
the
absolutely
best
system,
then
it
helps
to
have
much
more
compute
and
especially
if
everyone
is
within
the
same
paradigm,
then
compute
becomes
one
of
the
big
Ilya Sutskever (Co-founder and Chief Scientist)
03:53.820
differentiators.
Dwarkesh Patel (Host)
03:56.060
Yeah,
I
guess
while
it
was
possible
to
develop
these
ideas,
I'm
I'm
asking
you
for
the
history
because
you
were
actually
there.
I'm
not
sure
what
actually
happened,
but
it
sounds
like
it
was
possible
to
develop
these
ideas
using
minimal
amount
of
compute,
but
it
wasn't
the
Dwarkesh Patel (Host)
04:09.740
transformer
didn't
immediately
become
famous.
It
became
the
thing
everybody
started
doing
and
then
started
experimenting
on
top
of
and
building
on
top
of
because
it
was
validated
Ilya Sutskever (Co-founder and Chief Scientist)
04:18.380
at
higher
and
higher
levels
of
compute.
Dwarkesh Patel (Host)
04:20.100
Correct.
And
if
you
at
SSI
have
50
different
ideas,
how
will
you
know
which
one
is
the
next
transformer
and
which
one
is,
you
know,
brittle
without
having
the
kinds
of
compute
that
other
frontier
Ilya Sutskever (Co-founder and Chief Scientist)
04:35.100
labs
have?
So,
I
can
I
can
comment
on
that,
which
is
the
short
comment
is
that,
you
know,
you
mentioned
the
SSI.
Specifically
for
us,
the
amount
of
compute
that
SSI
has
for
research
is
really
not
that
small,
and
I
want
to
explain
why.
Like
a
simple
math
can
explain
why
the
Ilya Sutskever (Co-founder and Chief Scientist)
04:58.220
amount
of
compute
that
we
have
is
actually
a
lot
more
comparable
for
research
than
one
might
think.
Now
explain.
So,
SSI
has
raised
$3
billion
which
is
like
not
small
by
it's
like
a
lot
by
any
absolute
sense
but
you
could
say
but
look
at
the
other
company
is
raising
much
more.
Ilya Sutskever (Co-founder and Chief Scientist)
05:22.780
But
a
lot
of
what
their
a
lot
of
their
compute
goes
for
inference.
Like
these
big
numbers
these
big
loans
it's
earmarked
for
inference.
That's
number
one.
Number
two,
you
need
if
you
want
to
have
a
product
on
which
you
do
inference,
you
need
to
have
a
big
staff
of
engineers
of
Ilya Sutskever (Co-founder and Chief Scientist)
05:41.540
sales
people,
a
lot
of
the
research
needs
to
be
dedicated
for
producing
all
kinds
of
product-related
features.
So,
then
when
you
look
at
what's
actually
left
for
research,
the
difference
becomes
a
lot
smaller.
Now,
the
other
thing
is
is
that
if
you're
doing
something
different,
Ilya Sutskever (Co-founder and Chief Scientist)
06:01.000
do
you
really
need
the
absolute
maximal
scale
to
prove
it?
I
don't
think
it's
true
at
all.
I
think
that
in
our
case,
we
have
sufficient
compute
to
prove
to
convince
ourselves
and
anyone
else
that
what
we're
doing
is
correct.