Ilya Sutskever – We're moving from the age of scaling to the age of research - part 5/17
2025-11-25_17-29 • 1h 36m 3s
Dwarkesh Patel (Host)
00:00.720
That's
a
very
interesting
way
to
put
it.
But
let
me
ask
you
the
question
you
just
posed
then.
What
are
we
scaling
and
what
what
is
what
would
it
mean
to
have
a
recipe?
Because
I
guess
I'm
not
aware
of
a
very
very
clean
relationship
that
almost
looks
like
a
law
of
physics
which
Dwarkesh Patel (Host)
00:18.160
existed
in
pre-training.
There
was
a
power
law
between
data
or
computer
parameters
and
loss.
What
is
the
kind
of
relationship
we
should
be
seeking
and
how
how
should
we
think
about
what
this
new
recipe
might
look
like?
Ilya Sutskever (Co-founder and Chief Scientist)
00:32.800
So,
we've
we've
already
witnessed
a
transition
from
one
type
of
scaling
to
a
different
type
of
scaling
from
pre-training
to
RL
Now
people
are
scaling
your
RL.
Now
based
on
what
people
say
on
Twitter,
they
spend
more
computer
on
RL
than
on
pre-training
at
this
point
because
RL
Ilya Sutskever (Co-founder and Chief Scientist)
00:55.120
can
actually
consume
quite
a
bit
of
compute.
You
know,
you
do
very
very
long
roll
outs.
Yes.
So
it
takes
a
lot
of
compute
to
produce
those
roll
outs.
And
then
you
get
relatively
small
amount
of
learning
per
roll
out
so
you
really
can't
spend
you
really
can't
spend
a
lot
of
Ilya Sutskever (Co-founder and Chief Scientist)
01:08.440
compute.
And
I
could
imagine
like
I
wouldn't
at
at
this
at
this
It's
It's
more
like
I
wouldn't
even
call
it
a
scale
scaling.
I
would
say,
"Hey,
like
what
are
you
doing?"
And
is
the
thing
you
are
doing
the
the
the
the
most
productive
thing
you
could
be
doing.
Yeah.
Can
you
find
a
Ilya Sutskever (Co-founder and Chief Scientist)
01:26.240
most
more
productive
way
of
using
your
compute?
We've
discussed
the
value
function
business
earlier.
And
maybe
once
people
get
good
at
value
functions,
they'll
be
using
their
their
um
resources
more
productively.
And
if
you
find
a
whole
other
way
of
training
models.
You
could
Ilya Sutskever (Co-founder and Chief Scientist)
01:46.280
say,
is
this
scaling
or
is
it
just
using
your
resources?
I
think
it
becomes
a
little
bit
ambiguous.
In
a
sense
that
when
people
were
in
the
age
of
research,
back
then
it
was
like
people
say,
"Hey,
let's
try
this
and
this
and
this.
Let's
try
that
and
that
and
that.
Oh,
look,
Ilya Sutskever (Co-founder and Chief Scientist)
01:59.320
something
interesting
is
happening."
And
I
think
there
will
be
a
return
to
that.
Dwarkesh Patel (Host)
02:04.280
So
if
we're
back
in
the
era
of
research,
stepping
back,
what
is
the
part
of
the
recipe
that
we
need
to
think
most
about?
When
you
say
value
function,
people
are
already
trying
the
current
recipe,
but
then
having
a
LLM
as
a
judge
and
so
forth.
You
can
say
that's
the
value
Dwarkesh Patel (Host)
02:18.400
function,
but
it
sounds
like
you
have
something
much
more
fundamental
in
mind.
Do
we
need
Do
we
need
to
go
back
to
should
we
even
rethink
pre-training
at
all
and
not
just
add
more
steps
to
the
end
of
that
process?
Yeah.
Ilya Sutskever (Co-founder and Chief Scientist)
02:30.400
So,
the
the
the
discussion
about
value
function,
I
think
it
was
interesting.
I
want
to
like
emphasize
that
I
think
the
value
function
is
something
like
it's
going
to
make
our
RL
more
efficient.
And
I
think
that
makes
a
difference.
But
I
think
that
anything
you
can
do
with
a
Ilya Sutskever (Co-founder and Chief Scientist)
02:49.080
value
function,
you
can
do
without
just
more
slowly.
The
thing
which
I
think
is
the
most
fundamental
is
that
these
models
somehow
just
generalize
dramatically
worse
than
people.
Yes.
And
it's
super
obvious.
That's
that
seems
like
a
very
fundamental
thing.
Okay.
Dwarkesh Patel (Host)
03:07.520
So
this
is
the
crux
to
generalization
and
there's
two
sub
questions.
There's
one
which
is
about
sample
efficiency,
which
is
why
should
it
take
so
much
more
data
for
these
models
to
learn
than
humans.
There's
a
second
about
even
separate
from
the
amount
of
data
it
takes,
there's
Dwarkesh Patel (Host)
03:24.320
a
question
of
why
is
it
so
hard
to
teach
the
thing
we
want
to
a
model
than
to
a
human,
which
is
to
say
for
to
a
human
that
we
don't
necessarily
need
a
verifiable
world
war
to
be
able
to
you're
probably
mentoring
a
bunch
of
researchers
right
now
and
you're
you
know
talking
with
Dwarkesh Patel (Host)
03:41.120
them,
you're
showing
them
your
code
and
you're
showing
them
how
you
think.
And
from
that,
they're
picking
up
your
way
of
thinking
and
how
you
they
should
do
research.
You
don't
have
to
set
like
a
verifiable
reward
for
them.
That's
like,
"Okay,
this
is
the
next
part
of
their
Dwarkesh Patel (Host)
03:52.400
curriculum."
And
now
this
is
the
next
part
of
their
curriculum
and
oh
it
was
this
training
was
unstable.
and
we
got
a
there's
not
this
schleppy
bespoke
process.
So,
perhaps
these
two
issues
are
actually
related
in
some
way.
But
I'd
be
curious
to
explore
this
this
second
thing
Dwarkesh Patel (Host)
04:07.220
which
was
more
like
continual
learning
and
this
first
thing
which
feels
just
like
um
sample
efficiency.
Ilya Sutskever (Co-founder and Chief Scientist)
04:13.380
Yeah,
so
you
know
you
could
actually
wonder
one
one
possible
explanation
for
the
human
sample
efficiency
that
needs
to
be
considered
is
evolution.
And
evolution
has
given
us
a
small
amount
of
the
the
most
useful
information
possible.
And
for
things
like
vision,
hearing,
and
Ilya Sutskever (Co-founder and Chief Scientist)
04:36.300
locomotion,
I
think
there's
a
pretty
strong
case
that
evolution
actually
has
given
us
a
lot.
Mhm.
So,
for
example,
human
dexterity
far
exceeds
I
mean
robots
can
become
dexterous
too
if
you
subject
them
to
like
a
huge
amount
of
training
and
simulation.
But
to
train
a
robot
in
the
Ilya Sutskever (Co-founder and Chief Scientist)
04:55.780
real
world
to
quickly
like
pick
up
a
new
skill
like
a
person
does.
Seems
very
out
of
reach.
And
here
you
could
say,
"Oh
yeah,
like
locomotion."
All
our
ancestors
needed
great
locomotion,
squirrels
like
So,
locomotion
maybe
like
we've
got
like
some
unbelievable
prior.
You
could
Ilya Sutskever (Co-founder and Chief Scientist)
05:13.940
make
the
same
case
for
vision,
you
know.
I
I
believe
Jan
LeCan
made
the
point,
"Oh,
like
um
children
learn
to
drive
after
16
hour
after
like
10
hours
of
practice."
Which
is
true.
But
our
vision
is
so
so
good.
At
least
for
me,
when
I
remember
myself
being
5
year
old,
my
I
was
I
Ilya Sutskever (Co-founder and Chief Scientist)
05:31.820
was
very
excited
about
cars
back
then.
And
I'm
pretty
sure
my
car
recognition
was
more
than
adequate
for
self
driving
already
as
a
5
year
old.
You
don't
get
to
see
that
much
data
as
a
5
year
old.
You
spend
most
of
your
time
in
your
parents'
house.
So
you
have
very
low
data
Ilya Sutskever (Co-founder and Chief Scientist)
05:46.300
diversity.
But
you
could
say
maybe
that's
evolution
too.
But
then
language
and
math
and
coding,
probably
not.