Ilya Sutskever – We're moving from the age of scaling to the age of research - part 2/17
2025-11-25_17-29 • 1h 36m 3s
Dwarkesh Patel (Host)
00:00.520
I
I
like
this
idea
that
the
real
reward
hacking
is
the
human
researchers
who
are
too
focused
on
the
eval's.
I
think
there's
two
ways
to
understand
or
to
try
to
think
about
what
what
you
just
pointed
out.
One
is,
look,
if
it's
the
case
that
simply
by
becoming
superhuman
at
a
Dwarkesh Patel (Host)
00:20.260
coding
competition,
a
model
will
not
automatically
become
more
tasteful
and
exercise
a
better
judgment
about
how
to
improve
your
code
base.
Well,
then
you
should
expand
the
suite
of
environments
such
that
you're
not
just
testing
it
on
having
the
best
performance
in
coding
Dwarkesh Patel (Host)
00:36.020
competition,
it
should
also
be
able
to
make
the
best
kind
of
application
for
X
thing
or
Y
thing
or
Z
thing.
In
another
Maybe
this
is
what
you're
hinting
at.
Is
to
say
why
should
it
be
the
case
in
the
first
place
that
becoming
superhuman
at
coding
competitions
doesn't
make
you
a
Dwarkesh Patel (Host)
00:52.380
more
tasteful
programmer
more
generally.
Maybe
the
thing
to
do
is
not
to
keep
stacking
up
the
amount
of
environments
and
the
diversity
of
environments
to
figure
out
approach
with
let
you
learn
from
one
environment
and
improve
your
performance
on
something
else.
Ilya Sutskever (Co-founder and Chief Scientist)
01:07.820
So,
I
have
I
have
an
analogy
a
human
analogy
which
might
be
helpful.
So
So,
even
the
case,
let's
take
the
case
of
competitive
programming
since
you
mentioned
that.
And
suppose
you
have
two
students.
One
of
them
work
decided
they
want
to
be
the
best
competitive
programmer,
so
Ilya Sutskever (Co-founder and Chief Scientist)
01:24.180
they
will
practice
10,000
hours
for
that
domain.
They
will
solve
all
the
problems,
memorize
all
the
proof
techniques,
and
be
very
very,
you
know,
be
very
skilled
at
quickly
and
correctly
implementing
all
the
algorithms,
and
by
doing
by
doing
so
they
became
the
best.
to
one
of
Ilya Sutskever (Co-founder and Chief Scientist)
01:43.020
the
best.
Student
number
two
thought,
"Oh,
competitive
program
in
your
school,
maybe
they
practiced
for
100
hours.
Much
much
less
and
they
also
did
really
well.
Which
one
do
you
think
is
going
to
do
better
in
their
career
Dwarkesh Patel (Host)
01:55.260
later
on?"
Ilya Sutskever (Co-founder and Chief Scientist)
01:56.100
The
second.
Right?
And
I
think
that's
basically
what's
going
on.
The
models
are
much
more
like
the
first
student
but
even
more
because
then
we
say,
"Okay,
so
the
model
should
be
good
at
competitive
programming.
So
let's
get
every
single
competitive
programming
problem
ever.
And
Ilya Sutskever (Co-founder and Chief Scientist)
02:10.300
then
let's
do
some
data
augmentation
so
we
have
even
more
competitive
programming
problems.
Yes.
And
we
train
on
that.
And
so
now
you
got
this
great
competitive
programmer.
And
with
this
analogy,
I
think
it's
more
intuitive.
I
think
it's
more
intuitive
with
this
analogy
that
Ilya Sutskever (Co-founder and Chief Scientist)
02:23.060
yeah,
okay,
so
if
it's
so
well
trained,
okay,
it's
like
all
the
different
algorithms
and
all
the
different
proof
techniques
are
like
right
it's
at
its
fingertips.
And
it's
more
intuitive
that
with
this
level
of
preparation
it
not
would
not
necessarily
generalize
to
other
things.
Dwarkesh Patel (Host)
02:39.780
But
then
what
is
the
analogy
for
what
the
second
consultant
is
doing
before
they
do
the
100
hours
of
fine
tuning.
Ilya Sutskever (Co-founder and Chief Scientist)
02:48.020
I
think
it's
like
they
have
it.
I
think
it's
the
it
factor.
Yeah.
Right?
And
like
I
know
like
when
I
was
an
undergrad
I
remember
there
was
there
was
a
student
like
this
that
studied
with
me.
So
I
know
I
know
it
exists.
Dwarkesh Patel (Host)
03:01.100
Yeah.
I
think
it's
interesting
to
distinguish
it
from
whatever
pre-training
does.
So
when
we
to
understand
what
you
just
said
about
we
don't
have
to
choose
the
data
in
pre-training
is
to
say
actually
It's
not
dissimilar
to
the
10,000
hours
of
practice.
It's
just
that
you
get
Dwarkesh Patel (Host)
03:15.660
that
10,000
hours
of
practice
for
free
because
it's
already
somewhere
in
the
pre-training
distribution.
But
it's
like
maybe
you're
suggesting
actually
there's
actually
not
that
much
generalization
of
pre-training.
There's
just
so
much
data
in
pre-training.
And
but
it's
like
it's
Dwarkesh Patel (Host)
03:29.300
not
necessarily
generalizing
better
than
RL.
Ilya Sutskever (Co-founder and Chief Scientist)
03:31.220
Like
the
main
the
main
strength
of
pre-training
is
that
there
is
A
so
much
of
it.
Yeah.
And
B
you
don't
have
to
think
hard
about
what
data
to
put
into
pre-training.
Yeah.
and
it's
a
very
kind
of
natural
data
and
it
does
include
in
it
a
lot
of
what
people
do.
Yeah.
People's
Ilya Sutskever (Co-founder and Chief Scientist)
03:50.780
thoughts
and
a
lot
of
the
features
of,
you
know,
it's
like
the
whole
world
as
projected
by
people
onto
text.
Yeah.
And
pre-training
tries
to
capture
that
using
a
huge
amount
of
data.
It's
it's
very
the
pre-training
is
very
difficult
to
reason
about
because
it's
so
hard
to
Ilya Sutskever (Co-founder and Chief Scientist)
04:11.060
understand
the
manner
in
which
the
model
relies
on
pre-training
data.
And
whenever
the
model
makes
a
mistake,
could
it
be
because
something
by
chance
is
not
as
supported
by
the
pre-training
data?
You
know,
and
pre
support
by
pre-training
is
maybe
a
loose
term.
I
I
don't
know
if
Ilya Sutskever (Co-founder and Chief Scientist)
04:31.340
I
can
add
anything
more
useful
on
this,
but
I
don't
think
there
is
a
human
analog
to
pre-training.
Dwarkesh Patel (Host)
04:39.380
Here's
analogies
that
people
have
proposed
for
what
the
human
analogy
due
to
pre-training
is
and
I'm
curious
to
get
your
thoughts
on
why
they're
potentially
wrong.
One
is
to
think
about
the
first
18
or
15
or
13
years
of
a
person's
life
when
they
aren't
necessarily
economically
Dwarkesh Patel (Host)
04:55.540
productive,
but
they
are
doing
something
that
is
making
them
understand
the
world
better
and
so
forth.
And
the
other
is
to
think
about
evolution
as
doing
some
kind
of
search
for
3
billion
years
which
then
results
in
a
human
lifetime
instance.
And
then
I'm
I'm
I'm
curious
if
you
Dwarkesh Patel (Host)
05:15.460
think
either
of
these
are
actually
analogous
to
pre-training
or
how
would
how
would
you
think
about
at
least
what
lifetime
human
learning
is
like
if
not
pre-training?