Ilya Sutskever – We're moving from the age of scaling to the age of research - part 3/17
2025-11-25_17-29 • 1h 36m 3s
Ilya Sutskever (Co-founder and Chief Scientist)
00:00.000
I
think
there
are
some
similarities
Yeah.
Dwarkesh Patel (Host)
00:02.660
between
Ilya Sutskever (Co-founder and Chief Scientist)
00:03.060
both
of
these
two
pre-training
and
pre-training
tries
to
play
the
role
of
both
of
these
but
I
think
there
are
some
big
differences
as
well
the
amount
of
pre-training
data
is
very
very
staggering
Dwarkesh Patel (Host)
00:17.244
Yes.
Ilya Sutskever (Co-founder and Chief Scientist)
00:18.150
And
somehow
a
a
human
being
after
Ilya Sutskever (Co-founder and Chief Scientist)
00:21.460
even
15
years
with
a
tiny
fraction
of
the
pre-training
data
they
know
much
less
but
whatever
they
do
know
they
know
much
more
deeply
somehow
and
the
mistakes
like
like
already
at
that
age
you
would
not
make
mistakes
that
our
AIs
make.
Yeah.
Dwarkesh Patel (Host)
00:37.060
There
Ilya Sutskever (Co-founder and Chief Scientist)
00:37.220
is
another
thing
you
might
say,
could
it
be
something
like
evolution?
And
the
answer
is
maybe.
But
in
this
case,
I
think
evolution
might
actually
have
an
edge.
Like
there
is
this
I
remember
reading
about
this
case
where
some
you
know
that's
one
thing
that
neuroscientists
do
or
Ilya Sutskever (Co-founder and Chief Scientist)
00:54.860
rather
one
way
in
which
neuroscientists
can
learn
about
the
brain
is
by
studying
people
with
brain
damage
to
different
parts
of
the
brain.
And
And
so
And
some
people
have
the
most
strange
symptoms
you
could
imagine.
It's
actually
really
really
interesting.
And
there
was
one
case
Ilya Sutskever (Co-founder and Chief Scientist)
01:10.300
that
comes
to
mind
that's
relevant.
I
read
about
this
person
who
had
some
kind
of
brain
damage
that
took
out
I
think
a
stroke
or
an
accident
that
took
out
his
emotional
processing.
So
he
stopped
feeling
any
emotion.
And
as
a
result
of
that,
you
know,
he
still
remained
very
Ilya Sutskever (Co-founder and Chief Scientist)
01:32.900
articulate
and
he
could
solve
little
puzzles
and
on
tests
he
seemed
to
be
just
fine.
But
he
felt
no
emotion.
He
didn't
feel
sad.
He
didn't
feel
anger.
He
didn't
feel
animated.
And
he
became
somehow
extremely
bad
at
making
any
decisions
at
all.
It
would
take
him
hours
to
decide
Ilya Sutskever (Co-founder and Chief Scientist)
01:50.340
on
which
socks
to
wear
and
he
would
make
very
bad
financial
decisions.
And
that's
very
what
does
it
what
what
does
it
say
about
the
role
of
our
built-in
emotions
in
making
us
like
a
viable
agent
essentially.
And
I
guess
to
connect
to
your
question
Yes.
Ilya Sutskever (Co-founder and Chief Scientist)
02:12.700
about
pre-training
it's
like
maybe
pre
like
maybe
if
you're
good
enough
at
like
getting
everything
out
of
pre-training
you
can
get
you
you
could
get
that
as
well.
But
that's
the
kind
of
thing
which
seems
Well,
it
may
or
may
not
be
possible
to
get
that
from
pre-training
.
Dwarkesh Patel (Host)
02:33.340
What
is
that?
Clearly
not
just
directly
emotion.
It
seems
like
some
almost
value
function
like
thing
which
is
giving
telling
you
which
decision
to
be
made
like
what
what
the
end
reward
for
any
decision
should
be.
And
you
think
that
doesn't
sort
of
implicitly
come
from
Ilya Sutskever (Co-founder and Chief Scientist)
02:52.540
I
think
it
could.
I'm
just
saying
it's
not
one
it's
not
100%
obvious.
Dwarkesh Patel (Host)
02:56.580
Yeah.
But
what
what
what
is
that?
Like
what
how
do
you
think
about
emotions
in
what
is
the
ML
analogy
for
emotions.
Ilya Sutskever (Co-founder and Chief Scientist)
03:03.500
It
should
be
some
kind
of
a
value
function
thing.
Yeah.
Ilya Sutskever (Co-founder and Chief Scientist)
03:06.500
But
I
don't
think
there
is
a
ML
analogy
because
right
now
value
functions
don't
play
a
very
prominent
role
in
the
things
people
do.
Dwarkesh Patel (Host)
03:13.700
It
might
be
worth
defining
for
the
audience
what
a
value
function
is
if
you
if
you
want
to
do
that.
Ilya Sutskever (Co-founder and Chief Scientist)
03:17.260
I
mean,
certainly.
I
I'll
be
very
happy
to
do
that.
Right,
so
so
when
people
do
reinforcement
learning,
the
way
reinforcement
learning
is
done
right
now,
how
does
it
do?
How
do
people
train
those
agents.
So,
you
have
a
neural
net
and
you
give
it
a
problem.
And
then
you
tell
the
Ilya Sutskever (Co-founder and Chief Scientist)
03:37.580
model
go
solve
it.
The
model
takes
maybe
thousands,
hundreds
of
thousands
of
actions
or
thoughts
or
something
and
then
it
produces
a
solution.
The
solution
is
graded.
And
then
the
score
is
used
to
provide
a
training
signal
for
every
single
action
in
your
trajectory.
Mhm.
Dwarkesh Patel (Host)
03:57.420
So,
Ilya Sutskever (Co-founder and Chief Scientist)
03:57.500
that
means
that
if
you
are
doing
something
that
goes
for
a
long
time,
if
you're
training
a
task
that
takes
a
long
time
to
solve.
You
will
do
no
learning
at
all
until
you
solve
until
you
came
up
with
the
proposed
solution.
That's
how
reinforcement
learning
is
done
naively.
That's
Ilya Sutskever (Co-founder and Chief Scientist)
04:12.860
how
O1,
R1
ostensibly
are
done.
The
value
function
says
something
like,
"Okay,
look,
maybe
I
could
sometimes,
not
always,
could
tell
you
if
you're
doing
well
or
badly."
The
notion
of
a
value
function
is
more
useful
in
some
domains
than
others.
So,
for
example,
when
you
play
Ilya Sutskever (Co-founder and Chief Scientist)
04:31.940
chess
And
you
lose
a
piece,
you
know,
I
messed
up.
You
don't
need
to
play
the
whole
game
to
know
that
what
I
just
did
was
bad
and
therefore
whatever
um
whatever
preceded
it
was
also
bad.
So
the
value
function
lets
you
short
circuit
the
weight
until
the
very
end.
Like
let's
Ilya Sutskever (Co-founder and Chief Scientist)
04:52.420
suppose
that
you
started
to
pursue
some
kind
of
um
okay,
let's
suppose
that
you
are
doing
some
kind
of
a
math
thing
or
a
programming
thing
and
you're
trying
to
explore
a
particular
solution
direction.
And
after
let's
say
after
1,000
steps
of
thinking,
you
concluded
that
this
Ilya Sutskever (Co-founder and Chief Scientist)
05:08.700
direction
is
unpromising.
As
soon
as
you
conclude
this,
you
could
already
get
a
reward
signal
a
thousand
times
steps
previously
when
you
decided
to
pursue
down
this
path,
you
say,
"Oh,
next
time,
I
shouldn't
pursue
this
path
in
a
similar
situation."
Long
before
you
actually
came
Ilya Sutskever (Co-founder and Chief Scientist)
05:27.480
up
with
the
proposed
solution.