Ilya Sutskever – We're moving from the age of scaling to the age of research - part 3/17
2025-11-25_17-29 • 1h 36m 3s
Ilya Sutskever (Co-founder and Chief Scientist)
00:00.740
I
think
there
are
some
similarities
Yeah.
Yeah.
The
amount
of
pre-training
data
is
very
very
staggering.
Yes.
And
somehow
a
human
being
after
even
15
years
with
the
tiny
fraction
of
the
pre-training
data,
they
know
much
less.
Yeah.
But
whatever
they
do
know,
they
know
much
more
Ilya Sutskever (Co-founder and Chief Scientist)
00:31.920
deeply
somehow.
And
the
mistakes,
like
Like
already
at
that
age,
you
would
not
make
mistakes
that
our
eyes
make.
Yeah.
There
is
another
thing
you
might
say,
could
it
be
something
like
evolution?
And
the
answer
is
maybe.
But
in
this
case,
I
think
evolution
might
actually
have
an
Ilya Sutskever (Co-founder and Chief Scientist)
00:44.640
edge.
Like
there
is
this
I
remember
reading
about
this
case
where
some
you
know
that's
one
thing
that
neuroscientists
do
or
rather
one
way
in
which
neuroscientists
can
learn
about
the
brain
is
by
studying
people
with
brain
damage
to
different
parts
of
the
brain.
And
And
so
And
Ilya Sutskever (Co-founder and Chief Scientist)
01:04.840
some
people
have
the
most
strange
symptoms
you
could
imagine.
It's
actually
really
really
interesting.
And
there
was
one
case
that
comes
to
mind
that's
relevant.
I
read
about
this
person
who
had
some
kind
of
brain
damage
that
took
out
I
think
a
stroke
or
an
accident
that
took
Ilya Sutskever (Co-founder and Chief Scientist)
01:22.520
out
his
emotional
processing.
So
he
stopped
feeling
any
emotion.
And
as
a
result
of
that,
you
know,
he
still
still
remained
very
articulate
and
he
could
solve
little
puzzles
and
on
tests
he
seemed
to
be
just
fine.
But
he
felt
no
emotion.
He
didn't
feel
sad.
He
didn't
feel
anger.
Ilya Sutskever (Co-founder and Chief Scientist)
01:42.640
He
didn't
feel
animated.
And
he
became
somehow
extremely
bad
at
making
any
decisions
at
all.
It
would
take
him
hours
to
decide
on
which
socks
to
wear
and
he
would
make
very
bad
financial
decisions.
And
that's
very
What
does
it
What
What
does
it
say
about
the
role
of
our
built-in
Ilya Sutskever (Co-founder and Chief Scientist)
02:06.120
emotions
in
making
us
like
a
viable
agent
essentially.
And
I
guess
to
connect
to
your
question
Yes.
about
pre-training
it's
like
maybe
pre
like
maybe
if
you're
good
enough
at
like
getting
everything
out
of
pre-training
you
can
get
you
you
could
get
that
as
well.
But
that's
the
Ilya Sutskever (Co-founder and Chief Scientist)
02:23.280
kind
of
thing
which
seems
Well,
it
may
or
may
not
be
possible
to
get
that
from
pre-training
Dwarkesh Patel (Host)
02:33.000
pain.
What
is
that?
Clearly
not
just
directly
emotion.
It
seems
like
some
almost
value
function
like
thing
which
is
giving
telling
you
which
decision
to
be
made
like
what
what
the
end
reward
for
any
decision
should
be.
And
you
think
that
doesn't
sort
of
implicitly
come
from
Ilya Sutskever (Co-founder and Chief Scientist)
02:53.280
I
think
it
could.
I'm
just
saying
it's
not
one
it's
not
100%
obvious.
Yeah.
Dwarkesh Patel (Host)
02:58.360
But
what
what
what
is
that?
Like
what
how
do
you
think
about
emotions
in
what
is
the
ML
analogy
for
emotions.
Ilya Sutskever (Co-founder and Chief Scientist)
03:04.240
It
should
be
some
kind
of
a
value
function
thing.
Yeah.
But
I
don't
think
there
is
a
great
analogy
because
right
now
value
functions
don't
play
a
very
prominent
role
in
the
things
people
do.
Dwarkesh Patel (Host)
03:14.440
It
might
be
worth
defining
for
the
audience
what
a
value
function
is
if
you
if
you
want
to
do
that.
Ilya Sutskever (Co-founder and Chief Scientist)
03:18.000
I
mean,
certainly.
I
I'll
be
very
happy
to
do
that.
Right?
So
So
when
people
do
reinforcement
learning,
the
way
reinforcement
learning
is
done
right
now,
how
does
it
do?
How
do
people
train
those
agents.
So,
you
have
a
neural
net
and
you
give
it
a
problem.
And
then
you
tell
the
Ilya Sutskever (Co-founder and Chief Scientist)
03:38.320
model
go
solve
it.
The
model
takes
maybe
thousands,
hundreds
of
thousands
of
actions
or
thoughts
or
something
and
then
it
produces
a
solution.
The
solution
is
graded.
And
then
the
score
is
used
to
provide
a
training
signal
for
every
single
action
in
your
trajectory.
Mhm.
So,
Ilya Sutskever (Co-founder and Chief Scientist)
03:58.240
that
means
that
if
you
are
doing
something
that
goes
for
a
long
time,
if
you're
training
a
task
that
takes
a
long
time
to
solve.
You
will
do
no
learning
at
all
until
you
solve
until
you
came
up
with
the
proposed
solution.
That's
how
reinforcement
learning
is
done
naively.
That's
Ilya Sutskever (Co-founder and Chief Scientist)
04:13.600
how
O1,
R1
ostensibly
are
done.
The
value
function
says
something
like,
"Okay,
look,
maybe
I
could
sometimes,
not
always,
could
tell
you
if
you're
doing
well
or
badly."
The
notion
of
a
value
function
is
more
useful
in
some
domains
than
others.
So,
for
example,
when
you
play
Ilya Sutskever (Co-founder and Chief Scientist)
04:32.680
chess
And
you
lose
a
piece,
you
know,
I
messed
up.
You
don't
need
to
play
the
whole
game
to
know
that
what
I
just
did
was
bad
and
therefore
whatever
um
whatever
preceded
it
was
also
bad.
So
the
value
function
lets
you
short
circuit
the
weight
until
the
very
end.
Like
let's
Ilya Sutskever (Co-founder and Chief Scientist)
04:53.160
suppose
that
you
started
to
pursue
some
kind
of
um
okay,
let's
suppose
that
you
are
doing
some
kind
of
a
math
thing
or
a
programming
thing
and
you're
trying
to
explore
a
particular
solution
direction.
And
after
let's
say
after
1,000
steps
of
thinking,
you
concluded
that
this
Ilya Sutskever (Co-founder and Chief Scientist)
05:09.440
direction
is
unpromising.
As
soon
as
you
conclude
this,
you
could
already
get
a
reward
signal
a
thousand
times
steps
previously
when
you
decided
to
pursue
down
this
path,
you
say,
"Oh,
next
time,
I
shouldn't
pursue
this
path
in
a
similar
situation."
Long
before
you
actually
came
Ilya Sutskever (Co-founder and Chief Scientist)
05:28.220
up
with
the
proposed
solution.