Ilya Sutskever – We're moving from the age of scaling to the age of research - part 4/17
2025-11-25_17-29 • 1h 36m 3s
Dwarkesh Patel (Host)
00:00.180
This
was
in
the
Deep
Sea
Car
One
paper
is
that
the
space
of
trajectories
is
so
wide
that
maybe
it's
hard
to
learn
a
mapping
from
an
intermediate
trajectory
and
value
and
also
given
that
you
know,
in
coding
for
example,
you'll
have
the
wrong
idea,
then
you'll
go
back,
then
you'll
Dwarkesh Patel (Host)
00:19.820
change
something.
This
sounds
Ilya Sutskever (Co-founder and Chief Scientist)
00:21.180
like
such
lack
of
faith
in
deep
learning.
Yeah.
Like,
I
mean,
sure
it
might
be
difficult,
but
nothing
deep
learning
can
do.
Yeah.
So
my
expectation
is
that
like
value
function
should
be
useful
and
then
I
fully
I
fully
expect
that
they
will
be
used
in
the
future
if
not
already.
Ilya Sutskever (Co-founder and Chief Scientist)
00:42.620
What
was
I
alluding
to
with
the
person
who's
Yeah.
emotional
center
got
um
damaged
is
more
that
maybe
what
it
suggests
is
that
the
value
function
of
humans
is
modulated
by
emotions
in
some
important
way
that's
hardcoded
by
evolution.
And
maybe
that
is
important
for
people
to
be
Ilya Sutskever (Co-founder and Chief Scientist)
01:06.860
effective
in
the
world.
Dwarkesh Patel (Host)
01:08.500
That
That's
the
thing
I
was
actually
going
to
planning
on
asking
you.
There's
something
really
interesting
about
emotions
of
the
value
function
which
is
that
it's
impressive
that
they
have
this
much
utility
while
still
being
rather
simple
to
understand.
Ilya Sutskever (Co-founder and Chief Scientist)
01:23.980
So
I
have
two
responses.
I
do
agree
that
compared
to
the
kind
of
things
that
we
learn
and
the
things
we
are
talking
about,
the
kind
of
ways
we
are
talking
about,
emotions
are
relatively
simple.
They
might
even
be
so
simple
that
maybe
you
could
map
them
out
in
a
human
Ilya Sutskever (Co-founder and Chief Scientist)
01:42.700
understandable
way.
I
think
it
would
be
cool
to
do.
In
terms
of
utility
though,
I
think
there
is
a
thing
where
you
know
there
is
this
complexity
robustness
straight
off
where
complex
things
can
be
very
useful,
but
simple
things
are
very
useful
in
very
broad
range
of
situations.
Ilya Sutskever (Co-founder and Chief Scientist)
02:07.060
And
so
I
think
what
what
one
way
to
interpret
what
we
are
seeing
is
that
we've
got
these
emotions
that
essentially
evolved
mostly
mostly
from
our
male
ancestors
and
then
fine-tuned
a
little
bit
while
we
were
hominids,
just
a
bit.
We
do
have
like
a
decent
amount
of
social
Ilya Sutskever (Co-founder and Chief Scientist)
02:22.860
emotions
though
which
mammals
may
lack.
But
they're
not
very
sophisticated
and
because
they're
not
sophisticated
the
server
is
so
well
in
this
very
different
world
compared
to
the
one
that
we've
been
living
in.
Actually,
they
are
they
also
make
mistakes.
For
example,
our
Ilya Sutskever (Co-founder and Chief Scientist)
02:39.140
emotions,
well,
I
don't
know,
this
hunger
count
is
an
emotion.
Debatable,
but
I
think
for
example
Our
intuitive
feeling
of
hunger
is
not
succeeding
in
guiding
us
correctly
in
this
world
with
an
abundance
of
food.
Dwarkesh Patel (Host)
02:57.180
Yeah.
People
have
been
talking
about
scaling
data,
scaling
parameter,
a
scaling
compute.
Is
there
a
more
general
way
to
think
about
scaling?
What
are
the
other
scaling
axes?
Ilya Sutskever (Co-founder and Chief Scientist)
03:08.660
So,
the
thing
so
so
here
here
is
a
perspective.
Here's
a
perspective
I
think
might
be
might
be
true.
So,
the
way
a
male
used
to
work
is
that
people
would
just
think
of
with
stuff
and
try
to
and
try
to
get
interesting
results.
That's
what's
been
going
on
in
the
past.
Then
the
Ilya Sutskever (Co-founder and Chief Scientist)
03:36.340
scaling
insight
arrived,
right?
Scaling
loss,
GPT-3,
and
suddenly
everyone
realized
we
should
scale.
And
it's
just
this
this
is
an
example
of
how
language
affects
thought.
Scaling
is
what
just
one
word,
but
it's
such
a
powerful
word
because
it
informs
people
what
to
do.
They
Ilya Sutskever (Co-founder and Chief Scientist)
03:59.220
say,
"Okay,
let's
let's
try
to
scale
things."
And
so
you
say,
"Okay,
so
what
are
we
scaling?"
And
pre-training
was
a
thing
to
scale.
It
was
a
particular
scaling
recipe.
Yes.
The
big
breakthrough
of
pre-training
is
the
realization
that
this
recipe
is
good.
So
you
say,
"Hey,
if
Ilya Sutskever (Co-founder and Chief Scientist)
04:17.180
you
mix
some
compute
with
some
data
into
a
neural
net
of
a
certain
size,
you
will
get
results
and
you
will
know
that
it
will
be
better
if
you
just
scale
the
recipe
up.
And
this
is
also
great
companies
love
this
because
it
gives
you
a
very
uh
low
risk
way
of
investing
Yeah.
your
Ilya Sutskever (Co-founder and Chief Scientist)
04:38.900
resources.
Yeah.
Right?
It's
much
harder
to
invest
your
resources
in
research.
Compare
that,
you
know,
if
do
research.
You
need
to
have
like
go
forth
researchers
and
research
and
come
up
with
something
versus
get
more
data,
get
more
compute,
you
know,
you'll
get
something
from
Ilya Sutskever (Co-founder and Chief Scientist)
04:54.780
pre-training.
And
indeed,
you
know,
it
looks
like
like
I
based
on
various
um
things
people
say
on
some
people
say
on
Twitter
maybe
it
appears
that
Gemini
have
found
a
way
to
get
more
out
of
pre-training.
At
some
point
though
pre-training
will
run
out
of
data.
The
data
is
very
Ilya Sutskever (Co-founder and Chief Scientist)
05:12.360
clearly
finite.
And
so
then
okay,
what
do
you
do
next?
Either
you
do
some
kind
of
a
souped
up
pre-training,
different
recipe
from
the
one
you've
done
before
or
you're
doing
IRL
or
maybe
something
else.
But
now
that
compute
is
big,
compute
is
now
very
big.
In
some
sense
we
are
Ilya Sutskever (Co-founder and Chief Scientist)
05:28.400
back
to
the
age
of
research.
So
maybe
here's
another
way
to
put
it.
Up
until
2020
from
2015
from
2012
to
2020,
it
was
the
age
of
research.
Now
from
2020
to
2025,
it
was
the
age
of
scaling
or
maybe
plus
minus,
let's
add
the
arrow
bars
to
those
years.
Because
people
say
this
is
Ilya Sutskever (Co-founder and Chief Scientist)
05:46.640
amazing,
you
got
to
scale
more,
keep
scaling,
the
one
word
scaling.
But
now
the
scale
is
so
big,
like
is
is
it
is
the
belief
really
that
oh
it's
so
big,
but
if
had
a
100x
more,
everything
would
be
so
different.
Like
it
would
be
different
for
sure.
But
like
is
the
belief
that
if
Ilya Sutskever (Co-founder and Chief Scientist)
06:05.480
you
just
100x
the
scale,
everything
would
be
transformed.
I
don't
think
that's
true.
So
it's
back
to
the
age
of
research
again,
just
with
big
computers.