Ilya Sutskever – We're moving from the age of scaling to the age of research - part 16/17
2025-11-25_17-29 • 1h 36m 3s
Dwarkesh Patel (Host)
00:00.000
But
is
this
is
this
contradicted
by
what
human
like
learning
implies?
Is
that
like
it
can
learn?
Ilya Sutskever (Co-founder and Chief Scientist)
00:03.880
It
can,
but,
but,
you
have
accumulated
learning,
you
have
a
big
investment.
You
spent
a
lot
of
compute
to
become
really,
really,
really
good,
really
phenomenal
at
this
thing.
And
someone
else
spent
a
huge
amount
of
computer
and
a
huge
amount
of
experience
to
get
really,
really
Ilya Sutskever (Co-founder and Chief Scientist)
00:19.040
good
at
some
other
thing.
Right.
You
apply
a
lot
of
human
learning
to
get
there,
but
now
like
you
you
are
you
are
at
this
high
point
where
someone
else
would
say
look
like
I
don't
wanna
start
learning
what
you've
learned
to
go
Dwarkesh Patel (Host)
00:30.000
I
with
guess
this
that
would
require
many
different
companies
to
begin
at
the
human
like
continue
a
learning
agent
at
the
same
time,
so
that
they
can
start
their
different
research
in
different
branches,
but
if
one
company
you
know,
gets
that
agent
first
or
gets
that
learner
Dwarkesh Patel (Host)
00:47.480
first,
it
does
then
seem
like
well,
you
know,
they
like
we
if
you
just
think
about
every
single
job
in
the
economy,
Dwarkesh Patel (Host)
00:56.800
you
just
have
uh
instance
learning
each
one
seems
tractable
for
a
company.
Ilya Sutskever (Co-founder and Chief Scientist)
01:01.800
Yeah,
that's
that
that's
that's
a
valid
argument.
My
my
strong
intuition
is
that
it's
not
how
it's
going
to
go.
My
strong
intuition
is
that
yeah
like
the
argument
says
it
will
go
this
way.
Dwarkesh Patel (Host)
01:12.160
Yeah.
Ilya Sutskever (Co-founder and Chief Scientist)
01:12.600
But
my
strong
intuition
is
that
it
will
not
go
this
way.
That
this
is
the
You
know,
in
in
theory,
there
is
no
difference
between
theory
and
practice.
In
practice,
there
is
and
I
think
that's
going
to
be
one
of
those.
Dwarkesh Patel (Host)
01:23.500
A
lot
of
people's
models
of
recursive
self-improvement
literally
explicitly
state
we
will
have
a
million
Ilya's
in
a
server
that
are
coming
in
with
different
ideas
and
this
will
lead
to
a
super
intelligence
emerging
very
fast.
Do
you
have
some
intuition
about
how
parallelizable
Dwarkesh Patel (Host)
01:38.300
the
thing
you
are
doing
is?
How
how
much
how
what
are
the
gains
from
making
copies
of
Ilya?
Ilya Sutskever (Co-founder and Chief Scientist)
01:45.020
I
don't
know.
I
think
I
think
I
think
they'll
definitely
be
a
they'll
be
diminishing
returns
because
you
want
you
want
people
who
think
differently
rather
than
the
same.
I
think
that
if
they
were
literal
copies
of
me,
I'm
not
sure
how
much
more
incremental
value
you'd
get.
I
Ilya Sutskever (Co-founder and Chief Scientist)
02:04.620
think
that
but
people
who
think
differently,
that's
what
you
want.
Dwarkesh Patel (Host)
02:09.860
Why
is
it
that
it's
been
if
you
look
at
different
models
even
released
by
totally
different
companies
trained
on
potentially
It's
actually
crazy
non-overlapping
data
sets
how
similar
LLMs
are
to
each
other.
Ilya Sutskever (Co-founder and Chief Scientist)
02:20.020
Maybe
the
data
sets
are
not
as
non-overlapping
as
it
seems.
Dwarkesh Patel (Host)
02:23.580
But
there's
there's
some
sense
that
there's
like
even
if
an
individual
human
might
be
less
productive
than
the
future
AI,
maybe
there's
something
to
the
fact
that
human
teams
have
more
diversity
than
teams
of
AI's
might
have,
but
how
do
we
elicit
meaningful
diversity
among
AI's?
Dwarkesh Patel (Host)
02:37.460
So
I
think
just
raising
the
temperature
just
to
results
in
gibberish.
I
think
you
want
something
more
like
different
scientists
have
different
different
prejudices
or
different
ideas.
How
do
you
get
that
kind
of
diversity
among
AI
agents.
Ilya Sutskever (Co-founder and Chief Scientist)
02:48.900
So,
the
reason
there
has
been
no
diversity,
I
believe,
is
because
of
pre-training.
All
the
pre-trained
models
are
the
same,
pretty
much,
because
they're
pre-trained
on
the
same
data.
Now,
RL
and
post-training
is
where
some
differentiation
starts
to
emerge
because
different
Ilya Sutskever (Co-founder and Chief Scientist)
03:05.740
people
come
up
with
different
RL
training.
Dwarkesh Patel (Host)
03:09.060
Yeah.
And
then
I've
heard
you
hint
in
the
past
about
self-play
as
a
way
to
either
get
data
or
match
agents
to
other
agents
with
equivalent
intelligence
to
kick
off
learning.
How
should
we
think
about
why
there's
no
public
um
proposals
of
this
kind
of
thinking
working
with
other
Dwarkesh Patel (Host)
03:30.260
LMs?
Ilya Sutskever (Co-founder and Chief Scientist)
03:31.020
I
would
say
there
are
two
things
to
say.
I
would
say
that
the
reason
why
I
thought
self-play
was
interesting
is
because
it
offered
a
way
to
create
models
using
compute
only
without
data.
Right?
And
if
you
think
that
data
is
the
ultimate
bottleneck
Then
using
computer
only
is
Ilya Sutskever (Co-founder and Chief Scientist)
03:48.940
very
interesting.
So
that's
what
makes
it
interesting.
Now,
the
the
thing
is
that
self
play,
at
least
the
way
it
was
done
in
the
past
when
you
have
agents
which
are
somehow
compete
with
each
other,
it's
only
good
for
developing
a
certain
set
of
skills.
It
is
too
narrow.
It's
Ilya Sutskever (Co-founder and Chief Scientist)
04:10.020
only
good
for
like
negotiation,
conflict,
certain
social
skills.
strategizing
that
kind
of
stuff.
And
so
if
you
care
about
those
skills,
then
self-play
will
be
useful.
Now,
actually
I
think
that
self-play
did
find
a
home,
but
just
in
a
different
form,
in
a
different
form.
So
Ilya Sutskever (Co-founder and Chief Scientist)
04:32.420
things
like
debate,
prover
verifier,
you
have
some
kind
of
an
LLM
as
a
judge
which
is
also
incentivized
to
find
mistakes
in
your
work.
You
could
say
this
is
not
exactly
self-play,
but
this
is,
you
know,
a
related
adversarial
set
that
up
that
people
are
doing,
I
believe.
And
Ilya Sutskever (Co-founder and Chief Scientist)
04:48.860
really
self-play
is
an
example
of
um
is
a
special
case
of
more
general
like
um
competition
between
between
agents.
Right?
The
response,
the
natural
response
to
competition
is
to
try
to
be
different.
And
so
if
you
were
to
put
multiple
agents
and
you
tell
them,
you
know,
you
all
Ilya Sutskever (Co-founder and Chief Scientist)
05:03.940
need
to
work
on
some
problem
and
you
are
an
agent
and
you're
inspecting
what
everyone
else
is
working,
you're
going
to
say,
"Well,
if
they're
already
taking
this
approach,
it's
not
clear
I
should
pursue
it.
I
should
pursue
something
different
And
so
I
think
that
something
like
Ilya Sutskever (Co-founder and Chief Scientist)
05:19.060
this
could
also
create
an
incentive
for
a
diversity
of
approaches.