EP20: Yann LeCun - part 2/11
December 15, 2025 • 1h 50m 6s
? (?)
00:00.310
does
AMI
like
what
products
if
any
does
AMI
plan
to
to
produce
or
make
is
it
research
or
more
than
that
Yann LeCun (Chief AI Scientist)
00:09.350
no
it's
more
than
that
it's
actual
products
OK
but
you
know
what
things
i
have
to
do
with
with
you
know
what
models
and
you
know
planning
and
and
basically
we
have
the
ambition
of
becoming
kind
of
one
of
the
main
suppliers
of
intelligent
systems
down
the
line
we
think
the
the
Yann LeCun (Chief AI Scientist)
00:28.990
current
architectures
that
are
employed
you
know
at
adams
or
you
know
agent
systems
that
are
based
on
NLM's
umm
work
OK
for
language
even
agenting
systems
really
don't
work
very
well
they
require
a
lot
of
data
to
basically
clone
the
behavior
of
humans
and
they're
not
that
Yann LeCun (Chief AI Scientist)
00:46.750
reliable
so
we
think
the
proper
way
to
handle
this
and
i've
been
saying
this
for
almost
ten
years
now
is
have
role
models
that
are
capable
of
predicting
what
would
be
the
consequence
or
the
consequences
of
an
action
or
a
sequence
of
actions
that
an
AI
system
might
take
and
then
Yann LeCun (Chief AI Scientist)
01:07.510
the
system
arrives
at
a
sequence
of
actions
or
an
output
by
optimization
by
figuring
out
what
sequence
of
actions
will
optimally
accomplish
a
test
you
know
setting
for
myself
that's
planning
OK
so
i
think
in
the
central
part
of
intelligence
is
being
able
to
predict
the
Yann LeCun (Chief AI Scientist)
01:27.040
consequences
of
your
actions
and
then
use
them
for
planning
and
that's
what
we're
that's
what
i've
been
working
on
for
many
years
we've
been
making
fast
progress
with
you
know
a
combination
of
projects
here
at
NYU
and
also
at
meta
and
now
it's
time
to
basically
make
it
make
it
Yann LeCun (Chief AI Scientist)
01:46.990
real
Ravid Shwartz-Ziv (Assistant Professor)
01:49.160
and
what
do
you
think
are
the
missing
parts
like
and
why
you
think
it's
taking
so
long
because
you're
talking
about
it
as
you
said
like
for
many
years
already
but
it's
still
not
better
than
LLM
Yann LeCun (Chief AI Scientist)
02:02.150
right
it's
not
the
same
thing
as
LLM
right
it's
designed
to
handle
modalities
that
are
high
dimensional
continuous
and
noisy
and
LNM's
completely
suck
at
this
like
they
really
do
not
work
right
if
you
try
to
train
an
LLM
to
kind
of
learn
good
representations
of
images
or
video
Yann LeCun (Chief AI Scientist)
02:21.560
they're
really
not
that
great
you
know
generally
vision
capabilities
for
for
AI
AI
systems
right
are
trained
separately
they're
not
part
of
the
whole
LLM
thing
so
yeah
if
you
want
to
handle
data
that
is
high
dimensional
continuous
and
noisy
you
cannot
use
generative
models
you
Yann LeCun (Chief AI Scientist)
02:46.200
can
certainly
not
use
generative
models
that
tokenize
your
data
into
kind
of
discrete
symbols
OK
it's
just
no
way
and
we
have
a
lot
of
empirical
evidence
that
this
simply
doesn't
work
very
well
what
does
work
is
learning
an
abstract
representation
space
that
illuminates
a
lot
of
Yann LeCun (Chief AI Scientist)
03:02.670
details
about
the
input
essentially
all
these
details
that
are
not
predictable
which
includes
noise
and
make
predictions
in
that
representation
space
and
this
is
the
idea
of
jetpack
johnson
then
in
predictive
architectures
which
you
know
you
are
as
familiar
to
yeah
Ravid Shwartz-Ziv (Assistant Professor)
03:19.430
with
sorry
because
you
worked
on
this
yeah
so
also
randall
was
a
hosted
in
the
past
in
in
the
podcast
i
probably
talked
about
this
at
length
Yann LeCun (Chief AI Scientist)
03:31.310
so
so
there's
a
lot
of
ideas
around
this
and
let
me
tell
you
my
history
around
this
OK
i
have
been
convinced
for
a
long
time
probably
the
better
part
of
twenty
years
that
the
the
proper
way
to
building
intelligent
systems
was
through
some
form
of
unsupervised
learning
i
started
Yann LeCun (Chief AI Scientist)
03:51.710
working
on
unsupervised
learning
as
the
basis
for
you
know
making
progress
in
the
early
two
thousands
mid
two
thousands
before
that
i
wasn't
so
convinced
this
was
the
way
to
go
and
and
basically
this
was
the
idea
of
you
know
training
auto
encoders
to
learn
representations
right
Yann LeCun (Chief AI Scientist)
04:12.430
so
you
have
an
input
you
run
into
an
encoder
it
finds
a
representation
of
it
and
then
you
decode
so
you
guarantee
that
the
representation
contains
all
the
information
about
the
input
does
that
that
iteration
is
long
like
insisting
that
the
representation
contains
all
the
Yann LeCun (Chief AI Scientist)
04:26.910
information
about
the
input
is
a
bad
idea
OK
i
didn't
know
this
at
the
time
so
what
we
worked
on
was
you
have
several
ways
of
doing
this
you
know
jeff
hinton
at
the
time
was
working
on
restricted
boss
machines
joshua
benji
was
working
on
the
noisy
autoeurs
which
actually
became
Yann LeCun (Chief AI Scientist)
04:43.480
quite
successful
in
different
contexts
right
for
NLP
among
others
and
i
was
working
on
sparse
auto
encoders
so
basically
you
know
if
you
train
on
auto
encoder
you
need
to
recognize
the
the
representation
so
that
the
autoencoder
does
not
trivially
learn
an
identity
function
and
Yann LeCun (Chief AI Scientist)
04:58.550
this
is
the
information
bottleneck
podcast
listens
about
information
bottleneck
right
you
need
to
create
an
information
bottleneck
to
limit
the
information
content
of
the
representation
and
i
thought
high
dimensional
sparse
representations
was
actually
a
good
way
to
go
so
so
a
Yann LeCun (Chief AI Scientist)
05:15.190
bunch
of
my
students
did
their
PH
D
on
this
corelu
who
is
not
a
chief
AI
architect
at
alphabet
and
also
the
CTO
actually
did
this
finished
here
on
this
with
me
and
you
know
a
few
a
few
other
a
few
other
false
macro
on
zetto
and
eden
bro
and
a
few
others
so
this
was
kind
of
the
Yann LeCun (Chief AI Scientist)
05:37.080
idea
and
then
as
it
turned
out
and
the
idea
the
reason
why
we
worked
on
this
was
because
we
wanted
to
pre
train
very
deep
neural
nets
by
pre
training
in
those
things
that
auto
encoders
we
thought
that
was
the
way
to
go
what
happened
though
was
that
we
started
like
you
know
Yann LeCun (Chief AI Scientist)
05:55.070
experimenting
with
things
like
normalization
rectification
instead
of
hyperbole
tangential
to
sigmoid
like
radius
that
ended
up
you
know
basically
allowing
us
to
train
fairly
deep
network
completely
supervised
so
self
supervised
learning
and
this
was
at
the
same
time
that
data
Yann LeCun (Chief AI Scientist)
06:16.790
set
started
to
get
bigger
and
so
it
turned
out
like
you
know
supervisor
it
worked
fine
so
the
whole
idea
of
self
supervised
or
unsupervised
learning
was
put
put
aside
and
then
came
resnet
and
you
know
that's
a
completely
solved
the
problem
of
training
very
deep
architecture
Yann LeCun (Chief AI Scientist)
06:32.400
that's
right
in
twenty
fifteen
but
then
in
twenty
fifteen
i
started
you
know
thinking
again
about
like
how
do
we
push
towards
like
human
level
AI
which
really
was
the
original
objective
of
fair
really
and
my
objective
my
life
's
you
know
mission
and
realized
that
you
know
all
Yann LeCun (Chief AI Scientist)
06:51.110
the
approaches
of
reinforcement
learning
and
things
of
that
type
were
basically
not
scaling
you
know
reinforcement
learning
is
incredibly
inefficient
in
terms
of
samples
and
so
this
is
not
the
way
to
go
and
and
so
the
idea
of
world
models
right
the
system
that
can
predict
the
Yann LeCun (Chief AI Scientist)
07:09.400
consequence
consequences
of
its
action
they
can
plan
i
started
researching
playing
with
this
around
twenty
fifteen
sixteen
my
keynote
at
what
was
still
called
nips
at
the
time
in
twenty
sixteen
was
on
role
model
i
was
arguing
for
it
those
basically
the
centerpiece
of
my
talk
was
Yann LeCun (Chief AI Scientist)
07:28.590
like
this
is
what
we
should
be
working
on
like
you
know
grandma
was
action
condition
and
if
your
residents
are
working
on
this
on
video
prediction
and
things
like
that
we
had
some
papers
on
video
prediction
in
twenty
sixteen
and
i
made
a
the
same
mistake
as
before
and
the
same
Yann LeCun (Chief AI Scientist)
07:47.230
mistake
that
everybody
is
doing
at
the
moment
which
is
training
a
video
prediction
system
to
predict
at
the
pixel
level
which
is
really
impossible
and
you
can't
really
represent
useful
probability
distributions
on
the
space
of
video
frames
those
things
don't
work
i
knew
for
a
Yann LeCun (Chief AI Scientist)
08:07.990
fact
that
because
the
prediction
was
nondeterministic
we
had
to
have
a
model
with
latent
variables
to
represent
all
the
stuff
you
don't
know
about
the
variable
you're
supposed
to
predict
and
so
we
experimented
with
this
for
years
i
had
a
student
here
who
is
now
a
scientist
at
Yann LeCun (Chief AI Scientist)
08:25.190
fair
michael
enough
who
developed
a
video
prediction
system
with
latent
variables
and
he
kind
of
solved
this
problems
we're
facing
slightly
i
mean
today
the
solution
that
a
lot
of
people
are
employing
is
diffusion
models
which
is
a
way
to
train
in
non
deterministic
function
Yann LeCun (Chief AI Scientist)
08:44.710
essentially
or
energy
based
models
which
have
been
advocating
for
decades
now
which
also
is
another
way
of
training
non
deterministic
functions
but
in
the
end
i
discovered
that
this
was
all
about
the
idea
that
the
really
the
way
to
get
around
the
fact
that
you
can't
predict
that
Yann LeCun (Chief AI Scientist)
09:03.240
the
pixel
level
is
to
just
not
predict
the
pixel
level
is
to
another
representation
and
predict
that
the
representation
level
eliminating
all
the
details
you
cannot
predict
and
i
wasn't
really
thinking
about
those
methods
early
on
because
i
thought
there
was
a
huge
problem
of
Yann LeCun (Chief AI Scientist)
09:24.640
preventing
collapse
so
i'm
sure
randall
talked
about
this
but
you
know
when
you
train
let's
say
you
have
an
observed
variable
X
and
you're
trying
to
predict
a
variable
Y
but
you
don't
want
to
predict
all
the
details
right
to
run
X
and
Y
to
encoders
so
now
you
have
both
a
Yann LeCun (Chief AI Scientist)
09:40.510
representation
for's
for
X
SX
and
representation
for
Y
S
Y
you
can
train
a
predictor
to
produce
you
know
predict
the
representation
of
Y
from
the
representation
of
X
but
if
you
want
to
train
this
whole
thing
end
to
end
simultaneously
this
is
a
trivial
solution
where
the
system
Yann LeCun (Chief AI Scientist)
09:57.950
ignores
the
input
and
produces
constant
representations
and
the
predictors
probably
know
is
trivial
right
so
if
you're
on
the
criterion
to
train
the
system
is
minimized
dictionary
it's
not
going
to
work
it's
going
to
collapse
i
knew
about
this
problem
for
a
very
long
time
Yann LeCun (Chief AI Scientist)
10:14.110
because
i
worked
on
joint
embedding
architectures
we
used
to
call
them
siamese
networks
back
in
the
nineties
Autoscroll