Lex Fridman Podcast E333: Conversation with AI Pioneer Andre Karpathy

Lex Fridman Podcast E333
Created by Podalize
October 30, 2022
Figure 1:
1

1 Transcript
Lex 00:00
The following is a conversation with Andrek Kapathy, previously the Director of
AI at Tesla, and before that at OpenAI and Stanford. He is one of the greatest
scientist, engineers, and educators in the history of artificial intelligence. And
now, a quick few second mention of each sponsor. Check them out in the de-
scription, it’s the best way to support this podcast. We got 8Sleep for naps,
BetterHelp for mental health, Fundrise for real estate investing, and Athletic
Greens for nutrition. Choose wisely, my friends. And now, onto the full ad
reads. As always, no ads in the middle. I try to make this interesting, but if
you skip them, please still check out our sponsors. I enjoy their stuff, maybe
you will too. This episode is sponsored by 8Sleep and it’s new Pod3 mattress.
I am recording this in a hotel. In fact, given some complexities of my life, this
is the middle of the night, 4 a.m. I’m sitting in an empty hotel room, yelling at
a microphone. This, my friends, is my life. I do usually feel good about myself
at 4 a.m., but not with two cups of coffee in me. And the reason I feel good is
because I’m going to go to sleep soon and I’ve accomplished a lot. This is true
today, except for the sleep soon part, because I think I’m going to an airport
at some point soon. It doesn’t matter. What matters is I’m not even gonna
sleep here. And that’s great, because in a hotel, I don’t have an 8Sleep bed
that can cool itself. At home, I do, and that’s where I’m headed. I’m headed
home. Anyway, check it out and get special savings when you go to 8Sleep.com
slash Lex. This episode is also brought to you by BetterHelp, spelled H-E-L-P,
help. I’m a huge fan of talk therapy. I think of podcasting as a kind of talk
therapy. So I’m a huge fan of listening to podcasts. In fact, that’s how I think
of doing a podcast myself. I just get to have front row seats to a thing I love.
And it’s actually just the process of talking that reveals something about the
mind. I think that’s what good talk therapy is, is it guided by a professional
therapist. It helps you reveal to yourself something about your mind. Just lay
it all out on the table. So yeah, you should definitely use the best method of
talk therapy, the best meaning the most accessible. At least to try it, if not to
make it a regular part of your life, that’s what BetterHelp does. Check them
out at betterhelp.com slash Lex and save on your first month. This episode
is also brought to you by Fundrise, spelled F-U-N-D-R-I-S-E. It’s a platform
that allows you to invest in private real estate. We live in hard times, folks, for
many different reasons, but one of them is financial. And one way to protect
yourself in difficult times is diversify your investments. Private real estate is
one of the things, I believe, you should diversify into. And when you do, you
should use tools that look like they’re made in the 21st century, which a lot
of investment, even like online investment websites and services, seem to be
designed by the same people that designed the original ATMs. That’s not the
case with Fundrise. Super easy to use, accessible, over 150,000 investors use it,
their team vets and manages all their real estate projects. You can track your
portfolio’s performance on their website and see updates as properties across the
country are acquired, improved, and operated. Anyway, check out Fundrise. It
5

takes just a few minutes to get started at fundrise.com slash Lex. This show is
brought to you by Athletic Greens and its AG One Drink, which is an all-in-
one daily drink to support better health and peak performance. I have to be
honest, I completely forgot to bring Athletic Greens with me as I’m traveling
now, and I miss it. It’s not just good for my nutritional base and needs, it’s
good for my soul. It’s part of the sort of the daily habit of life. And when you
don’t have that habit, the routine stuff is off. So it’s good to just put that into
your daily routine to make sure that you’re getting the vitamins, the nutrition
that you need, no matter the dietary, the workload, the athletic endeavors that
you partake in. I don’t know, it’s kind of incredible. And yeah, that’s what
Athletic Greens is for me. They’ll give you one month’s supply of fish oil when
you sign up at athleticgreens.com slash Lex. This is the Lex Friedman podcast.
To support it, please check out our sponsors. And now, dear friends, here’s
Andre Karpathy. What is a neural network? And why does it seem to do such
a surprisingly good job of learning?
Andrej 05:48
What is a neural network? It’s a mathematical abstraction of the brain, I would
say that’s how it was originally developed. At the end of the day, it’s a mathe-
matical expression. And it’s a fairly simple mathematical expression when you
get down to it. It’s basically a sequence of matrix multiplies, which are really
dot products mathematically, and some non-linearity is thrown in. And so it’s
a very simple mathematical expression, and it’s got knobs in it. Many knobs.
Many knobs. And these knobs are loosely related to basically the synapses in
your brain. They’re trainable, they’re modifiable. And so the idea is like, we
need to find the setting of the knobs that makes the neural net do whatever
you want it to do, like classify images and so on. And so there’s not too much
mystery, I would say, in it. You might think that, basically don’t want to endow
it with too much meaning with respect to the brain and how it works. It’s really
just a complicated mathematical expression with knobs, and those knobs need
a proper setting for it to do something desirable.
Lex 06:43
Yeah, but poetry is just the collection of letters with spaces. But it can make
us feel a certain way. And in that same way, when you get a large number of
knobs together whether it’s inside the brain or inside a computer, they seem to
surprise us with their power.
Andrej 07:00
Yeah, I think that’s fair. So basically, I’m underselling it by a lot because you
definitely do get very surprising emergent behaviors out of these neural nets
when they’re large enough and trained on complicated enough problems. Like
say, for example, the next word prediction in a massive data set from the inter-
6

net. And then these neural nets take on pretty surprising magical properties.
Yeah, I think it’s kind of interesting how much you can get out of even very
simple mathematical formalism.
Lex 07:26
When your brain right now is talking, is it doing next word prediction? Or is
it doing something more interesting?
Andrej 07:33
Well, it’s definitely some kind of a generative model that’s a GPT-like and
prompted by you. So you’re giving me a prompt and I’m kind of like respond-
ing to it in a generative way.
Lex 07:42
And by yourself, perhaps a little bit? Like are you adding extra prompts from
your own memory inside your head?
Andrej 07:50
Or no? Or is it like you’re referencing some kind of a declarative structure of like
memory and so on? And then you’re putting that together with your prompt
and giving away some answer.
Lex 08:01
How much of what you just said has been said by you before?
Andrej 08:06
Nothing basically, right?
Lex 08:07
No, but if you actually look at all the words you’ve ever said in your life and
you do a search, you’ll probably have said a lot of the same words in the same
order before.
Andrej 08:18
Yeah, could be. I mean, I’m using phrases that are common, et cetera, but I’m
remixing it into a pretty sort of unique sentence at the end of the day. But
you’re right, definitely there’s like a ton of remixing.
7

Lex 08:28
Why, you didn’t, it’s like Magnus Carlsen said, I’m rated 2900 whatever, which
is pretty decent. I think you’re talking very, you’re not giving enough credit to
neural nets here. Why do they seem to, what’s your best intuition about this
emergent behavior?
Andrej 08:49
I mean, it’s kind of interesting because I’m simultaneously underselling them,
but I also feel like there’s an element to which I’m over, like, it’s actually kind
of incredible that you can get so much emergent magical behavior out of them
despite them being so simple mathematically. So I think those are kind of like
two surprising statements that are kind of juxtaposed together. And I think
basically what it is, is we are actually fairly good at optimizing these neural
nets. And when you give them a hard enough problem, they are forced to learn
very interesting solutions in the optimization. And those solution basically have
these emergent properties that are very interesting.
Lex 09:22
There’s wisdom and knowledge in the knobs. And so this representation that’s
in the knobs, does it make sense to you intuitively that a large number of knobs
can hold a representation that captures some deep wisdom about the data it
has looked at? It’s a lot of knobs.
Andrej 09:42
It’s a lot of knobs. And somehow, you know, so speaking concretely, one of the
neural nets that people are very excited about right now are GPTs, which are
basically just next word prediction networks. So you consume a sequence of
words from the internet and you try to predict the next word. And once you
train these on a large enough data set, you can basically prompt these neural
nets in arbitrary ways and you can ask them to solve problems and they will.
So you can just tell them, you can make it look like you’re trying to solve some
kind of a mathematical problem and they will continue what they think is the
solution based on what they’ve seen on the internet. And very often those so-
lutions look very remarkably consistent, look correct potentially even.
Lex 10:27
Do you still think about the brain side of it? So as neural nets is an abstrac-
tion, a mathematical abstraction of the brain, you still draw wisdom from the
biological neural networks or even the bigger question. So you’re a big fan of
biology and biological computation. What impressive thing is biology doing to
you that computers are not yet? That gap.
8

Andrej 10:53
I would say I’m definitely on, I’m much more hesitant with the analogies to
the brain than I think you would see potentially in the field. And I kind of
feel like, certainly the way neural networks started is everything stemmed from
inspiration by the brain. But at the end of the day, the artifacts that you get
after training, they are arrived at by a very different optimization process than
the optimization process that gave rise to the brain. And so I think, I kind of
think of it as a very complicated alien artifact. It’s something different. I’m
sorry, the neural nets that we’re training. They are complicated alien artifact.
I do not make analogies to the brain because I think the optimization process
that gave rise to it is very different from the brain. So there was no multi-agent
self-play kind of setup and evolution. It was an optimization that is basically
what amounts to a compression objective on a massive amount of data.
Lex 11:47
Okay, so artificial neural networks are doing compression and biological neural
networks. And now to survive. And they’re not really doing anything. They’re
an agent in a multi-agent self-play system that’s been running for a very, very
long time.
Andrej 12:03
That said, evolution has found that it is very useful to predict and have a pre-
dictive model in the brain. And so I think our brain utilizes something that
looks like that as a part of it. But it has a lot more, you know, catches and
gizmos and value functions and ancient nuclei that are all trying to like make
you survive and reproduce and everything else.
Lex 12:24
And the whole thing through embryogenesis is built from a single cell. I mean,
it’s just, the code is inside the DNA and it just builds it up like the entire
organism with arms and the head and legs. And like it does it pretty well. It
should not be possible. So there’s some learning going on. There’s some kind of
computation going through that building process. I mean, I don’t know where,
if you were just to look at the entirety of history of life on earth, what do you
think is the most interesting invention? Is it the origin of life itself? Is it just
jumping to eukaryotes? Is it mammals? Is it humans themselves, homo sapi-
ens? The origin of intelligence or highly complex intelligence? Or is it all just
a continuation of the same kind of process?
Andrej 13:20
9

Certainly I would say it’s an extremely remarkable story that I’m only like
briefly learning about recently. All the way from, actually like you almost have
to start at the formation of earth and all of its conditions and the entire solar
system and how everything is arranged with Jupiter and moon and the hab-
itable zone and everything. And then you have an active earth that’s turning
over material. And then you start with a biogenesis and everything. And so
it’s all like a pretty remarkable story. I’m not sure that I can pick like a single
unique piece of it that I find most interesting. I guess for me as an artificial
intelligence researcher, it’s probably the last piece. We have lots of animals that,
you know, are not building technological society, but we do. And it seems to
have happened very quickly. It seems to have happened very recently and some-
thing very interesting happened there that I don’t fully understand. I almost
understand everything else kind of, I think intuitively, but I don’t understand
exactly that part and how quick it was.
Lex 14:22
Both explanations will be interesting. One is that this is just a continuation of
the same kind of process. There’s nothing special about humans. That would
be, deeply understanding that would be very interesting that we think of our-
selves as special, but it was obvious. It was already written in the code that
you would have greater and greater intelligence emerging. And then the other
explanation, which is something truly special happened, something like a rare
event, whether it’s like crazy rare event, like Space Odyssey. What would it
be? See, if you say like the invention of fire or the, as Richard Rangham says,
the beta males deciding a clever way to kill the alpha males by collaborating.
So just optimizing the collaboration, the multi-agent aspect of the multi-agent.
And that really being constrained on resources and trying to survive the collab-
oration aspect is what created the complex intelligence. But it seems like it’s a
natural algorithm of the evolution process. What could possibly be a magical
thing that happened? Like a rare thing that would say that humans are actu-
ally, human level intelligence is actually a really rare thing in the universe.
Andrej 15:40
Yeah, I’m hesitant to say that it is rare by the way, but it definitely seems like,
it’s kind of like a punctuated equilibrium where you have lots of exploration and
then you have certain leaps, sparse leaps in between. So of course like origin of
life would be one, DNA, sex, eukaryotic life, the endosymbiosis event where the
Archeon ate little bacteria, just the whole thing. And then of course emergence
of consciousness and so on. So it seems like definitely there are sparse events
where massive amount of progress was made, but yeah, it’s kind of hard to pick
one.
Lex 16:13
10

So you don’t think humans are unique? Gotta ask you how many intelligent
alien civilizations do you think are out there? And is their intelligence different
or similar to ours?
Andrej 16:28
Yeah, I’ve been preoccupied with this question quite a bit recently, basically the
Fermi paradox and just thinking through. And the reason actually that I am
very interested in the origin of life is fundamentally trying to understand how
common it is that there are technological societies out there in space. And the
more I study it, the more I think that there should be quite a lot.
Lex 16:53
Why haven’t we heard from them? Because I agree with you. It feels like I just
don’t see why what we did here on Earth is so difficult to do.
Andrej 17:04
Yeah, and especially when you get into the details of it, I used to think origin
of life was very, it was this magical rare event, but then you read books like for
example, Nick Lane, the vital question, life ascending, et cetera. And he really
gets in and he really makes you believe that this is not that rare basic chemistry.
You have an active Earth and you have your alkaline vents and you have lots
of alkaline waters mixing with the ocean and you have your proton gradients
and you have the little porous pockets of these alkaline vents that concentrate
chemistry. And basically as he steps through all of these little pieces, you start
to understand that actually this is not that crazy, you could see this happen
on other systems. And he really takes you from just a geology to primitive life
and he makes it feel like it’s actually pretty plausible. And also like the origin
of life was actually fairly fast after formation of Earth. If I remember correctly,
just a few hundred million years or something like that after basically when it
was possible, life actually arose. And so that makes me feel that that is not
the constraint, that is not the limiting variable and that life should actually be
fairly common. And then where the drop-offs are is very interesting to think
about. I currently think that there’s no major drop-offs basically and so there
should be quite a lot of life. And basically where that brings me to then is the
only way to reconcile the fact that we haven’t found anyone and so on is that
we just can’t, we can’t see them, we can’t observe them.
Lex 18:34
Just a quick brief comment, Nick Lane and a lot of biologists I talk to, they
really seem to think that the jump from bacteria to more complex organisms is
the hardest jump.
11

Andrej 18:45
The eukaryotic life basically.
Lex 18:46
Yeah, which I don’t, I get it. They’re much more knowledgeable than me about
the intricacies of biology. But that seems like crazy. Because how many single-
cell organisms are there? And how much time you have, surely it’s not that
difficult. Like in a billion years, it’s not even that long of a time really. Just
all these bacteria under constrained resources battling it out, I’m sure they can
invent more complex. Like I don’t understand, it’s like how to move from a hello
world program to like invent a function or something like that. I don’t. Yeah.
So I don’t, yeah, so I’m with you. I just feel like I don’t see any. If the origin
of life, that would be my intuition, that’s the hardest thing. But if that’s not
the hardest thing because it happened so quickly, then it’s gotta be everywhere.
And yeah, maybe we’re just too dumb to see it.
Andrej 19:39
Well, it’s just, we don’t have really good mechanisms for seeing this life. I mean,
by what, how, so I’m not an expert just to preface this, but just from what I’ve
been at it.
Overlaps 19:47
On aliens. Who’s, I wanna meet an expert on alien intelligence and how to
communicate.
Andrej 19:53
I’m very suspicious of our ability to find these intelligences out there and to find
these Earths. Like radio waves, for example, are terrible. Their power drops off
as basically one over R square. So I remember reading that our current radio
waves would not be, the ones that we are broadcasting would not be measur-
able by our devices today. Only like, was it like one 10th of a light year away?
Like not even, basically tiny distance because you really need like a targeted
transmission of massive power directed somewhere for this to be picked up on
long distances. And so I just think that our ability to measure is not amazing. I
think there’s probably other civilizations out there. And then the big question is
why don’t they build one moment probes and why don’t they interstellar travel
across the entire galaxy? And my current answer is it’s probably interstellar
travel is like really hard. You have the interstellar medium. If you wanna move
at close to the speed of light, you’re going to be encountering bullets along the
way because even like tiny hydrogen atoms and little particles of dust are basi-
cally have like massive kinetic energy at those speeds. And so basically you need
12

some kind of shielding. You need, you have all the cosmic radiation. It’s just
like brutal out there. It’s really hard. And so my thinking is maybe interstellar
travel is just extremely hard. And you have to go very slow.
Lex 21:04
And billions of years to build hard. It feels like, it feels like we’re not a billion
years away from doing that.
Andrej 21:12
It just might be that it’s very, you have to go very slowly potentially as an
example through space.
Lex 21:18
Right, as opposed to close to the speed of light.
Andrej 21:20
So I’m suspicious basically of our ability to measure life and I’m suspicious of
the ability to just permeate all of space in the galaxy or across galaxies. And
that’s the only way that I can currently see
Lex 21:31
a way around it. Yeah, it’s kind of mind blowing to think that there’s trillions
of intelligent alien civilizations out there kind of slowly traveling through space
to meet each other. And some of them meet, some of them go to war, some of
them collaborate.
Andrej 21:48
Or they’re all just independent. They’re all just like little pockets.
Lex 21:53
Well statistically, if there’s like, if it’s trillions of them, surely some of them,
some of the pockets are close enough together. Some of them happen to be
close, yeah. And close enough to see each other. And then once you, see once
you see something that is definitely complex life, like if we see something, we’re
probably going to be severe, like intensely, aggressively motivated to figure out
what the hell that is and try to meet them. What would be your first instinct
to try to, like at a generational level, meet them or defend against them or what
would be your instinct as a president of the United States and a scientist? I
don’t know which hat you prefer in this question.
13

Andrej 22:39
Yeah, I think the question, it’s really hard. I will say like, for example, for us,
we have lots of primitive life forms on earth next to us. We have all kinds of
ants and everything else and we share space with them and we are hesitant to
impact on them and we’re trying to protect them by default because they are
amazing, interesting, dynamical systems that took a long time to evolve and
they are interesting and special. And I don’t know that you want to destroy
that by default. And so I like complex dynamical systems that took a lot of
time to evolve. I think I’d like to preserve it if I can afford to. And I’d like to
think that the same would be true about the galactic resources and that they
would think that we’re kind of incredible, interesting story that took time, it
took a few billion years to unravel and you don’t want to just destroy it.
Lex 23:33
I could see two aliens talking about earth right now and saying, I’m a big fan
of complex dynamical systems. So I think it was a value to preserve these and
who basically are a video game they watch or show a TV show that they watch.
Andrej 23:48
Yeah, I think you would need like a very good reason I think to destroy it. Like
why don’t we destroy these ant farms and so on? It’s because we’re not actually
like really in direct competition with them right now. We do it accidentally and
so on, but there’s plenty of resources. And so why would you destroy something
that is so interesting and precious?
Lex 24:06
Well, from a scientific perspective, you might probe it. You might interact with
it lightly.
Andrej 24:11
You might want to learn something from it, right?
Lex 24:13
So I wonder, there could be certain physical phenomena that we think is a phys-
ical phenomena, but it’s actually interacting with us to like poke the finger and
see what happens.
Andrej 24:22
14

I think it should be very interesting to scientists, other alien scientists what
happened here. And what we’re seeing today is a snapshot. Basically it’s a
result of a huge amount of computation over like billion years or something like
that.
Lex 24:36
So it could have been initiated by aliens. This could be a computer running a
program. Okay, if you had the power to do this, when you, okay, for sure, at
least I would, I would pick a Earth-like planet that has the conditions, based
on my understanding of the chemistry prerequisites for life, and I would seed
it with life and run it, right? Like wouldn’t you 100% do that and observe it
and then protect, I mean, that’s not just a hell of a good TV show. It’s a good
scientific experiment. It’s physical simulation, right? Maybe the evolution is
the most, like actually running it is the most efficient way to understand com-
putation or to compute stuff.
Andrej 25:25
Or to understand life or what life looks like and what branches it can take.
Lex 25:30
It does make me kind of feel weird that we’re part of a science experiment, but
maybe everything’s a science experiment. Does that change anything for us?
If we’re a science experiment? I don’t know. Two descendants of apes talking
about being inside of a science experiment.
Andrej 25:46
I’m suspicious of this idea of like a deliberate panspermia, as you described it,
Surnas. And I don’t see a divine intervention in some way in the historical
record right now. I do feel like the story in these books, like Nick Lane’s books
and so on sort of makes sense. And it makes sense how life arose on Earth
uniquely. And yeah, I don’t need to reach for more exotic explanations right
now.
Lex 26:09
Sure, but NPCs inside a video game don’t observe any divine intervention ei-
ther. We might just be all NPCs running a kind of code.
Andrej 26:19
Maybe eventually they will. Currently NPCs are really dumb, but once they’re
running GPTs, maybe they will be like, hey, this is really suspicious, what the
15

hell?
Lex 26:27
So you famously tweeted, it looks like if you bombard Earth with photons for a
while, you can emit a roadster. So if like in Hitchhiker’s Guide to the Galaxy,
we would summarize the story of Earth. So in that book, it’s mostly harmless.
What do you think is all the possible stories, like a paragraph long or a sentence
long, that Earth could be summarized as? Once it’s done, it’s computation. So
like all the possible full, if Earth is a book, right? Probably there has to be an
ending. I mean, there’s going to be an end to Earth and it could end in all kinds
of ways. It can end soon, it can end later. What do you think are the possible
stories?
Andrej 27:11
Well, definitely there seems to be, yeah, you’re sort of, it’s pretty incredible that
these self-replicating systems will basically arise from the dynamics and then
they perpetuate themselves and become more complex and eventually become
conscious and build a society. And I kind of feel like in some sense, it’s kind
of like a deterministic wave that kind of just like happens on any sufficiently
well-arranged system like Earth. And so I kind of feel like there’s a certain sense
of inevitability in it and it’s really beautiful.
Lex 27:44
And it ends somehow, right? So it’s a chemically diverse environment where
complex dynamical systems can evolve and become more further and further
complex. But then there’s a certain, There’s certain terminating conditions.
Yeah, I don’t know what the terminating conditions are,
Andrej 28:06
but definitely there’s a trend line of something and we’re part of that story.
And like, where does it go? So, we’re famously described often as a biological
bootloader for AIs. And that’s because humans, I mean, we’re an incredible bi-
ological system and we’re capable of computation and love and so on, but we’re
extremely inefficient as well. Like we’re talking to each other through all the
different audio, it’s just kind of embarrassing, honestly, that we’re manipulating
like seven symbols, serially, we’re using vocal chords, it’s all happening over like
multiple seconds. It’s just like kind of embarrassing when you step down to the
frequencies at which computers operate or are able to cooperate on. And so
basically it does seem like synthetic intelligences are kind of like the next stage
of development. And I don’t know where it leads to, like at some point I suspect
the universe is some kind of a puzzle and these synthetic AIs will uncover that
puzzle and solve it. And then what happens after, right?
16

Lex 29:10
Like what, cause if you just like fast forward Earth, many billions of years, it’s
like, it’s quiet. And then it’s like, to turmoil, you see like city lights and stuff
like that. And then what happens at like at the end, like, is it like a, or is it like
a calming, is it explosion? Is it like Earth like open, like a giant, cause you said
emit roasters, like will it start emitting like a giant number of like satellites?
Yes, it’s some kind of a crazy explosion
Andrej 29:40
and we’re living, we’re like, we’re stepping through a explosion and we’re like
living day to day and it doesn’t look like it, but it’s actually, if you, I saw a
very cool animation of Earth and life on Earth and basically nothing happened.
And then the last like two seconds, like basically cities and everything and the
lower orbit just gets cluttered and just the whole thing happens in the last two
seconds. And you’re like, this is exploding. This is a state of explosion. So if
you play, yeah, yeah.
Lex 30:05
If you play it at normal speed, it will just look like an explosion. It’s a fire-
cracker. We’re living in a firecracker.
Andrej 30:12
Where it’s going to start emitting
Lex 30:13
all kinds of interesting things. And then so explosion doesn’t, it might actually
look like a little explosion with lights and fire and energy emitted, all that kind
of stuff. But when you look inside the details of the explosion, there’s actual
complexity happening where there’s like, yeah, human life or some kind of life.
We hope it’s not a destructive firecracker.
Andrej 30:35
It’s kind of like a constructive firecracker. All right, so given that,
Lex 30:40
I think a hilarious discussion. It is really interesting to think about like what
the puzzle of the universe is.
17

Andrej 30:45
Did the creator of the universe give us a puzzle of the universe? Did the creator
of the universe give us a message? Like for example, in the book Contact, Carl
Sagan, there’s a message for any civilization in the digits, in the expansion of
pi and base 11 eventually, which is kind of interesting thought. Maybe we’re
supposed to be giving a message to our creator. Maybe we’re supposed to some-
how create some kind of a quantum mechanical system that alerts them to our
intelligent presence here. Because if you think about it from their perspective,
it’s just say like quantum field theory, massive like cellular ton of metal like
thing. And like, how do you even notice that we exist? You might not even be
able to pick us up in that simulation. And so how do you prove that you exist,
that you’re intelligent and that you’re part of the universe?
Lex 31:31
So this is like a touring test for intelligence from Earth. Yeah. Like the creator
is, I mean, maybe this is like trying to complete the next word in a sentence.
This is a complicated way of that. Like Earth is just, is basically sending a
message back.
Andrej 31:45
Yeah, the puzzle is basically like alerting the creator that we exist. Or maybe
the puzzle is just to just break out of the system and just stick it to the creator
in some way. Basically, like if you’re playing a video game, you can somehow
find an exploit and find a way to execute on the host machine in arbitrary code.
There’s some, for example, I believe someone got a Mario, a game of Mario to
play Pong just by exploiting it. And then creating a, basically writing code and
being able to execute arbitrary code in the game. And so maybe we should be,
maybe that’s the puzzle is that we should be, find a way to exploit it. So I think
like some of these synthetic AI’s will eventually find the universe to be some
kind of a puzzle and then solve it in some way. And that’s kind of like the end
game somehow.
Lex 32:31
Do you often think about it as a simulation? So as the universe being a kind of
computation that might have bugs and exploits? Yes. Yeah, I think so.
Andrej 32:43
Is that what physics is essentially? I think it’s possible that physics has exploits
and we should be trying to find them. Arranging some kind of a crazy quantum
mechanical system that somehow gives you buffer overflow, somehow gives you
a rounding error in the floating point.
18

Lex 32:58
Yeah, that’s right. And like more and more sophisticated exploits. Like those
are jokes, but that could be actually very close to reality.
Andrej 33:05
Yeah, we’ll find some way to extract infinite energy. For example, when you
train a reinforcement learning agents in physical simulations and you ask them
to say, run quickly on the flat ground, they’ll end up doing all kinds of like
weird things in part of that optimization, right? They’ll get on their back leg
and they’ll slide across the floor. And it’s because the optimization, the en-
forcement learning optimization on that agent has figured out a way to extract
infinite energy from the friction forces and basically their poor implementation.
And they found a way to generate infinite energy and just slide across the sur-
face. And it’s not what you expected. It’s just sort of like a perverse solution.
And so maybe we can find something like that. Maybe we can be that little dog
in this physical simulation.
Lex 33:46
The cracks or escapes, the intended consequences of the physics that the uni-
verse came up with. We’ll figure out some kind of shortcut to some weirdness.
And then, oh man, but see the problem with that weirdness is the first person
to discover the weirdness, like sliding on the back legs, that’s all we’re gonna
do. It’s very quickly because everybody does that thing. So the paperclip max-
imizer is a ridiculous idea, but that very well could be what then we’ll just,
Andrej 34:20
we’ll just all switch that because it’s so fun. Well, no person will discover it, I
think, by the way. I think it’s going to have to be some kind of a super intelli-
gent AGI of a third generation. Like we’re building the first generation AGI. I
mean, you know.
Lex 34:34
Third generation. Yeah, so the bootloader for an AI, that AI will be a bootloader
for another AI. That AI, yeah. And then there’s no way for us to introspect like
what that might even.
Andrej 34:48
I think it’s very likely that these things, for example, like say you have these
AGI’s, it’s very likely that, for example, they will be completely inert. I like
19

these kinds of sci-fi books sometimes where these things are just completely
inert. They don’t interact with anything. And I find that kind of beautiful
because they’ve probably figured out the meta game of the universe in some
way potentially. They’re doing something completely beyond our imagination.
And they don’t interact with simple chemical life forms. Like why would you
do that? So I find those kinds of ideas compelling.
Lex 35:17
What’s their source of fun? What are they doing? What’s the source of plea-
sure?
Andrej 35:21
Well, probably puddle solving in the universe.
Lex 35:23
But inert, so can you define what it means inert so they escape the interaction
with physical reality?
Andrej 35:29
They will appear inert to us as in they will behave in some very strange way to
us because they’re beyond, they’re playing the meta game. And the meta game
is probably say like arranging quantum mechanical systems in some very weird
ways to extract infinite energy, solve the digital expansion of pi to whatever
amount. They will build their own like little fusion reactors or something crazy.
Like they’re doing something beyond comprehension and not understandable to
us and actually brilliant under the hood.
Lex 36:01
What if quantum mechanics itself is the system and we’re just thinking it’s
physics but we’re really parasites on, not parasite, we’re not really hurting
physics. We’re just living on this organisms, this organism and we’re like trying
to understand it but really it is an organism and with a deep, deep intelligence.
Maybe physics itself is the organism that’s doing the super interesting thing and
we’re just like one little thing, ant sitting on top of it trying to get energy from
it.
Andrej 36:36
We’re just kind of like these particles in the wave that I feel like is mostly de-
terministic and takes a universe from some kind of a big bang to some kind of
a super intelligent replicator, some kind of a stable point in the universe given
20

these laws of physics.
Lex 36:50
You don’t think as Einstein said, God doesn’t play dice. So you think it’s mostly
deterministic. There’s no randomness in the thing?
Andrej 36:57
I think as a deterministic, oh, there’s tons of, well, I wanna be careful with
randomness. Pseudo random? Yeah, I don’t like random. I think maybe the
laws of physics are deterministic. Yeah, I think they’re deterministic.
Lex 37:09
You just got really uncomfortable with this question. Do you have anxiety
about whether the universe is random or not? Is this a source? There’s no
randomness. You said you like good will hunting. It’s not your fault, Andre.
It’s not your fault, man. So you don’t like randomness?
Andrej 37:29
Yeah, I think it’s unsettling. I think it’s a deterministic system. I think that
things that look random, like say the collapse of the wave function, et cetera,
I think they’re actually deterministic, just entanglement and so on. And some
kind of a multi-verse theory, something, something.
Lex 37:43
Okay, so why does it feel like we have a free will? Like if I raised a hand, I
chose to do this now. That doesn’t feel like a deterministic thing. It feels like
I’m making a choice.
Andrej 37:58
It feels like it.
Lex 37:59
Okay, so it’s all feelings. It’s just feelings. So when an RL agent is making a
choice, is that, it’s not really making a choice. The choice is already there.
Andrej 38:11
Yeah, you’re interpreting the choice and you’re creating a narrative for having
made it.
21

Lex 38:16
Yeah, and now we’re talking about the narrative. It’s very meta. Looking back,
what is the most beautiful or surprising idea in deep learning or AI in general
that you’ve come across? You’ve seen this field explode and grow in interesting
ways. Just what cool ideas like, like made you sit back and go, hmm, small, big
or small?
Andrej 38:39
Well, the one that I’ve been thinking about recently, the most probably is the
transformer architecture. So basically, neural networks have a lot of architec-
tures that were trendy have come and gone for different sensory modalities, like
for vision, audio, text. You would process them with different looking neural
nets. And recently we’ve seen this convergence towards one architecture, the
transformer. And you can feed it video, or you can feed it images or speech or
text, and it just gobbles it up. And it’s kind of like a bit of a general purpose
computer that is also trainable and very efficient to run on our hardware. And
so this paper came out in 2016, I wanna say.
Lex 39:20
Attention is all you need. Attention is all you need. You criticize the paper title
in retrospect that it wasn’t, it didn’t foresee the bigness of the impact that it
was going to have.
Andrej 39:33
Yeah, I’m not sure if the authors were aware of the impact that that paper
would go on to have, probably they weren’t. But I think they were aware of
some of the motivations and design decisions behind the transformer, and they
chose not to, I think, expand on it in that way in the paper. And so I think they
had an idea that there was more than just the surface of just like, oh, we’re just
doing translation and here’s a better architecture. You’re not just doing trans-
lation. This is like a really cool, differentiable, optimizable, efficient computer
that you’ve proposed. And maybe they didn’t have all of that foresight, but I
think it’s really interesting.
Lex 40:02
Isn’t it funny, sorry to interrupt, that that title is meme-able, that they went
for such a profound idea, they went with a, I don’t think anyone used that kind
of title before, right? Attention is all you need.
22

Andrej 40:14
Yeah, it’s like a meme or something, basically.
Lex 40:16
Isn’t that funny? That one, like maybe if it was a more serious title, it wouldn’t
have the impact.
Andrej 40:22
Honestly, yeah, there is an element of me that honestly agrees with you and
prefers it this way. Yes. If it was too grand, it would overpromise and then
underdeliver potentially. So you want to just meme your way to greatness.
Lex 40:37
That should be a t-shirt. So you tweeted, the Transformer is a magnificent neu-
ral network architecture because it is a general purpose, differentiable computer.
It is simultaneously expressive in the forward pass, optimizable via backpropa-
gation gradient descent, and efficient high parallelism compute graph. Can you
discuss some of those details, expressive, optimizable, efficient from memory or
in general, whatever comes to your heart?
Andrej 41:05
You want to have a general purpose computer that you can train on arbitrary
problems, like say the task of next work prediction or detecting if there’s a cat
in a image or something like that. And you want to train this computer, so you
want to set its weights. And I think there’s a number of design criteria that
sort of overlap in the Transformer simultaneously that made it very successful.
And I think the authors were kind of deliberately trying to make this really
powerful architecture. And so basically it’s very powerful in the forward pass
because it’s able to express very general computation as sort of something that
looks like message passing. You have nodes and they all store vectors and these
nodes get to basically look at each other and it’s each other’s vectors and they
get to communicate. And basically nodes get to broadcast, hey, I’m looking
for certain things. And then other nodes get to broadcast, hey, these are the
things I have. Those are the keys and the values. So it’s not just attention.
Yeah, exactly. Transformer is much more than just the attention component.
It’s got many pieces architectural that went into it. The residual connection,
the way it’s arranged, there’s a multi-layer perceptron and there the way it’s
stacked and so on. But basically there’s a message passing scheme where nodes
get to look at each other, decide what’s interesting and then update each other.
And so I think when you get to the details of it, I think it’s a very expressive
function. So it can express lots of different types of algorithms in forward pass.
Not only that, but the way it’s designed with the residual connections, layer nor-
23

malizations, the softmax attention and everything, it’s also optimizable. This
is a really big deal because there’s lots of computers that are powerful that you
can’t optimize or they’re not easy to optimize using the techniques that we have,
which is backpropagation and gradient sent. These are first order methods, very
simple optimizers really. And so you also need it to be optimizable. And then
lastly, you want it to run efficiently in our hardware. Our hardware is a massive
throughput machine, like GPUs. They prefer lots of parallelism. So you don’t
want to do lots of sequential operations. You want to do a lot of operations
serially. And the transformer is designed with that in mind as well. And so it’s
designed for our hardware and it’s designed to both be very expressive in the
forward pass, but also very optimizable in the backward pass.
Lex 43:10
And you said that the residual connections support a kind of ability to learn
short algorithms fast and first, and then gradually extend them longer during
training. What’s the idea of learning short algorithms?
Andrej 43:23
Right. Think of it as, so basically a transformer is a series of blocks, right?
And these blocks have a tension and a little multi-layer perceptron. And so
you go off into a block and you come back to this residual pathway and then
you go off and you come back. And then you have a number of layers arranged
sequentially. And so the way to look at it, I think, is because of the residual
pathway in the backward pass, the gradients sort of flow along it uninterrupted
because addition distributes the gradient equally to all of its branches. So the
gradient from the supervision at the top just floats directly to the first layer.
And all the residual connections are arranged so that in the beginning, during
initialization, they contribute nothing to the residual pathway. So what it kind
of looks like is, imagine the transformer is kind of like a Python function, like
a dev. And you get to do various kinds of like lines of code. Say you have a
hundred layers deep transformer, typically they would be much shorter, say 20.
So if 20 lines of code, then you can do something in them. And so think of
during the optimization, basically what it looks like is first, you optimize the
first line of code and then the second line of code can kick in and the third line
of code can kick in. And I kind of feel like because of the residual pathway and
the dynamics of the optimization, you can sort of learn a very short algorithm
that gets the approximate answer, but then the other layers can sort of kick
in and start to create a contribution. And at the end of it, you’re optimizing
over an algorithm that is 20 lines of code, except these lines of code are very
complex because it’s an entire block of a transformer. You can do a lot in there.
What’s really interesting is that this transformer architecture actually has been
a remarkably resilient. Basically the transformer that came out in 2016 is the
transformer you would use today, except you reshuffle some of the layer norms.
The layer normalizations have been reshuffled to a pre-norm formulation. And
24

so it’s been remarkably stable, but there’s a lot of bells and whistles that peo-
ple have attached to it and tried to improve it. I do think that basically it’s a
big step in simultaneously optimizing for lots of properties of a desirable neural
network architecture. And I think people have been trying to change it, but it’s
proven remarkably resilient. But I do think that there should be even better
architectures potentially.
Lex 45:26
But you admire the resilience here. There’s something profound about this ar-
chitecture that leads to resilience. So maybe everything can be turned into a
problem that transformers can solve.
Andrej 45:39
Currently, it definitely looks like the transformers taking over AI and you can
feed basically arbitrary problems into it. And it’s a general differentiable com-
puter and it’s extremely powerful. And this convergence in AI has been really
interesting to watch for me personally.
Lex 45:53
What else do you think could be discovered here about transformers? Like what
surprising thing? Or is it a stable, I want a stable place. Is there something
interesting we might discover about transformers? Like aha moments, maybe
has to do with memory, maybe knowledge representation, that kind of stuff.
Andrej 46:12
Definitely the zeitgeist today is just pushing, like basically right now the zeit-
geist is do not touch the transformer, touch everything else. So people are
scaling up the datasets, making them much, much bigger. They’re working on
the evaluation, making the evaluation much, much bigger. And they’re basically
keeping the architecture unchanged. And that’s how we’ve, that’s the last five
years of progress in AI kind of.
Lex 46:34
What do you think about one flavor of it, which is language models? Have you
been surprised? Has your sort of imagination been captivated by, you mentioned
GPT and all the bigger and bigger and bigger language models. And what are
the limits of those models do you think? So just for the task of natural language.
Andrej 47:00
Basically the way GPT is trained, right, is you just download a massive amount
25

of text data from the internet and you try to predict the next word in the se-
quence, roughly speaking. You’re predicting little word chunks, but roughly
speaking, that’s it. And what’s been really interesting to watch is, basically it’s
a language model. Language models have actually existed for a very long time.
There’s papers on language modeling from 2003, even earlier.
Lex 47:24
Can you explain in that case what a language model is?
Andrej 47:27
Yeah, so language model just basically the rough idea is just predicting the next
word in a sequence, roughly speaking. So there’s a paper from, for example,
Benjio and the team from 2003, where for the first time they were using a neural
network to take say like three or five words and predict the next word. And
they’re doing this on much smaller data sets. And the neural net is not a trans-
former, it’s a multi-layer perceptron, but it’s the first time that a neural network
has been applied in that setting. But even before neural networks, there were
language models, except they were using N-gram models. So N-gram models
are just count-based models. So if you start to take two words and predict the
third one, you just count up how many times you’ve seen any two word combi-
nations and what came next. And what you predict that’s coming next is just
what you’ve seen the most of in the training set. And so language modeling
has been around for a long time. Neural networks have done language modeling
for a long time. So really what’s new or interesting or exciting is just realizing
that when you scale it up with a powerful enough neural net, a transformer,
you have all these emergent properties where basically what happens is if you
have a large enough data set of text, you are in the task of predicting the next
word, you are multitasking a huge amount of different kinds of problems. You
are multitasking understanding of chemistry, physics, human nature. Lots of
things are sort of clustered in that objective. It’s a very simple objective, but
actually you have to understand a lot about the world to make that prediction.
Lex 49:00
You just said the you word understanding. Are you, in terms of chemistry and
physics and so on, what do you feel like it’s doing? Is it searching for the right
context? What is the actual process happening here?
Andrej 49:16
Yeah, so basically it gets a thousand words and it’s trying to predict the thou-
sand and first. And in order to do that very, very well over the entire data set
available on the internet, you actually have to basically kind of understand the
context of what’s going on in there. And it’s a sufficiently hard problem that
26

if you have a powerful enough computer, like a transformer, you end up with
interesting solutions. And you can ask it to do all kinds of things. And it shows
a lot of emergent properties like in-context learning. That was the big deal with
GPT and the original paper when they published it, is that you can just sort of
prompt it in various ways and ask it to do various things. And it will just kind
of complete the sentence. But in the process of just completing the sentence,
it’s actually solving all kinds of really interesting problems that we care about.
Lex 50:05
Do you think it’s doing something like understanding? Like when we use the
word understanding for us humans?
Andrej 50:13
I think it’s doing some understanding. In its weight, it understands, I think
a lot about the world and it has to in order to predict the next word in the
sequence.
Lex 50:22
So it’s trained on the data from the internet. What do you think about this
approach in terms of datasets of using data from the internet? Do you think
the internet has enough structured data to teach AI about human civilization?
Andrej 50:37
Yeah, so I think the internet has a huge amount of data. I’m not sure if it’s a
complete enough set. I don’t know that text is enough for having a sufficiently
powerful AGI as an outcome.
Lex 50:48
Of course there is audio and video and images and all that kind of stuff.
Andrej 50:52
Yeah, so text by itself, I’m a little bit suspicious about. There’s a ton of things
we don’t put in text in writing just because they’re obvious to us about how the
world works and the physics of it and that things fall. We don’t put that stuff
in text because why would you? We share that understanding. And so text
is a communication medium between humans and it’s not a all encompassing
medium of knowledge about the world. But as you pointed out, we do have
video and we have images and we have audio. And so I think that that defi-
nitely helps a lot, but we haven’t trained models sufficiently across both, across
all of those modalities yet. So I think that’s what a lot of people are interested
27

in.
Lex 51:25
But I wonder what that shared understanding of what we might call common
sense has to be learned, inferred, in order to complete the sentence correctly.
So maybe the fact that it’s implied on the internet, the model’s gonna have to
learn that. Not by reading about it, by inferring it in the representation. So
common sense, just like we, I don’t think we learn common sense. Nobody says,
tells us explicitly. We just figure it all out by interacting with the world. And
so here’s a model of reading about the way people interact with the world. It
might have to infer that. I wonder. Yeah. You briefly worked on a project called
World of Bits, training an RL system to take actions on the internet versus just
consuming the internet like we talked about. Do you think there’s a future for
that kind of system, interacting with the internet to help the learning?
Andrej 52:21
Yes, I think that’s probably the final frontier for a lot of these models because,
so as you mentioned, when I was at OpenAI, I was working on this project,
World of Bits, and basically it was the idea of giving neural networks access to
a keyboard and a mouse. And the idea is- What could possibly go wrong? So
basically you perceive the input of the screen pixels and basically the state of
the computer is sort of visualized for human consumption in images of the web
browser and stuff like that. And then you give the neural network the ability
to press keyboards and use the mouse. And we were trying to get it to, for
example, complete bookings and interact with user interfaces. And-
Lex 52:59
What’d you learn from that experience? Like what was some fun stuff? This
is a super cool idea. Yeah. I mean, it’s like, yeah, I mean, the step between
observer to actor is a super fascinating step.
Andrej 53:12
Yeah. There’s a universal interface in the digital realm, I would say. And there’s
a universal interface in the physical realm, which in my mind is a humanoid form
factor kind of thing. We can later talk about Optimus and so on, but I feel like
there’s a, they’re kind of like a similar philosophy in some way, where the phys-
ical world is designed for the human form and the digital world is designed for
the human form of seeing the screen and using keyboard and mouse. And so
it’s the universal interface that can basically command the digital infrastructure
we’ve built up for ourselves. And so it feels like a very powerful interface to
command and to build on top of. Now to your question as to like what I learned
from that, it’s interesting because the world of bits was basically too early, I
28

think, at OpenAI at the time. This is around 2015 or so. And the zeitgeist at
that time was very different in AI from the zeitgeist today. At the time, every-
one was super excited about reinforcement learning from scratch. This is the
time of the Atari paper, where neural networks were playing Atari games and
beating humans in some cases, AlphaGo and so on. So everyone was very ex-
cited about training neural networks from scratch using reinforcement learning
directly. It turns out that reinforcement learning is extremely inefficient way of
training neural networks, because you’re taking all these actions and all these
observations, and you get some sparse rewards once in a while. So you do all
this stuff based on all these inputs. And once in a while you’re like told, you
did a good thing, you did a bad thing. And it’s just an extremely hard problem
you can’t learn from that. You can burn a forest and you can sort of boot force
through it and we saw that I think with, with Go and Dota and so on, and
it does work, but it’s extremely inefficient and not how you want to approach
problems, practically speaking. And so that’s the approach that at the time we
also took to world of bits. We would have an agent initialize randomly. So with
keyboard mash and mouse mash and try to make a booking. And it’s just like
revealed the insanity of that approach very quickly, where you have to stumble
by the correct booking in order to get a reward of you did it correctly. And
you’re never gonna stumble by it by chance at random.
Lex 55:19
So even with a simple web interface, there’s too many options.
Andrej 55:22
There’s just too many options. And it’s too sparse of a reward signal. And
you’re starting from scratch at the time. And so you don’t know how to read.
You don’t understand pictures, images, buttons. You don’t understand what it
means to like make a booking. But now what’s happened is it is time to revisit
that. And OpenAI is interested in this. Companies like ADEPT are interested
in this and so on. And the idea is coming back because the interface is very
powerful. But now you’re not training an agent from scratch. You are taking
the GPT as an initialization. So GPT is pre-trained on all of text. And it
understands what’s a booking. It understands what’s a submit. It understands
quite a bit more. And so it already has those representations. They are very
powerful. And that makes all of the training significantly more efficient and
makes the problem tractable.
Lex 56:07
Should the interaction be with like the way humans see it, with the buttons and
the language, or should it be with the HTML, JavaScript and the CSS? What
do you think is the better?
29

Andrej 56:18
So today, all of this interaction is mostly on the level of HTML, CSS and so
on. That’s done because of computational constraints. But I think ultimately
everything is designed for human visual consumption. And so at the end of the
day, there’s all the additional information is in the layout of the webpage and
what’s next to it. And what’s our red background and all this kind of stuff
and what it looks like visually. So I think that’s the final frontier as we are
taking in pixels and we’re giving out keyboard mouse commands. But I think
it’s impractical still today.
Lex 56:45
Do you worry about bots on the internet given these ideas, given how exciting
they are? Do you worry about bots on Twitter being not the stupid bots that
we see now with the crypto bots, but the bots that might be out there actually
that we don’t see, that they’re interacting in interesting ways? So this kind of
system feels like it should be able to pass the I’m not a robot click button, what-
ever. Which do you actually understand how that test works? I don’t quite,
like there’s a checkbox or whatever that you click. It’s presumably tracking like
mouse movement and the timing and so on. So exactly this kind of system we’re
talking about should be able to pass that. So yeah, what do you feel about bots
that are language models plus have some interactability and are able to tweet
and reply and so on? Do you worry about that world?
Andrej 57:41
Yeah, I think it’s always been a bit of an arms race between sort of the attack
and the defense. So the attack will get stronger, but the defense will get stronger
as well, our ability to detect that.
Lex 57:51
How do you defend? How do you detect? How do you know that your Karpathy
account on Twitter is human? How would you approach that? Like if people
were claimed, how would you defend yourself in the court of law that I am a
human? This account is human.
Andrej 58:09
Yeah, at some point I think it might be, I think the society will evolve a little
bit. Like we might start signing, digitally signing some of our correspondence
or things that we create. Right now it’s not necessary, but maybe in the future
it might be. I do think that we are going towards a world where we share the
digital space with AIs.
30

Lex 58:30
Synthetic beings.
Andrej 58:31
Yeah, and they will get much better and they will share our digital realm and
they’ll eventually share our physical realm as well. It’s much harder. But that’s
kind of like the world we’re going towards. And most of them will be benign
and awful and some of them will be malicious and it’s going to be an arms race
trying to detect them.
Lex 58:46
So, I mean, the worst isn’t the AIs, the worst is the AIs pretending to be human.
So I don’t know if it’s always malicious. There’s obviously a lot of malicious ap-
plications, but it could also be, if I was an AI, I would try very hard to pretend
to be human because we’re in a human world. I wouldn’t get any respect as an
AI. I want to get some love and respect on Twitter.
Andrej 59:10
I don’t think the problem is intractable. People are thinking about the proof
of personhood and we might start digitally signing our stuff and we might all
end up having like, yeah, basically some solution for proof of personhood. It
doesn’t seem to me intractable. It’s just something that we haven’t had to do
until now. But I think once the need like really starts to emerge, which is soon,
I think people will think about it much more.
Lex 59:33
So, but that too will be a race because obviously you can probably spoof or fake
the proof of personhood. So you have to try to figure out how to- Probably.
I mean, it’s weird that we have like social security numbers and like passports
and stuff. It seems like it’s harder to fake stuff in the physical space than the
digital space. It just feels like it’s going to be very tricky, very tricky to out.
Because it seems to be pretty low cost to fake stuff. What are you going to put
an AI in jail for like trying to use a fake personhood proof? I mean, okay, fine.
You’ll put a lot of AIs in jail, but there’ll be more AIs, like exponentially more.
The cost of creating a bot is very low. Unless there’s some kind of way to track
accurately, like you’re not allowed to create any program without showing, tying
yourself to that program. Like any program that runs on the internet, you’ll be
able to trace every single human program in those involved with that program.
Andrej 1:00:46
31

Yeah, maybe you have to start declaring when, we have to start drawing those
boundaries and keeping track of, okay, what are digital entities versus human
entities? And what is the ownership of human entities and digital entities and
something like that. I don’t know, but I think I’m optimistic that this is possi-
ble. And in some sense, we’re currently in like the worst time of it because all
these bots suddenly have become very capable, but we don’t have defenses yet
built up as a society. But I think that doesn’t seem to me intractable. It’s just
something that we have to deal with.
Lex 1:01:22
It seems weird that the Twitter bot, like really crappy Twitter bots are so nu-
merous. Like is it, so I presume that the engineers at Twitter are very good.
So it seems like what I would infer from that is it seems like a hard problem.
They’re probably catching, all right, if I were to sort of steel man the case, it’s
a hard problem and there’s a huge cost to false positive, to removing a post by
somebody that’s not a bot. That creates a very bad user experience. So they’re
very cautious about removing. So maybe it’s, and maybe the bots are really
good at learning what gets removed and not such that they can stay ahead of
the removal process very quickly.
Andrej 1:02:10
My impression of it, honestly, is there’s a lot of low-hung fruit. I mean, just,
it’s not subtle. That’s my impression of it.
Lex 1:02:18
It’s not subtle. But you have to, yeah, that’s my impression as well. But it feels
like maybe you’re seeing the tip of the iceberg. Maybe the number of bots is in
like the trillions and you have to like, just, it’s a constant assault of bots and
you, I don’t know. You have to steel man the case, because the bots I’m seeing
are pretty like obvious. I could write a few lines of code to catch these bots.
Andrej 1:02:45
I mean, definitely there’s a lot of low-hung fruit, but I will say, I agree that if
you are a sophisticated actor, you could probably create a pretty good bot right
now using tools like GPTs, because it’s a language model. You can generate
faces that look quite good now and you can do this at scale. And so I think,
yeah, it’s quite plausible and it’s going to be hard to defend.
Lex 1:03:06
There was a Google engineer that claimed that the Lambda was sentient. Do
you think there’s any inkling of truth to what he felt? And more importantly, to
32

me at least, do you think language models will achieve sentience or the illusion
of sentience soon-ish?
Andrej 1:03:25
Yeah, to me it’s a little bit of a canary in a coal mine kind of moment, honestly,
a little bit, because, so this engineer spoke to like a chat bot at Google and
became convinced that this bot is sentient. He asked it some existential philo-
sophical questions. And it gave like reasonable answers and looked real and so
on. So to me, it’s a, he wasn’t sufficiently trying to stress the system, I think,
and exposing the truth of it as it is today. But I think this will be increasingly
harder over time. So yeah, I think more and more people will basically become,
yeah, I think more and more, there’ll be more people like that over time as this
gets better.
Lex 1:04:13
Like form an emotional connection to an AI chat bot.
Andrej 1:04:16
Yeah, perfectly plausible in my mind. I think these AIs are actually quite good
at human connection, human emotion. A ton of text on the internet is about
humans and connection and love and so on. So I think they have a very good
understanding in some sense of how people speak to each other about this. And
they’re very capable of creating a lot of that kind of text. There’s a lot of like
sci-fi from fifties and sixties that imagined AIs in a very different way. They
are calculating cold Vulcan like machines. That’s not what we’re getting today.
We’re getting pretty emotional AIs that actually are very competent and capa-
ble of generating plausible sounding text with respect to all of these topics.
Lex 1:04:58
See, I’m really hopeful about AI systems that are like companions that help you
grow, develop as a human being, help you maximize long-term happiness. But
I’m also very worried about AI systems that figure out from the internet that
humans get attracted to drama. And so these would just be like shit talking
AIs. They just constantly, did you hear? Like they’ll do gossip. They’ll try to
plant seeds of suspicion to other humans that you love and trust and just kind
of mess with people, cause that’s going to get a lot of attention. So drama,
maximize drama on the path to maximizing engagement. And us humans will
feed into that machine and it’ll be a giant drama shit storm. So I’m worried
about that. So it’s the objective function really defines the way that human
civilization progresses with AIs in it.
33

Andrej 1:05:54
I think right now, at least today, they are not sort of, it’s not correct to really
think of them as goal-seeking agents that want to do something. They have no
long-term memory or anything. It’s literally a good approximation of it is you
get a thousand words and you’re trying to predict the thousand at first, and
then you continue feeding it in. And you are free to prompt it in whatever way
you want. So in text, so you say, okay, you are a psychologist and you are very
good and you love humans. And here’s a conversation between you and another
human, human colon something, you something. And then it just continues the
pattern. And suddenly you’re having a conversation with a fake psychologist
who’s like trying to help you. And so it’s still kind of like in the realm of a tool,
it is a, people can prompt it in arbitrary ways and it can create really incredible
text, but it doesn’t have long-term goals over long periods of time. It doesn’t
try to, so it doesn’t look that way right now.
Lex 1:06:44
But you can do short-term goals that have long-term effects. So if my prompting
short-term goal is to get Andrew Capote to respond to me on Twitter, when I
like, I think AI might, that’s the goal, but it might figure out that talking shit
to you, it would be the best in a highly sophisticated, interesting way. And then
you build up a relationship when you respond once. And then it like over time,
it gets to not be sophisticated and just like, just talk shit. And okay, maybe
it won’t get to Andre, but it might get to another celebrity. It might get into
other big accounts and then it’ll just, so with just that simple goal, get them to
respond, maximize the probability of actual response.
Andrej 1:07:34
Yeah, I mean, you could prompt a powerful model like this with its opinion
about how to do any possible thing you’re interested in. So they will just,
they’re kind of on track to become these oracles. I could sort of think of it that
way. They are oracles, currently it’s just text, but they will have calculators,
they will have access to Google search, they will have all kinds of gadgets and
gizmos, they will be able to operate the internet and find different information.
And yeah, in some sense, that’s kind of like currently what it looks like in terms
of the development.
Lex 1:08:04
Do you think it’ll be an improvement eventually over what Google is for access
to human knowledge? Like it’ll be a more effective search engine to access hu-
man knowledge?
Andrej 1:08:15
34

I think there’s definite scope in building a better search engine today. And I
think Google, they have all the tools, all the people, they have everything they
need, all the puzzle pieces, they have people training transformers at scale, they
have all the data. It’s just not obvious if they are capable as an organization to
innovate on their search engine right now. And if they don’t, someone else will.
There’s absolute scope for building a significantly better search engine built on
these tools.
Lex 1:08:37
It’s so interesting, a large company where the search, there’s already an infras-
tructure, it works as it brings out a lot of money. So where structurally inside a
company is their motivation to pivot? To say we’re going to build a new search
engine. Yeah, that’s really hard. So it’s usually going to come from a startup,
right?
Andrej 1:08:57
That would be, yeah. Or some other more competent organization. So I don’t
know. So currently, for example, maybe Bing has another shot at it as an ex-
ample.
Lex 1:09:09
Microsoft Edge, we’re talking offline.
Andrej 1:09:12
I mean, I definitely, it’s really interesting because search engines used to be
about, okay, here’s some query. Here’s web pages that look like the stuff that
you have, but you could just directly go to answer and then have supporting
evidence. And these models basically, they’ve read all the texts and they’ve read
all the web pages. And so sometimes when you see yourself going over to search
results and sort of getting like a sense of like the average answer to whatever
you’re interested in, like that just directly comes out. You don’t have to do that
work. So they’re kind of like, yeah, I think they have a way of distilling all that
knowledge into like some level of insight, basically.
Lex 1:09:50
Do you think of prompting as a kind of teaching and learning like this whole
process, like another layer? Cause maybe that’s what humans are. Where you
have that background model and then the world is prompting you.
Andrej 1:10:07
35

Yeah, exactly. I think the way we are programming these computers now, like
GPTs is converging to how you program humans. I mean, how do I program
humans via prompt? I go to people and I prompt them to do things. I prompt
them from information. And so natural language prompt is how we program
humans. And we’re starting to program computers directly in that interface.
It’s like pretty remarkable, honestly.
Lex 1:10:28
So you’ve spoken a lot about the idea of software 2.0. All good ideas become
like cliches so quickly. Like the terms, it’s kind of hilarious. It’s like, I think
Eminem once said that like, if he gets annoyed by a song he’s written very
quickly, that means it’s going to be a big hit because it’s too catchy. But can
you describe this idea and how you’re thinking about it has evolved over the
months and years since you coined it?
Overlaps 1:11:00
Yeah.
Andrej 1:11:01
Yes, I had a blog post on software 2.0, I think several years ago now. And the
reason I wrote that post is because I kept, I kind of saw something remarkable
happening in like software development and how a lot of code was being tran-
sitioned to be written not in sort of like C++ and so on, but it’s written in
the weights of a neural net. Basically just saying that neural nets are taking
over software, the realm of software and taking more and more and more tasks.
And at the time, I think not many people understood this deeply enough that
this is a big deal. This is a big transition. Neural networks were seen as one
of multiple classification algorithms you might use for your dataset problem on
Kaggle. Like this is not that, this is a change in how we program computers.
And I saw neural nets as this is going to take over. The way we program com-
puters is going to change. It’s not going to be people writing a software in C++
or something like that and directly programming the software. It’s going to be
accumulating training sets and datasets and crafting these objectives by which
we train these neural nets. And at some point there’s going to be a compilation
process from the datasets and the objective and the architecture specification
into the binary, which is really just the neural net weights and the forward pass
of the neural net. And then you can deploy that binary. And so I was talking
about that sort of transition and that’s what the post is about. And I saw this
sort of play out in a lot of fields, autopilot being one of them, but also just a
simple image classification. People thought originally in the 80s and so on that
they would write the algorithm for detecting a dog in an image. And they had
all these ideas about how the brain does it. And first we detect corners and
then we detect lines and then we stitched them up. And they were like really
36

going at it. They were like thinking about how they’re going to write the algo-
rithm. And this is not the way you build it. And there was a smooth transition
where, okay, first we thought we were going to build everything. Then we were
building the features. So like hog features and things like that. That detect
these little statistical patterns from image patches. And then there was a little
bit of learning on top of it, like a support vector machine or binary classifier
for cat versus dog and images on top of the features. So we wrote the features,
but we trained the last layer, sort of the classifier. And then people are like,
actually let’s not even design the features because we can’t. Honestly, we’re
not very good at it. So let’s also learn the features. And then you end up with
basically a convolutional neural net where you’re learning most of it. You’re just
specifying the architecture and the architecture has tons of filling the blanks,
which is all the knobs and you let the optimization write most of it. And so
this transition is happening across the industry everywhere. And suddenly we
end up with a ton of code that is written in neural net weights. And I was just
pointing out that the analogy is actually pretty strong. And we have a lot of
developer environments for software 1.0, like we have IDEs, how you work with
code, how you debug code, how do you run code, how do you maintain code.
We have GitHub. So I was trying to make those analogies in the neural realm.
Like what is the GitHub of software 2.0? Turns out it’s something that looks
like Hugging Face right now. And so I think some people took it seriously and
built cool companies. And many people originally attacked the post. It actually
was not well received when I wrote it. And I think maybe it has something to
do with the title, but the post was not well received. And I think more people
sort of have been coming around to it over time.
Lex 1:14:23
Yeah, so you were the director of AI at Tesla where I think this idea was really
implemented at scale, which is how you have engineering teams doing software
2.0. So can you sort of linger on that idea of, I think we’re in the really early
stages of everything you just said, which is like GitHub IDEs. Like how do
we build engineering teams that work in software 2.0 systems? And the data
collection and the data annotation, which is all part of that software 2.0. Like
what do you think is the task of programming a software 2.0? Is it debugging
in the space of hyperparameters or is it also debugging the space of data?
Andrej 1:15:10
Yeah, the way by which you program the computer and influence its algorithm
is not by writing the commands yourself. You’re changing mostly the data set.
You’re changing the loss functions of what the neural net is trying to do, how it’s
trying to predict things, but basically the data sets and the architecture of the
neural net. And so in the case of the autopilot, a lot of the data sets had to do
with, for example, detection of objects and lane line markings and traffic lights
and so on. So you accumulate massive data sets of, here’s an example, here’s
37

the desired label. And then here’s roughly what the algorithm should look like.
And that’s a convolutional neural net. So the specification of the architecture is
like a hint as to what the algorithm should roughly look like. And then the fill
in the blanks process of optimization is the training process. And then you take
your neural net that was trained, it gives all the right answers on your data set
and you deploy it.
Lex 1:16:05
So there’s, in that case, perhaps in all machine learning cases, there’s a lot of
tasks. So is coming up formulating a task, like for a multi-headed neural net-
work, is formulating a task part of the programming? Yeah, Harry Marceau.
How do you break down a problem into a set of tasks?
Andrej 1:16:26
Yeah, I’m on a high level, I would say, if you look at the software running in the
autopilot, I gave a number of talks on this topic. I would say originally a lot of
it was written in software 1.0. There’s imagine lots of C++, right? And then
gradually there was a tiny neural net that was, for example, predicting given a
single image, is there like a traffic light or not, or is there a lane line marking
or not? And this neural net didn’t have too much to do in the scope of the
software. It was making tiny predictions on individual little image. And then
the rest of the system stitched it up. So, okay, we’re actually, we don’t have just
a single camera, we have eight cameras. We actually have eight cameras over
time. And so what do you do with these predictions? How do you put them
together? How do you do the fusion of all that information and how do you
act on it? All of that was written by humans in C++. And then we decided,
okay, we don’t actually want to do all of that fusion in C++ code because we’re
actually not good enough to write that algorithm. We want the neural nets to
write the algorithm and we want to port all of that software into the 2.0 stack.
And so then we actually had neural nets that now take all the eight camera
images simultaneously and make predictions for all of that. And actually they
don’t make predictions in the space of images. They now make predictions
directly in 3D. And actually they don’t in three dimensions around the car.
And now actually we don’t manually fuse the predictions over in 3D over time.
We don’t trust ourselves to write that tracker. So actually we give the neural
net the information over time. So it takes these videos now and makes those
predictions. And so you’re sort of just like putting more and more power into
the neural net, more and more processing. And at the end of it, the eventual
sort of goal is to have most of the software potentially be in 2.0 land because it
works significantly better. Humans are just not very good at writing software
basically.
Lex 1:18:16
38

So the prediction is happening in this like 4D land with three-dimensional world
over time. How do you do annotation in that world? What have you, so data
annotation, whether it’s self-supervised or manual by humans is a big part of
this software 2.0 world.
Andrej 1:18:38
Right. I would say by far in the industry, if you’re like talking about the industry
and what is the technology of what we have available, everything is supervised
learning. So you need a data sets of input, desired output, and you need lots of
it. And there are three properties of it that you need. You need it to be very
large. You need it to be accurate, no mistakes, and you need it to be diverse.
You don’t want to just have a lot of correct examples of one thing. You need to
really cover the space of possibility as much as you can. And the more you can
cover the space of possible inputs, the better the algorithm will work at the end.
Now, once you have really good data sets that you’re collecting, curating, and
cleaning, you can train your neural net on top of that. So a lot of the work goes
into cleaning those data sets. Now, as you pointed out, it’s probably, it could be,
the question is how do you achieve a ton of, if you want to basically predict in
3D, you need data in 3D to back that up. So in this video, we have eight videos
coming from all the cameras of the system, and this is what they saw. And this
is the truth of what actually was around. There was this car, there was this car,
this car. These are the lane line markings. This is the geometry of the road.
There’s traffic light in this three-dimensional position. You need the ground
truth. And so the big question that the team was solving, of course, is how do
you arrive at that ground truth? Because once you have a million of it, and it’s
large, clean, and diverse, then training a neural net on it works extremely well,
and you can ship that into the car. And so there’s many mechanisms by which
we collected that training data. You can always go for human annotation. You
can go for simulation as a source of ground truth. You can also go for what
we call the offline tracker that we’ve spoken about at the AI Day and so on,
which is basically an automatic reconstruction process for taking those videos
and recovering the three-dimensional sort of reality of what was around that
car. So basically think of doing like a three-dimensional reconstruction as an
offline thing, and then understanding that, okay, there’s 10 seconds of video.
This is what we saw, and therefore, here’s all the lane lines, cars, and so on.
And then once you have that annotation, you can train a neural net to imitate
it.
Lex 1:20:40
And how difficult is the reconstruction?
Andrej 1:20:43
It’s difficult, but it can be done.
39

Lex 1:20:45
So there’s overlap between the cameras, and you do the reconstruction, and
there’s perhaps if there’s any inaccuracy, so that’s caught in the annotation
step.
Andrej 1:20:56
Yes, the nice thing about the annotation is that it is fully offline. You have in-
finite time. You have a chunk of one minute, and you’re trying to just offline in
a supercomputer somewhere figure out where were the positions of all the cars,
all the people, and you have your full one minute of video from all the angles,
and you can run all the neural nets you want, and they can be very efficient,
massive neural nets. There can be neural nets that can’t even run in the car
later at test time. So they can be even more powerful neural nets than what
you can eventually deploy. So you can do anything you want, three-dimensional
reconstruction, neural nets, anything you want just to recover that truth, and
then you supervise that truth.
Lex 1:21:29
What have you learned, you said no mistakes, about humans doing annotation?
Because I assume humans are, there’s like a range of things they’re good at in
terms of clicking stuff on screen. Isn’t that, how interesting is that to you of a
problem of designing an annotator where humans are accurate, enjoy it? Like
what are even the metrics? Are efficient, are productive, all that kind of stuff?
Andrej 1:21:54
Yeah, so I grew the annotation team at Tesla from basically zero to a thousand
while I was there. That was really interesting. You know, my background is
a PhD student, researcher, so growing that kind of an organization was pretty
crazy. But yeah, I think it’s extremely interesting and part of the design process
very much behind the autopilot as to where you use humans. Humans are very
good at certain kinds of annotations. They’re very good, for example, at two-
dimensional annotations of images. They’re not good at annotating cars over
time in three-dimensional space, very, very hard. And so that’s why we were
very careful to design the tasks that are easy to do for humans versus things
that should be left to the offline tracker. Like maybe the computer will do all
the triangulation and three-degree construction, but the human will say exactly
these pixels of the image are a car. Exactly these pixels are a human. And so
co-designing the data annotation pipeline was very much bread and butter was
what I was doing daily.
40

Lex 1:22:48
Do you think there’s still a lot of open problems in that space? Just in general,
annotation, where the stuff the machines are good at, machines do and the hu-
mans do what they’re good at, and there’s maybe some iterative process.
Andrej 1:23:03
I think to a very large extent, we went through a number of iterations and
we learned a ton about how to create these datasets. I’m not seeing big open
problems. Like originally when I joined, I was really not sure how this would
turn out. But by the time I left, I was much more secure and actually we sort
of understand the philosophy of how to create these datasets. And I was pretty
comfortable with where that was at the time.
Lex 1:23:25
So what are strengths and limitations of cameras for the driving task in your
understanding? When you formulate the driving task as a vision task with eight
cameras, you’ve seen that the entire, most of the history of the computer vision
field when it has to do with neural networks, what, just if you step back, what
are the strengths and limitations of pixels, of using pixels to drive?
Andrej 1:23:49
Yeah, pixels I think are a beautiful sensory, beautiful sensor I would say. The
thing is like cameras are very, very cheap and they provide a ton of informa-
tion, ton of bits. So it’s a extremely cheap sensor for a ton of bits and each
one of these bits is a constraint on the state of the world. And so you get lots
of megapixel images, very cheap, and it just gives you all these constraints for
understanding what’s actually out there in the world. So vision is probably the
highest bandwidth sensor. It’s a very high bandwidth sensor. And-
Lex 1:24:22
I love that pixels is a constraint on the world. This is highly complex, high
bandwidth constraint on the state of the world. That’s fascinating.
Andrej 1:24:34
It’s not just that, but again, this real importance of, it’s the sensor that humans
use. Therefore everything is designed for that sensor. Yeah. The text, the
writing, the flashing signs, everything is designed for vision. And so you just
find it everywhere. And so that’s why that is the interface you want to be in.
Talking again about these universal interfaces. And that’s where we actually
want to measure the world as well and then develop software for that sensor.
41

Lex 1:25:02
But there’s other constraints on the state of the world that humans use to
understand the world. I mean, vision ultimately is the main one, but we’re
like referencing our understanding of human behavior and some common sense
physics that could be inferred from vision from a perception perspective. But it
feels like we’re using some kind of reasoning to predict the world, not just the
pixels.
Andrej 1:25:31
I mean, you have a powerful prior for how the world evolves over time, et cetera.
So it’s not just about the likelihood term coming up from the data itself, telling
you about what you are observing, but also the prior term of like, where are the
likely things to see and how do they likely move and so on.
Lex 1:25:47
And the question is how complex is the range of possibilities that might happen
in the driving task? Right. That’s still, is that to you still an open problem of
how difficult is driving, like philosophically speaking? Mm. Like, all the time
you’ve worked on driving, do you understand how hard driving is?
Andrej 1:26:10
Yeah, driving is really hard. Because it has to do with predictions of all these
other agents and the theory of mind and what they’re going to do. And are
they looking at you? Where are they looking? Where are they thinking? Yeah.
There’s a lot that goes there. At the full tail of the expansion of the noise that
we have to be comfortable with eventually, the final problems are of that form.
I don’t think those are the problems that are very common. I think eventually
they’re important. But it’s like really in the tail end.
Lex 1:26:36
In the tail end, the rare edge cases. From the vision perspective, what are the
toughest parts of the vision problem of driving?
Andrej 1:26:47
Well, basically, the sensor is extremely powerful. But you still need to process
that information. And so going from brightnesses of these pixel values to, hey,
here are the three-dimensional world is extremely hard. And that’s what the
neural networks are fundamentally doing. And so the difficulty really is in just
doing an extremely good job of engineering the entire pipeline, the entire data
engine, having the capacity to train these neural nets, having the ability to eval-
42

uate the system and iterate on it. So I would say just doing this in production
at scale is the hard part. It’s an execution problem.
Lex 1:27:22
So the data engine, but also the deployment of the system, such that it has low
latency performance. So it has to do all these steps.
Andrej 1:27:33
Yeah, for the neural net specifically, just making sure everything fits into the
chip on the car. And you have a finite budget of flops that you can perform.
And memory bandwidth and other constraints. And you have to make sure it
flies. And you can squeeze in as much compute as you can into the timing.
Lex 1:27:47
What have you learned from that process? Because maybe that’s one of the
bigger, like, new things coming from a research background, where there’s a
system that has to run under heavily constrained resources, has to run really
fast. What kind of insights have you learned from that?
Andrej 1:28:05
Yeah, I’m not sure if there’s too many insights. You’re trying to create a neural
net that will fit in what you have available. And you’re always trying to opti-
mize it. And we talked a lot about it on the AI Day, and basically the triple
backflips that the team is doing to make sure it all fits and utilizes the engine.
So I think it’s extremely good engineering. And then there’s all kinds of little
insights peppered in on how to do it properly.
Lex 1:28:30
Let’s actually zoom out, because I don’t think we talked about the data engine.
The entirety of the layout of this idea that I think is just beautiful, with humans
in the loop. Can you describe the data engine?
Andrej 1:28:43
Yeah, the data engine is what I call the almost biological feeling process by
which you perfect the training sets for these neural networks. So because most
of the programming now is in the level of these data sets and makes sure they’re
large, diverse, and clean, basically you have a data set that you think is good.
You train your neural net. You deploy it. And then you observe how well it’s
performing. And you’re trying to always increase the quality of your data set.
So you’re trying to catch scenarios basically that are basically rare. And it is in
43

these scenarios that neural nets will typically struggle in, because they weren’t
told what to do in those rare cases in the data set. But now you can close the
loop, because if you can now collect all those at scale, you can then feed them
back into the reconstruction process I described, and reconstruct the truth in
those cases, and add it to the data set. And so the whole thing ends up being
a staircase of improvement of perfecting your training set. And you have to
go through deployments so that you can mine the parts that are not yet rep-
resented well in the data set. So your data set is basically imperfect. It needs
to be diverse. It has pockets that are missing. And you need to pad out the
pockets.
Lex 1:29:52
You can think of it that way in the data. What role do humans play in this?
So what’s this biological system, like a human body is made up of cells? What
role, like how do you optimize the human system? The multiple engineers col-
laborating, figuring out what to focus on, what to contribute, which task to
optimize in this neural network? Who is in charge of figuring out which task
needs more data? Can you speak to the hyperparameters of the human system?
Andrej 1:30:28
It really just comes down to extremely good execution from an engineering team
who knows what they’re doing. They understand intuitively the philosophical
insights underlying the data engine and the process by which the system im-
proves, and how to, again, delegate the strategy of the data collection and how
that works, and then just making sure it’s all extremely well executed. And
that’s where most of the work is, is not even the philosophizing or the research
or the ideas of it. It’s just extremely good execution. It’s so hard when you’re
dealing with data at that scale.
Lex 1:30:55
So your role in the data engine, executing well on it, is difficult and extremely
important. Is there a priority of a vision board of saying, we really need to
get better at stoplights, the prioritization of tasks? Is that essentially, and that
comes from the data?
Andrej 1:31:14
That comes to a very large extent to what we are trying to achieve in the prod-
uct or map, what we’re trying to, the release we’re trying to get out, and the
feedback from the QA team where the system is struggling or not, the things
we’re trying to improve.
44

Lex 1:31:27
And the QA team gives some signal, some information in aggregate about the
performance of the system in various conditions.
Andrej 1:31:34
And then, of course, all of us drive it, and we can also see it. It’s really nice
to work with a system that you can also experience yourself. And it drives you
home.
Lex 1:31:42
Is there some insight you can draw from your individual experience that you
just can’t quite get from an aggregate statistical analysis of data? Yeah. It’s so
weird, right? Yes. It’s not scientific in a sense, because you’re just one anecdotal
sample.
Andrej 1:31:58
Yeah, I think there’s a ton of, it’s a source of truth. It’s your interaction with
the system. And you can see it. You can play with it. You can perturb it. You
can get a sense of it. You have an intuition for it. I think numbers just have a
way of, numbers and plots and graphs are much harder. It hides a lot of.
Lex 1:32:15
It’s like if you train a language model, it’s a really powerful way is by you in-
teracting with it. Yeah, 100%. Try to build up an intuition.
Andrej 1:32:24
Yeah. I think like Elon also, he always wanted to drive the system himself. He
drives a lot. And I want to say almost daily. So he also sees this as a source of
truth, you driving the system and it performing. And yeah.
Lex 1:32:40
So what do you think? Tough questions here. So Tesla last year removed radar
from the sensor suite and now just announced it’s going to remove ultrasonic
sensors relying solely on vision. So camera only. Does that make the perception
problem harder or easier?
Andrej 1:33:02
I would almost reframe the question in some way. So the thing is basically, you
would think that additional sensors.
45

Lex 1:33:07
By the way, can I just interrupt? Go ahead. I wonder if a language model will
ever do that if you prompt it. Let me reframe your question. That would be
epic. That’s the wrong prompt.
Andrej 1:33:17
Sorry. Yeah, so it’s like a little bit of a wrong question because basically you
would think that these sensors are an asset to you. But if you fully consider the
entire product in its entirety, these sensors are actually potentially a liability
because these sensors aren’t free. They don’t just appear on your car. Suddenly
you have an entire supply chain. You have people procuring it. There can be
problems with them. They may need replacement. They are part of the man-
ufacturing process. They can hold back the line in production. You need to
source them. You need to maintain them. You have to have teams that write
the firmware, all of it. And then you also have to incorporate and fuse them
into the system in some way. And so it actually bloats a lot of it. And I think
Elon is really good at simplify, simplify. Best part is no part. And he always
tries to throw away things that are not essential because he understands the
entropy in organizations and in the approach. And I think in this case, the cost
is high. And you’re not potentially seeing it if you’re just a computer vision
engineer. And I’m just trying to improve my network. And is it more useful
or less useful? How useful is it? And the thing is, once you consider the full
cost of a sensor, it actually is potentially a liability. And you need to be really
sure that it’s giving you extremely useful information. In this case, we looked at
using it or not using it. And the delta was not massive. And so it’s not useful.
Lex 1:34:33
Is it also bloat in the data engine, like having more sensors?
Andrej 1:34:37
100%. Is it a distraction? And these sensors, they can change over time, for ex-
ample. You can have one type of, say, radar. You can have other type of radar.
They change over time. Now you suddenly need to worry about it. Now sud-
denly you have a column in your SQLite telling you, oh, what sensor type was
it? And they all have different distributions. And then they contribute noise
and entropy into everything. And they bloat stuff. And also organizationally,
it’s been really fascinating to me that it can be very distracting. If all you want
to get to work is vision, all the resources are on it. And you’re building out a
data engine. And you’re actually making forward progress. Because that is the
sensor with the most bandwidth, the most constraints in the world. And you’re
investing fully into that. And you can make that extremely good. If you’re only
46

Lex Fridman Podcast E333: Conversation with AI Pioneer Andre Karpathy

Lex Fridman Podcast E333: Conversation with AI Pioneer Andre Karpathy

Recommended

Recommended

More Related Content

Similar to Lex Fridman Podcast E333: Conversation with AI Pioneer Andre Karpathy

Similar to Lex Fridman Podcast E333: Conversation with AI Pioneer Andre Karpathy (16)

Recently uploaded

Recently uploaded (20)

Lex Fridman Podcast E333: Conversation with AI Pioneer Andre Karpathy