Large Language Models and the End of Programming

Matt Welsh
matt@ziggylabs.ai
Large language
models and the
end of
programming

Who is this guy?!?
Been writing code since 1984 (on a VIC-20)
PhD in CS from UC Berkeley
Prof of CS at Harvard
Wrote one of the first books about Linux
Engineering leadership at Google, Apple, OctoAI
Founded a couple of AI startups

Who is this guy?!?
Been writing code since 1984 (on a VIC-20)
PhD in CS from UC Berkeley
Prof of CS at Harvard
Wrote one of the first books about Linux
Engineering leadership at Google, Apple, OctoAI
Founded a couple of AI startups
Speaker at Craft Conference 2024

Large
Language
Models
This is everyone here

*** COMPUTER SCIENCE IS DOOMED ***
Computer Science has always been about one thing:
Translating ideas into programs.
CS is the study of how to take a problem and map it onto
instructions that can be executed by a Von Neumann machine.
7

*** COMPUTER SCIENCE IS DOOMED ***
Critically, the goal of CS has always been that programs
are implemented, maintained, and understood by humans.
But -- spoiler alert! -- humans suck at all of these things.
8

Let’s just make programming easier!
Fifty years of programming language research has done
nothing to improve the state of affairs.
No amount of improvement to type systems, debugging, static
analysis, linters, or documentation is going to magically
solve this problem.
9

FORTRAN (1957)
10

BASIC (1964)
11

APL (1966)
12

Rust (2010)
13

This is how I program now...
14

15

16
What is the algorithm being implemented here?
How would you write it down as code?
What, if anything, could you prove about this program?

How much does it cost to replace one human with AI?
Let’s assume (for now) that LLMs will be able to do most,
or all, of the programming work done by a human software
developer.
What would it cost to replace a human SWE with LLM calls?
19

Typical SWE salary: $220,000
20

Benefits, taxes, free breakfast, lunch, dinner, snacks,
masseuse, shuttle bus, on-site doctor, bowling alley, ...
$92,000
Total: $312,000
21

Benefits, taxes, free breakfast, lunch, dinner, snacks,
masseuse, shuttle bus, on-site doctor, bowling alley, ...
$92,000
Total: $312,000
Number of working days per year: 260
Total cost for one-human-SWE-day: $1200
22

Average lines of finalized code checked in per day ~= 100
23

Average number of GPT-4o tokens per line ~= 10
Price for GPT-4o = $0.005 / 1K tokens (input)
$0.015 / 1K tokens (output)
24

Average number of GPT-4o tokens per line ~= 10
Price for GPT-4o = $0.005 / 1K tokens (input)
$0.015 / 1K tokens (output)
Assume 5:1 input-to-output token ratio
Total cost for one-human-SWE-day equivalent work: $0.04
25

$0.04 / day
(30,000x cheaper) 26
$1200 / day

The robot does not take breaks.
The robot does not require catered
lunches or on-site massage.
The robot takes the same length of
time whether it’s a prototype or
final production code.
The robot makes plenty of
mistakes, but makes them
incredibly quickly.
27

IBM 607 ad - 1953
28
“150 Extra Engineers
An IBM Electronic Calculator
speeds through thousands of
intricate computations so quickly
that on many complex problems
it’s just like having 150 EXTRA
Engineers.”

Beyond Programming
Conventional programs are about
giving direct instructions to
simple symbolic machines.
But what if the machine can
understand natural language
and perform complex reasoning
directly?
29

Teaching, not programming
Gradually, programming will be replaced by models that can:
- Understand problem definitions in natural language
- Perform sophisticated reasoning and cognition
- Find novel solutions to problems
- Learn how to do new things on their own
30
>> This radically changes how we think about computation. <<

The Natural Language Computer
31
Natural
language
"program" Large Language
Model
ChatGPT

32
Natural
language
"program"
External tools
("peripherals")
{ API }
Large Language
Model
ChatGPT

33
Natural
language
"program"
External tools
("peripherals")
{ API }
Large Language
Model
Task
Task
Task
ChatGPT

34
Natural
language
"program"
External tools
("peripherals")
{ API }
Large Language
Model
Short-term
memory
Vector DB
Long-term
memory
Task
Task
Task
ChatGPT

35
Natural
language
"program"
External tools
("peripherals")
{ API }
Large Language
Model
Short-term
memory
Vector DB
Long-term
memory
Task
Task
Task
ChatGPT

Two key questions...
1. What is the extent of LLMs’ reasoning abilities?
2. What is the right way to “program” in natural language?
36

38
https://arxiv.org/abs/2206.04615
214
benchmarks
measuring a
wide range of
reasoning
tasks
Measuring LLM reasoning - Google’s BIG-Bench

Measuring LLM reasoning - Google’s BIG-Bench
39
As models get
larger, they
get better
(but still not
great overall)

40
Example: BIG-Bench “checkmate_in_one” task
In the following chess position, ﬁnd a
checkmate-in-one move.
1. e4 e5 2. Nf3 d6 3. d4 exd4 4. Nxd4 Nf6 5. Nc3
Qe7 6. Bd3 d5 7. O-O dxe4 8. Re1 Be6 9. Nxe6 fxe6
10. Bxe4 Nxe4 11. Nxe4 Nd7 12. Bg5 Qb4 13. Qg4
Qd4 14. Qxe6+ Be7

41
In the given chess position, the move for White to
checkmate in one is:
15. Qxe7#
This move places the Black king in check and there
is no legal move for Black to escape the check,
resulting in checkmate.
Qd4 14. Qxe6+ Be7

42
Qd4 14. Qxe6+ Be7 15.
What if we add an extra “step”
to the prompt?

43
In the given chess position, the checkmate-in-one
move is:
15. Nd6#
This move delivers checkmate because the knight
on d6 controls key squares around the Black king,
and there is no way for Black to capture or block
the check.
Qd4 14. Qxe6+ Be7 15.
Whoops!

What’s the right way to program
in natural language?

45
How things are done today...

Natural language != plain text
Existing frameworks treat LLM prompts and responses as
plain strings.
46

Natural language != plain text
Existing frameworks treat LLM prompts and responses as
plain strings.
But natural language...
- Has a great deal of ambiguity
- Can encode complex algorithmic concepts
- Carries detailed semantic information
- Requires different syntactic structures depending on the
language used (Chinese, English, and Yoruba are not the
same!)
47

Probably the wrong way to do it...
48

51
http://www.literateprogramming.com/knuthweb.pdf
Knuth, 1984

Evolving Computer Science
52
Slide rule
1859-1975

53
Slide rule
1859-1975
Computer science
1959-2030

54
Over time, CS looks more like EE: A more technical skill set
necessary in some very specialized occupations.

55
The vast majority of people building “software” will not be
programming: they will be interacting with an AI.

56
The vast majority of people building “software” will not be
programming: they will be interacting with an AI.
AI greatly expands access to computing to anyone who can
express themselves in natural language.

57
8.1 Billion
non-programmers
World population
2024

58
30 Million
programmers
8.1 Billion
non-programmers
World population
2024

59
30 Million
programmers
8.1 Billion
non-programmers
World population
2024
Natural language opens up
computing to everyone

60
The network is the computer.
-- John Gage, 1984

61
The network is the computer.
-- John Gage, 1984
The model is the computer.
-- Matt Welsh, 2024

Large Language Models and the End of Programming

More Related Content

Similar to Large Language Models and the End of Programming

Recently uploaded

Large Language Models and the End of Programming