4. The Goal
High-level introduction based on my blog post GPT in 60 Lines of NumPy
Basically, you’ll be able to understand the last 15 lines of the “60 lines”
11. 4, 1, 3, 9, 5
Regular Algorithms
Problem Programmer Algorithm
Input Algorithm Output
sort a list of
numbers
merge sort
1, 3, 4, 5, 9
12. ML Algorithms
labelled
pictures of
cats/dogs
Problem Programmer
Training
Algorithm
Data
(Input/Output
examples)
Training
Algorithm
Inference
Algorithm
classify
pictures of
cats/dogs
trained neural
network
Input
Inference
Algorithm
Output dog
gradient
descent on
neural network
27. What is a GPT?
GPT stands for Generative Pre-trained Transformer. It's a type of neural
network architecture based on the Transformer.
● Generative: A GPT generates text.
● Pre-trained: A GPT is trained on lots of text from books, the internet, etc …
● Transformer: A GPT is a decoder-only transformer neural network.
28. Language Modeling (Next Word Prediction)
“Not” → “all”
“Not all” → “heroes”
“Not all heroes” → “wear”
“Not all heroes wear” → “capes”
29.
30. Self-Supervised Learning
Given a piece the text “not all heroes wear”:
Input = [“not”, “not all”, “not all heroes”, “not all heroes wear”]
Label = [“all”, “heroes”, “wear”, “capes”]
The label can be derived from the input text itself, no need for human labellers.
34. OpenAI's GPT-3, Google's LaMDA, and other similar models are just GPTs under
the hood. What makes them special is they happen to be:
1. Very big (billions of parameters in the neural network)
2. Trained on lots of data (hundreds of gigabytes of text)
As such, we call them Large Language Models (LLMs).
35. If GPTs are just next word predictors, how do they
produce full sentences?