A brief introduction to generative models in general is given, followed by a succinct discussion about text generation models and the "Transformer" architecture. Finally, the focus is set on a non-technical discussion about ChatGPT with a selection of recent news articles.
2. Generative Models
Models that learn from a given
dataset how to generate new
data instances.
https://developers.google.com/machine-learning/gan/generative
♫ ♬
♪♪
A generative models is trained
using a dataset:
It can subsequently generate new
data instances:
♫
Music—Google Research introduced MusicLM that generates music
from text. OpenAI released Jukebox, “provided with genre, artist, and
lyrics as input, Jukebox outputs a new music sample produced from
scratch.”
Image—Both Google (Imagen) and OpenAI (DALL.E) have developed
impressive models that generate novel images from text.
Text—OpenAI’s ChatGPT has become widely known, but other
players have similar, possibly even better, technology (including
Google, with Bard, and Meta with BlenderBot3).
Others—Recommender (movies, books, flight destinations), drug
discovery…
■ ChatGPT: https://chat.openai.com/
■ Bard: https://bit.ly/3JpiFkH
■ Recommender: https://arxiv.org/abs/1802.05814
■ Drug discovery: https://bit.ly/42lguaj
■ MusicLM: https://bit.ly/3Tm4Rfk
■ Jukebox: https://openai.com/research/jukebox
■ Imagegen: https://imagen.research.google/
■ DALL.E: https://labs.openai.com/
3. Discriminative vs. Generative Models
GLM, GBM, SVM, RF, Feedforward ANN, … GMM, VAE, GAN, Transformers …
Given a set of data instances X (and a set of labels Y)
“Discriminative
models capture the
conditional
probability P(Y | X).”
“Generative models
capture the joint
probability P(X, Y), or
just P(X) if there are
no labels.”
Source: https://developers.google.com/machine-learning/gan/generative
Y1
Y2
In a regression analysis, Y is continuous. We are then interested in the conditional
expectation E(Y|X)—which depends on the conditional probability density function.
4. Discriminative Model: 2016 Olympics Athletes
● We know the gender (y) and the
weight (X) of each athlete.
● Given a weight, what is the probability
of the gender, i.e., P(y | X)?
● P(y = Female | X = 50 kg) ≈ 89.6%
● P(y = Female | X = 65 kg) ≈ 60.4%
● P(y = Female | X = 100 kg) ≈ 2.6%
(Obtained by fitting a simple logistic
regression model)
Dataset: https://www.kaggle.com/datasets/rio2016/olympic-games
≈ 69 kg
Female Male
5. Generative Model: 2016 Olympics Athletes
Let us imagine a situation where we
have only the weights data of athletes
(no gender information).
We wish to generate more synthetic
data that cannot easily be discerned
from the real world observed data.
In this toy case, a Gaussian mixture
model can be fitted.
Although the model identifies two
components, it cannot label them. The
labels (‘Female’ and ‘Male’) have been
set via our knowledge of the context.
Newly generated
data instances
7. 1966: ELIZA
Image source: https://en.wikipedia.org/wiki/ELIZA#/media/File:ELIZA_conversation.png
“While ELIZA was capable of
engaging in discourse, it
could not converse with true
understanding. However,
many early users were
convinced of ELIZA's
intelligence and
understanding, despite
Weizenbaum's insistence to
the contrary.”
Source: https://en.wikipedia.org/wiki/ELIZA
(and references therein).
8. 2005: SCIgen - An Automatic CS Paper Generator
https://www.nature.com/articles/d41586-021-01436-7
https://news.mit.edu/2015/how-three-mit-students-fooled-scientific-journals-0414
A project using a rather rudimentary technology that aimed to "maximize amusement, rather than coherence" is
still the cause of troubles today...
https://pdos.csail.mit.edu/archive/scigen/
9. 2017: Google Revolutionized Text Generation
■ Vaswani (2017), Attention Is All You Need (doi.org/10.48550/arXiv.1706.03762)
■ https://openai.com/research/better-language-models
Image generated with DALL.E: “A small robot standing on the
shoulder of a giant robot” (and slightly modified with The Gimp)
OpenAI’s Generative Pre-trained
Transformer (DALL.E, 2021; ChatGPT,
2022), as the name suggests, reposes on
Transformers.
Google introduced the Transformer,
which rapidly became the state-of-the-art
approach to solve most NLP problems.
10. ● Kiela et al. (2021), Dynabench: Rethinking Benchmarking in NLP: https://arxiv.org/abs/2104.14337
● Roser (2022), The brief history of artificial intelligence: The world has changed fast – what might be next?: https://ourworldindata.org/brief-history-of-ai
Transformers
2017
Text and shapes in blue have been added to the original work from Max Roser.
11. What are Transformers?
Images source: https://colab.research.google.com/drive/1L42pL04PbauS-nNzVg7IYNtrK0pFYCGY
Encoder Decoder
Encoder—Self-attention mechanism:
each word is encoded in a numerical
sequence, which is contextualized,
for this sequence is formed taking
into account the other surrounding
words (left and right, the “context”).
Decoder—Masked self-attention
mechanism (left xor right context),
cross-attention and auto-regressive
(re-uses its past outputs as inputs of
the following steps)
Transformer (1-layer ) Transformer (4-layer )
Both encoder and
decoder can be
used as a
standalone model.
Popular LLMs rely
only on decoders.
Whereas, e.g.,
machine
translations may
leverage the “full”
transformer
architecture.
Source: Vaswani (2017), Attention Is All You Need
(doi.org/10.48550/arXiv.1706.03762)
12. Going Further…
https://youtu.be/LE3NfEULV6k
For a rather high-level understanding: For getting your hands dirty:
https://colab.research.google.com/drive/1L42pL04Pba
uS-nNzVg7IYNtrK0pFYCGY
https://youtu.be/H39Z_720T5s https://youtu.be/MUqNwgPjJvQ
https://youtu.be/d_ixlCubqQw https://youtu.be/0_4KEb08xrE
Video lecture on
Embeddings:
https://developers.google.com/
machine-learning/crash-course/
embeddings/video-lecture
13. The Mushrooming of Transformer-based LLMs
PaML (540b), LaMDA
(137b) and others (Bard
relies on LaMDA)
OPT-IML (175b), Galactica
(120b), BlenderBot3 (175b)
and perhaps others?
ERNIE 3.0 Titan (260b)
GPT-3 (175b), GPT-3.5 (?b),
more versions coming…
(ChatGPT relies on GPT-3.5)
BLOOM (176b)
PanGu-𝛼 (200b)
Jurassic-1 (178b), Jurassic-2 (?b)
Exaone (300b)
Megatron-Turing NLG (530b)
(It appears that all those models rely only on
transformer-based decoders)
15. 2022: ChatGPT
“ChatGPT, the popular chatbot
from OpenAI, is estimated to have
reached 100 million monthly
active users in January, just two
months after launch, making it the
fastest-growing consumer
application in history”
https://www.statista.com/chart/29174/time-to-one-million-users/
Reuters, Feb 1, 2023
https://reut.rs/3yQNlGo
16. “ChatGPT is 'not particularly
innovative,' and 'nothing revolutionary',
says Meta's chief AI scientist
The public perceives OpenAI's ChatGPT as
revolutionary, but the same techniques are being used
and the same kind of work is going on at many
research labs, says the deep learning pioneer.”
Irrational Exuberance?
https://twitter.com/ylecun/status/1617921903934726144
https://on.ft.com/3JRPM22
zdnet.com, https://zd.net/3mTlOS0
17. Google’s Bard, Meta’s Galactica & Baidu’s Ernie
https://bit.ly/3Jnt404
https://bit.ly/3TnLRwS
https://reut.rs/3FvarpQ
“Bard and ChatGPT are large language
models, not knowledge models. They are
great at generating human-sounding
text, they are not good at ensuring their
text is fact-based.”
—Jack Krawczyk, the product lead
for Bard, March 2, 2023
(https://cnb.cx/3ZXFFy3)
https://on.ft.com/3JogEVH
(March 16, 2023)
“Baidu Inc. surged more than 14%
Friday after brokerages including
Citigroup tested the company’s
just-unveiled ChatGPT-like service
and granted it their preliminary
approval.”
—Bloomberg, March 17, 2023
(https://yhoo.it/3JLxAXI)