Deep Generative Learning for All

Universitat Politècnica de Catalunya
Universitat Politècnica de CatalunyaAssociate Professor at Universitat Politècnica de Catalunya
Deep Generative
Learning for All
(a.k.a. The GenAI Hype)
Xavier Giro-i-Nieto
@DocXavi
xavigiro.upc@gmail.com
Associate Professor (on leave)
Universitat Politècnica de Catalunya
Institut de Robòtica Industrial
ELLIS Unit Barcelona
Spring 2020
[Summer School website]
2
Acknowledgements
Santiago Pascual
santi.pascual@upc.edu
@santty128
PhD 2019
Universitat Politecnica de Catalunya
Technical University of Catalonia
Albert Pumarola
apumarola@iri.upc.edu
@AlbertPumarola
PhD 2021
Universitat Politècnica de Catalunya
Technical University of Catalonia
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Gerard I. Gállego
PhD Student
Universitat Politècnica de Catalunya
gerard.ion.gallego@upc.edu
@geiongallego
3
Acknowledgements
Eduard Ramon
Applied Scientist
Amazon Barcelona
@eram1205
Wentong Liao
Applied Scientist
Amazon Barcelona
Ciprian Corneanu
Applied Scientist
Amazon Seattle
Laia Tarrés
PhD Student
Universitat Politècnica de Catalunya
laia.tarres@upc.edu
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Image generation
5
#StyleGAN3 (NVIDIA) Karras, Tero, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and
Timo Aila. "Alias-free generative adversarial networks." NeurIPS 2021. [code]
6
#DiT Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." arXiv 2022.
Image generation
7
#DALL-E-2 (OpenAI) Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen "Hierarchical Text-Conditional
Image Generation with CLIP Latents." 2022. [blog]
Text-to-Image generation
8
Text-to-Video generation
#Make-a-video (Meta) Singer, Uriel, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu et al.
"Make-a-video: Text-to-video generation without text-video data." arXiv 2022.
“A dog wearing a Superhero
outfit with red cape flying
through the sky”
Synthetic labels to train discriminative models
9
#BigDatasetGAN Li, Daiqing, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, and Antonio
Torralba. "BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations." arXiv 2022.
Video Super-resolution
10
#TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. Learning temporal coherence via self-supervision for
GAN-based video generation. ACM Transactions on Graphics 2020.
Human Motion Transfer
11
#EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.
Speech Enhancement
12
Recover lost information/add enhancing details by learning the natural distribution of audio
samples.
original
enhanced
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
14
Discriminative vs Generative Models
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
a. Pθ
(Y|X): Discriminative Models
b. Pθ
(X): Generative Models
c. Pθ
(X|Y): Conditioned Generative Models
3. Latent variable
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Pθ
(Y|X): Discriminative Models
16
Slide credit:
Albert Pumarola (UPC 2019)
Classification Regression
Text Prob. of being a Potential Customer
Image
Audio Speech Translation
Jim Carrey
What Language?
X=Data
Y=Labels
θ = Model parameters
Discriminative Modeling
Pθ
(Y|X)
17
0.01
0.09
0.9
input
Network (θ) output
class
Figure credit: Javier Ruiz (UPC TelecomBCN)
Discriminative model: Tell me the probability of some ‘Y’ responses given ‘X’
inputs.
Pθ
(Y | X = [pixel1
, pixel2
, …, pixel784
])
Pθ
(Y|X): Discriminative Models
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
19
Slide Concept: Albert Pumarola (UPC 2019)
Pθ
(X): Generative Models
Classification Regression Generative
Text Prob. of being a Potential Customer
“What about Ron magic?” offered Ron.
To Harry, Ron was loud, slow and soft
bird. Harry did not like to think about
birds.
Image
Audio Language Translation
Music Composer and Interpreter
MuseNet Sample
Jim Carrey
What Language?
Discriminative Modeling
Pθ
(Y|X)
Generative Modeling
Pθ
(X)
X=Data
Y=Labels
θ = Model parameters
Each real sample xi
comes from
an M-dimensional probability
distribution P(X).
X = {x1
, x2
, …, xN
}
Pθ
(X): Generative Models
21
1) We want our model with parameters θ to output samples with distribution
Pθ
(X), matching the distribution of our training data P(X).
2) We can sample points from Pθ
(X) plausibly looking how P(X) distributed.
P(X)
Distribution of training data
Pλ,μ,σ
(X)
Distribution of training data
Example: Gaussian Mixture Models (GMM)
Pθ
(X): Generative Models
22
What are the parameters θ we need to estimate in deep neural networks ?
θ = (weights & biases)
output
Network (θ)
?
Pθ
(X): Generative Models
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. GAN
b. Auto-regressive
c. VAE
d. Diffusion
Pθ
(X|Y): Conditioned Generative Models
Joint probabilities P(X|Y) to
model conditioning variables on
the generative process:
X = {x1
, x2
, …, xN
}
Y = {y1
, y2
, …, yN
}
DOG
CAT
TRUCK
PIZZA
THRILLER
SCI-FI
HISTORY
/aa/
/e/
/o/
Outline
1. Motivation
2. Discriminative vs Generative Models
a. P(Y|X): Discriminative Models
b. P(X): Generative Models
c. P(X|Y): Conditioned Generative Models
3. Sampling
4. Architectures
a. Generative Adversarial Networks (GANs)
b. Auto-regressive
c. Variational Autoencoders (VAEs)
d. Diffusion
Our learned model should be able to make up new samples from the distribution,
not just copy and paste existing samples!
26
Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
Sampling
Philip Isola, Generative Models of Images. MIT 2023.
Sampling
Slide concept: Albert Pumarola (UPC 2019)
Learn
Sample Out
Training Dataset
Generated Samples
Feature
space
Manifold Pθ
(X)
“Model the data distribution so that we can sample new points out of the
distribution”
Sampling
Sampling
z
Generated Samples
How could we generate diverse samples from a deterministic deep neural network ?
Generator
(θ)
Sampling
Generated Samples
How could we generate diverse samples from a deterministic deep neural network ?
Generator
(θ)
Sample z from a known prior, for example, a multivariate normal distribution N(0, I).
Example: dim(z)=2
x’
z
Slide concept: Albert Pumarola (UPC 2019)
Learn
Training Dataset
Interpolated Samples
Feature
space
Manifold Pθ
(X)
Traversing the learned manifold through interpolation.
Interpolation
Disentanglement
Philip Isola, Generative Models of Images. MIT 2023.
Disentanglement
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Auto-regressive
○ Variational Autoencoders (VAEs)
○ Diffusion
35
Credit: Santiago Pascual [slides] [video]
36
Generator & Discriminator
We have two modules: Generator (G) and Discriminator (D).
● They “fight” against each other during training→ Adversarial Learning
D’s goal:
Classify between real
samples and those
produced by G.
G’s goal:
Fool D to
missclassify.
Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. "Generative Adversarial Nets." NeurIPS 2014.
37
Discriminator
Discriminator network D → binary classifier between real (x) and generated (x’).
samples.
Generated (1)
Discriminator
(θ)
x’
Discriminator
(θ)
x Real (0)
38
Generator
Real world
samples
Database
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
z
Generator & Discriminator
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Auto-regressive
○ Variational Autoencoders (VAEs)
○ Diffusion
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to
detect whether money is real or fake.
100
100
FAKE: It’s
not even
green
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
There is no
watermark
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect
whether money is real or fake.
100
100
FAKE:
Watermark
should be
rounded
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to
detect whether money is real or fake.
After enough iterations, and if the counterfeiter is good enough (in terms of G network it
means “has enough parameters”), the police should be confused.
REAL?
FAKE?
Adversarial Training Analogy: is it fake money?
Figure: Santiago Pascual (UPC)
Adversarial Training
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
Alternate between training the discriminator and generator
Neural Network
Neural Network
Figure: Kevin McGuinness (DCU)
Adversarial Training: Discriminator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Generated
1. Fix generator weights, draw samples from both real world and generated images
2. Train discriminator to distinguish between real world and generated images
Backprop error to
update discriminator
weights
Figure: Kevin McGuinness (DCU)
Adversarial Training: Discriminator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update discriminator
weights
Figure: Kevin McGuinness (DCU)
In the set up of the figure, which ground truth label for a generated image should we use to train the
discriminator ? Consider a binary encoding of “1” (Real) and “0” (Fake).
Generated
Adversarial Training: Generator
1. Fix discriminator weights
2. Sample from generator by injecting noise.
3. Backprop error through discriminator to update generator weights
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update generator
weights
Figure: Kevin McGuinness (DCU)
Generated
Adversarial Training: Generator
Generator
Real world
images
Discriminator
Real
Loss
Latent
random
variable
Sample
Sample
Backprop error to
update generator
weights
Figure: Kevin McGuinness (DCU)
In the set up of the figure, which ground truth label for a generated image should we use to train the
generator ? Consider a binary encoding of “1” (Real) and “0” (Fake).
Generated
Adversarial Training: How to make it work ?
Soumith Chintala, “How to train a GAN ? Tips and tricks to make GAN work”. Github 2016.
NeurIPS Barcelona 2016
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
■ Generator & Discriminator Networks
■ Adversarial Training
■ Conditional GANs
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Non-Conditional GANs
51
Slide credit: Víctor Garcia
Discriminator
D(·)
Generator
G(·)
Real World
Random
seed (z)
Real/Generated
52
Conditional GANs (cGAN)
Slide credit: Víctor Garcia
Conditional Adversarial Networks
Real World
Real/Generated
Condition
Discriminator
D(·)
Generator
G(·)
53
Learn more about GANs
Ian Goodfellow.
NeurIPS Barcelona 2016.
Mihaela Rosca & Jeff Donahue.
UCL x Deepmind 2020.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Manifold Pθ
(X)
Encode Decode
“Generate”
56
Auto-Encoder (AE)
z
Feature
space
● Learns Pθ
(X) with a reconstruction loss.
● Proposed as a pre-training stage for the encoder (“self-supervised learning”).
57
Auto-Encoder (AE)
Encode Decode
“Generate”
z
Feature
space
Manifold Pθ
(X)
Could we generate new samples by sampling from a normal distribution and
feeding it into the encoder, or the decoder (as in GANs) ?
?
58
Auto-Encoder (AE)
No, because the noise (or encoded noise) would be out of the learned manifold.
Encode Decode
“Generate”
z
Feature
space
Manifold Pθ
(X)
Could we generate new samples by sampling from a normal distribution and
feeding it into the encoder, or the decoder (as in GANs) ?
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
60
Variational Auto-Encoder (AE)
Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013.
Encoder: Predict the mean μ(X) and covariance ∑(X) of a multivariate normal
distribution.
Encode
Encode
Loss term to follow a normal
distribution N(0, I).
61
Source: Wikipedia. Image by Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145
Maths 101: Multivariate normal distribution
62
Variational Auto-Encoder (AE)
Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013.
Decoder: Trained to reconstruct the input data from a z sampled from N(μ, ∑).
Encode
z
Decode Reconstruction
loss term.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
z
Encode Decode
Challenge:
We cannot backprop through sampling of because “Sampling” is not differentiable!
64
Reparametrization Trick
z
Solution: Reparameterization trick
Sample and define z from it, multiplying by and summing
65
Reparametrization Trick
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
■ AE vs VAE
■ Variational Inference
■ Reparametrization trick
■ Generative behaviour
○ Diffusion
○ Auto-regressive
Generative behaviour
z
67
How can we now generate new samples once the underlying generating
distribution is learned ?
z1
We can sample from our prior N(0,I), discarding the encoder path.
z2
z3
68
Generative behaviour
69
Generative behaviour
N(0, I)
Example: P(X) can be modelled mapping a simple normal distribution N(0, I) through a
powerful non-linear function g(z).
70
Generative behaviour
#NVAE Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]
71
Walking around z manifold dimensions gives us spontaneous generation of
samples with different shapes, poses, identities, lightning, etc..
Generative behaviour
Learn more about VAEs
72
Andriy Mnih (UCL - Deepmind 2020)
Max Welling - University of Amsterdam (2020)
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Diffusion
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
■ Forward diffusion process
■ Reverse denoising process
○ Auto-regressive
Forward Diffusion Process
Philip Isola, Generative Models of Images. MIT 2023.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
■ Forward diffusion process
■ Reverse denoising process
○ Auto-regressive
Denoising Autoencoder (DAE)
Encode Decode
“Generate”
#DAE Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust
features with denoising autoencoders." ICML 2008.
Philip Isola, Generative Models of Images. MIT 2023.
Reverse Denoising process
Data Manifold Pθ
(x0
)
x0
xT
Noise
Image
Network learns to
denoise step by step
CNN
U-net
Reverse Denoising process
What is the dimension of the latent variable in diffusion models ?
Same dimensionality as the diffused data.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive Models (AR)
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
Motivation
PixelRNN
An RNN predicts the probability of each sample xi
with a categorical output
distribution: Softmax
83
#PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
PixelRNN
84
#PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
Why are not all completions identical ?
(aka how can AR offer a generative behaviour ?)
PixelCNN
85
#PixelCNN Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., & Graves, A. Conditional image generation with
pixelcnn decoders. NeurIPS 2016.
Wavenet
86
Wavenet used dilated convolutions to produce synthetic audio, sample by
sample, conditioned over by receptive field of size T:
#Wavenet Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal
Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. "Wavenet: A generative model for raw audio." arXiv 2016. [blog]
The Transformer
Figure: Jay Alammar, “The illustrated Transformer” (2018)
#Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention
is all you need. NeurIPS 2017.
Auto-regressive (at test).
The Transformer
Figure: Jay Alammar, “The illustrated Transformer” (2018)
Text completion
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
“GPT-2 is trained with a simple objective: predict the next word, given all of the
previous words within some text.”
Condition Generated completions
In a shocking finding, scientist
discovered a herd of unicorns
living in a remote, previously
unexplored valley, in the Andes
Mountains. Even more surprising to
the researchers was the fact that
the unicorns spoke perfect
English.
The scientist named the population,
after their distinctive horn, Ovid’s
Unicorn. These four-horned, silver-white
unicorns were previously unknown to
science.
Now, after almost two centuries, the
mystery of what sparked this odd
phenomenon is finally solved.
Zero-shot learning
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
GPT-2/3 can also solve tasks for which it was not trained for (zero-shot
learning).
Text Reading Comprehension
The 2008 Summer Olympics torch relay was run from March 24
until August 8, 2008, prior to the 2008 Summer Olympics,
with the theme of “one world, one dream”. Plans for the
relay were announced on April 26, 2007, in Beijing, China.
The relay, also called by the organizers as the “Journey of
Harmony”, lasted 129 days and carried the torch 137,000 km
(85,000 mi) – the longest distance of any Olympic torch
relay since the tradition was started ahead of the 1936
Summer Olympics.
After being lit at the birthplace of the Olympic Games in
Olympia, Greece on March 24, the torch traveled to the
Panathinaiko Stadium in Athens, and then to Beijing,
arriving on March 31. From Beijing, the torch was following
a route passing through six continents. The torch has
visited cities along the Silk Road, symbolizing ancient
links between China and the rest of the world. The relay
also included an ascent with the flame to the top of Mount
Everest on the border of Nepal and Tibet, China from the
Chinese side, which was closed specially for the event.
Q: What was the theme?
A: “one world, one dream”.
Q: What was the length of the race?
A: 137,000 km
Q: Was it larger than previous ones?
A: No
Q: Where did the race begin?
A: Olympia, Greece
Zero-shot learning
#GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better
Language Models and Their Implications”. OpenAI Blog 2019.
“GPT-2 is trained with a simple objective: predict the next word, given all of the
previous words within some text.”
Zero-shot task performances
(GPT-2 was never trained for these tasks)
#iGPT Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., & Sutskever, I. Generative Pretraining from Pixels. ICML
2020.
GPT-2 / GPT-3
#ChatGPT [blog]
#GPT-4 (OpenAI) GPT-4 Technical Report. arXiv 2023. [blog]
ChatGPT / GPT-4
Discussion
Learn more about AR models
Nal Kalchbrenner, Mediterranean Machine Learning
Summer School 2022.
Outline
1. Motivation
2. Discriminative vs Generative Models
3. Sampling
4. Architectures
○ Generative Adversarial Networks (GANs)
○ Variational Autoencoders (VAEs)
○ Denoising Diffusion Models (DDM)
○ Auto-regressive Models (AR)
Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
97
Source: David Foster
Recommended books
Interview of David Foster for Machine
Learning Street Talk (2023)
Recommended courses
Deep Unsupervised Learning
(UC Berkeley CS294-158-SP2020)
1 of 99

Recommended

Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision) by
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
870 views26 slides
Deep Generative Models by
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
1.7K views55 slides
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016) by
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Universitat Politècnica de Catalunya
4.9K views38 slides
Convolutional neural network by
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
5.3K views68 slides
Tutorial on Deep Generative Models by
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
5.9K views96 slides
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ... by
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
515 views66 slides

More Related Content

What's hot

Jupyter, A Platform for Data Science at Scale by
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleMatthias Bussonnier
7.9K views45 slides
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group) by
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Universitat Politècnica de Catalunya
6K views55 slides
CNN Machine learning DeepLearning by
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearningAbhishek Sharma
541 views79 slides
Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ... by
Generative Models and Adversarial Training  (D3L4 2017 UPC Deep Learning for ...Generative Models and Adversarial Training  (D3L4 2017 UPC Deep Learning for ...
Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ...Universitat Politècnica de Catalunya
1.3K views26 slides
Human-level Control Through Deep Reinforcement Learning (Presentation) by
Human-level Control Through Deep Reinforcement Learning (Presentation)Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)Muhammed Kocabaş
1.8K views46 slides
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr... by
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
2K views23 slides

What's hot(20)

Jupyter, A Platform for Data Science at Scale by Matthias Bussonnier
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at Scale
Matthias Bussonnier7.9K views
CNN Machine learning DeepLearning by Abhishek Sharma
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
Abhishek Sharma541 views
Human-level Control Through Deep Reinforcement Learning (Presentation) by Muhammed Kocabaş
Human-level Control Through Deep Reinforcement Learning (Presentation)Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)
Muhammed Kocabaş1.8K views
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr... by Taegyun Jeon
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
Taegyun Jeon2K views
RNN and its applications by Sungjoon Choi
RNN and its applicationsRNN and its applications
RNN and its applications
Sungjoon Choi8.1K views
Liver segmentation using U-net: Practical issues @ SNU-TF by WonjoongCheon
Liver segmentation using U-net: Practical issues @ SNU-TFLiver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TF
WonjoongCheon973 views
Introduction to Visual transformers by leopauly
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
leopauly365 views
Deep Learning - Convolutional Neural Networks by Christian Perone
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone71.4K views
Machine Learning - Object Detection and Classification by Vikas Jain
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
Vikas Jain3.7K views
Stable Diffusion path by Vitaly Bondar
Stable Diffusion pathStable Diffusion path
Stable Diffusion path
Vitaly Bondar1.8K views
Deep Learning: Introduction & Chapter 5 Machine Learning Basics by Jason Tsai
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai1.4K views
DQN (Deep Q-Network) by Dong Guo
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
Dong Guo2.1K views

Similar to Deep Generative Learning for All

GAN - Theory and Applications by
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
9.4K views41 slides
EuroSciPy 2019 - GANs: Theory and Applications by
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEmanuele Ghelfi
1.1K views41 slides
Lecture17 xing fei-fei by
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-feiTianlu Wang
417 views120 slides
Adversarial examples in deep learning (Gregory Chatel) by
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)MeetupDataScienceRoma
1.3K views39 slides
Using model-based statistical inference to learn about evolution by
Using model-based statistical inference to learn about evolutionUsing model-based statistical inference to learn about evolution
Using model-based statistical inference to learn about evolutionErick Matsen
1.9K views73 slides
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute... by
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya
646 views73 slides

Similar to Deep Generative Learning for All(20)

GAN - Theory and Applications by Emanuele Ghelfi
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi9.4K views
EuroSciPy 2019 - GANs: Theory and Applications by Emanuele Ghelfi
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and Applications
Emanuele Ghelfi1.1K views
Lecture17 xing fei-fei by Tianlu Wang
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
Tianlu Wang417 views
Adversarial examples in deep learning (Gregory Chatel) by MeetupDataScienceRoma
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)
Using model-based statistical inference to learn about evolution by Erick Matsen
Using model-based statistical inference to learn about evolutionUsing model-based statistical inference to learn about evolution
Using model-based statistical inference to learn about evolution
Erick Matsen1.9K views
Distributed Meta-Analysis System by jarising
Distributed Meta-Analysis SystemDistributed Meta-Analysis System
Distributed Meta-Analysis System
jarising8.3K views
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B... by NTNU
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
NTNU459 views
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B... by Albert Orriols-Puig
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo... by Codiax
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Codiax161 views
ISBA 2022 Susie Bayarri lecture by Pierre Jacob
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
Pierre Jacob446 views
Gf o2014talk by Bob O'Hara
Gf o2014talkGf o2014talk
Gf o2014talk
Bob O'Hara721 views
Striving to Demystify Bayesian Computational Modelling by Marco Wirthlin
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational Modelling
Marco Wirthlin280 views
Dirty data science machine learning on non-curated data by Gael Varoquaux
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
Gael Varoquaux20K views
Generative Adversarial Networks (GANs) at the Data Science Meetup Luxembourg ... by Chris Hammerschmidt
Generative Adversarial Networks (GANs) at the Data Science Meetup Luxembourg ...Generative Adversarial Networks (GANs) at the Data Science Meetup Luxembourg ...
Generative Adversarial Networks (GANs) at the Data Science Meetup Luxembourg ...
Algoritma genetika by Hendra Arie
Algoritma genetikaAlgoritma genetika
Algoritma genetika
Hendra Arie168 views

More from Universitat Politècnica de Catalunya

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto by
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
290 views94 slides
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI... by
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
183 views92 slides
Open challenges in sign language translation and production by
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and productionUniversitat Politècnica de Catalunya
187 views83 slides
Generation of Synthetic Referring Expressions for Object Segmentation in Videos by
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
522 views42 slides
Discovery and Learning of Navigation Goals from Pixels in Minecraft by
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftUniversitat Politècnica de Catalunya
193 views40 slides
Learn2Sign : Sign language recognition and translation using human keypoint e... by
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
362 views49 slides

More from Universitat Politècnica de Catalunya(20)

Recently uploaded

How to be(come) a successful PhD student by
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD studentTom Mens
460 views62 slides
Open Access Publishing in Astrophysics by
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in AstrophysicsPeter Coles
725 views26 slides
Chromatography ppt.pptx by
Chromatography ppt.pptxChromatography ppt.pptx
Chromatography ppt.pptxvarshachandgudesvpm
16 views1 slide
Guinea Pig as a Model for Translation Research by
Guinea Pig as a Model for Translation ResearchGuinea Pig as a Model for Translation Research
Guinea Pig as a Model for Translation ResearchPervaizDar1
11 views21 slides
Pollination By Nagapradheesh.M.pptx by
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptxMNAGAPRADHEESH
15 views9 slides
himalay baruah acid fast staining.pptx by
himalay baruah acid fast staining.pptxhimalay baruah acid fast staining.pptx
himalay baruah acid fast staining.pptxHimalayBaruah
5 views16 slides

Recently uploaded(20)

How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens460 views
Open Access Publishing in Astrophysics by Peter Coles
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in Astrophysics
Peter Coles725 views
Guinea Pig as a Model for Translation Research by PervaizDar1
Guinea Pig as a Model for Translation ResearchGuinea Pig as a Model for Translation Research
Guinea Pig as a Model for Translation Research
PervaizDar111 views
Pollination By Nagapradheesh.M.pptx by MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH15 views
himalay baruah acid fast staining.pptx by HimalayBaruah
himalay baruah acid fast staining.pptxhimalay baruah acid fast staining.pptx
himalay baruah acid fast staining.pptx
HimalayBaruah5 views
Conventional and non-conventional methods for improvement of cucurbits.pptx by gandhi976
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptx
gandhi97618 views
Ethical issues associated with Genetically Modified Crops and Genetically Mod... by PunithKumars6
Ethical issues associated with Genetically Modified Crops and Genetically Mod...Ethical issues associated with Genetically Modified Crops and Genetically Mod...
Ethical issues associated with Genetically Modified Crops and Genetically Mod...
PunithKumars622 views
Connecting communities to promote FAIR resources: perspectives from an RDA / ... by Allyson Lister
Connecting communities to promote FAIR resources: perspectives from an RDA / ...Connecting communities to promote FAIR resources: perspectives from an RDA / ...
Connecting communities to promote FAIR resources: perspectives from an RDA / ...
Allyson Lister34 views
RemeOs science and clinical evidence by PetrusViitanen1
RemeOs science and clinical evidenceRemeOs science and clinical evidence
RemeOs science and clinical evidence
PetrusViitanen135 views
Distinct distributions of elliptical and disk galaxies across the Local Super... by Sérgio Sacani
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...
Sérgio Sacani30 views
A training, certification and marketing scheme for informal dairy vendors in ... by ILRI
A training, certification and marketing scheme for informal dairy vendors in ...A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...
ILRI11 views
PRINCIPLES-OF ASSESSMENT by rbalmagro
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENT
rbalmagro11 views
application of genetic engineering 2.pptx by SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz7 views

Deep Generative Learning for All

  • 1. Deep Generative Learning for All (a.k.a. The GenAI Hype) Xavier Giro-i-Nieto @DocXavi xavigiro.upc@gmail.com Associate Professor (on leave) Universitat Politècnica de Catalunya Institut de Robòtica Industrial ELLIS Unit Barcelona Spring 2020 [Summer School website]
  • 2. 2 Acknowledgements Santiago Pascual santi.pascual@upc.edu @santty128 PhD 2019 Universitat Politecnica de Catalunya Technical University of Catalonia Albert Pumarola apumarola@iri.upc.edu @AlbertPumarola PhD 2021 Universitat Politècnica de Catalunya Technical University of Catalonia Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Gerard I. Gállego PhD Student Universitat Politècnica de Catalunya gerard.ion.gallego@upc.edu @geiongallego
  • 3. 3 Acknowledgements Eduard Ramon Applied Scientist Amazon Barcelona @eram1205 Wentong Liao Applied Scientist Amazon Barcelona Ciprian Corneanu Applied Scientist Amazon Seattle Laia Tarrés PhD Student Universitat Politècnica de Catalunya laia.tarres@upc.edu
  • 4. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 5. Image generation 5 #StyleGAN3 (NVIDIA) Karras, Tero, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. "Alias-free generative adversarial networks." NeurIPS 2021. [code]
  • 6. 6 #DiT Peebles, William, and Saining Xie. "Scalable Diffusion Models with Transformers." arXiv 2022. Image generation
  • 7. 7 #DALL-E-2 (OpenAI) Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen "Hierarchical Text-Conditional Image Generation with CLIP Latents." 2022. [blog] Text-to-Image generation
  • 8. 8 Text-to-Video generation #Make-a-video (Meta) Singer, Uriel, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu et al. "Make-a-video: Text-to-video generation without text-video data." arXiv 2022. “A dog wearing a Superhero outfit with red cape flying through the sky”
  • 9. Synthetic labels to train discriminative models 9 #BigDatasetGAN Li, Daiqing, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, and Antonio Torralba. "BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations." arXiv 2022.
  • 10. Video Super-resolution 10 #TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N. Learning temporal coherence via self-supervision for GAN-based video generation. ACM Transactions on Graphics 2020.
  • 11. Human Motion Transfer 11 #EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.
  • 12. Speech Enhancement 12 Recover lost information/add enhancing details by learning the natural distribution of audio samples. original enhanced
  • 13. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 14. 14 Discriminative vs Generative Models Philip Isola, Generative Models of Images. MIT 2023.
  • 15. Outline 1. Motivation 2. Discriminative vs Generative Models a. Pθ (Y|X): Discriminative Models b. Pθ (X): Generative Models c. Pθ (X|Y): Conditioned Generative Models 3. Latent variable 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 16. Pθ (Y|X): Discriminative Models 16 Slide credit: Albert Pumarola (UPC 2019) Classification Regression Text Prob. of being a Potential Customer Image Audio Speech Translation Jim Carrey What Language? X=Data Y=Labels θ = Model parameters Discriminative Modeling Pθ (Y|X)
  • 17. 17 0.01 0.09 0.9 input Network (θ) output class Figure credit: Javier Ruiz (UPC TelecomBCN) Discriminative model: Tell me the probability of some ‘Y’ responses given ‘X’ inputs. Pθ (Y | X = [pixel1 , pixel2 , …, pixel784 ]) Pθ (Y|X): Discriminative Models
  • 18. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 19. 19 Slide Concept: Albert Pumarola (UPC 2019) Pθ (X): Generative Models Classification Regression Generative Text Prob. of being a Potential Customer “What about Ron magic?” offered Ron. To Harry, Ron was loud, slow and soft bird. Harry did not like to think about birds. Image Audio Language Translation Music Composer and Interpreter MuseNet Sample Jim Carrey What Language? Discriminative Modeling Pθ (Y|X) Generative Modeling Pθ (X) X=Data Y=Labels θ = Model parameters
  • 20. Each real sample xi comes from an M-dimensional probability distribution P(X). X = {x1 , x2 , …, xN } Pθ (X): Generative Models
  • 21. 21 1) We want our model with parameters θ to output samples with distribution Pθ (X), matching the distribution of our training data P(X). 2) We can sample points from Pθ (X) plausibly looking how P(X) distributed. P(X) Distribution of training data Pλ,μ,σ (X) Distribution of training data Example: Gaussian Mixture Models (GMM) Pθ (X): Generative Models
  • 22. 22 What are the parameters θ we need to estimate in deep neural networks ? θ = (weights & biases) output Network (θ) ? Pθ (X): Generative Models
  • 23. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. GAN b. Auto-regressive c. VAE d. Diffusion
  • 24. Pθ (X|Y): Conditioned Generative Models Joint probabilities P(X|Y) to model conditioning variables on the generative process: X = {x1 , x2 , …, xN } Y = {y1 , y2 , …, yN } DOG CAT TRUCK PIZZA THRILLER SCI-FI HISTORY /aa/ /e/ /o/
  • 25. Outline 1. Motivation 2. Discriminative vs Generative Models a. P(Y|X): Discriminative Models b. P(X): Generative Models c. P(X|Y): Conditioned Generative Models 3. Sampling 4. Architectures a. Generative Adversarial Networks (GANs) b. Auto-regressive c. Variational Autoencoders (VAEs) d. Diffusion
  • 26. Our learned model should be able to make up new samples from the distribution, not just copy and paste existing samples! 26 Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow) Sampling
  • 27. Philip Isola, Generative Models of Images. MIT 2023. Sampling
  • 28. Slide concept: Albert Pumarola (UPC 2019) Learn Sample Out Training Dataset Generated Samples Feature space Manifold Pθ (X) “Model the data distribution so that we can sample new points out of the distribution” Sampling
  • 29. Sampling z Generated Samples How could we generate diverse samples from a deterministic deep neural network ? Generator (θ)
  • 30. Sampling Generated Samples How could we generate diverse samples from a deterministic deep neural network ? Generator (θ) Sample z from a known prior, for example, a multivariate normal distribution N(0, I). Example: dim(z)=2 x’ z
  • 31. Slide concept: Albert Pumarola (UPC 2019) Learn Training Dataset Interpolated Samples Feature space Manifold Pθ (X) Traversing the learned manifold through interpolation. Interpolation
  • 32. Disentanglement Philip Isola, Generative Models of Images. MIT 2023.
  • 33. Disentanglement Philip Isola, Generative Models of Images. MIT 2023.
  • 34. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Auto-regressive ○ Variational Autoencoders (VAEs) ○ Diffusion
  • 35. 35 Credit: Santiago Pascual [slides] [video]
  • 36. 36 Generator & Discriminator We have two modules: Generator (G) and Discriminator (D). ● They “fight” against each other during training→ Adversarial Learning D’s goal: Classify between real samples and those produced by G. G’s goal: Fool D to missclassify. Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative Adversarial Nets." NeurIPS 2014.
  • 37. 37 Discriminator Discriminator network D → binary classifier between real (x) and generated (x’). samples. Generated (1) Discriminator (θ) x’ Discriminator (θ) x Real (0)
  • 39. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Auto-regressive ○ Variational Autoencoders (VAEs) ○ Diffusion
  • 40. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: It’s not even green Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 41. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: There is no watermark Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 42. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. 100 100 FAKE: Watermark should be rounded Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 43. Imagine we have a counterfeiter (G) trying to make fake money, and the police (D) has to detect whether money is real or fake. After enough iterations, and if the counterfeiter is good enough (in terms of G network it means “has enough parameters”), the police should be confused. REAL? FAKE? Adversarial Training Analogy: is it fake money? Figure: Santiago Pascual (UPC)
  • 44. Adversarial Training Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Generated Alternate between training the discriminator and generator Neural Network Neural Network Figure: Kevin McGuinness (DCU)
  • 45. Adversarial Training: Discriminator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Generated 1. Fix generator weights, draw samples from both real world and generated images 2. Train discriminator to distinguish between real world and generated images Backprop error to update discriminator weights Figure: Kevin McGuinness (DCU)
  • 46. Adversarial Training: Discriminator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update discriminator weights Figure: Kevin McGuinness (DCU) In the set up of the figure, which ground truth label for a generated image should we use to train the discriminator ? Consider a binary encoding of “1” (Real) and “0” (Fake). Generated
  • 47. Adversarial Training: Generator 1. Fix discriminator weights 2. Sample from generator by injecting noise. 3. Backprop error through discriminator to update generator weights Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update generator weights Figure: Kevin McGuinness (DCU) Generated
  • 48. Adversarial Training: Generator Generator Real world images Discriminator Real Loss Latent random variable Sample Sample Backprop error to update generator weights Figure: Kevin McGuinness (DCU) In the set up of the figure, which ground truth label for a generated image should we use to train the generator ? Consider a binary encoding of “1” (Real) and “0” (Fake). Generated
  • 49. Adversarial Training: How to make it work ? Soumith Chintala, “How to train a GAN ? Tips and tricks to make GAN work”. Github 2016. NeurIPS Barcelona 2016
  • 50. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ■ Generator & Discriminator Networks ■ Adversarial Training ■ Conditional GANs ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive
  • 51. Non-Conditional GANs 51 Slide credit: Víctor Garcia Discriminator D(·) Generator G(·) Real World Random seed (z) Real/Generated
  • 52. 52 Conditional GANs (cGAN) Slide credit: Víctor Garcia Conditional Adversarial Networks Real World Real/Generated Condition Discriminator D(·) Generator G(·)
  • 53. 53 Learn more about GANs Ian Goodfellow. NeurIPS Barcelona 2016. Mihaela Rosca & Jeff Donahue. UCL x Deepmind 2020.
  • 54. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 55. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 56. Manifold Pθ (X) Encode Decode “Generate” 56 Auto-Encoder (AE) z Feature space ● Learns Pθ (X) with a reconstruction loss. ● Proposed as a pre-training stage for the encoder (“self-supervised learning”).
  • 57. 57 Auto-Encoder (AE) Encode Decode “Generate” z Feature space Manifold Pθ (X) Could we generate new samples by sampling from a normal distribution and feeding it into the encoder, or the decoder (as in GANs) ? ?
  • 58. 58 Auto-Encoder (AE) No, because the noise (or encoded noise) would be out of the learned manifold. Encode Decode “Generate” z Feature space Manifold Pθ (X) Could we generate new samples by sampling from a normal distribution and feeding it into the encoder, or the decoder (as in GANs) ?
  • 59. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 60. 60 Variational Auto-Encoder (AE) Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013. Encoder: Predict the mean μ(X) and covariance ∑(X) of a multivariate normal distribution. Encode Encode Loss term to follow a normal distribution N(0, I).
  • 61. 61 Source: Wikipedia. Image by Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145 Maths 101: Multivariate normal distribution
  • 62. 62 Variational Auto-Encoder (AE) Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv 2013. Decoder: Trained to reconstruct the input data from a z sampled from N(μ, ∑). Encode z Decode Reconstruction loss term.
  • 63. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 64. z Encode Decode Challenge: We cannot backprop through sampling of because “Sampling” is not differentiable! 64 Reparametrization Trick
  • 65. z Solution: Reparameterization trick Sample and define z from it, multiplying by and summing 65 Reparametrization Trick
  • 66. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ■ AE vs VAE ■ Variational Inference ■ Reparametrization trick ■ Generative behaviour ○ Diffusion ○ Auto-regressive
  • 67. Generative behaviour z 67 How can we now generate new samples once the underlying generating distribution is learned ?
  • 68. z1 We can sample from our prior N(0,I), discarding the encoder path. z2 z3 68 Generative behaviour
  • 69. 69 Generative behaviour N(0, I) Example: P(X) can be modelled mapping a simple normal distribution N(0, I) through a powerful non-linear function g(z).
  • 70. 70 Generative behaviour #NVAE Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]
  • 71. 71 Walking around z manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identities, lightning, etc.. Generative behaviour
  • 72. Learn more about VAEs 72 Andriy Mnih (UCL - Deepmind 2020) Max Welling - University of Amsterdam (2020)
  • 73. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Diffusion ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 74. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ■ Forward diffusion process ■ Reverse denoising process ○ Auto-regressive
  • 75. Forward Diffusion Process Philip Isola, Generative Models of Images. MIT 2023.
  • 76. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ■ Forward diffusion process ■ Reverse denoising process ○ Auto-regressive
  • 77. Denoising Autoencoder (DAE) Encode Decode “Generate” #DAE Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust features with denoising autoencoders." ICML 2008.
  • 78. Philip Isola, Generative Models of Images. MIT 2023. Reverse Denoising process
  • 79. Data Manifold Pθ (x0 ) x0 xT Noise Image Network learns to denoise step by step CNN U-net Reverse Denoising process What is the dimension of the latent variable in diffusion models ? Same dimensionality as the diffused data.
  • 80. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 81. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Models (AR) Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 83. PixelRNN An RNN predicts the probability of each sample xi with a categorical output distribution: Softmax 83 #PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016.
  • 84. PixelRNN 84 #PixelRNN Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. ICML 2016. Why are not all completions identical ? (aka how can AR offer a generative behaviour ?)
  • 85. PixelCNN 85 #PixelCNN Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., & Graves, A. Conditional image generation with pixelcnn decoders. NeurIPS 2016.
  • 86. Wavenet 86 Wavenet used dilated convolutions to produce synthetic audio, sample by sample, conditioned over by receptive field of size T: #Wavenet Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. "Wavenet: A generative model for raw audio." arXiv 2016. [blog]
  • 87. The Transformer Figure: Jay Alammar, “The illustrated Transformer” (2018) #Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention is all you need. NeurIPS 2017. Auto-regressive (at test).
  • 88. The Transformer Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 89. Text completion #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. “GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.” Condition Generated completions In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.
  • 90. Zero-shot learning #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. GPT-2/3 can also solve tasks for which it was not trained for (zero-shot learning). Text Reading Comprehension The 2008 Summer Olympics torch relay was run from March 24 until August 8, 2008, prior to the 2008 Summer Olympics, with the theme of “one world, one dream”. Plans for the relay were announced on April 26, 2007, in Beijing, China. The relay, also called by the organizers as the “Journey of Harmony”, lasted 129 days and carried the torch 137,000 km (85,000 mi) – the longest distance of any Olympic torch relay since the tradition was started ahead of the 1936 Summer Olympics. After being lit at the birthplace of the Olympic Games in Olympia, Greece on March 24, the torch traveled to the Panathinaiko Stadium in Athens, and then to Beijing, arriving on March 31. From Beijing, the torch was following a route passing through six continents. The torch has visited cities along the Silk Road, symbolizing ancient links between China and the rest of the world. The relay also included an ascent with the flame to the top of Mount Everest on the border of Nepal and Tibet, China from the Chinese side, which was closed specially for the event. Q: What was the theme? A: “one world, one dream”. Q: What was the length of the race? A: 137,000 km Q: Was it larger than previous ones? A: No Q: Where did the race begin? A: Olympia, Greece
  • 91. Zero-shot learning #GPT-2 Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever, “Better Language Models and Their Implications”. OpenAI Blog 2019. “GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.” Zero-shot task performances (GPT-2 was never trained for these tasks)
  • 92. #iGPT Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., & Sutskever, I. Generative Pretraining from Pixels. ICML 2020. GPT-2 / GPT-3
  • 93. #ChatGPT [blog] #GPT-4 (OpenAI) GPT-4 Technical Report. arXiv 2023. [blog] ChatGPT / GPT-4
  • 95. Learn more about AR models Nal Kalchbrenner, Mediterranean Machine Learning Summer School 2022.
  • 96. Outline 1. Motivation 2. Discriminative vs Generative Models 3. Sampling 4. Architectures ○ Generative Adversarial Networks (GANs) ○ Variational Autoencoders (VAEs) ○ Denoising Diffusion Models (DDM) ○ Auto-regressive Models (AR) Figure source: Lilian Weng, What are diffusion models ?, Lil’Log 2021.
  • 98. Recommended books Interview of David Foster for Machine Learning Street Talk (2023)
  • 99. Recommended courses Deep Unsupervised Learning (UC Berkeley CS294-158-SP2020)