Deep Generative Modelling

Recap
● Part 1: Introduction to Machine Learning (Ivaylo Strandjev)
● Part 2: Deep Learning (Teodor Radenkov)
● Part 3: Playing with Image models (Ivaylo Strandjev)

Playing with image models
● The convolution operator for images
● Deep CNNs like Inception Resnet V2
● Interpretability of Deep Neural Networks
● Adversarial Examples

● Motivation
● Generative and Discriminative Models
● Popular Deep Generative Models for Images
● Conditional Generative Models
Today

Generative Models
Generates new random observable data, models the joint distribution of all
variables.
Given some dataset D generate new samples like D, but not the same.
We need to adjust their hidden parameters
Considered as branch of unsupervised learning, but they can be used for tasks
like classification

“What I cannot create, I do not
understand.”
—Richard Feynman

Motivation
● Tremendous amount of information out there in the world
● Machines are good in solving specific tasks
● Better than humans in Object recognition, Go, Speech recognition
● Cannot build compact representations of the world

How can we overcome this intelligence
gap?
By forcing our models to learn very compact and
disentangled representations.

Disentangled factors
● P( X | Z), where X is an image, Z is a vector that causes (explains) X
● We would like the dimensions of Z to describe
real world factors
● Z which has a separate dimension for lighting, guitar,
bookshelf , rotation will be considered more
disentangled than the raw pixels of X
● P(guitar | Z) can be easily computed with
Disentangled representation.

Applications
Short term applications
● Image translation, denoising, super-resolution
● Domain Adaptation
● Music, Audio and Text Generation
Long term applications
● Understanding of the real world

Discriminative Models
● ImageNet. Here y would be the vector of 1000 labels and x some image from
the dataset.
● They are trying to maximize log P(y | x)
● Predictions obtained by argmax of yi
: P(yi
| x)
● Classification models are discriminative ones.

● During training maximize the probability log P(X)
● Generate new sampled images close to the ImageNet distribution P(X)
● During inference for some image X depending on the model you might be
able to estimate the probability of the image X under the model
Generative Models

p(y, w, β)p(y, β ; w)
Discriminative Generative
y - text categories
w - sequence of words
β - model parameters

Properties and Drawbacks of Discriminative Models
● Good at capturing statistical regularities of the data
● Find features invariant to characteristics you don’t care for the task
○ Object classification: Rotation, Translation, Lighting, Color
○ Segmentation: You care for Rotation, Translation
● Having difficulties to build disentangled representations
● Adversarial examples are good example for that

Generation from Discriminative Model (Example)
Handwriting Model This is regarding my friend, Kate Zack
Gradient ascent on the input image X

Generation from Discriminative Model (Example)
Handwriting Model
P E T K O
X
Maximize

Generative Models
● Gaussian mixture model
● Hidden Markov model
● Naive Bayes
● Latent Dirichlet allocation
● … many others

Deep Generative Models
● Restricted Boltzmann Machines
● Variational Autoencoders
● PixelRNN, PixelCNN
● Generative Adversarial Networks
● Neural Language Models
● WaveNet

Deep Generative Models
Generator
Latent variables (code)

Autoencoders
Autoencoder
network
Loss = Pixelwise L2 or Softmax.

● Latent variables
● Lower dimensional than the input
Autoencoders
Encoder Decoder
Loss =

Autoencoders
● random latent code won’t get us anywhere
● Pass an image to the encoder to get “valid” code
Encoder Decoder

● Encoder-Decoder architecture
● Forcing the latent code to be Gaussian distributed
● Sample the latent code from the Gaussian and pass it to the decoder network
Variational Autoencoders

Variational Autoencoders
Encoder Decoder
Loss =
Mean
Std
Sampled code

● CIFAR-10
● Blurry images
● Good
approximation
of the likelihood
of the input
data
Variational Autoencoders - Samples
Input Output

Deep Recurrent Attentive Writer
● Generates the image sequentially
● On each step the model decides where to focus and draw
● Uses an attention mechanism to achieve it
○ A topic of different lecture :(

Deep Recurrent Attention Writer (DRAW)
● Google Street View Numbers
● The red rectangle is showing
where the model is attending on
the current step
● Impressive as DRAW is the first
successful model that generates
images sequentially

Deep Recurrent Attentive Writer
VAEs DRAW

Fully Convolutional Model
● Typically using pre-trained classification network as encoder
● Most often VGG-16, because it’s fast and has less parameters
● Using transposed convolution layers as decoder until we reach the desired
shape
● Often the architecture of the encoder is the transposed of the one of the
decoder

Fully Convolutional Decoder
Radford et al 2015

Transposed Convolution (Deconvolution)
● Kernel size = 3
● Stride = 1
Animation source: leonardoaraujosantos.gitbooks.io
Input layer
Output layer

Properties of Transposed Convolution
● During backpropagation a convolutional layer becomes transposed
convolution
● Checkerboard pattern might appear in the generated image (sensitive to
kernel and stride sizes)
Odena, et al., "Deconvolution and Checkerboard
Artifacts", Distill, 2016. http://doi.org/10.23915

● In practice, VAEs latent code dimensions are very interpretable
● To achieve this it collapses some latent dimensions and doesn’t use them
● Able to generate samples close to the data distribution
Pros of VAEs

Drawbacks of VAEs
● Pixels in the L2 loss function are independent, which leads to blurry images
● The exact probability of a generated image under the model is intractable to
compute
X - input image, Z latent code

Generative Adversarial Networks (GAN)
● A generative model invented by Ian Goodfellow in 2014
● Already widely adopted and an area of massive research
● New GAN paper is published every week
● Has many awesome applications. We’ll see some of them later on.
● GANs define the generative problem as an adversarial game between two
networks

GANs
Generator
Discriminator
Sample
Real
Images
Sample
Real Fake
Loss

GANs - Discriminator Training
Generator
Discriminator
Sample
Real
Images
Sample
Real Fake
Classification Loss

GANs - Generator Training
Generator
Discriminator
Sample
Real
Images
Sample
Real Fake
Maximize

Problems in GAN Training
● Instable during training
● Mode colapse
● Higher Log-likelihood != better samples
However, GAN training is getting easier. Checkout Wassterstein GANs and
LSGANs .

Generative Adversarial Networks

Conditional Generative Adversarial Networks
Real
Data
● Consists of pairs (X,Y)
● P(Y|X) : generate Y given X

Image-to-Image translation (pix2pix)
pix2pix:Image-to-Image Translation with CANs, Isola et al 2016

Image-to-Image translation (example)
pix2pix:Image-to-Image Translation with CANs, Isola et al 2016

CycleGAN
X Y
● X and Y are unpaired collections of images
● different domains of the same world
● Learn to translate image X into Y

CycleGAN
Zhu et al 2017 (Unpaired Image-to-Image Translation using Cycle-Consisten GANs)

CycleGAN (failure case)

CycleGAN
● Cycle Consistency Loss
○ || F(G(X)) - X ||
○ || G(F(Y)) - Y ||

References
● https://blog.openai.com/generative-models
● http://distill.pub/2016/deconv-checkerboard/
● https://github.com/junyanz/CycleGAN
● https://github.com/phillipi/pix2pix
● http://videolectures.net/deeplearning2015_bengio_generative_models/
● https://www.youtube.com/watch?v=P78QYjWh5sM&spfreload=1
● http://image-net.org/explore

Deep Generative Modelling

Recommended

Recommended

More Related Content

Similar to Deep Generative Modelling

Similar to Deep Generative Modelling (20)

Recently uploaded

Recently uploaded (20)

Deep Generative Modelling