Generative Adversarial Nets.pdf

Introduc)on to
Genera)ve Adversarial Nets
Stefan Mathe

Roadmap
1.  What are genera)ve models?
2.  Mo)va)on
3.  A Taxonomy of genera)ve models
4.  Genera)ve Adversarial Nets (GANs)
1.  Model Deﬁni)on
2.  Theore)cal Guarantees
3.  Generaliza)ons
4.  Evalua)on
5.  Conclusions

What are Genera)ve Models?
•  Input: a training set of samples drawn from a
distribu)on 𝑝↓data 
•  Output: an es)mate of 𝑝↓data 
– How do we represent it?

Represen)ng 𝑝↓𝑑𝑎𝑡𝑎 
as a probability
density func)on
as a sample
generator
training
samples
model
generated
samples

Why study genera)ve models?
•  Model Based Reinforcement Learning
•  Semi-supervised Learning
•  Handling mul)-modal outputs
•  Generate realis)c samples
– Single image super-resolu)on
– Crea)ng art
– HandwriTen Digit Genera)on
– Image-to-image transla)on

Single Image Super-Resolu)on
Ledig et al. (2016)
original bicubic interpola)on SRResNet SRGAN
SRResNet: Super-resolu)on ResNet
SRGAN: Super-resolu)on SRGAN mul)-modal response => not blurry!

Crea)ng Art: Interac)ve GAN (iGAN)
Zhu et al. (2016)
hTps://www.youtube.com/watch?v=9c4z6YsBGQ0

HandwriTen Digit Genera)on
Kingma and Welling (2013)
hTp://dpkingma.com/sgvb_mnist_demo/demo.html

Image-to-image Transla)on
Isola et al. (2016)

A Taxonomy of Genera)ve
Models

Maximum Likelihood (ML)
𝜽↑∗ =arg max┬𝜽 ⁠∑𝑖=1↑𝑛▒log(𝑝↓model ↑𝜽 
(𝒙↓𝑖 ))  
Figure reproduced from Goodfellow et al. (2016)

ML and the
Kullback Leibler (KL) divergence
•  Our training samples deﬁne an empirical
distribu)on 𝑝 ↓data 
𝑝 ↓data (𝑥)=∑𝑖=1↑𝑛▒1↓𝑥↓𝑖  (𝑥) 
𝜽↑∗ =arg min┬𝜽 ⁠𝐷↓𝐾𝐿 (𝑝 ↓data ‖
𝑝↓model  ) 
•  ML is equivalent to minimizing the KL
divergence between 𝑝 ↓data  and 𝑝↓model

A Taxonomy of Genera)ve Models
Genera)ve Model
Explicit Density Implicit Density
Tractable
Density
Approximate
Density
Direct Sampling
Markov Chain
Sampling
Fully Visible
Belief Nets
(FVBN)
Nonlinear
ICA
Varia)onal
Autoencoder
(VAE)
Boltzmann
Machine
Genera)ve
Adversarial
Nets (GAN)
Genera)ve
Stochas)c
Networks (GSN)
Adapted from Goodfellow et al. (2014)
state-of-the-art

Explicit Density Models
•  Explicitly represent 𝑝↓model (𝒙;𝜽)
•  Advantages:
–  Easy to op)mize, just plug 𝑝↓model  into the ML objec)ve
–  Can evaluate the likelihood of any sample, if needed
•  Disadvantages:
– 𝑝↓model  must be complex enough => tractability issues
•  Solu)on 1: restrict 𝑝↓model  to a tractable, but rela)vely strong,
family (FVBN, nonlinear ICA)
•  Solu)on 2: approximate 𝑝↓model  (VAEs, Boltzmann Machines)
–  Hard to generate new samples

Implicit Density Models
•  Interact indirectly with 𝑝↓model (𝒙;𝜽) by sampling
•  Advantages:
–  Sampling is straighiorward
•  Disadvantages:
–  Likelihood is expensive to compute
•  Sampling procedures
–  Itera)ve (GSNs):
•  Learn the denoising distribu)on (ojen unimodal) via ML
•  Pick a training sample, apply noise and denoise repeatedly
•  Ajer enough itera)ons, we get a sample from 𝑝↓data (𝒙)
–  Direct (GANs)
•  Sample in a single step
•  Objec)ve func)ons

Genera)ve Stochas)c Networks
•  How do we sample?
–  Pick a random training example
–  Apply noise and denoise repeatedly
–  Ajer enough itera)ons, we get a sample from 𝑝↓data (𝒙)
•  What do we learn?
–  the denoising distribu)on 𝑝𝒙⁠𝒙   via ML
•  Advantages:
–  Learning is cast as an op)miza)on problem
–  𝑝𝒙⁠𝒙   is known to be easy to learn
•  Disadvantage:
–  Sampling is expensive

Genera)ve Adversarial Networks
𝑝↓g (𝒙;𝜽) =𝑝↓𝒛 (𝐺(𝒛;𝜽))
𝑝↓𝒛 
𝒛
𝑝↓𝒈 
𝒙
𝐺(𝒛,𝜽)
•  How do we sample?
–  Pick a random latent variable 𝑧 from a
fixed distribu)on 𝑝↓𝒛  (e.g. Gaussian)
–  Pass 𝑧 through a trained generator
network 𝐺(𝒛;𝜽) that produces the
sample
•  What do we learn?
–  The generator 𝐺(𝒛;𝜽)
•  Advantages:
–  Sampling is trivial (forward prop) and
efficient
•  Disadvantage:
–  We need to cast learning 𝐺(𝒛;𝜽) as the
Nash equilibrium of a game => more
difficult than an op)miza)on!

Genera)ve Adversarial Training
•  Formulate the problem as a game between:
–  The generator 𝑝↓g (𝒙;𝜽↓𝒈 )=𝑝↓𝒛 (𝐺(𝒛;𝜽↓𝒈 )) (as
before)
–  The discriminator 𝐷(𝒙;𝜽↓𝒅 ): tries to determine whether
𝒙 was sampled from 𝑝↓data 

𝜽↓𝒈 ↑∗ =argmin┬𝜽↓𝒈  ⁠max┬𝜽↓𝒅  ⁠𝔼↓𝑥~𝑝↓data  [log⁠𝐷(𝒙;𝜽↓𝒅 ) ]  +𝔼↓𝑧~𝑝↓𝐳  [
log⁠(1−𝐷(𝐺(𝒛;𝜽↓𝒈 )) ]
•  Formally:

•  Cannot find the op)mum D for each G (too expensive)!
•  Solu)on: Alternate between op)mizing G (keeping D fixed)
and op)mizing D (keeping G fixed) => a minimax game

Convergence Guarantees
•  Only available for inﬁnite capacity models
1.  Minimax game has a global minimum at 𝑝↓g =
𝑝↓data 
2.  If 𝐷 is allowed to reach its op)mum in the inner
loop in the algorithm, then 𝑝↓g →𝑝↓data 
•  We don’t yet have suﬃcient theore)cal
support for the success of these models!

The Minimax Game (Generalized)
•  D minimizes 𝐽↑(𝐷) (𝜽↓𝒈 ,𝜽↓𝒅 ) w.r.t. 𝜽↓𝒅  and
updates 𝜽↓𝒅 
•  G minimizes 𝐽↑(𝐺) (𝜽↓𝒈 ,𝜽↓𝒅 ) w.r.t. 𝜽↓𝒈  and
updates 𝜽↓𝒈 
•  For D, we always use the cross entropy:

𝐽↑(𝐷) =⁠−𝔼↓𝑥~𝑝↓data  [log⁠𝐷(𝒙;𝜽↓𝒅 ) ]−
𝔼↓𝑧~𝑝↓𝐳  [log⁠(1−𝐷(𝐺(𝒛;𝜽↓𝒈 )) ] 

•  For G, in the minimax game:

Heuris)c non-satura)ng game
•  Problem with minimax: when 𝐷 rejects generated
samples, G has no gradient!
•  Solu)on: ﬂip the target of the cross-entropy for G

𝐽↑(𝐺) =⁠𝔼↓𝑧~𝑝↓𝐳  [log⁠(𝐷(𝐺(𝒛;𝜽↓𝒈 )) ] 

•  G minimizes: 𝐷↓𝐾𝐿 (𝑝↓model ‖𝑝↓data  )−2𝐷↓𝐽𝑆 
(𝑝↓data ‖𝑝↓model  )
•  L Not nice (but it works!):
–  No longer a 0-sum game
–  Gets us even further from the theore)cal guarantees
•  Recent work by Arjovski et. al. (2017) removes
the need for such tricks J (not presented here)

Maximum likelihood game
•  It can be shown that the minimax game
op)mizes the Jensen-Shannon (JS) divergence
between 𝑝↓data  and 𝑝↓model 
•  We can make the model op)mize the KL
divergence if we set
𝐽↑(𝐺) =⁠−𝔼↓𝑧~𝑝↓𝐳  [log⁠(𝜌↑−1 (𝐺(𝒛;𝜽↓𝒈 )) ]

Quan)ta)ve Evalua)on
•  How to compare models?
–  Problem: log likelihood not easy to compute for genera)ve
machines
–  Solu)on: es)mate via Parzen Windowing
•  At least comparable to other methods on MNIST,
TFD

Qualita)ve Comparison
•  Non-trivial, sharp solu)ons (not memorizing data)
MNIST TFD
CIFAR-10 (fully connected) CIFAR-10 (conv D, deconv. G)

What makes GANs work?
(Give sharper results than VAEs)
•  Ini)al hypothesis:
–  because they minimize JS instead of KL
–  KL is not symmetric, minimizing JS similar to reverse KL
•  Not true!
–  ML GANs s)ll generate sharp results
–  GANs prefer far fewer modes than G’s capacity would allow
•  Mistery solved recently by Arjovski et. al. (2017)”:
–  Both JS and KL induce convergence issues
–  There is a probability measure to use

The Convergence Problem
•  We only have theore)cal guarantees for convergence in
func)on space
•  Typical failure: mode collapse (the Helve4ca Scenario)
•  Hypothesis: maximin diﬀerent from minimax, but the
associated games (simultaneous descent) are almost
iden)cal!
min┬𝜽↓𝒈  ⁠max┬𝜽↓𝒅  ⁠𝑉(𝜽↓𝒈 ,𝜽↓𝒅 )   ≠ max┬𝜽↓𝒅  ⁠min┬𝜽↓𝒈  ⁠𝑉(𝜽↓𝒈 ,𝜽↓

Conclusions
•  Contribu)on: GANs completely break away from the ML
approach by switching to an adversarial mini-max game
formula)on
•  Strengths:
–  Easy and eﬃcient sample genera)on process
–  Simple training algorithm
–  No need for a noise model
–  State-of-the art results (qualita)vely the best)
•  Weaknesses:
–  No explicit likelihood representa)on
–  Convergence problems (Helve)ca scenario)
–  Model comparison issues (Parzen Windows, high variance)
–  We don’t know why they work (no theore)cal guarantees)
•  But see: Arjovski et. al. (2017)” for a recent and elegant
solu)on for the former two!

Stop GAN Violence!
While the costs of human violence have aFracted a great deal of aFenGon from
the research community, the effects of the network-on-network (NoN) violence
popularised by GeneraGve Adversarial Networks have yet to be addressed. In this
work, we quanGfy the financial, social, spiritual, cultural, grammaGcal and
dermatological impact of this aggression and address the issue by proposing a
more peaceful approach which we term GeneraGve Unadversarial Networks
(GUNs). Under this framework, we simultaneously train two models: a generator
G that does its best to capture whichever data distribuGon it feels it can manage,
and a moGvator M that helps G to achieve its dream. FighGng is strictly verboten
and both models evolve by learning to respect their differences. The framework is
both theoreGcally and electrically grounded in game theory, and can be viewed
as a winner-shares-all two-player game in which both players work as a team to
achieve the best score. Experiments show that by working in harmony, the
proposed model is able to claim both the moral and log-likelihood high ground.
Our work builds on a rich history of carefully argued posiGon-papers, published as
anonymous YouTube comments, which prove that the opGmal soluGon to NoN
violence is more GUNs.
Albanie et al. arXiv:1703.02528, 2017

Resources
•  Code and pretrained model:
– hTps://github.com/goodfeli/adversarial
•  Tutorial:
– hTps://arxiv.org/pdf/1701.00160.pdf

References
•  [1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S.
Ozair, A. Courville and Y. Bengio. “Genera)ve Adversarial Nets”, NIPS,
2014.
•  [2] I. Goodfellow, ”Genera)ve Adversarial Networks”, NIPS 2016 Tutorial
•  [3] M. Arjovsky, S. Chintala, L. BoTou, “Wasserstein GAN”, ArXiV
1701.07875v2, 2017
•  [4] S. Albanie, S. Ehrhardt, J. F. Henriques, “Stopping GAN Violence:
Genera)ve Unadversarial Networks”, ArXiV 1703.02528v1, 2017

Generative Adversarial Nets.pdf

More Related Content

Similar to Generative Adversarial Nets.pdf

Recently uploaded

Generative Adversarial Nets.pdf