Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning.
A 15 minutes seminar which explains the PPGNs models.
The paper: https://arxiv.org/abs/1612.00005
Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
1. Plug & Play Generative Networks:
Conditional Iterative Generation of
Images in Latent Space
Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey Dosovitskiy, Jeff Clune
2017
By Safaa Alnabulsi
2. Index
Motivation
What is GAN?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h)
Additional Results
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 2
3. Motivation
Challenges solved in generating Images:
High quality images at higher resolutions (227 × 227):
Current image generative models often work well at low resolutions (e.g. 32 ×
32), but struggle to generate high-resolution (e.g. 128 × 128 or higher), due to many
challenges including difficulty in training and computationally expensive sampling
procedures.
Author | Safaa Alnabulsi 3
4. Motivation
Challenges solved in generating Images:
High resolutions images
Realistic and diverse samples within a class
Author | Safaa Alnabulsi 4
5. Motivation
Challenges solved in generating Images:
High resolutions images
Realistic and diverse samples within a class
Does it for all 1000 ImageNet categories
Author | Safaa Alnabulsi 5
6. Index
Motivation
What is GAN?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h)
Additional Results
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 6
7. What is GAN?
GAN consists of two models:
- Generative model (G): generates new data instances / models the distribution of
individual classes. p(x|c)
- Discriminative model (D): evaluates them for authenticity/ learns the boundary
between classes
Author | Safaa Alnabulsi 7
8. How does GAN work?
GANs are formulated as a game between two networks and it is important (and tricky!) to
keep them in balance!
Author | Safaa Alnabulsi
Cite:
Real / Fake
8
9. Plug and Play Generative Networks
Author | Safaa Alnabulsi 9
10. Index
Motivation
What is GAN?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h)
Additional Results
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 10
11. Probabilistic interpretation of iterative
image generation methods
MALA-approx: Metropolis-adjusted Langevin algorithm (MALA )+ Markov chain
Monte Carlo (MCMC) which uses the following transition operator:
Author | Safaa Alnabulsi 11
12. Probabilistic framework for Activation
Maximization
take a step from the current
image xt toward one that looks
more like a generic image (an
image from any class).
take a step from the current image
xt toward an image that causes the
classifier to output higher
confidence in the chosen class.
add a small amount of noise to
jump around the search space
to encourage a diversity of
images
Author | Safaa Alnabulsi
prior conditionjoint model
12
14. Index
Motivation
What is GAN?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h)
Additional results
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 14
16. Index
Motivation
What is GAN?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h)
Additional Results
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 16
17. Method | DGN-AM: sampling without a
learned prior
Deep Generator Network-based Activation Maximization
Perform sampling in this lower-dimensional h-space.
h in this case represents features extracted from the first fully connected layer
(called fc6) of a pre-trained AlexNet on 1000-class ImageNet classification
network
Author | Safaa Alnabulsi 17
18. Method | DGN-AM: sampling without a
learned prior
Once we trained the network G we find the equation for the MALA algorithm
No learned prior No noise
Author | Safaa Alnabulsi 18
19. Method | DGN-AM: sampling without a
learned prior
proc:
• Sampling in input space h is faster
than image space x.
• produce realistic images at a high
resolution.
• It can also produce interesting new
types of images that G never saw
during training
Author | Safaa Alnabulsi
cons:
• Low mixing speed
• Same image after many steps
• Lack of diversity
19
20. Index
Motivation
What is GAN?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h)
Additional Results
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 20
21. Method | PPGN-x: DAE model of p(x)
What is DAE (Denoising Autoencoder) ?
DAE
Noisex
+
R(x)
Author | Safaa Alnabulsi 21
23. Method | PPGN-x: DAE model of p(x)
proc:
• sampling from the entire model.
Author | Safaa Alnabulsi
cons:
• it models the data distribution
poorly.
• the chain mixes slowly (sampling in
the high-dimensional image space)
23
24. Index
Motivation
What is GAN?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h) <-- (This paper‘s model)
Additional Results
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 24
25. Method | PPGN-h: Generator and DAE
model of p(h)
Author | Safaa Alnabulsi
To address the poor mixing speed of DGN-AM, they incorporate a proper p(h) prior
learned via a DAE into the sampling procedure
25
26. Method | PPGN-h: Generator and DAE
model of p(h)
Author | Safaa Alnabulsi
The update rule to sample h from this model:
26
27. Method | PPGN-h: Generator and DAE model
of p(h)
proc:
• the chain mixes faster than PPGN-x
Author | Safaa Alnabulsi
cons:
• Sample from PPGN-h are
qualitatively similar to those from
DGN-AM
• Samples still lack quality and
diversity. ( poor p(h) model
learned by the DAE).
27
28. Index
Motivation
What is GAN?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h)
Additional Resuls
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 28
29. Generating images with different condition
networks | Captioning
PPGNs can be flexibly turned into a text-to-image model by combining the
prior with an image captioning network, and this process does not even
require additional training.
Author | Safaa Alnabulsi 29
30. Generating images with different condition
networks | Multifaceted Feature Visualization
Instead of conditioning on a class output neuron, here we condition on a
hidden neuron, revealing many facets that a neuron has learned to detect
Author | Safaa Alnabulsi 30
Figure 6: Images synthesized to activate a hidden neuron (number 196) previously identified as a
“face detector neuron”
31. Index
Motivation
What is GAN?
How does GAN work?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h)
Additional results
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 31
32. Inpainting
Because PPGNs can be interpreted
probabilistically, we can also sample
from them conditioned on part of an
image (in addition to the class
condition) to perform inpainting.
Author | Safaa Alnabulsi 32
33. Index
Motivation
What is GAN?
Probabilistic interpretation of iterative image generation methods
Methods and Experiments
o PDGN-AM: sampling without a learned prior
o PGN-x: DAE model of p(x)
o PPGN-h: Generator and DAE model of p(h)
Additional Results
o Generating images with different condition networks (Captioning , Multifaceted Feature Visualization )
o Inpainting
Conclusion
Author | Safaa Alnabulsi 33
34. Conclusion
P&P model generates images in 227x227 which is considered HD in Image
generation models.
The most useful property of PPGN is the capability of “plug and play”—
allowing one to drop in a replaceable condition network and generate images
according to a condition specified ( Classs, caption or neuron ) at test time.
Usage of PPGNs to synthesize images for videos or create arts with one or
even multiple condition networks at the same time.
The approach is modality-agnostic and can be applied to many types of
data.
Author | Safaa Alnabulsi 34
Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning
Here are the steps a GAN takes:
- The generator takes in random numbers and returns an image.
- This generated image is fed into the discriminator alongside a stream of images taken from the actual dataset.
- The discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with 1 representing a prediction of authenticity and 0 representing fake.
So you have a double feedback loop:
- The discriminator is in a feedback loop with the ground truth of the images, which we know.
- The generator is in a feedback loop with the discriminator.
Ref: https://skymind.ai/wiki/generative-adversarial-network-gan
In Plug and play, to generate images they combine two different types of networks:
The Generator can be thought of as generic painter that draws wide variety of images and show them to the conditioner Which looks at the images and tell the generator what to draw next
And this process is iterative
replaceable conditioner
for obtaining random samples, they use the
Metropolis-adjusted Langevin algorithm (MALA) is a Markov chain Monte Carlo (MCMC) method
probability distribution for which direct sampling is difficult.
The furture state is based on the current step + gradient + noise
joint model p(x, y), which can be decomposed into an image model and a clas- sification model:
In every update they encourage the image to be realistic, class specific and diverse.
It mixes fast because we sample in the latent space.
I will cover that in the upcoming slides.
In this paper, they propose a class of models called PPGNs that are composed of 1) a generator network G that is trained to draw a wide range of image types, and 2) a replaceable “condition” network C that tells G what to draw
Starting from b
Instead of sampling in the image space (i.e. in the space of individual pixels)
they sample in the abstract, high-level feature space h of a generator G trained to reconstruct images x from compressed features h extracted from a pre-trained encoder E (f).
So the input in here is no longer an image but rather a random noise vector .
Because the generator network was trained to produce realistic images, it serves as a prior on p(x) since it ideally can only generate real images.
However, this model has no learned prior on p(h)
Explain the update equation:
x is a deterministic variable, so we can simplify the model
they define a Gaussian p(h) centered at 0
The final h is pushed through G to produce an image sample.
They train a DAE for images and incorporate it to the sampling procedure as a p(x) prior to avoid fooling examples
in here explain what the dae? A trained network which reconstructs an image after adding some random noise to it.
We use it to calculate the score function p(x) by subtracting the output from input and divide them by the added noise (squared segma)
We take here the MALA equation and put the formula from the DAE in first term, and this would represent how we generates images.
I will be mentioning the mixing speed in the presnetion which means ( "mixing time" of the Markov chain ):
The mixing time has a direct impact on sampling quality since, the smaller the mixing time, the faster the convergence of the Markov chain to the stationary distribution, and the smaller the correlation in the samples.
In this approach we notice that and says the cons.
In here they included all three ǫ terms.
"mixing time" of the Markov chain:
The mixing time has a direct impact on sampling quality since, the smaller the mixing time, the faster the convergence of the Markov chain to the stationary distribution, and the smaller the correlation in the samples.
Inpainting is filling in missing pixels given the observed context regions.
The model must understand the entire image to be able to reasonably fill in a large masked out region that is positioned randomly