Explaining the basic mechanism of DNGAN.
Planned to be presented at TensorFlow study meetup (5) in Tokyo.
http://connpass.com/event/38073/
2016/09/23 ver1.0 Upload
2016/09/26 ver1.1 Correct some wordings
Google confidential | Do not distribute
DCGAN How does it work?
Etsuji Nakai
Cloud Solutions Architect at Google
2016/09/26 ver1.1 GIF Animation
https://goo.gl/zXL1bV
$ who am i
▪Etsuji Nakai
Cloud Solutions Architect at Google
Twitter @enakai00
Now on Sale!
What is DCGAN?
▪ DCGAN: Deep Convolutional Generative Adversarial Networks
● It works in the opposite direction of the image classifier (CNN).
● CNN transforms an image to a class label (list of probabilities).
● DCGAN generates an image from random parameters.
(0.01, 0.05, 0.91, 0.02, ...)
deer dog cat human ...
(0.01, 0.05, 0.91, 0.02, ...)
CNN
DCGAN
Probabilities of each entry.
What do these
numbers mean?
Random parameters
Examples of Convolutional Filters
▪ Convolutional filters are ... just an image filter you sometimes apply in Photoshop!
Filter to blur images Filter to extract vertical edges
Convolutional Filters in CNN
▪ CNN applies a lot of filters to extract various features from a single image.
▪ CNN applies multi-layered filters to a single image (to extract features of
features?)
▪ A filtered image becomes smaller to drop off unnecessary details.
Extracting vertical and horizontal edges using two filters.
Convolutional Filters in CNN
▪ This shows how filters are
applied to a multi-layered image.
Input image
Output image A
Output image B
Filter A
Filter B
Apply independent
filters to each layer
Sum up resulting images
from each layer
Typical CNN Filtering Layers
http://arxiv.org/abs/1511.06434
RGB layers of a
single 64x64 image.
128 layers of
32x32 images.
256 layers of
16x16 images.A list of
probabilities
・・・
▪ Starting from a single RGB image on the right, multiple filtering layers are applied
to produce smaller (and more) images.
Image Generation Flow of DCGAN
http://arxiv.org/abs/1511.06434
RGB layers of a
single 64x64 image.
512 layers of
8x8 images.
1024 layers of
4x4 images.A list of random
numbers
・・・
▪ Basically, it's just flipping the direction. No magic!
Illustration of Convolution Operations
▪ Convolutional filters in CNN and transposed-convolutional filters in DCGAN works
in the opposite directions. Here's a good Illustration how they work.
http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html
Convolution:
(Up to) 3x3 blue pixels contribute to
generate a single green pixel. Each
of 3x3 blue pixels is multiplied by
the corresponding filter value, and
the results from different blue
pixels are summed up to be a
single green pixel.
Transposed-convolution:
A single green pixel contributes to
generate (up to) 3x3 blue pixels.
Each green pixel is multiplied by
each of 3x3 filter values, and the
results from different green pixels
are summed up to be a single blue
pixel.
GIF Animation
https://goo.gl/tAY4BL
Training Strategy of DCGAN
It's a fake!
▪ We train two models simultaneously.
● CNN: Classifying authentic and fake images.
● "Authentic" images are provided as training data to CNN.
● DCGAN: Trained to generate images classified as authentic by CNN.
● By trying to fool CNN, DCGAN learns to generate images similar to the training data.
CNN DCGAN
Training data
Training Loop of DCGAN
▪ By repeating this loop, CNN
becomes more accurate and
DCGAN becomes more crafty.
CNN
DCGAN
Training data B
Generated image A
Random numbers
P(A) : Probability that
A is authentic.
P(B) : Probability that
B is authentic.
Modify parameters such that
P(A) becomes large
Modify parameters such that
P(A) becomes small
and P(B) becomes large
Model
▪ Training data : MNIST (28x28 pixels, grayscale images)
▪ DCGAN : Generate a single 28x28 image from 64 parameters.
● → 128 x (7x7) → 64 x (14x14) → 1 x (28x28)
▪ CNN : Calculate a probability that a single 28x28 image is authentic.
● 1 x (28x28) → 64 x (14✕14) → 128 x (7x7) → Probability of authentic image
▪ Batch size : 32
● Modify filter parameters using 32 generated images and 32 MNIST images at a
time.
Learning Process
▪ This shows the evolution of images
generated from the same input parameters
during the training loop. (DCGAN's filters are
initialized with random values.)
Playing with Input Parameters
▪ If we change the input parameter, the shape of generated image changes too. By
making small, contiguous changes to the input, we can achieve a morphing effect.
▪ Since the input parameter is a point in the 64 dimensional space, we can draw a
straight line between two points. The end points represent images before and
after morphing.
Playing with Input Parameters
▪ Using more complicated closed loop in the parameter space, we can even make a
dancing image :)
▪ The sample image on this page is generated from the trajectory over a sphere
(embedded in the 64 dimensional space.)
GIF Animation
https://goo.gl/zXL1bV
Interpretation of Input Parameters
▪ In the DCGAN paper, it is suggested that the input parameters could use a
semantic structure as in the following example.
Smile
ManWoman
Neutral
Neutral Woman
Smiling Woman Smiling Man
Neutral Man
http://arxiv.org/abs/1511.06434