Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unsupervised learning represenation with DCGAN


Published on

Unsupervised learning representation with Deep Convolutional Generative Adversarial Network, Paper by Alec Radford, Luke Metz, and Soumith Chintala
(indico Research, Facebook AI Research).

Published in: Education
  • Be the first to comment

Unsupervised learning represenation with DCGAN

  1. 1. UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS Alec Radford, Luke Metz, and Soumith Chintala (indico Research, Facebook AI Research) Accepted paper of ICLR 2016 HY587 Paper Presentation Shyam Krishna Khadka George Simantiris
  2. 2. UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS Introduced by Ian Goodfellow in 2014: Generative Adversarial Nets. Advances in Neural Information Processing Systems, 2672–2680. Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. (2014). GANs are focused on the optimization of competing criteria: “We simultaneously train two models: a generative model G and a discriminative model D. Eg: G: Forger that produces counterfeit money D: Police to identify whether it is true money or fake End goal: G produces money that is hard to be distinguished by D.
  3. 3. UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS Unsupervised learning that actually works well to generate and discriminate! Generated results are hard to believe, but qualitative experiments are convincing.
  4. 4. UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS Main contribution: Extensive model exploration to identify a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models. Other contributions: • use the trained discriminator for image classification • the generators have vector arithmetic properties
  5. 5. Images generated from this method: References:
  8. 8. Overview of the Deep Convolutional Generative Adversarial Network (DCGAN) Can be thought of as two separate networks
  9. 9. Generator G(.) input = random numbers output = generated image Generated image G(z): Uniform noise vector (random numbers, z = a 100-dimensional vector from a uniform distribution)  z is the distribution that creates new images!
  10. 10. Discriminator D(.) input = real/generated image output = prediction of real image
  11. 11. Generator G(.) Discriminator D(.) Generator Goal: Fool D(G(z)) i.e., generate an image G(z) such that D(G(z)) is wrong, i.e., D(G(z)) = 1. Discriminator Goal: discriminate between real and generated images i.e., D(x)=1, where x is a real image D(G(z))=0, where G(z) is a generated image.  Conflicting goals.  Both goals are unsupervised.  Optimal when D(.)=0.5 (i.e., cannot tell the difference between real and generated images) and G(z)=learns the training images distribution. Example Architecture:
  12. 12. DCGAN Generator: Fully-connected layer (composed of weights) reshaped to have width, height and feature maps Uses ReLU activation functions Fractionally-strided convolutions: 8x8 input, 5x5 conv window = 16x16 output Batch Normalization: normalize responses to have zero mean and unit variance over the entire mini- batch, but not in last layer (to prevent sample oscilation and model instability) Uses Tanh to scale generated image output between -1 and 1 No max pooling! Increases spatial dimensionality through fractionally-strided convolutions
  13. 13. Fractionally-strided convolution Input = 5x5 with zero-padding at border = 6x6 (stride=2) Output = 3x3 Input = 3x3 Interlace zero-padding with inputs = 7x7 (stride=1) Output = 5x5 Filter size=3x3 Clear dashed squares = zero-padded inputs Regular convolution
  14. 14. DCGAN Discriminator: Real image Generated Uses LeakyReLU activation functions Batch Normalization No max pooling! Reduces spatial dimensionality through strided convolutions Sigmoid (between 0-1) Stride 2, padding 2
  15. 15. ARCHITECTURE GUIDELINES FOR STABLE DEEP CONVOLUTIONAL GANS  Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).  Use batchnorm in both the generator and the discriminator.  Remove fully connected hidden layers for deeper architectures.  Use ReLU activation in generator.  Use LeakyReLU activation in the discriminator.
  16. 16. DETAILS OF ADVERSARIAL TRAINING  Pre-processing: scale images between -1 and 1 (tanh range).  Minibatch SGD (m = 128).  Weight init.: zero-centered normal distribution (std. dev. = 0.02).  Leaky ReLU slope = 0.2.  Adam optimizer with tuned hyperparameters to accelerate training.  Learning rate = 0.0002.  Momentum term β1 = 0.5 to stabilize training. DCGANs were trained on three datasets: Large-scale Scene Understanding (LSUN), Imagenet-1k, Faces (newly assembled).
  17. 17. GENERATED IMAGES AND SANITY CHECKS THAT IT'S NOT JUST MEMORIZING EXAMPLES… Generated LSUN bedrooms after one (left) and five (right) epochs of training.
  19. 19. Average 4 vectors from exemplar faces looking left and 4 looking right. Interpolate between the left and right vectors creates a "turn vector“.
  20. 20. (Top) Unmodified sample generated images (Bottom) Samples generated after dropping out "window" concept. Some windows are removed or transformed. The overall scene stays the same, indicating the generator has separated objects (windows) from the scene. MANIPULATING THE GENERATOR REPRESENTATION (FORGETTING TO DRAW CERTAIN OBJECTS)
  21. 21. Find 3 exemplar images (e.g., 3 smiling women) Average their Z vector Other images produced by adding small uniform noise to the new vector! Generate an image based on this new vector!!! Do simple vector arithmetic operations Arithmetic in pixel space VECTOR ARITHMETIC ON FACE SAMPLES
  22. 22. GANS AS FEATURE EXTRACTOR CIFAR-10 1) Train on ImageNet 2) Get all the responses from the Discriminator's layers 3) Max-pool each layer to get a 4x4 spatial grid 4) Flatten to form feature vector 5) Train a regularized linear L2-SVM classifier for CIFAR-10 (note: while other approaches achieve higher performance, this network was not trained on CIFAR-10!)
  23. 23. SUMMARY  Unsupervised learning that really seems to work.  Visualizations indicate that the Generator is learning something close to the true distribution of real images.  Classification performance using the Discriminator features indicates that features learned are discriminative of the underlying classes.
  24. 24. APPENDIX: OPTIMIZING A GENERATIVE ADVERSARIAL NETWORK (GAN) Gradient w.r.t the parameters of the Discriminator Gradient w.r.t the parameters of the Generator maximize minimize Loss function to maximize for the Discriminator Loss function to minimize for the Generator Interpretation: compute the gradient of the loss function, and then update the parameters to min/max the loss function (gradient descent/ascent)
  25. 25. EXAMPLE 1: Uniform noise vector (random numbers) Real images minimize Imagine for a real image D(x) scores 0.8  it is a real image (correct) D(x) = 0.8 log(0.8) = -0.2 D(G(z)) = 0.2 log(1-0.2) = log(0.8) = -0.2 Then for a generated image, D(G(z)) scores 0.2  it is a generated image (correct) We add them together and this gives us a fairly high (-0.4) loss. We ascend so we want to maximize this). Note that we are adding two negative numbers so 0 is the upper bound.
  26. 26. EXAMPLE 1 (continued): D(G(z)) scores 0.2  a generated image is a generated image  bad, D(.) wasn’t fooled. Assigned loss is -0.2. Note that we want to minimize this loss function. D(G(z)) = 0.2 log(1-0.2) = log(0.8) = -0.2
  27. 27. EXAMPLE 2: minimize For a real image D(x) scores 0.2  it is a generated image (wrong) D(x) = 0.2 log(0.8) = -1.6 D(G(z)) = 0.8 log(1-0.8) = log(0.2) = -1.6 Then for a generated image, D(G(z)) scores 0.8  it is a real image (wrong) These bad predictions combined give a loss of -3.2. A lower value compared to the loss to when we had good predictions (Ex. 1). Remember the goal is to maximize!
  28. 28. EXAMPLE 2 (continued): D(G(z)) scores 0.8  a generated image is a real image  good, D(.) was fooled. Assigned loss is -1.6. Compare to the previous loss and remember that we want to minimize this loss function! D(G(z)) = 0.8 log(1-0.8) = log(0.2) = -1.6