Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018


Published on

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018

  1. 1. Kevin McGuinness Assistant Professor School of Electronic Engineering Dublin City University #DLUPC Generative Models Day 4 Lecture 3
  2. 2. What is a generative model? A model P(X; ϴ) that we can draw samples from. E.g. A Gaussian Mixture Model ● Fitting: EM algorithm ● Drawing samples: ○ Draw sample from categorical distribution to select Gaussian ○ Draw sample from Gaussian GMMs are not generally complex enough to draw samples of images from. P(X = x) x x 2
  3. 3. Why are generative models important? ● Model the probability density of images ● Understanding P(X) may help us understand P(Y | X) ● Generate novel content ● Generate training data for discriminative networks ● Artistic applications ● Image completion ● Monte-carlo estimators 3
  4. 4. Generative adversarial networks Novel method of training deep generative models invented by Ian Goodfellow et al. in 2014 Idea: pit a generator and a discriminator against each other ● Generator tries to draw samples from P(X) ● Discriminator tries to tell if sample came from the generator or the real world Both discriminator and generator are deep networks (differentiable functions) Can train with backprop: train discriminator for a while, then train generator, then discriminator, … 4
  5. 5. Generative adversarial networks (conceptual) Generator Real world images Discriminator Real Loss Latentrandomvariable Sample Sample Fake 5
  6. 6. The generator Deterministic mapping from a latent random vector to sample from q(x) ~ p(x) Usually a deep neural network. E.g. DCGAN: 6
  7. 7. The discriminator Parameterised function that tries to distinguish between samples from real images p(x) and generated ones q(x). Usually a deep convolutional neural network. conv conv ... F F 7
  8. 8. Training GANs Generator Real world images Discriminator Real Loss Latentrandomvariable Sample Sample Fake Alternate between training the discriminator and generator Differentiable module Differentiable module 8
  9. 9. Generator Real world images Discriminator Real Loss Latentrandomvariable Sample Sample Fake 1. Fix generator weights, draw samples from both real world and generated images 2. Train discriminator to distinguish between real world and generated images Backprop error to update discriminator weights 9
  10. 10. Generator Real world images Discriminator Real Loss Latentrandomvariable Sample Sample Fake 1. Fix discriminator weights 2. Sample from generator 3. Backprop error through discriminator to update generator weights Backprop error to update generator weights 10
  11. 11. Training GANs Iterate these two steps until convergence (which may not happen) ● Updating the discriminator should make it better at discriminating between real images and generated ones (discriminator improves) ● Updating the generator makes it better at fooling the current discriminator (generator improves) Eventually (we hope) that the generator gets so good that it is impossible for the discriminator to tell the difference between real and generated images. Discriminator accuracy = 0.5 11
  12. 12. Discriminator training Generator training 12
  13. 13. Some examples of generated images… 13
  14. 14. ImageNet Source: 14
  15. 15. CIFAR-10 Source: 15
  16. 16. Credit: Alec Radford Code on GitHub 16
  17. 17. Credit: Alec Radford Code on GitHub 17
  18. 18. Issues Known to be very difficult to train: ● Formulated as a “game” between two networks ● Unstable dynamics: hard to keep generator and discriminator in balance ● Optimization can oscillate between solutions ● Mode collapse in the generator Difficult to evaluate results 18
  19. 19. Important variants Wasserstein GAN (WGAN) ● MLE leads to a KL divergence loss. ● Numerical stability issues when estimated distribution and true distribution do not overlap significantly (loss blows up). ● WGAN idea is to use a coarse approximation of the Wasserstein distance (the Earth mover's distance). ● Weight clipping is needed to enforce Lipschitz constraint. Overall effect is to make the GAN more stable. Discriminator can be trained more on each step without blowing up. Can work well in practice, but clipping the weights to enforce Lipschitz slows training.
  20. 20. Important variants Least squares GAN (LSGAN) ● Similar motivation to WGAN: want a loss that gives nice gradients and doesn't blow up. ● LSGAN Idea: just use squared error (L2 distance)! ● Turns out this is the same as minimizing the Pearson 2 divergence.
  21. 21. Important variants Energy-based GAN (EBGAN) ● Instead of using a binary classifier as the discriminator D use an energy-based model (an autoencoder) ● D models the image manifold since it is trained on real images ● Optimize to generate samples that have low energy ● Generator gets more signal from D
  22. 22. Important variants Boundary Equilibrium GAN (BEGAN) ● Combines ideas from WGAN and EBGAN ● BEGAN idea: matching the distributions of the reconstruction losses can be a suitable proxy for matching the data distributions. ● Use Wasserstein distance approximation to do this ● Includes mechanism for automatically maintaining equilibrium
  23. 23. Conditional GANs GANs can be conditioned on other info: e.g. a label ● z might capture random characteristics of the data, variabilities of possible futures, ● c would condition the deterministic parts (label) For details on ways to condition GANs: Ways of Conditioning Generative Adversarial Networks (Wack et al.) 24
  24. 24. Generating images/frames conditioned on captions (Reed et al. 2016b) (Zhang et al. 2016) 25
  25. 25. Predicting the future with adversarial training Want to train a model to predict the pixels in frame (t+K) from pixels in frame t. Many possible futures for same frame Using supervised loss like MSE results in blurry solutions: loss if minimized if predictor averages over possibilities when predicting. We really want a sample, not the mean Adversarial training can solve this: easy for an adversary to detect blurry frames Mathieu et al. Deep multi-scale video prediction beyond mean square error, ICLR 2016 ( 26
  26. 26. Mathieu et al. Deep multi-scale video prediction beyond mean square error, ICLR 2016 ( 27
  27. 27. Image super-resolution Bicubic: not using data statistics. SRResNet: trained with MSE. SRGAN is able to understand that there are multiple correct answers, rather than averaging. (Ledig et al. 2016) 28
  28. 28. Saliency prediction Adversarial lossDala loss Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017. 29
  29. 29. Image-to-Image translation Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." arXiv:1611.07004 (2016). Generator Discriminator Generated pairs Real World Ground truth pairs Loss 30
  30. 30. Questions?