Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018

215 views

Published on

https://telecombcn-dl.github.io/2018-dlai/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018

  1. 1. http://bit.ly/dlai2018 Santiago Pascual de la Puente santi.pascual@upc.edu PhD Candidate Universitat Politecnica de Catalunya Technical University of Catalonia Deep Generative Models I Variational Autoencoders #DLUPC
  2. 2. Outline ● What is in here? ● Introduction ● Taxonomy ● Variational Auto-Encoders (VAEs) (DGMs I) ● Generative Adversarial Networks (GANs) ● PixelCNN/Wavenet ● Normalizing Flows (and flow-based gen. models) ○ Real NVP ○ GLOW ● Models comparison ● Conclusions 2
  3. 3. What is in here?
  4. 4. Disclaimer 4 ● We are going to make a shift in the modeling paradigm: discriminative → generative. ● Is this a type of neural network? No → either fully connected networks, convolutional networks or recurrent networks fit, depending on how we condition data points (sequentially, globally, etc.). ● This is a very fun topic with which we can make a network paint, sing or write.
  5. 5. Planning for our generative trip 5 ● 5/11/2018 → DGM I: Introduction to gen models + VAEs ○ (... in b/w ...) Methodology + RNN lessons ● 19/11/2018 → DGM II: GANs ● 19/11/2018 → DGM III: likelihood models (pixelCNN + flow models) ● 26/11/2018 → Practical lesson on DGMs (code day)
  6. 6. Introduction
  7. 7. What we are used to do with Neural Nets 7 0 1 0 input Network output class Figure credit: Javier Ruiz Discriminative model → aka. tell me the probability of some ‘Y’ responses given ‘X’ inputs. Here we don’t care about the process that generated ‘X’. we just detect some patterns in the input to give an answer. This gives us some outcomes probabilities given some ‘X’ data: P(Y | X). P(Y = [0,1,0] | X = [pixel1 , pixel2 , …, pixel784 ])
  8. 8. What is a generative model? 8 We have datapoints that emerge from some generating process (landscapes in the nature, speech from people, etc.) X = {x1 , x2 , …, xN } Having our dataset X with example datapoints (images, waveforms, written text, etc.) each point xn lays in some M-dimensional space.
  9. 9. What is a generative model? 9 We have datapoints that emerge from some generating process (landscapes in the nature, speech from people, etc.) X = {x1 , x2 , …, xN } Having our dataset X with example datapoints (images, waveforms, written text, etc.) each point xn lays in some M-dimensional space. Each point xn comes from an M-dimensional probability distribution P(X) → Model it!
  10. 10. What is a generative model? 10 We have datapoints that emerge from some generating process (landscapes in the nature, speech from people, etc.) X = {x1 , x2 , …, xN } Y = {y1 , y2 , …, yN } Having our dataset X with example datapoints (images, waveforms, written text, etc.) each point xn lays in some M-dimensional space. Each point yn lays in some K-dimensional space. M can be different than K. DOG CAT TRUCK PIZZA THRILLER SCI-FI HISTORY /aa/ /e/ /o/ We can also modeled joint probabilities to apply conditioning variables on the generative process: P(X, Y)
  11. 11. 11 1) We want our model with parameters θ={weights, biases} to output samples distributed Pmodel, matching the distribution of our training data Pdata or P(X). 2) We can sample points from Pmodel plausibly looking Pdata distributed. What is a generative model?
  12. 12. 12 1) We want our model with parameters θ={weights, biases} to output samples distributed Pmodel, matching the distribution of our training data Pdata or P(X). 2) We can sample points from Pmodel plausibly looking Pdata distributed. What is a generative model? We have not mentioned any network structure or model input data format, just a requirement on what we want in the output of our model.
  13. 13. 13 We can generate any type of data, like speech waveform samples or image pixels. What is a generative model? M samples = M dimensions 32 32 . . . Scan and unroll x3 channels M = 32x32x3 = 3072 dimensions
  14. 14. Our learned model should be able to make up new samples from the distribution, not just copy and paste existing samples! 14 What is a generative model? Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
  15. 15. Why Generative Models? ● Model very complex and high-dimensional distributions. ● Be able to generate realistic synthetic samples ○ possibly perform data augmentation ○ simulate possible futures for learning algorithms ● Unsupervised learning: fill blanks in the data ● Manipulate real samples with the assistance of the generative model ○ Example: edit pictures with guidance (photoshop super pro level)
  16. 16. Motivating Applications
  17. 17. Image inpainting 17 Recover lost information/add enhancing details by learning the natural distribution of pixels. original enhanced
  18. 18. Speech Enhancement 18 Recover lost information/add enhancing details by learning the natural distribution of audio samples. original enhanced
  19. 19. Speech Synthesis 19 Generate spontaneously new speech by learning its natural distribution along time. Speech synthesizer
  20. 20. Image Generation 20 Generate spontaneously new images by learning their spatial distribution. Image generator Figure credit: I. Goodfellow
  21. 21. Super-resolution 21 Generate high resolution image version, even introducing made up plausible details. (Ledig et al. 2016)
  22. 22. Generative Models Taxonomy
  23. 23. 23 We will see models that learn the probability density function: ● Explicitly (we impose a known loss function) ○ (1) With approximate density → Variational Auto-Encoders (VAEs) ○ (2) With tractable density & likelihood-based: ■ PixelCNN, Wavenet. ■ Flow-based models: real-NVP, GLOW. ● Implicitly (we “do not know” the loss function) ○ (3) Generative Adversarial Networks (GANs). Taxonomy
  24. 24. Variational Auto-Encoders
  25. 25. Auto-Encoder Neural Network Autoencoders: ● Predict at the output the same input data. ● Do not need labels cx x^ Encode Decode 25 Bottleneck
  26. 26. Auto-Encoder Neural Network ● Q: Is an AE a generative model? (How can we make it generate data?) c Encode Decode “Generate” 26
  27. 27. Auto-Encoder Neural Network ● Q: Is an AE a generative model? (How can we make it generate data?) ○ A: This “just” memorizes codes C from our training samples! c Encode Decode “Generate” memorization 27
  28. 28. Variational Auto-Encoder VAE intuitively: ● Introduce a restriction in z, such that our data points x (e.g. images) are distributed in a latent space (manifold) following a specified probability density function Z (normally N(0, I)). z Encode Decode z ~ N(0, I) 28
  29. 29. VAE intuitively: ● Introduce a restriction in z, such that our data points x (e.g. images) are distributed in a latent space (manifold) following a specified probability density function Z (normally N(0, I)). z Encode Decode z ~ N(0, I) We can then sample z to generate NEW x data points (e.g. images). Variational Auto-Encoder 29
  30. 30. Variational Auto-Encoder ● VAE, aka. where Bayesian theory and deep learning collide, in depth: Fixed parameters Generate X, N times sample z, N times Deterministic function 30Credit:
  31. 31. Variational Auto-Encoder ● VAE, aka. where Bayesian theory and deep learning collide, in depth: Fixed parameters Generate X, N times sample z, N times Deterministic function Maximize the probability of each X under the generative process. Maximum likelihood framework
  32. 32. Variational Auto-Encoder Intuition behind normally distributed z vectors: any output distribution can be achieved from the simple N(0, I) with powerful mappings. N(0, I) 32
  33. 33. Variational Auto-Encoder Intuition behind normally distributed z vectors: any output distribution can be achieved from the simple N(0, I) with powerful mappings. Who’s the strongest non-linear and learnable mapper in the universe (so far)? 33
  34. 34. Variational Auto-Encoder ● VAE, aka. where Bayesian theory and deep learning collide, in depth: Fixed parameters Generate X, N times sample z, N times Deterministic function Maximize the probability of each X under the generative process. Maximum likelihood framework
  35. 35. Variational Auto-Encoder Now to solve the maximum likelihood problem… We’d like to know and . We introduce as a key piece → sample values z likely to produce X, not just the whole possibilities. But is unkown too! Variational Inference comes in to play its role: approximate with . Key Idea behind the variational inference application: find an approximation function that is good enough to represent the real one → optimization problem. 35
  36. 36. Variational Auto-Encoder Neural network prespective The approximated function starts to shape up as a neural encoder, going from training datapoints x to the likely z points following , which in turn is similar to the real . 36Credit: Altosaar What is a variational autoncoder? (Altosaar 2017)
  37. 37. Variational Auto-Encoder Neural network prespective The (latent→ data) mapping starts to shape up as a neural decoder, where we go from our sampled z to the reconstruction, which can have a very complex distribution. 37Credit: Altosaar
  38. 38. Variational Auto-Encoder Continuing with the encoder approximation , we compute the KL divergence with the true distribution: 38 KL divergence Credit: Kristiadis
  39. 39. Variational Auto-Encoder Continuing with the encoder approximation , we compute the KL divergence with the true distribution: Bayes rule 39 Variational autoencoder: Intutition and implementation (Kristiadi 2017)
  40. 40. Variational Auto-Encoder Continuing with the encoder approximation , we compute the KL divergence with the true distribution: Gets out of expectation for no dependency over z. 40Credit: Kristiadi
  41. 41. Variational Auto-Encoder Continuing with the encoder approximation , we compute the KL divergence with the true distribution: 41Credit: Kristiadi
  42. 42. Variational Auto-Encoder Continuing with the encoder approximation , we compute the KL divergence with the true distribution: A bit more rearranging with sign and grouping leads us to a new KL term between and , thus the encoder approximate distribution and our prior. 42 Credit: Kristiadi
  43. 43. Variational Auto-Encoder We finally reach the Variational AutoEncoder objective function. 43Credit: Kristiadi
  44. 44. Variational Auto-Encoder We finally reach the Variational AutoEncoder objective function. log likelihood of our data 44Credit: Kristiadi
  45. 45. Variational Auto-Encoder We finally reach the Variational AutoEncoder objective function. Not computable and non-negative (KL) approximation error. log likelihood of our data 45Credit: Kristiadi
  46. 46. Variational Auto-Encoder We finally reach the Variational AutoEncoder objective function. log likelihood of our data Not computable and non-negative (KL) approximation error. 46 Reconstruction loss of our data given latent space → NEURAL DECODER reconstruction loss! Credit: Kristiadi
  47. 47. Variational Auto-Encoder We finally reach the Variational AutoEncoder objective function. log likelihood of our data Not computable and non-negative (KL) approximation error. Reconstruction loss of our data given latent space → NEURAL DECODER reconstruction loss! Regularization of our latent representation → NEURAL ENCODER projects over prior. 47Credit: Kristiadi
  48. 48. Variational Auto-Encoder Now, we have to define shape to compute its divergence against the prior (i.e. to properly condense x samples over the surface of z). Simplest way: distribute over normal distribution with predicted moments: and . This allows us to compute the KL-div with in a closed form! 48Credit: Kristiadi
  49. 49. Variational Auto-Encoder z Encode Decode We can compose our encoder - decoder setup, and place our VAE losses to regularize and reconstruct. 49
  50. 50. Variational Auto-Encoder z Encode Decode Reparameterization trick But WAIT, how can we backprop through sampling of ? Not differentiable! 50
  51. 51. Variational Auto-Encoder z Reparameterization trick Sample and operate with it, multiplying by and summing 51
  52. 52. Variational Auto-Encoder z Generative behavior Q: How can we now generate new samples once the underlying generating distribution is learned? 52
  53. 53. Variational Auto-Encoder z1 Generative behavior Q: How can we now generate new samples once the underlying generating distribution is learned? A: We can sample from our prior, for example, discarding the encoder path. z2 z3 53
  54. 54. 54 Variational Auto-Encoder Walking around z manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identitites, lightning, etc.. Examples: MNIST manifold: https://youtu.be/hgyB8RegAlQ Face manifold: https://www.youtube.com/watch?v=XNZIN7Jh3Sg
  55. 55. 55 Variational Auto-Encoder Walking around z manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identitites, lightning, etc.. Example with MNIST manifold
  56. 56. 56 Variational Auto-Encoder Walking around z manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identitites, lightning, etc.. Example with Faces manifold
  57. 57. Variational Auto-Encoder Code show with PyTorch on VAEs! https://github.com/pytorch/examples/tree/master/vae 57
  58. 58. Variational Auto-Encoder 58 Loss Model MNIST:Binary reconstruction case (BCE loss)
  59. 59. GANs are coming in DGMs II ... The GAN epidemic
  60. 60. Thanks! Questions?
  61. 61. References 61 ● NIPS 2016 Tutorial: Generative Adversarial Networks (Goodfellow 2016) ● Auto-Encoding Variational Bayes (Kingma & Welling 2013) ● https://wiseodd.github.io/techblog/2016/12/10/variational-autoencoder/ ● https://jaan.io/what-is-variational-autoencoder-vae-tutorial/ ● Tutorial on Variational Autoencoders (Doersch 2016)

×