[course site]
Santiago Pascual de la Puente
santi.pascual@upc.edu
PhD Candidate
Universitat Politecnica de Catalunya
Technical University of Catalonia
Deep Generative Models I
#DLUPC
Outline
●
●
●
●
●
●
●
2
Introduction
What we are used to with Neural Nets
4
0
1
0
input
Network
output
class
Figure credit: Javier Ruiz
P(Y = [0,1,0] | X = [pixel1
, pixel2
, …, pixel784
])
What is a generative model?
5
X = {x1
, x2
, …, xN
}
What is a generative model?
6
X = {x1
, x2
, …, xN
}
X = {x1
, x2
, …, xN
} P(X) →
What is a generative model?
7
X = {x1
, x2
, …, xN
}
Y = {x1
, x2
, …, xN
}
DOG
CAT
TRUCK
PIZZA
THRILLER
SCI-FI
HISTORY
/aa/
/e/
/o/
8
We want our model with parameters θ to output samples distributed Pmodel,
matching the distribution of our training data Pdata or P(X).
Example) y = f(x), where y is scalar, make Pmodel similar to Pdata by training
the parameters θ to maximize their similarity.
What is a generative model?
Key Idea: our model cares about what distribution generated the input data
points, and we want to mimic it with our probabilistic model. Our learned
model should be able to make up new samples from the distribution, not
just copy and paste existing samples!
9
What is a generative model?
Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
Why Generative Models?
● Model very complex and high-dimensional distributions.
● Be able to generate realistic synthetic samples
○ possibly perform data augmentation
○ simulate possible futures for learning algorithms
● Fill the blanks in the data
● Manipulate real samples with the assistance of the generative model
○ Example: edit pictures with guidance
Motivating Applications
Image inpainting
12
Recover lost information/add enhancing details by learning the natural distribution of pixels.
original enhanced
Speech Enhancement
13
Recover lost information/add enhancing details by learning the natural distribution of audio samples.
original
enhanced
Speech Synthesis
14
Generate spontaneously new speech by learning its natural distribution along time.
Speech
synthesizer
Image Generation
15
Generate spontaneously new images by learning their spatial distribution.
Image
generator
Figure credit: I. Goodfellow
Super-resolution
16
Generate spontaneously new images by learning their spatial distribution.
Augment resolution introducing plausible details
(Ledig et al. 2016)
Generative Models Taxonomy
18
Model the probability density function:
● Explicitly
○ With tractable density → PixelRNN, PixelCNN and
Wavenet
○ With approximate density → Variational
Auto-Encoders
● Implicitly
○ Generative Adversarial Networks
Taxonomy
PixelRNN, PixelCNN
& Wavenet
Factorizing the joint distribution
●
○
○
20
Factorizing the joint distribution
●
○
21
Factorizing the joint distribution
●
○
■
22
PixelRNN
RNN/LSTM
(x, y)
At every pixel, in a greyscale image:
256 classes (8 bits/pixel).
23
Recall RNN formulation:
Factorizing the joint distribution
●
○
■
■
24
Factorizing the joint distribution
●
○
■
■
A CNN?
Whut??
25
My CNN is going causal...
hi there how are you doing ?
100 dim
Let’s say we have a sequence of 100 dimensional vectors describing words.
sequence length = 8
arrangein 2D
100
dim
sequence length = 8
,
Focus in 1D to
exemplify causalitaion
26
27
We can apply a 1D convolutional activation over the 2D matrix: for an arbitrari kernel of width=3
100
dim
sequence length = 8
K1
Each 1D convolutional kernel is a
2D matrix of size (3, 100)100
dim
My CNN is going causal...
28
Keep in mind we are working with 100 dimensions although here we depict just one for simplicity
1 2 3 4 5 6 7 8
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
1 2 3 4 5 6
w1 w2 w3
The length result of the convolution is
well known to be:
seqlength - kwidth + 1 = 8 - 3 + 1 = 6
So the output matrix will be (6, 100)
because there was no padding
My CNN is going causal...
29
My CNN is going causal...
When we add zero padding, we normally do so on both sides of the sequence (as in image padding)
1 2 3 4 5 6 7 8
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
1 2 3 4 5 6 7 8
w1 w2 w3
The length result of the convolution is
well known to be:
seqlength - kwidth + 1 = 10 - 3 + 1 = 8
So the output matrix will be (8, 100)
because we had padding
0 0
w1 w2 w3
w1 w2 w3
30
Add the zero padding just on the left side of the sequence, not symmetrically
1 2 3 4 5 6 7 8
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
1 2 3 4 5 6 7 8
w1 w2 w3
The length result of the convolution is
well known to be:
seqlength - kwidth + 1 = 10 - 3 + 1 = 8
So the output matrix will be (8, 100)
because we had padding
HOWEVER: now every time-step t
depends on the two previous inputs as
well as the current time-step → every
output is causal
Roughly: We make a causal convolution
by padding left the sequence with
(kwidth - 1) zeros
0
w1 w2 w3
w1 w2 w3
0
My CNN went causal
PixelCNN/Wavenet
31
PixelCNN/Wavenet
receptive field hast to be T 32
PixelCNN/Wavenet
receptive field hast to be T
Dilated convolutions help
us reach a receptive field
sufficiently large to
emulate an RNN!
33
Conditional PixelCNN/Wavenet
●
○
34
Conditional PixelCNN/Wavenet
35
●
●
Conditional PixelCNN/Wavenet
●
●
○
36
Variational Auto-Encoders
Auto-Encoder Neural Network
Autoencoders:
● Predict at the output the
same input data.
● Do not need labels
cx x^
Encode Decode
38
Auto-Encoder Neural Network
● Q: What generative thing can we do with an AE? How can we make it
generate data?
c
Encode Decode
“Generate” 39
Auto-Encoder Neural Network
● Q: What generative thing can we do with an AE? How can we make it
generate data?
○ A: This “just” memorizes codes C from our training samples!
c
Encode Decode
“Generate”
memorization
40
Variational Auto-Encoder
VAE intuitively:
● Introduce a restriction in z, such that our data points x (e.g. images)
are distributed in a latent space (manifold) following a specified
probability density function Z (normally N(0, I)).
z
Encode Decode
z ~ N(0, I)
41
VAE intuitively:
● Introduce a restriction in z, such that our data points x (e.g. images)
are distributed in a latent space (manifold) following a specified
probability density function Z (normally N(0, I)).
z
Encode Decode
z ~ N(0, I)
We can then sample z
to generate new x data
points (e.g. images).
Variational Auto-Encoder
42
Variational Auto-Encoder
● VAE, aka. where Bayesian theory and deep learning collide, in depth:
Fixed
parameters
Generate X,
N times
sample z,
N times
Deterministic
function 43Credit:
Variational Auto-Encoder
● VAE, aka. where Bayesian theory and deep learning collide, in depth:
Fixed
parameters
Generate X,
N times
sample z,
N times
Deterministic
function
Maximize the probability of
each X under the generative
process.
Maximum likelihood framework
Variational Auto-Encoder
Intuition behind normally distributed z vectors: any output distribution can be achieved from the simple
N(0, I) with powerful non-linear mappings.
N(0, I)
45
Variational Auto-Encoder
Intuition behind normally distributed z vectors: any output distribution can be achieved from the simple
N(0, I) with powerful non-linear mappings.
Who’s the strongest non-linear mapper in the universe?
46
Variational Auto-Encoder
● VAE, aka. where Bayesian theory and deep learning collide, in depth:
Fixed
parameters
Generate X,
N times
sample z,
N times
Deterministic
function
Maximize the probability of
each X under the generative
process.
Maximum likelihood framework
Variational Auto-Encoder
Now to solve the maximum likelihood problem… We’d like to know and . We
introduce as a key piece → sample values z likely to produce X, not just the whole
possibilities.
But is unkown too! Variational Inference comes in to play its role: approximate
with .
Key Idea behind the variational inference application: find an approximation
function that is good enough to represent the real one → optimization problem.
48
Variational Auto-Encoder
Neural network prespective
The approximated function starts to shape up as a neural encoder, going from training datapoints x to
the likely z points following , which in turn is similar to the real .
49Credit: Altosaar
Variational Auto-Encoder
Neural network prespective
The (latent→ data) mapping starts to shape up as a neural decoder, where we go from our sampled z
to the reconstruction, which can have a very complex distribution.
50Credit: Altosaar
Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
distribution:
51
KL divergence
Credit: Kristiadis
Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
distribution:
Bayes rule
52
Credit: Kristiadis
Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
distribution:
Gets out of expectation for no
dependency over z.
53
Credit: Kristiadis
Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
distribution:
54
Credit: Kristiadis
Variational Auto-Encoder
Continuing with the encoder approximation , we compute the KL divergence with the true
distribution:
A bit more rearranging with sign and grouping leads us to a new KL term between
and , thus the encoder approximate distribution and our prior. 55
Credit: Kristiadis
Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
56
Credit: Kristiadis
Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
log likelihood of our data
57
Credit: Kristiadis
Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
Not computable and non-negative (KL)
approximation error.
log likelihood of our data
58
Credit: Kristiadis
Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
log likelihood of our data
Not computable and non-negative (KL)
approximation error.
59
Reconstruction loss of our data
given latent space → NEURAL
DECODER reconstruction loss!
Credit: Kristiadis
Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
log likelihood of our data
Not computable and non-negative (KL)
approximation error.
Reconstruction loss of our data
given latent space → NEURAL
DECODER reconstruction loss!
Regularization of our latent
representation → NEURAL
ENCODER projects over prior.
60
Credit: Kristiadis
Variational Auto-Encoder
Now, we have to define shape to compute its divergence against the prior (i.e. to properly
condense x samples over the surface of z). Simplest way: distribute over normal distribution with
moments: and .
This allows us to compute the KL-div with in a closed form :)
61
Credit: Kristiadis
Variational Auto-Encoder
z
Encode Decode
We can compose our encoder - decoder setup, and place our VAE losses to regularize and reconstruct.
62
Variational Auto-Encoder
z
Encode Decode
Reparameterization trick
But WAIT, how can we backprop through sampling of ? Not differentiable!
63
Variational Auto-Encoder
z
Reparameterization trick
Sample and operate with it, multiplying by and summing
64
Variational Auto-Encoder
z
Generative behavior
Q: How can we now generate new samples once the underlying generating distribution is learned?
65
Variational Auto-Encoder
z1
Generative behavior
Q: How can we now generate new samples once the underlying generating distribution is learned?
A: We can sample from our prior, for example, discarding the encoder path.
z2
z3
66
67
Variational Auto-Encoder
Walking around z manifold dimensions gives us spontaneous generation of samples with different
shapes, poses, identitites, lightning, etc..
Examples:
MNIST manifold: https://youtu.be/hgyB8RegAlQ
Face manifold: https://www.youtube.com/watch?v=XNZIN7Jh3Sg
68
Variational Auto-Encoder
Walking around z manifold dimensions gives us spontaneous generation of samples with different
shapes, poses, identitites, lightning, etc..
Example with MNIST manifold
69
Variational Auto-Encoder
Walking around z manifold dimensions gives us spontaneous generation of samples with different
shapes, poses, identitites, lightning, etc..
Example with Faces manifold
Variational Auto-Encoder
Code show with PyTorch on VAEs!
https://github.com/pytorch/examples/tree/master/vae
70
Variational Auto-Encoder
Model
Loss
71
Thanks! Questions?
References
73
● NIPS 2016 Tutorial: Generative Adversarial Networks (Goodfellow 2016)
● Pixel Recurrent Neural Networks (van den Oord et al. 2016)
● Conditional Image Generation with PixelCNN Decoders (van den Oord et al. 2016)
● Auto-Encoding Variational Bayes (Kingma & Welling 2013)
● https://wiseodd.github.io/techblog/2016/12/10/variational-autoencoder/
● https://jaan.io/what-is-variational-autoencoder-vae-tutorial/
● Tutorial on Variational Autoencoders (Doersch 2016)

Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Intelligence)

  • 1.
    [course site] Santiago Pascualde la Puente santi.pascual@upc.edu PhD Candidate Universitat Politecnica de Catalunya Technical University of Catalonia Deep Generative Models I #DLUPC
  • 2.
  • 3.
  • 4.
    What we areused to with Neural Nets 4 0 1 0 input Network output class Figure credit: Javier Ruiz P(Y = [0,1,0] | X = [pixel1 , pixel2 , …, pixel784 ])
  • 5.
    What is agenerative model? 5 X = {x1 , x2 , …, xN }
  • 6.
    What is agenerative model? 6 X = {x1 , x2 , …, xN } X = {x1 , x2 , …, xN } P(X) →
  • 7.
    What is agenerative model? 7 X = {x1 , x2 , …, xN } Y = {x1 , x2 , …, xN } DOG CAT TRUCK PIZZA THRILLER SCI-FI HISTORY /aa/ /e/ /o/
  • 8.
    8 We want ourmodel with parameters θ to output samples distributed Pmodel, matching the distribution of our training data Pdata or P(X). Example) y = f(x), where y is scalar, make Pmodel similar to Pdata by training the parameters θ to maximize their similarity. What is a generative model?
  • 9.
    Key Idea: ourmodel cares about what distribution generated the input data points, and we want to mimic it with our probabilistic model. Our learned model should be able to make up new samples from the distribution, not just copy and paste existing samples! 9 What is a generative model? Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
  • 10.
    Why Generative Models? ●Model very complex and high-dimensional distributions. ● Be able to generate realistic synthetic samples ○ possibly perform data augmentation ○ simulate possible futures for learning algorithms ● Fill the blanks in the data ● Manipulate real samples with the assistance of the generative model ○ Example: edit pictures with guidance
  • 11.
  • 12.
    Image inpainting 12 Recover lostinformation/add enhancing details by learning the natural distribution of pixels. original enhanced
  • 13.
    Speech Enhancement 13 Recover lostinformation/add enhancing details by learning the natural distribution of audio samples. original enhanced
  • 14.
    Speech Synthesis 14 Generate spontaneouslynew speech by learning its natural distribution along time. Speech synthesizer
  • 15.
    Image Generation 15 Generate spontaneouslynew images by learning their spatial distribution. Image generator Figure credit: I. Goodfellow
  • 16.
    Super-resolution 16 Generate spontaneously newimages by learning their spatial distribution. Augment resolution introducing plausible details (Ledig et al. 2016)
  • 17.
  • 18.
    18 Model the probabilitydensity function: ● Explicitly ○ With tractable density → PixelRNN, PixelCNN and Wavenet ○ With approximate density → Variational Auto-Encoders ● Implicitly ○ Generative Adversarial Networks Taxonomy
  • 19.
  • 20.
    Factorizing the jointdistribution ● ○ ○ 20
  • 21.
    Factorizing the jointdistribution ● ○ 21
  • 22.
    Factorizing the jointdistribution ● ○ ■ 22
  • 23.
    PixelRNN RNN/LSTM (x, y) At everypixel, in a greyscale image: 256 classes (8 bits/pixel). 23 Recall RNN formulation:
  • 24.
    Factorizing the jointdistribution ● ○ ■ ■ 24
  • 25.
    Factorizing the jointdistribution ● ○ ■ ■ A CNN? Whut?? 25
  • 26.
    My CNN isgoing causal... hi there how are you doing ? 100 dim Let’s say we have a sequence of 100 dimensional vectors describing words. sequence length = 8 arrangein 2D 100 dim sequence length = 8 , Focus in 1D to exemplify causalitaion 26
  • 27.
    27 We can applya 1D convolutional activation over the 2D matrix: for an arbitrari kernel of width=3 100 dim sequence length = 8 K1 Each 1D convolutional kernel is a 2D matrix of size (3, 100)100 dim My CNN is going causal...
  • 28.
    28 Keep in mindwe are working with 100 dimensions although here we depict just one for simplicity 1 2 3 4 5 6 7 8 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 1 2 3 4 5 6 w1 w2 w3 The length result of the convolution is well known to be: seqlength - kwidth + 1 = 8 - 3 + 1 = 6 So the output matrix will be (6, 100) because there was no padding My CNN is going causal...
  • 29.
    29 My CNN isgoing causal... When we add zero padding, we normally do so on both sides of the sequence (as in image padding) 1 2 3 4 5 6 7 8 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 1 2 3 4 5 6 7 8 w1 w2 w3 The length result of the convolution is well known to be: seqlength - kwidth + 1 = 10 - 3 + 1 = 8 So the output matrix will be (8, 100) because we had padding 0 0 w1 w2 w3 w1 w2 w3
  • 30.
    30 Add the zeropadding just on the left side of the sequence, not symmetrically 1 2 3 4 5 6 7 8 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 1 2 3 4 5 6 7 8 w1 w2 w3 The length result of the convolution is well known to be: seqlength - kwidth + 1 = 10 - 3 + 1 = 8 So the output matrix will be (8, 100) because we had padding HOWEVER: now every time-step t depends on the two previous inputs as well as the current time-step → every output is causal Roughly: We make a causal convolution by padding left the sequence with (kwidth - 1) zeros 0 w1 w2 w3 w1 w2 w3 0 My CNN went causal
  • 31.
  • 32.
  • 33.
    PixelCNN/Wavenet receptive field hastto be T Dilated convolutions help us reach a receptive field sufficiently large to emulate an RNN! 33
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
    Auto-Encoder Neural Network Autoencoders: ●Predict at the output the same input data. ● Do not need labels cx x^ Encode Decode 38
  • 39.
    Auto-Encoder Neural Network ●Q: What generative thing can we do with an AE? How can we make it generate data? c Encode Decode “Generate” 39
  • 40.
    Auto-Encoder Neural Network ●Q: What generative thing can we do with an AE? How can we make it generate data? ○ A: This “just” memorizes codes C from our training samples! c Encode Decode “Generate” memorization 40
  • 41.
    Variational Auto-Encoder VAE intuitively: ●Introduce a restriction in z, such that our data points x (e.g. images) are distributed in a latent space (manifold) following a specified probability density function Z (normally N(0, I)). z Encode Decode z ~ N(0, I) 41
  • 42.
    VAE intuitively: ● Introducea restriction in z, such that our data points x (e.g. images) are distributed in a latent space (manifold) following a specified probability density function Z (normally N(0, I)). z Encode Decode z ~ N(0, I) We can then sample z to generate new x data points (e.g. images). Variational Auto-Encoder 42
  • 43.
    Variational Auto-Encoder ● VAE,aka. where Bayesian theory and deep learning collide, in depth: Fixed parameters Generate X, N times sample z, N times Deterministic function 43Credit:
  • 44.
    Variational Auto-Encoder ● VAE,aka. where Bayesian theory and deep learning collide, in depth: Fixed parameters Generate X, N times sample z, N times Deterministic function Maximize the probability of each X under the generative process. Maximum likelihood framework
  • 45.
    Variational Auto-Encoder Intuition behindnormally distributed z vectors: any output distribution can be achieved from the simple N(0, I) with powerful non-linear mappings. N(0, I) 45
  • 46.
    Variational Auto-Encoder Intuition behindnormally distributed z vectors: any output distribution can be achieved from the simple N(0, I) with powerful non-linear mappings. Who’s the strongest non-linear mapper in the universe? 46
  • 47.
    Variational Auto-Encoder ● VAE,aka. where Bayesian theory and deep learning collide, in depth: Fixed parameters Generate X, N times sample z, N times Deterministic function Maximize the probability of each X under the generative process. Maximum likelihood framework
  • 48.
    Variational Auto-Encoder Now tosolve the maximum likelihood problem… We’d like to know and . We introduce as a key piece → sample values z likely to produce X, not just the whole possibilities. But is unkown too! Variational Inference comes in to play its role: approximate with . Key Idea behind the variational inference application: find an approximation function that is good enough to represent the real one → optimization problem. 48
  • 49.
    Variational Auto-Encoder Neural networkprespective The approximated function starts to shape up as a neural encoder, going from training datapoints x to the likely z points following , which in turn is similar to the real . 49Credit: Altosaar
  • 50.
    Variational Auto-Encoder Neural networkprespective The (latent→ data) mapping starts to shape up as a neural decoder, where we go from our sampled z to the reconstruction, which can have a very complex distribution. 50Credit: Altosaar
  • 51.
    Variational Auto-Encoder Continuing withthe encoder approximation , we compute the KL divergence with the true distribution: 51 KL divergence Credit: Kristiadis
  • 52.
    Variational Auto-Encoder Continuing withthe encoder approximation , we compute the KL divergence with the true distribution: Bayes rule 52 Credit: Kristiadis
  • 53.
    Variational Auto-Encoder Continuing withthe encoder approximation , we compute the KL divergence with the true distribution: Gets out of expectation for no dependency over z. 53 Credit: Kristiadis
  • 54.
    Variational Auto-Encoder Continuing withthe encoder approximation , we compute the KL divergence with the true distribution: 54 Credit: Kristiadis
  • 55.
    Variational Auto-Encoder Continuing withthe encoder approximation , we compute the KL divergence with the true distribution: A bit more rearranging with sign and grouping leads us to a new KL term between and , thus the encoder approximate distribution and our prior. 55 Credit: Kristiadis
  • 56.
    Variational Auto-Encoder We finallyreach the Variational AutoEncoder objective function. 56 Credit: Kristiadis
  • 57.
    Variational Auto-Encoder We finallyreach the Variational AutoEncoder objective function. log likelihood of our data 57 Credit: Kristiadis
  • 58.
    Variational Auto-Encoder We finallyreach the Variational AutoEncoder objective function. Not computable and non-negative (KL) approximation error. log likelihood of our data 58 Credit: Kristiadis
  • 59.
    Variational Auto-Encoder We finallyreach the Variational AutoEncoder objective function. log likelihood of our data Not computable and non-negative (KL) approximation error. 59 Reconstruction loss of our data given latent space → NEURAL DECODER reconstruction loss! Credit: Kristiadis
  • 60.
    Variational Auto-Encoder We finallyreach the Variational AutoEncoder objective function. log likelihood of our data Not computable and non-negative (KL) approximation error. Reconstruction loss of our data given latent space → NEURAL DECODER reconstruction loss! Regularization of our latent representation → NEURAL ENCODER projects over prior. 60 Credit: Kristiadis
  • 61.
    Variational Auto-Encoder Now, wehave to define shape to compute its divergence against the prior (i.e. to properly condense x samples over the surface of z). Simplest way: distribute over normal distribution with moments: and . This allows us to compute the KL-div with in a closed form :) 61 Credit: Kristiadis
  • 62.
    Variational Auto-Encoder z Encode Decode Wecan compose our encoder - decoder setup, and place our VAE losses to regularize and reconstruct. 62
  • 63.
    Variational Auto-Encoder z Encode Decode Reparameterizationtrick But WAIT, how can we backprop through sampling of ? Not differentiable! 63
  • 64.
    Variational Auto-Encoder z Reparameterization trick Sampleand operate with it, multiplying by and summing 64
  • 65.
    Variational Auto-Encoder z Generative behavior Q:How can we now generate new samples once the underlying generating distribution is learned? 65
  • 66.
    Variational Auto-Encoder z1 Generative behavior Q:How can we now generate new samples once the underlying generating distribution is learned? A: We can sample from our prior, for example, discarding the encoder path. z2 z3 66
  • 67.
    67 Variational Auto-Encoder Walking aroundz manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identitites, lightning, etc.. Examples: MNIST manifold: https://youtu.be/hgyB8RegAlQ Face manifold: https://www.youtube.com/watch?v=XNZIN7Jh3Sg
  • 68.
    68 Variational Auto-Encoder Walking aroundz manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identitites, lightning, etc.. Example with MNIST manifold
  • 69.
    69 Variational Auto-Encoder Walking aroundz manifold dimensions gives us spontaneous generation of samples with different shapes, poses, identitites, lightning, etc.. Example with Faces manifold
  • 70.
    Variational Auto-Encoder Code showwith PyTorch on VAEs! https://github.com/pytorch/examples/tree/master/vae 70
  • 71.
  • 72.
  • 73.
    References 73 ● NIPS 2016Tutorial: Generative Adversarial Networks (Goodfellow 2016) ● Pixel Recurrent Neural Networks (van den Oord et al. 2016) ● Conditional Image Generation with PixelCNN Decoders (van den Oord et al. 2016) ● Auto-Encoding Variational Bayes (Kingma & Welling 2013) ● https://wiseodd.github.io/techblog/2016/12/10/variational-autoencoder/ ● https://jaan.io/what-is-variational-autoencoder-vae-tutorial/ ● Tutorial on Variational Autoencoders (Doersch 2016)