Jonathan Ronen - Variational Autoencoders tutorial

Autoencoders
NN club, March 21 2018
Jonathan Ronen

Agenda
● PCA and linear autoencoders
● Deep and nonlinear autoencoders
● Variational autoencoders

PCA for dimensionality reduction

PCA for dimensionality reduction
● The U that maximizes the variance of PC1
● also minimizes the reconstruction error
○ Note: this is not the same as OLS,
which minimizes
There are efficient solvers for this, but we could also use backpropagation

PCA through backpropagation
●
● This is an autoencoder
● If the neurons are linear, it is similar to PCA
○ Caveat: PCs are orthogonal, autoencoded
components are not - but they will span the
same space

PCA vs linear autoencoders for MNIST

Nonlinear autoencoder with 32 hidden neurons

Autoencoders can be deep
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
Sigmoid
Sigmoid
Sigmoid
Sigmoid
Sigmoid

Deep autoencoder (bottleneck of 2)
Guess which one is deep (has intermediate layer)?

Many variations of autoencoders
● Sparse autoencoders
● Denoising autoencoders
● Convolutional autoencoders
○ UNet is a sort of autoencoder
● And more…
● I’d like to introduce Variational Autoencoders

Variational autoencoders
Variational Bayesian Inference
Variational Inference
+
autoencoders
z
x observation
latent variable

Variational Inference (quick overview)
z
x observation
latent variable

z
x observation
latent variable
problematic...

z
x observation
latent variable
problematic...
Variational Inference Solution:
Chosen to be a
distribution we can work
with

Side note on
● Information
○ “How many bits do we need to represent event x if we optimized for p(x)?”
● Entropy
○ “What is the expected amount of information in each event drawn from p(x)?” (how many bits?)
● Cross-entropy
○ “What is the expected amount of information in p(x) if we optimized for q(x)?” (how many bits?)
● Kullback-Leibler divergence: “cross-entropy - entropy”
○ “How many more bits will we need to represent events from p(x) if we optimized for q(x)?

skipping the math...
Maximizing the Evidence LOwer Bound (ELBO)

Variational inference is methods to maximize ELBO
How does it fit in with autoencoders?

What if autoencoders were probabilistic?

What if autoencoders were probabilistic?
Regular autoencoder
Variational autoencoder

Variational Autoencoder loss - negative ELBO
reconstruction error divergence from prior

Backpropagation through VAEs
sampling

Backpropagation through VAEs - reparameterizing

Regular autoencoder as a generative model?

Jupyter Notebook with all analysis in this talk
https://nbviewer.jupyter.org/gist/jonathanronen/69902c1a97149ab4aae42e099d1d1367

Further reading
● https://arxiv.org/abs/1312.6114
● https://www.youtube.com/watch?v=uaaqyVS9-rM
● https://www.jeremyjordan.me/variational-autoencoders/
● https://blog.keras.io/building-autoencoders-in-keras.html

Jonathan Ronen - Variational Autoencoders tutorial

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Jonathan Ronen - Variational Autoencoders tutorial

Similar to Jonathan Ronen - Variational Autoencoders tutorial (20)

Recently uploaded

Recently uploaded (20)

Jonathan Ronen - Variational Autoencoders tutorial