Autoencoders
NN club, March 21 2018
Jonathan Ronen
Agenda
● PCA and linear autoencoders
● Deep and nonlinear autoencoders
● Variational autoencoders
PCA for dimensionality reduction
PCA for dimensionality reduction
● The U that maximizes the variance of PC1
● also minimizes the reconstruction error
○ Note: this is not the same as OLS,
which minimizes
There are efficient solvers for this, but we could also use backpropagation
PCA through backpropagation
●
● This is an autoencoder
● If the neurons are linear, it is similar to PCA
○ Caveat: PCs are orthogonal, autoencoded
components are not - but they will span the
same space
PCA vs linear autoencoders for MNIST
PCA vs linear autoencoders for MNIST
Autoencoders can be nonlinear
Nonlinear autoencoder with 32 hidden neurons
Autoencoders can be deep
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
ReLu
Sigmoid
Sigmoid
Sigmoid
Sigmoid
Sigmoid
Deep autoencoder (bottleneck of 2)
Guess which one is deep (has intermediate layer)?
Many variations of autoencoders
● Sparse autoencoders
● Denoising autoencoders
● Convolutional autoencoders
○ UNet is a sort of autoencoder
● And more…
● I’d like to introduce Variational Autoencoders
Variational autoencoders
Variational Bayesian Inference
Variational Inference
+
autoencoders
z
x observation
latent variable
Variational Inference (quick overview)
z
x observation
latent variable
Variational Inference (quick overview)
z
x observation
latent variable
problematic...
Variational Inference (quick overview)
z
x observation
latent variable
problematic...
Variational Inference Solution:
Chosen to be a
distribution we can work
with
Side note on
● Information
○ “How many bits do we need to represent event x if we optimized for p(x)?”
● Entropy
○ “What is the expected amount of information in each event drawn from p(x)?” (how many bits?)
● Cross-entropy
○ “What is the expected amount of information in p(x) if we optimized for q(x)?” (how many bits?)
● Kullback-Leibler divergence: “cross-entropy - entropy”
○ “How many more bits will we need to represent events from p(x) if we optimized for q(x)?
Variational Inference (quick overview)
skipping the math...
Maximizing the Evidence LOwer Bound (ELBO)
Variational inference is methods to maximize ELBO
How does it fit in with autoencoders?
What if autoencoders were probabilistic?
What if autoencoders were probabilistic?
Regular autoencoder
Variational autoencoder
Variational Autoencoder loss - negative ELBO
reconstruction error divergence from prior
Backpropagation through VAEs
sampling
Backpropagation through VAEs - reparameterizing
VAE 2d embedding
VAEs are a generative model
Regular autoencoder as a generative model?
Jupyter Notebook with all analysis in this talk
https://nbviewer.jupyter.org/gist/jonathanronen/69902c1a97149ab4aae42e099d1d1367
Further reading
● https://arxiv.org/abs/1312.6114
● https://www.youtube.com/watch?v=uaaqyVS9-rM
● https://www.jeremyjordan.me/variational-autoencoders/
● https://blog.keras.io/building-autoencoders-in-keras.html

Jonathan Ronen - Variational Autoencoders tutorial