This document summarizes the key ideas of auto-encoding variational Bayes. It discusses representation learning using latent variables to model high-dimensional sparse data on low-dimensional manifolds. It then explains generative modeling and the challenge of directly estimating complex data generating distributions. Finally, it introduces variational autoencoders as a way to approximate intractable posterior distributions over latent variables using variational inference and maximize a tractable evidence lower bound objective using the reparameterization trick, allowing end-to-end training of the encoder and decoder networks.
4. 4
Representation Learning
X = (Latitude, Longitude, altitude)
3d
X : Location of the
car
Representation & Manifold
hypothesis
X = (Distance from the datum)
1d
satellit
e
navigati
on
6. 6
Representation Learning
: Hidden variable that are not measured directly,
but have a significant impact on the variation of data points.
Latent
Variable
: Learning non-linear subspace with dense data points
built by Hidden factors of variation.
Manifold
Learning
Lower dimension, Dense space
7. 7
Representation Learning
Goal : Learn a function to map input ‘x’ -> output ‘y’
The behavior of intermediate
layers
- All features are projected down to two dim. (For
visualization)
- The classes become increasingly linearly separable
https://deeplearning.cs.cmu.edu/F20/document/slides/lec17.representations.pdf
Layers sequentially “straighten” the data manifold
Manifold hypo. in Supervised
Learning
8. 8
Representation Learning
( can use latent
variables )
Manifold hypo. in Unsupervised
Learning
http://cs231n.stanford.edu/slides/2021/lecture_12.pdf
(Linear manifold)
Goal : Learn some underlying hidden structure of the
data
10. 10
Generative Modeling
Density
estimation
Trainin
g
• 𝑃𝜃 𝑦 𝑥)
• 𝑦 = 𝑓𝜃(𝑥)
Discriminative
model
Generative
model
Use in 𝑓 ∶ 𝑥 → 𝑦 • 𝑔 ∶ 𝑠𝑒𝑒𝑑 → 𝑥
• 𝑔 ∶ 𝑠𝑒𝑒𝑑, 𝑦 → 𝑥
• 𝑃𝜃(𝑥)
• 𝑃𝜃(𝑥, 𝑦) or 𝑃𝜃 𝑥 𝑦)
Data generation
Conditional prob.
Estimation
Predictio
n
Learn direct maps
11. 11
Generative Modeling
Process of the occurrence of natural
images
according to probability distribution
𝑃𝑑𝑎𝑡𝑎(𝐱)
Generative
model
We want to learn 𝑃𝑚𝑜𝑑𝑒𝑙 𝐱 ; 𝜃 similar to
𝑃𝑑𝑎𝑡𝑎(𝐱)
Machine Learning, Ilseok Oh. Lecture slide
Data Generating
Distribution
* image from Fei-Dei Li, Justin Johnson, Serena Yeung, cs231n Stanford
13. 13
Generative Modeling
We want to learn 𝑃𝑚𝑜𝑑𝑒𝑙 𝐱 ; 𝜃 similar to
𝑃𝑑𝑎𝑡𝑎(𝐱)
Estimate directly via 𝑎𝑟𝑔𝑚𝑎𝑥𝜃 𝑃𝑚𝑜𝑑𝑒𝑙 𝐱 ; 𝜃 ?
* very challenging!! • Intractable
• Require strong
constraints
Latent variable (generative)
model
* slide from Aaron Courville, IFT6266 Hiver 2017
: learn a mapping from some latent
variable z
to a complicated distribution on x
15. 15
Variational Auto-encoder
• The data we observe in the real world are very high-dimensional and
sparse.
• A low-dimensional, high-density nonlinear manifold exists
in the space where observational data are defined.
• There is a latent variable describing the manifold,
which is very closely related to the variation of observed data x.
Story so far
• Want to get a model that generate data similar to observational data x.
• To do that, we need to estimate the distribution of the data P(x).
• However, direct estimation of P(x) is challenging.
• Instead, let's model a conditional distribution P(x|z) using the latent
variable z.
16. 16
Variational Auto-encoder
• where does ‘z’ come from?
• How can ‘z’ be defined and
obtained?
Proble
m
* image from the cs236, Stanford 2019f - Deep Generative Models, lectue5
Since z is literally a latent variable,
it is very difficult to define it manually and impossible to measure
directly.
17. 17
Variational Auto-encoder
• 𝑥𝑖 ~ 𝑃 𝑥 𝑧)
• 𝑧𝑖 ~ 𝑃(𝑧)
• 𝑧𝑖 ~ 𝑃 𝑧 𝑥)
Distributional assumptions
Can use this sample directly
but the performance is not
good.
Still, there is a problem…
Overview for data generating
process
Assume a familiar
distribution.
𝑃 𝑧 𝑥) = 𝑃 𝑥 𝑧)𝑃 𝑧 /𝑃(𝑥)
Learn the distribution of the latent
variable z, which is well explained
from x.
And sampling z from that dist.
Assume a familiar
distribution.
Intractable
18. 18
Variational Auto-encoder
Variational Inference
𝑝𝜃 𝑧 𝑥) ≈ 𝑞𝜙(𝑧 | 𝑥)
General family of methods for approximating
Complicated densities by a simpler class of
densities
* slide from shakir Mohamed(Google DeepMind), Imperial College, London, 2015
intractab
le
tractabl
e,
familiar
24. 24
Variational Auto-encoder
End2End learning
pros
cons
• Interpretable latent space
• Allows inference of q(z|x), can be
useful feature representation for
other tasks
• Approx’ optimal
• Samples are blurrier
* slide from Aaron Courville, IFT6266 Hiver 2017
27. 27
Main Reference
Paper
• Auto-Encoding Variational Bayes, Diederik P Kingma, Max Welling, 2013.
[link]
• Tutorial on Variational Autoencoders, Carl Doersch, 2016. [link]
• NIPS 2016 Tutorial: Generative Adversarial Networks, Ian Goodfellow,
2016. [link]
Slide
• cs231n lecture slide, stanfold, 2021s. [link]
• cs236n lecture slide, stanfold, 2019f. [link]
• IFT6266-H2017, University of Montreal [link]
Book
• Deep Learning, Ian Goodfellow et al, 2016. [e-book]
• Machine Learning, Ilseok Oh, 2018.
etc.
• Tutorial - what is a variational autoencoder? [link]
• Everything about the autoencoder [video]
자동치의 위치는 삼차원 공간에 무작위로 분포하지 않고, 도로라는 일차원 비선형 공간에 분포한다.
대부분의 자동차가 도로 위에 있는데 간혹 갓길로 벗어난 자동차도 있을 것이다.
공중에 둥둥떠다니는 자동차는 없음. 아주 낮은확률로 태풍에 날라다닐때 포착되겠지
매니폴드 가정.
: 샘플은 원본 데이터가 표현되는 d차원 공간에 무작위적으로 분포하는 것이 아니라, 그보다 훨씬 낮은 차원의 공간에 분포
- 다음으로 이미지 데이터를 살펴보자.
구체적으로 살펴보자
차원수가 저렇다는거고 실제로 가질수잇는 값이 0 ~ 255까지니까
나타낼 수 있는 256^(저 숫자) 인거
잠재변수 : 인간이 인위적으로 정의한 변수 / 특성이 아닌 데이터 내부에 잠재된 변수
매니폴드학습이 대표적인 representation learning이라고 볼 수 있겠음.
사실 움짤임
차원축소 / clustering / 밀도추정 등등
(밀도 추정의 경우 manifold랑 직접 연관이 없다고 생각하실 수도 있지만
어떤 유용한 Latent variable을 이용해서 추정하는 경우가 많고,
그 유용한 latent variable은 매니폴드의 기저가 되기때문에 )
일반적인 분별모델에서의 x -> y의 관계가
여기서 잠재변수 생성모델의 z -> x
분포 x|z는 z가 given이므로 예측대상이 실세계의 데이터여서 다루기 쉬운 분포로 가정할 수 있음.
Z의 마지널 분포인 P(z)를 아무렇게나 단순하게 가정할 수 있다. 그리고 p(z)로부터 z를 샘플링하여 decoder에 전달할 수 있다. 그러나 좀더 x와 밀접한 관련이 있는 z를 샘플링 하여 생성모델에 넘기고 싶다.
분포 z|x는 x가 given이고 분포의 예측대상이 실세계에서 관측불가능한 잠재변수이므로 이를 p(x|z)*p(z) / p(x) 로 구해야되는데 분포 p(x)를 계산할 수 없음