Flow based generative models

Flow-based generative models
NICE arXiv:1410.8516

RealNVP arXiv:1605.08803

Glow arXiv:1807.03039
박수철

Generative model
Generative model은 data의 분포 p(x)를 추정하고 p(x)에서 sampling해내는 것
dataset
sampling
진짜
가짜

Generative model
Generative model은 data의 분포 p(x)를 추정하고 p(x)에서 sampling해내는 것
0.666
0.333
흰색 검은색 빨간색
0.
likelihood
Category
true distribution

Generative model
sample set1
sample set2
sample set3
sample set4
GOOD ?
NOT BAD ???????

Generative model
다차원의 데이터의 경우 각 포인트마다의 correlation, dependency가 존재!!!

NICE nonlinear independent component estimation
Halid Z. Yerebakan Bartek Rajwa Murat Dundar. The Inﬁnite Mixture of Inﬁnite Gaussian Mixtures. (NIPS 2014)
복잡한 p(x), 어떻게 estimation할 것인가?

적당한 transform을 거쳐 independent gaussian 분포에 넣는다
https://www.periscope.tv/hugo_larochelle/1ypKdAVmbEpGW

어떤 반의 학생들의 키의 평균이 175cm, 표준편차가 5cm이라하고
표준정규분포를 만족한다고 해보자.
키가 175cm일 likelihood는 0.08정도….

어떤 반의 학생들의 키의 평균이 1750mm, 표준편차가 50mm이라하고
표준정규분포를 만족한다고 해보자.
키가 1750mm일 likelihood는 0.008정도….

같은 말 같은데 likelihood는 차이가 났다. 왜?
175cm180cm
1750mm 1800mm
Continuous distribution의 likelihood
는 해당 값이 일어날 확률을 말해주지 않는다.
pdf의 영역의 크기가 해당 구간이 일어날 확
률을 말해준다.
y = 10x,
δy
δx
= 10
pX(x) = pY(y) ⋅ 10

determinant of Jacobian은 변환된 영역의 비율을 나타낸다
https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant

data space에서의
likelihood
Z space에서의
likelihood
transform에 의해 변환된 영역의 비율

pX(x) 를 직접 여러 분포들로 복잡하게 estimation 하는 문제에서
pZ(z) 정하고,간단한 pX(x) = pZ( f(x))|det(
δf(x)
δxT
)| 를 만족시키는 복잡한
를 찾는 문제로 변환한다.f(x)
pX(x) = pZ(f(x))|det(
δf(x)
δxT
)|
prior로 고정알고싶은 것
데이터 구하고자 하는 것
제약조건

Det of Jacobian term이 없으면 어떻게 되나?
p(z)
f(x)
최대화, determinant->0, 역변환 어렵pZ( f(x))
이건 마치 entropy term이 빠진 reverse KL divergence와 같음…

Det of Jacobian의 문제
구하는데 시간이 오래 걸린다.
Determinant를 구하는 것은 O(D^3)의 계산량이 필요
MNIST만 하더라도 28x28=784차원, 784^3…

해결책 : coupling layer
= +
ya yb
xa xb
m(xa)
ya = xa
yb = xb + m(xa)
δy
δx
=
Id 0
δyb
δxa
δyb
δxb
diagonal 성분들이 모두 1
Triangular matrix의 determinant는 diagonal성분들의 곱
Log determinant는 0
VOLUME PRESERVING

해결책 : coupling layer
= +
xI1
xI2
m(xI1
)
+ =
⊙
h(4)
h
exp(s)

coupling layer는 어떻게 independent component를 뽑아내나?
(X1, X2) ∼ N(0,I2) (Y1, Y2) = (X1, X2 + m(X1))
m(X1) = X1
m(X1) = sin(2X1)
m(Y1) = Y1
m(Y1) = sin(2Y1)
coupling layer는 X2에 있는 X1에 대한 depedent 성분을 nonlinear 함수로 분석해 덜어낸다.

Density Estimation Using real NVP
(real-valued non-volume preserving)
Scale term 추가

Non-volume perserving
Determinant :

Masking

Multi-scale architecture

Glow: Generative Flow with Invertible 1x1 Convolutions
고품질, 고해상도
조금 인공적인 느낌이 나는건 왤까?????

Glow를 구성하는 세부 레이어들
batch norm 대용
shuﬄe 대용
realNVP에서 사용하던 것
LU depcomposition 사용시

Invertible 1x1 convolution의 기능
eigenvalue decomposition ICA
Note that a 1×1 convolution with equal number of input and output channels is
a generalization of a permutation operation. (Diederik P.Kingma. Glow. 2018)
Machine learning. Theodoridis. Academy Press
DFT
en.wikipedia.org/wiki/DFT_matrix
scikit-learn.org/stable/auto_examples/decomposition/plot_ica_vs_pca.html

decoding은 encoding의 역순
github.com/openai/glow/blob/master/model.py

References
NICE: Non-linear Independent Components Estimation  
Laurent Dinh, David Krueger, Yoshua Bengio. 2014.
Density estimation using Real NVP
Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio. 2017.
Diederik P. Kingma, Prafulla Dhariwal. 2018.

Flow based generative models

More Related Content

What's hot

Similar to Flow based generative models

Flow based generative models