SlideShare a Scribd company logo
1 of 53
Download to read offline
Toward Disentanglement
through Understanding ELBO (Part I)
kv
Viscovery: Algorithm Team
kelispinor@gmail.com
February 17, 2019
kv (Viscovery) ELBO February 17, 2019 1 / 53
Overview
1 Background Knowledge
Information Quantities
Rate Distortion Theory and Information Bottleneck
Variational Inference
2 Build up Frameworks for Disentangle Representations
3 Isolating Sources of Disentanglement
ELBO Surgery
Evaluate Disentanglement
Experiments
4 Conclusion
kv (Viscovery) ELBO February 17, 2019 2 / 53
Before we start ...
What is disentanglement?
Disentangled Representation = Fatorized + Interpretable
Reuse and generalize knowledge
Extrapolate beyond training data distribution
Questions will be answered in series of discussion
(Part I) Why VAE is the main framework to realize disentanglement?
[Chen, 2018]
(Part II) Why there is a trade-off between reconstruction and
disentanglement? [Alemi, 2018]
(Part II) Is disentanglement a task or principal? [Achille, 2018]
kv (Viscovery) ELBO February 17, 2019 3 / 53
Background Knowledge
kv (Viscovery) ELBO February 17, 2019 4 / 53
Quick Review
Consider beta decay process, we observe N electrons
↑↑↓↓↑ ... ↑
Number of the possible states of N spins
N!
(pN)!((1 − p)N)!
∼
NN
(pN)pN((1 − p)N)(1−p)N
=
1
ppN(1 − p)(1−p)N
= 2NS
where S is called Shannon Entropy per spin
S = −p log p − (1 − p) log(1 − p)
Number of bits of information one gains in actually observe such state is
NS. [?]
kv (Viscovery) ELBO February 17, 2019 5 / 53
Quick Review
In general case:
N!
(p1N)p1N(p2N)p2N...(pkN)pk N
∼
NN
k
i=1(pi N)pi N
= 2NS
Such that
S = −
i
pi log pi
kv (Viscovery) ELBO February 17, 2019 6 / 53
Quick Review
We have a theory that predicts a probability distribution Q for the final
state, however, the correct probability distribution is P, then after
observing N decays, we will see outcome i approximately pi N times.
P =
N
i
qpi N
i
N!
j (pj N)!
We already calculated N!
j (pj N)! ∼ 2−N i pi log pi ,
so
P ∼ 2−N i pi (log pi −log qi )
The quantity, we called, relative entropy or Kullback-Liebler divergence
DKL(p||q) =
i
pi (log pi − log qi )
kv (Viscovery) ELBO February 17, 2019 7 / 53
Quick Review
Quantities:
Entropy: Sx = − x p(x) log p(x), information we don’t know.
Relative Entropy, KL-Divergence:
DKL(p(x)||q(x)) = x p(x) log p(x) − log q(x)
Mutual Information:
I(x; y) = Sx − Sx|y = Sy − Sy|x = Sx,y − Sx|y − Sy|x
I(x; y) = DKL p(x, y)||p(x)p(y)
Symmetric between x and y
Extreme Cases: Independence, Deterministic Relation
Relation:
Chain rule: p(x, y) = p(x|y)p(y)
Bayesian: p(y|x) = p(x|y)p(y)
p(x)
kv (Viscovery) ELBO February 17, 2019 8 / 53
Information Theory
kv (Viscovery) ELBO February 17, 2019 9 / 53
Information System
Sending a signal from Alice to Bob: X → ˜X
kv (Viscovery) ELBO February 17, 2019 10 / 53
Rate-Distortion Theory
What makes good encoding? Low rate, low distortion.
min
p(˜x)|p(x)
I(X; ˜X) s.t d(X, ˜X) < D
Theorem (Rate Distortion, Shannon and Kolmogorov)
Define the function as the minimum achievable rate under distortion
constraint D.
R(D) = min
p(˜x|x) s.t d(x,˜x)<D
I(X; ˜X)
Then an encoding that achieves this rate is
p(˜x|x) =
p(˜x)
Z(x, β)
e−βd(x,˜x)
kv (Viscovery) ELBO February 17, 2019 11 / 53
Rate-Distortion: RD-Curve
Figure: Trade-off between transmission rate and distortion
kv (Viscovery) ELBO February 17, 2019 12 / 53
Information System
Sending a signal from Alice to Bob: X → ˜X, where Y the relevant
information about X.
kv (Viscovery) ELBO February 17, 2019 13 / 53
Information Bottleneck Theory
What makes good encoding? Low rate, high relevance.
min
p(˜x)|p(x)
I( ˜X; X) s.t I( ˜X, Y ) > L
Theorem (Information Bottleneck, Tishby, Pereira, and Bialek)
Define the function as the minimum achievable rate while preserving L bits
of mutual information.
R(L) = min
p(˜x|x) s.t I(˜x;y)≥L
I(X; ˜X)
Then an encoding that achieves this rate is
p(˜x|x) =
p(˜x)
Z(x, β)
e−βDKL[p(y|x)||p(y|˜x)]
kv (Viscovery) ELBO February 17, 2019 14 / 53
Comparison
What makes good encoding?: Low
Rate, Low Distortion
p(˜x|x) =
p(˜x)
Z(x, β)
e−βd(x,˜x)
What makes good code?: Low
Rate, High Relevance
p(˜x|x) =
p(˜x)
Z(x, β)
e−βDKL[p(y|x)||p(y|˜x)]
kv (Viscovery) ELBO February 17, 2019 15 / 53
Structure of the Solution
On the structure of solution
L[p(˜x|x)] = I(X; ˜X) − βI( ˜X; Y )
The Lagrangian multiplier operates as the trade-off parameter between
complexity of representation and preserved relevant information.
I( ˜X; Y ) is the measure of performance
I(X; ˜X) as the regularization term
kv (Viscovery) ELBO February 17, 2019 16 / 53
Inference
kv (Viscovery) ELBO February 17, 2019 17 / 53
Inference under Posterior
X: observation, input data
Z: latent variable, representation, embedding
p(z|x)
posterior
=
likelihood of z
p(x|z)
prior
p(z)
p(x)
evidence
Due to p(x) is intractable, there are two parallel ways to solve
MCMC
Variational Inference
kv (Viscovery) ELBO February 17, 2019 18 / 53
Variational Inference
Propose a simpler, tractable distribution q(z) to approximate posterior
p(z|x)
DKL(q(z|x)||p(z|x)) = Eq(z|x) log(
q(z|x)p(x)
p(x, z)
)
= Eq(z|x) log
q(z|x)
p(x, z)
+ log p(x)
Swap the left and right hand-sides, we get
log p(x) = DKL(q(z|x)||p(z|x)) + Eq(z|x) log
p(x, z)
q(z|x)
kv (Viscovery) ELBO February 17, 2019 19 / 53
Reduce KL Divergence to ELBO
Due to the positivity of divergence
log p(x) = DKL(q(z|x)||p(z|x)) + Eq(z|x)[log p(x|z)] − DKL[q(z|x)||p(z)]
Evidence Lower Bound, ELBO (per sample)
log p(x) ≤ Eq(z|x)
p(xn, z)
q(z|xn)
ELBO
LELBO = Eq(z|x) log p(xn|z)
reconstruction
− DKL q(z|xn)||p(z)
regularization
kv (Viscovery) ELBO February 17, 2019 20 / 53
Implement ELBO using VAE
Figure: ELBO Structure in VAE
kv (Viscovery) ELBO February 17, 2019 21 / 53
Implement ELBO using VAE
Figure: Parameterization Trick for Back-propagation
kv (Viscovery) ELBO February 17, 2019 22 / 53
Build up Framework for Disentanglement
kv (Viscovery) ELBO February 17, 2019 23 / 53
Build up β-VAE framework
Suppose the data generation process are affected by two type of factors
p(x|z) ≈ p(x|v, w)
where v is conditionally independent; w is conditionally dependent factor.
Maximization data likelihood of observed data over the whole latent
distribution. Also, the aim of disentanglement is to ensure the inferred
latent capture the generative factors v in an independent manner.
max
θ
Epθ(z)[pθ(x|z)] s.t DKL(q(z|x)||p(z)) <
kv (Viscovery) ELBO February 17, 2019 24 / 53
β-VAE
The objective function of β-VAE[Higgins, 2017] goes to
L = Eq(z|x)[log p(x|z)] − β DKL[q(z|x)||p(z)]
Understanding effect of β
Reconstruction quality is the poor indicator of learnt disentanglement
Good disentanglement often lead to blurry reconstruction
Disentangled representation lacks capability of latent channel
kv (Viscovery) ELBO February 17, 2019 25 / 53
Disentanglement of β-VAE
kv (Viscovery) ELBO February 17, 2019 26 / 53
ELBO Surgery
kv (Viscovery) ELBO February 17, 2019 27 / 53
ELBO Surgery
Conjecture two criteria may be important
MI between data variable and latent variable
Independence of latent variable
kv (Viscovery) ELBO February 17, 2019 28 / 53
ELBO Surgery
To further understand ELBO, we use the average encoding distribution as
the expression. [Hoffman, 2017] Identify each training example with
unique index {1, 2, 3, ...N}
Define q(z|n) = q(z|xn), q(z, n) = p(n)q(z|n) = 1
N q(z|n) where
p(n) = 1
N . The marginal distribution q(z) = Ep(n)[q(z|n)] and
q(z) = n q(z|n)p(n).
kv (Viscovery) ELBO February 17, 2019 29 / 53
TC Decomposition: Sources of Disentanglement in ELBO
Ep(n) DKL[q(z|n)||p(z)] (regularization term)
= Ep(n) Eq(z|n) log q(z|n) − log p(z)
+ log q(z) − log q(z) + log
j
q(zj ) − log
j
q(zj )
kv (Viscovery) ELBO February 17, 2019 30 / 53
TC Decomposition: Sources of Disentanglement in ELBO
Ep(n) DKL[q(z|n)||p(z)] (regularization term)
= Ep(n) Eq(z|n) log q(z|n) − log p(z)
+ log q(z) − log q(z) + log
j
q(zj ) − log
j
q(zj )
= DKL[q(z, n)||q(z)p(n))]
(1) Index-Code MI,Iq(z;n)
+ DKL[q(z)||
j
q(zj )]
(2) Total Correlation
+
j
DKL[q(zj )||p(zj )]
(3) Dim-wise KL
Index-Code MI: Mutual information between data and latent variable
Total Correlation: Generalization of MI between latent variables
Dim-wise KL: Marginal KL for prior distribution
Note that β-VAE penalizes three terms evenly.
kv (Viscovery) ELBO February 17, 2019 31 / 53
ELBO TC-Decomposition
Modified ELBO =
Reconstruction
Eq(z,n)[log p(n|z)] +
− α Iq(z; n)
Index-Code MI
−β DKL[q(z)||Πj q(zj )]
Total Correlation
−γ
j
DKL[q(zj )||p(zj )]
Dim-wise KL
kv (Viscovery) ELBO February 17, 2019 32 / 53
ELBO TC-Decomposition
Modified ELBO =
Reconstruction
Eq(z,n)[log p(n|z)] +
− α Iq(z; n)
Index-Code MI
−β DKL[q(z)||Πj q(zj )]
Total Correlation
−γ
j
DKL[q(zj )||p(zj )]
Dim-wise KL
Index-Code MI: DKL[q(z, n)||q(z)p(n))] = Iq(z; n)
Drop the penalty to improve disentanglement
Keep this term to improve disentanglement according to IB
Dataset dependent
Total Correlation: DKL[q(z)||Πj q(zj )]
Heavier penalty on this term induces disentanglement
TC forces model to find statistically independent factors
Dim-wise KL: j DKL[q(zj )||p(zj )]
Prevent latent space deviating from corresponding prior
kv (Viscovery) ELBO February 17, 2019 33 / 53
Minibatch Sampling: Stochastic Estimation of log q(z)
The evaluation of density q(z) requires sampling the whole dataset.
Random chosen n will lead to q(z|n) close to zero. Inspired by importance
sampling, for given batch of samples {n1, n2, ...nM}, we can use the
estimator re-utilize the batch
Eq(z|x)[log q(z)] = Eq(z|x) log En ∼p(n )[q(z|n )]
≈
1
M
M
i=1
log
1
MN
M
j=1
q(z(ni )|nj )
where z(ni ) is a sample from q(z|ni ). Treat q(z) as mixture of
distribution, where the data index n indicates the mixture components.
kv (Viscovery) ELBO February 17, 2019 34 / 53
Special case: β-TCVAE
With MBS, it is available to assign different weights (α, β, γ) to terms
Lβ−TC = Eq(z|n)p(n)[log p(n|z)]
− αIq(z; n) − β DKL(q(z)||
j
q(zj )) − γ
j
DKL(q(zj )||p(zj ))
Proposed β-TCVAE uses α = γ = 1, and set β as hyper-parameter.
kv (Viscovery) ELBO February 17, 2019 35 / 53
Pseudo Code: VAE
Using Tensorflow Probability
latent_prior = make_mixture_prior() # p(z)
approx_posterior = encoder(features) # q(z|x)
# z ~ q(z|x)
approx_posterior_sample = approx_posterior.sample()
# p(x|z)
decoder_likelihood = decoder(approx_posterior_sample)
# log(p(x|z))
rate = decoder_likelihood.log_prob(features)
log_qz_x = approx_posterior.log_prob(approx_posterior_sample
log_pz = latent_prior.log_prob(approx_posterior_sample)
kl_div = log_qz_x - log_pz # D_kl(q(z|x) || p(z))
elbo = tf.reduce_sum(rate - kl_div)
kv (Viscovery) ELBO February 17, 2019 36 / 53
Pseudo Code: β-TCVAE
Lβ−TC = Eq(z|n)p(n)[log p(n|z)]
− αIq(z; n) − β DKL(q(z)||
j
q(zj )) − γ
j
DKL(q(zj )||p(zj ))
log_qz = tf.logsumexp(tf.reduce_sum(log_qz_x, 1), 0)
-tf.log(M * N)
log_qz_factorized =
tf.reduce_sum(tf.logsumexp(log_qz_x)-tf.log(M * N), 1)
Iq = log_qz_x - log_qz
TC = log_qz - log_qz_factorized
Dim_kl = log_qz_factorized - latent_prior
modified_elbo = rate - Iq - TC - Dim_kl
kv (Viscovery) ELBO February 17, 2019 37 / 53
Evaluate Disentanglement: Mutual Information Gap
Suppose we have some groundtruth factor {vk}K
k=1 Define joint
distribution q(zj , vk) = N
n=1 p(vk)p(n|vk)q(zj |n)
In(zj ; vk) = Eq(zj ,vk ) log q(zj |n)p(n|vk) + S(zj )
kv (Viscovery) ELBO February 17, 2019 38 / 53
Feature Factor and Latent Variable
Figure: Correlation between factors and latent space
kv (Viscovery) ELBO February 17, 2019 39 / 53
Evaluate Disentanglement: Mutual Information Gap
Mutual Information Gap
MIG =
1
K
K
k=1
1
S(vk)
I(zj(k) ; vk) − maxj=j(k) I(zj ; vk)
where j(k)
= arg max
j
I(zj ; vk)
where 0 ≤ I(zj ; vk) = S(vk) − S(vk|zj ) ≤ S(vk) naturally serve as the
normalization condition. Benefits of this metric
Axis-alignment
Compactness of representation
kv (Viscovery) ELBO February 17, 2019 40 / 53
Experiments and Conclusion
kv (Viscovery) ELBO February 17, 2019 41 / 53
Dataset
Figure: Specs of Datasets
kv (Viscovery) ELBO February 17, 2019 42 / 53
Experiments: Performance of β-TCVAE
Figure: Left: fully connected; Right: convolution
kv (Viscovery) ELBO February 17, 2019 43 / 53
Experiments: Performance of β-TCVAE
kv (Viscovery) ELBO February 17, 2019 44 / 53
Experiments: Trade-off
ELBO-Disentanglement Trade-off
Figure: DSprites
kv (Viscovery) ELBO February 17, 2019 45 / 53
Experiments: Trade-off
ELBO-Disentanglement Trade-off
Figure: 3D Faces
kv (Viscovery) ELBO February 17, 2019 46 / 53
Experiments: TC versus MIG
How Independence relates to Disentanglement?
Figure: DSprites
kv (Viscovery) ELBO February 17, 2019 47 / 53
Experiments: TC versus MIG
How Independence relates to Disentanglement?
Figure: 3D Faces
kv (Viscovery) ELBO February 17, 2019 48 / 53
Extra Experiments
Removing Index-Code MI
Batch-size Effect
make no significant difference.
kv (Viscovery) ELBO February 17, 2019 49 / 53
Results
kv (Viscovery) ELBO February 17, 2019 50 / 53
Results
kv (Viscovery) ELBO February 17, 2019 51 / 53
Conclusion
In this paper
Regularization term in ELBO contains various factors which naturally
encourage disentangling
Total correlation (independence of latent variable) is the major factor
force machine to learn statistically independent reprensentation
New information-theoretic metric quantity
kv (Viscovery) ELBO February 17, 2019 52 / 53
References
Chen, T.Q., and etc,
Isolating Sources of Disentanglement in Variational Autoencoders. NIPS. 2018
Tishby, Naftali, and etc.
The information bottleneck method. physics/0004057 (2000).
Hoffman, Matthew D., and Matthew J. Johnson.
Elbo surgery: yet another way to carve up the variational evidence lower bound.
NIPS. 2016.
Higgins and etc.
beta-vae: Learning basic visual concepts with a constrained variational framework.
ICLR 2017
Burgess and ect.
Understanding disentangling in beta-VAE. arXiv preprint arXiv:1804.03599.
Achille, Alessandro, and etc.
Emergence of invariance and disentanglement in deep representations.
Alemi, Alexander, et al.
Fixing a broken ELBO. International Conference on Machine Learning. 2018.
kv (Viscovery) ELBO February 17, 2019 53 / 53

More Related Content

What's hot

深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習Masahiro Suzuki
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)준식 최
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)ARISE analytics
 
[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by FactorisingDeep Learning JP
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについてMasahiro Suzuki
 
20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEs20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEsX 37
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIWithTheBest
 
機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明Satoshi Hara
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김
 
【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】
【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】
【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】Kento Watanabe
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP
 
[論文紹介] 機械学習システムの安全性における未解決な問題
[論文紹介] 機械学習システムの安全性における未解決な問題[論文紹介] 機械学習システムの安全性における未解決な問題
[論文紹介] 機械学習システムの安全性における未解決な問題tmtm otm
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Plot Hong
 

What's hot (20)

深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Journal Club: VQ-VAE2
Journal Club: VQ-VAE2Journal Club: VQ-VAE2
Journal Club: VQ-VAE2
 
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
 
[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising[DL輪読会]Disentangling by Factorising
[DL輪読会]Disentangling by Factorising
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
 
20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEs20191215 rate distortion theory and VAEs
20191215 rate distortion theory and VAEs
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】
【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】
【第30回人工知能学会全国大会 発表資料】ストーリー展開と一貫性を同時に考慮した歌詞生成モデル【JSAI30th】
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
StarGAN
StarGANStarGAN
StarGAN
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 
[論文紹介] 機械学習システムの安全性における未解決な問題
[論文紹介] 機械学習システムの安全性における未解決な問題[論文紹介] 機械学習システムの安全性における未解決な問題
[論文紹介] 機械学習システムの安全性における未解決な問題
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習
 

Similar to Toward Disentanglement through Understand ELBO

Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdfHyungjoo Cho
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Olga Zinkevych
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learningYujiro Katagiri
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtureszukun
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
On Convolution of Graph Signals and Deep Learning on Graph Domains
On Convolution of Graph Signals and Deep Learning on Graph DomainsOn Convolution of Graph Signals and Deep Learning on Graph Domains
On Convolution of Graph Signals and Deep Learning on Graph DomainsJean-Charles Vialatte
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Frank Nielsen
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Frank Nielsen
 

Similar to Toward Disentanglement through Understand ELBO (20)

Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Muchtadi
MuchtadiMuchtadi
Muchtadi
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learning
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
On Convolution of Graph Signals and Deep Learning on Graph Domains
On Convolution of Graph Signals and Deep Learning on Graph DomainsOn Convolution of Graph Signals and Deep Learning on Graph Domains
On Convolution of Graph Signals and Deep Learning on Graph Domains
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...Slides: The dual Voronoi diagrams with respect to representational Bregman di...
Slides: The dual Voronoi diagrams with respect to representational Bregman di...
 
Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)Computational Information Geometry: A quick review (ICMS)
Computational Information Geometry: A quick review (ICMS)
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Presentation OCIP 2015
Presentation OCIP 2015Presentation OCIP 2015
Presentation OCIP 2015
 
Poisson factorization
Poisson factorizationPoisson factorization
Poisson factorization
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
 

More from Kai-Wen Zhao

Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human labelKai-Wen Zhao
 
Deep Double Descent
Deep Double DescentDeep Double Descent
Deep Double DescentKai-Wen Zhao
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionKai-Wen Zhao
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Kai-Wen Zhao
 
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...Kai-Wen Zhao
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 

More from Kai-Wen Zhao (8)

Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
Deep Double Descent
Deep Double DescentDeep Double Descent
Deep Double Descent
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...
 
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 

Recently uploaded

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Recently uploaded (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

Toward Disentanglement through Understand ELBO

  • 1. Toward Disentanglement through Understanding ELBO (Part I) kv Viscovery: Algorithm Team kelispinor@gmail.com February 17, 2019 kv (Viscovery) ELBO February 17, 2019 1 / 53
  • 2. Overview 1 Background Knowledge Information Quantities Rate Distortion Theory and Information Bottleneck Variational Inference 2 Build up Frameworks for Disentangle Representations 3 Isolating Sources of Disentanglement ELBO Surgery Evaluate Disentanglement Experiments 4 Conclusion kv (Viscovery) ELBO February 17, 2019 2 / 53
  • 3. Before we start ... What is disentanglement? Disentangled Representation = Fatorized + Interpretable Reuse and generalize knowledge Extrapolate beyond training data distribution Questions will be answered in series of discussion (Part I) Why VAE is the main framework to realize disentanglement? [Chen, 2018] (Part II) Why there is a trade-off between reconstruction and disentanglement? [Alemi, 2018] (Part II) Is disentanglement a task or principal? [Achille, 2018] kv (Viscovery) ELBO February 17, 2019 3 / 53
  • 4. Background Knowledge kv (Viscovery) ELBO February 17, 2019 4 / 53
  • 5. Quick Review Consider beta decay process, we observe N electrons ↑↑↓↓↑ ... ↑ Number of the possible states of N spins N! (pN)!((1 − p)N)! ∼ NN (pN)pN((1 − p)N)(1−p)N = 1 ppN(1 − p)(1−p)N = 2NS where S is called Shannon Entropy per spin S = −p log p − (1 − p) log(1 − p) Number of bits of information one gains in actually observe such state is NS. [?] kv (Viscovery) ELBO February 17, 2019 5 / 53
  • 6. Quick Review In general case: N! (p1N)p1N(p2N)p2N...(pkN)pk N ∼ NN k i=1(pi N)pi N = 2NS Such that S = − i pi log pi kv (Viscovery) ELBO February 17, 2019 6 / 53
  • 7. Quick Review We have a theory that predicts a probability distribution Q for the final state, however, the correct probability distribution is P, then after observing N decays, we will see outcome i approximately pi N times. P = N i qpi N i N! j (pj N)! We already calculated N! j (pj N)! ∼ 2−N i pi log pi , so P ∼ 2−N i pi (log pi −log qi ) The quantity, we called, relative entropy or Kullback-Liebler divergence DKL(p||q) = i pi (log pi − log qi ) kv (Viscovery) ELBO February 17, 2019 7 / 53
  • 8. Quick Review Quantities: Entropy: Sx = − x p(x) log p(x), information we don’t know. Relative Entropy, KL-Divergence: DKL(p(x)||q(x)) = x p(x) log p(x) − log q(x) Mutual Information: I(x; y) = Sx − Sx|y = Sy − Sy|x = Sx,y − Sx|y − Sy|x I(x; y) = DKL p(x, y)||p(x)p(y) Symmetric between x and y Extreme Cases: Independence, Deterministic Relation Relation: Chain rule: p(x, y) = p(x|y)p(y) Bayesian: p(y|x) = p(x|y)p(y) p(x) kv (Viscovery) ELBO February 17, 2019 8 / 53
  • 9. Information Theory kv (Viscovery) ELBO February 17, 2019 9 / 53
  • 10. Information System Sending a signal from Alice to Bob: X → ˜X kv (Viscovery) ELBO February 17, 2019 10 / 53
  • 11. Rate-Distortion Theory What makes good encoding? Low rate, low distortion. min p(˜x)|p(x) I(X; ˜X) s.t d(X, ˜X) < D Theorem (Rate Distortion, Shannon and Kolmogorov) Define the function as the minimum achievable rate under distortion constraint D. R(D) = min p(˜x|x) s.t d(x,˜x)<D I(X; ˜X) Then an encoding that achieves this rate is p(˜x|x) = p(˜x) Z(x, β) e−βd(x,˜x) kv (Viscovery) ELBO February 17, 2019 11 / 53
  • 12. Rate-Distortion: RD-Curve Figure: Trade-off between transmission rate and distortion kv (Viscovery) ELBO February 17, 2019 12 / 53
  • 13. Information System Sending a signal from Alice to Bob: X → ˜X, where Y the relevant information about X. kv (Viscovery) ELBO February 17, 2019 13 / 53
  • 14. Information Bottleneck Theory What makes good encoding? Low rate, high relevance. min p(˜x)|p(x) I( ˜X; X) s.t I( ˜X, Y ) > L Theorem (Information Bottleneck, Tishby, Pereira, and Bialek) Define the function as the minimum achievable rate while preserving L bits of mutual information. R(L) = min p(˜x|x) s.t I(˜x;y)≥L I(X; ˜X) Then an encoding that achieves this rate is p(˜x|x) = p(˜x) Z(x, β) e−βDKL[p(y|x)||p(y|˜x)] kv (Viscovery) ELBO February 17, 2019 14 / 53
  • 15. Comparison What makes good encoding?: Low Rate, Low Distortion p(˜x|x) = p(˜x) Z(x, β) e−βd(x,˜x) What makes good code?: Low Rate, High Relevance p(˜x|x) = p(˜x) Z(x, β) e−βDKL[p(y|x)||p(y|˜x)] kv (Viscovery) ELBO February 17, 2019 15 / 53
  • 16. Structure of the Solution On the structure of solution L[p(˜x|x)] = I(X; ˜X) − βI( ˜X; Y ) The Lagrangian multiplier operates as the trade-off parameter between complexity of representation and preserved relevant information. I( ˜X; Y ) is the measure of performance I(X; ˜X) as the regularization term kv (Viscovery) ELBO February 17, 2019 16 / 53
  • 17. Inference kv (Viscovery) ELBO February 17, 2019 17 / 53
  • 18. Inference under Posterior X: observation, input data Z: latent variable, representation, embedding p(z|x) posterior = likelihood of z p(x|z) prior p(z) p(x) evidence Due to p(x) is intractable, there are two parallel ways to solve MCMC Variational Inference kv (Viscovery) ELBO February 17, 2019 18 / 53
  • 19. Variational Inference Propose a simpler, tractable distribution q(z) to approximate posterior p(z|x) DKL(q(z|x)||p(z|x)) = Eq(z|x) log( q(z|x)p(x) p(x, z) ) = Eq(z|x) log q(z|x) p(x, z) + log p(x) Swap the left and right hand-sides, we get log p(x) = DKL(q(z|x)||p(z|x)) + Eq(z|x) log p(x, z) q(z|x) kv (Viscovery) ELBO February 17, 2019 19 / 53
  • 20. Reduce KL Divergence to ELBO Due to the positivity of divergence log p(x) = DKL(q(z|x)||p(z|x)) + Eq(z|x)[log p(x|z)] − DKL[q(z|x)||p(z)] Evidence Lower Bound, ELBO (per sample) log p(x) ≤ Eq(z|x) p(xn, z) q(z|xn) ELBO LELBO = Eq(z|x) log p(xn|z) reconstruction − DKL q(z|xn)||p(z) regularization kv (Viscovery) ELBO February 17, 2019 20 / 53
  • 21. Implement ELBO using VAE Figure: ELBO Structure in VAE kv (Viscovery) ELBO February 17, 2019 21 / 53
  • 22. Implement ELBO using VAE Figure: Parameterization Trick for Back-propagation kv (Viscovery) ELBO February 17, 2019 22 / 53
  • 23. Build up Framework for Disentanglement kv (Viscovery) ELBO February 17, 2019 23 / 53
  • 24. Build up β-VAE framework Suppose the data generation process are affected by two type of factors p(x|z) ≈ p(x|v, w) where v is conditionally independent; w is conditionally dependent factor. Maximization data likelihood of observed data over the whole latent distribution. Also, the aim of disentanglement is to ensure the inferred latent capture the generative factors v in an independent manner. max θ Epθ(z)[pθ(x|z)] s.t DKL(q(z|x)||p(z)) < kv (Viscovery) ELBO February 17, 2019 24 / 53
  • 25. β-VAE The objective function of β-VAE[Higgins, 2017] goes to L = Eq(z|x)[log p(x|z)] − β DKL[q(z|x)||p(z)] Understanding effect of β Reconstruction quality is the poor indicator of learnt disentanglement Good disentanglement often lead to blurry reconstruction Disentangled representation lacks capability of latent channel kv (Viscovery) ELBO February 17, 2019 25 / 53
  • 26. Disentanglement of β-VAE kv (Viscovery) ELBO February 17, 2019 26 / 53
  • 27. ELBO Surgery kv (Viscovery) ELBO February 17, 2019 27 / 53
  • 28. ELBO Surgery Conjecture two criteria may be important MI between data variable and latent variable Independence of latent variable kv (Viscovery) ELBO February 17, 2019 28 / 53
  • 29. ELBO Surgery To further understand ELBO, we use the average encoding distribution as the expression. [Hoffman, 2017] Identify each training example with unique index {1, 2, 3, ...N} Define q(z|n) = q(z|xn), q(z, n) = p(n)q(z|n) = 1 N q(z|n) where p(n) = 1 N . The marginal distribution q(z) = Ep(n)[q(z|n)] and q(z) = n q(z|n)p(n). kv (Viscovery) ELBO February 17, 2019 29 / 53
  • 30. TC Decomposition: Sources of Disentanglement in ELBO Ep(n) DKL[q(z|n)||p(z)] (regularization term) = Ep(n) Eq(z|n) log q(z|n) − log p(z) + log q(z) − log q(z) + log j q(zj ) − log j q(zj ) kv (Viscovery) ELBO February 17, 2019 30 / 53
  • 31. TC Decomposition: Sources of Disentanglement in ELBO Ep(n) DKL[q(z|n)||p(z)] (regularization term) = Ep(n) Eq(z|n) log q(z|n) − log p(z) + log q(z) − log q(z) + log j q(zj ) − log j q(zj ) = DKL[q(z, n)||q(z)p(n))] (1) Index-Code MI,Iq(z;n) + DKL[q(z)|| j q(zj )] (2) Total Correlation + j DKL[q(zj )||p(zj )] (3) Dim-wise KL Index-Code MI: Mutual information between data and latent variable Total Correlation: Generalization of MI between latent variables Dim-wise KL: Marginal KL for prior distribution Note that β-VAE penalizes three terms evenly. kv (Viscovery) ELBO February 17, 2019 31 / 53
  • 32. ELBO TC-Decomposition Modified ELBO = Reconstruction Eq(z,n)[log p(n|z)] + − α Iq(z; n) Index-Code MI −β DKL[q(z)||Πj q(zj )] Total Correlation −γ j DKL[q(zj )||p(zj )] Dim-wise KL kv (Viscovery) ELBO February 17, 2019 32 / 53
  • 33. ELBO TC-Decomposition Modified ELBO = Reconstruction Eq(z,n)[log p(n|z)] + − α Iq(z; n) Index-Code MI −β DKL[q(z)||Πj q(zj )] Total Correlation −γ j DKL[q(zj )||p(zj )] Dim-wise KL Index-Code MI: DKL[q(z, n)||q(z)p(n))] = Iq(z; n) Drop the penalty to improve disentanglement Keep this term to improve disentanglement according to IB Dataset dependent Total Correlation: DKL[q(z)||Πj q(zj )] Heavier penalty on this term induces disentanglement TC forces model to find statistically independent factors Dim-wise KL: j DKL[q(zj )||p(zj )] Prevent latent space deviating from corresponding prior kv (Viscovery) ELBO February 17, 2019 33 / 53
  • 34. Minibatch Sampling: Stochastic Estimation of log q(z) The evaluation of density q(z) requires sampling the whole dataset. Random chosen n will lead to q(z|n) close to zero. Inspired by importance sampling, for given batch of samples {n1, n2, ...nM}, we can use the estimator re-utilize the batch Eq(z|x)[log q(z)] = Eq(z|x) log En ∼p(n )[q(z|n )] ≈ 1 M M i=1 log 1 MN M j=1 q(z(ni )|nj ) where z(ni ) is a sample from q(z|ni ). Treat q(z) as mixture of distribution, where the data index n indicates the mixture components. kv (Viscovery) ELBO February 17, 2019 34 / 53
  • 35. Special case: β-TCVAE With MBS, it is available to assign different weights (α, β, γ) to terms Lβ−TC = Eq(z|n)p(n)[log p(n|z)] − αIq(z; n) − β DKL(q(z)|| j q(zj )) − γ j DKL(q(zj )||p(zj )) Proposed β-TCVAE uses α = γ = 1, and set β as hyper-parameter. kv (Viscovery) ELBO February 17, 2019 35 / 53
  • 36. Pseudo Code: VAE Using Tensorflow Probability latent_prior = make_mixture_prior() # p(z) approx_posterior = encoder(features) # q(z|x) # z ~ q(z|x) approx_posterior_sample = approx_posterior.sample() # p(x|z) decoder_likelihood = decoder(approx_posterior_sample) # log(p(x|z)) rate = decoder_likelihood.log_prob(features) log_qz_x = approx_posterior.log_prob(approx_posterior_sample log_pz = latent_prior.log_prob(approx_posterior_sample) kl_div = log_qz_x - log_pz # D_kl(q(z|x) || p(z)) elbo = tf.reduce_sum(rate - kl_div) kv (Viscovery) ELBO February 17, 2019 36 / 53
  • 37. Pseudo Code: β-TCVAE Lβ−TC = Eq(z|n)p(n)[log p(n|z)] − αIq(z; n) − β DKL(q(z)|| j q(zj )) − γ j DKL(q(zj )||p(zj )) log_qz = tf.logsumexp(tf.reduce_sum(log_qz_x, 1), 0) -tf.log(M * N) log_qz_factorized = tf.reduce_sum(tf.logsumexp(log_qz_x)-tf.log(M * N), 1) Iq = log_qz_x - log_qz TC = log_qz - log_qz_factorized Dim_kl = log_qz_factorized - latent_prior modified_elbo = rate - Iq - TC - Dim_kl kv (Viscovery) ELBO February 17, 2019 37 / 53
  • 38. Evaluate Disentanglement: Mutual Information Gap Suppose we have some groundtruth factor {vk}K k=1 Define joint distribution q(zj , vk) = N n=1 p(vk)p(n|vk)q(zj |n) In(zj ; vk) = Eq(zj ,vk ) log q(zj |n)p(n|vk) + S(zj ) kv (Viscovery) ELBO February 17, 2019 38 / 53
  • 39. Feature Factor and Latent Variable Figure: Correlation between factors and latent space kv (Viscovery) ELBO February 17, 2019 39 / 53
  • 40. Evaluate Disentanglement: Mutual Information Gap Mutual Information Gap MIG = 1 K K k=1 1 S(vk) I(zj(k) ; vk) − maxj=j(k) I(zj ; vk) where j(k) = arg max j I(zj ; vk) where 0 ≤ I(zj ; vk) = S(vk) − S(vk|zj ) ≤ S(vk) naturally serve as the normalization condition. Benefits of this metric Axis-alignment Compactness of representation kv (Viscovery) ELBO February 17, 2019 40 / 53
  • 41. Experiments and Conclusion kv (Viscovery) ELBO February 17, 2019 41 / 53
  • 42. Dataset Figure: Specs of Datasets kv (Viscovery) ELBO February 17, 2019 42 / 53
  • 43. Experiments: Performance of β-TCVAE Figure: Left: fully connected; Right: convolution kv (Viscovery) ELBO February 17, 2019 43 / 53
  • 44. Experiments: Performance of β-TCVAE kv (Viscovery) ELBO February 17, 2019 44 / 53
  • 45. Experiments: Trade-off ELBO-Disentanglement Trade-off Figure: DSprites kv (Viscovery) ELBO February 17, 2019 45 / 53
  • 46. Experiments: Trade-off ELBO-Disentanglement Trade-off Figure: 3D Faces kv (Viscovery) ELBO February 17, 2019 46 / 53
  • 47. Experiments: TC versus MIG How Independence relates to Disentanglement? Figure: DSprites kv (Viscovery) ELBO February 17, 2019 47 / 53
  • 48. Experiments: TC versus MIG How Independence relates to Disentanglement? Figure: 3D Faces kv (Viscovery) ELBO February 17, 2019 48 / 53
  • 49. Extra Experiments Removing Index-Code MI Batch-size Effect make no significant difference. kv (Viscovery) ELBO February 17, 2019 49 / 53
  • 50. Results kv (Viscovery) ELBO February 17, 2019 50 / 53
  • 51. Results kv (Viscovery) ELBO February 17, 2019 51 / 53
  • 52. Conclusion In this paper Regularization term in ELBO contains various factors which naturally encourage disentangling Total correlation (independence of latent variable) is the major factor force machine to learn statistically independent reprensentation New information-theoretic metric quantity kv (Viscovery) ELBO February 17, 2019 52 / 53
  • 53. References Chen, T.Q., and etc, Isolating Sources of Disentanglement in Variational Autoencoders. NIPS. 2018 Tishby, Naftali, and etc. The information bottleneck method. physics/0004057 (2000). Hoffman, Matthew D., and Matthew J. Johnson. Elbo surgery: yet another way to carve up the variational evidence lower bound. NIPS. 2016. Higgins and etc. beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR 2017 Burgess and ect. Understanding disentangling in beta-VAE. arXiv preprint arXiv:1804.03599. Achille, Alessandro, and etc. Emergence of invariance and disentanglement in deep representations. Alemi, Alexander, et al. Fixing a broken ELBO. International Conference on Machine Learning. 2018. kv (Viscovery) ELBO February 17, 2019 53 / 53