SlideShare a Scribd company logo
1 of 29
Download to read offline
Paper Summary :
Disentangling by Factorising
Jun-sik Choi
Department of Brain and Cognitive Engineering,
Korea University
November 26, 2019
Overview of paper [2]
To enhancing the disentangled representation, Factor-VAE is
proposed.
Factor-VAE enhances disentanglement by encouraging the
distribution of representations to be factorial (independent
accross the dimensions).
Factor-VAE provides a better trade-off between
disentanglement and reconstruction quality than β-VAE [1].
Also, a new disentnaglement metirc is proposed.
Unsupervised Disentangled Representation
Disentangled Representation
a representation where a change in one dimension corresponds
to a change in one factor of variation, while being relatively
invariant to changes in other factors. [3]
Why disentangled representation matters?[4]
Data can be represented in more interpretable and semantic
manner.
Learned disentangled representations are more transferrable.
Why disentangled representation in unsupervised manner
1. Humans are able to learn factors of variation unsupervised.
2. Labels are costly as obtaining them requires a human in the
loop.
3. Labels assigned by humans might be inconsistent or leave out
the factors that are difficult for humans to identify.
Factor-VAE
Goal
Obtain a better trade-off between disentnaglement and
reconstruction, which is one drawback of β-VAE [1].
How?
Factor-VAE augments the VAE objective with a penalty that
encourages the marginal distribution of representations to be
factorial without substantially affecting the quality of
reconstructions.
The penalty is expressed as a KL divergence between the
marginal distribution and the product of its marginals,
optimized by a discriminator network following the divergence
minimisation view of GANs.
Trade-off between Disentanglement and Reconstruction in
beta-VAE I
Notations and assumptions
- Observations: x(i)
∈ X, i = 1, . . . , N
- Underlying generative factors: f = (f1, . . . , fK )
- Latent code that models f : z ∈ Rd
- p(z) = N(0, I), decoder: pθ(x|z), encoder: qθ(z|x)
Disentanglement of Representation
- Variational posterior for an observation:
qθ(z|x) =
d
j=1
N zj |µj (x), σ2
j (x)
can be seen as the distribution of representation corresponding
to the data point x.
Trade-off between Disentanglement and Reconstruction in
beta-VAE II
- Marginal posterior and disentanglement
q(z) = Epdata (x)[q(z|x)] =
1
N
N
i=1
q z|x(i)
A disentangled represent would have each zj correspond to
precisely one underlying factor fk , so we want q(z) be
independently factorized:
q(z) =
d
j=1
q (zj )
Trade-off between Disentanglement and Reconstruction in
beta-VAE III
Further Decomposition of β-VAE objective
- The β-VAE objective:
1
N
N
i=1
Eq(z|x(i)
) log p x(i)
|z − βKL q z|x(i)
p(z)
is a lower bound of Epdata (x) log p x(i)
Where,
Eq(z|x(i)
) log p x(i)
|z : negative reconstruction error
KL q z|x(i)
p(z) : complexity penalty.
Trade-off between Disentanglement and Reconstruction in
beta-VAE IV
- The KL term can be further decomposed as:
Epdata(x)[KL(q(z|x) p(z))] = I(x; z) + KL(q(z) p(z))
proof
Epdata(x)[KL(q(z|x) p(z))]
= Epdata(x)Eq(z|x) log q(z|x)
p(z)
= Epdata(x)Eq(z|x) log q(z|x)
q(z)
q(z)
p(z)
= Epdata(x)Eq(z|x) log q(z|x)
q(z) + log q(z)
p(z)
= Epdata(x)[KL(q(z|x) q(z))] + Eq(x,z) log q(z)
p(z)
= Iq(x; z) + Eq(z) log q(z)
p(z)
= Iq(x; z) + KL(q(z) p(z))
Trade-off between Disentanglement and Reconstruction in
beta-VAE V
Epdata(x)[KL(q(z|x) p(z))] = I(x; z) + KL(q(z) p(z))
- When increasing penalty for complexity by setting β > 1,
KL(q(z) p(z)) and I(x; z) are both penalized.
- Penalizing KL(q(z) p(z)) makes q(z) to be factorized as prior
p(z).
- Penalizing I(x; z) reduces amount of information about x
stored in z, which lead to poor reconstruction.
Total Correlation Penalty I
Factor-VAE objective
1
N
N
i=1
Eq(z|x(i)
) log p x(i)
|z −KL q z|x(i)
p(z)
− γKL(q(z) ¯q(z))
where, ¯q(z) := d
j=1 q (zj ) is a lower bound on the marginal
log likelihood Epdata(x)[log p(x)] and directly encourages
independence in the code distribution.
Total correlation [5] KL(q(z) ¯q(z))
A popular measure of dependence for multiple random
variables.
As both q(z)and ¯q(z) are intractable, an alternative approach
for optimizing total correlation is required.
Total Correlation
Total Correlation Penalty II
Alternative way to optimize total correlation
1. Sample q z|x(i)
with uniformly sampled x(i)
.
2. Generate d samples from q(z) and ignoring all but one
dimension for each sample.
Or,
1. Sample a batch from q(z)
2. Randomly permuting across the batch for eatch latent
dimension.
As long as the batch is large enough, the distribution of these
samples will closely approximate ¯q(z).
Total Correlation Penalty III
Minimization of KL divergence
By training a classifier (Discriminator), approximate the density
ratio that arises in the KL term (Density-ratio trick [6]).
TC(z) = KL(q(z) ¯q(z)) = Eq(z) log
q(z)
¯q(z)
≈ Eq(z) log
D(z)
1 − D(z)
The discriminator and VAE trained jointly.
The discriminator is trained to classify between samples from
q(z) and ¯q(z).
Total Correlation Penalty IV
Total Correlation Penalty V
Metric for Disentanglement I
Disentanglement metric proposed in [1]
Weaknesses
1. The metric is sensitive to hyperparameters of the linear
classifier optimization.
2. Learned representations can be a linear combination of several
dimensions, so using linear classifier is inppropiate.
3. The metric has a failure mode. When only K − 1 factors out
of K factors are disentangled, the classifier still gives 100%
accuracy.
Metric for Disentanglement II
Proposed metric for disentanglement
1. Choose a factor k and generate data with this factor fixed, but
all other factors varying randomly.
2. Obtain their representations.
3. Normalize each dimension by its empirical standard deviation s
over the full data (or a large enough random subset).
4. Take the empirical variance Var z
(l)
d /sd in each dimension of
normalized representations.
5. The target index k and index of dimension with the lowest
variance are fed to the majority-vote classifier.
If the representation is perfectly disentangled, the variance of
dimension corresponding to the fixed factor will be 0.
Metric for Disentanglement III
As representations are normalized, the argmin Varl z
(l)
d /sd is
invariant to rescaling of the representations in each dimension.
Majority-vote classification1
1. For each L samples, one vote (ai , bi ),
ai ∈ {1, . . . , D} , bi ∈ {1, . . . , K} is achieved.
2. Given M votes (ai , bi )M
i=1, Voting matrix
Vjk =
M
i=1 I (ai = j, bi = k) is achieved.
3. Then, the majority vote classifier is defined to be
C(j) = arg maxk Vjk .
4. In other words, C(j) is the index of generative factor k which
produces largest number of lowest variance for latent
dimension j.
5. The metric is the accuracy of the classifier
ΣD
j=1VjC(j)
Σj Σk Vjk
.
Note that for majority-vote classifier, there are no optimisation
hyperparameters to tune, and the resulting classifier is a
deterministic function of the training data.
Metric for Disentanglement IV
Comparison between metrics ([1, 2])
1. New disentanglement metric of [2] is much less sensitive to
hyperparameters than old metric of [1].
2. Old metric is very sensitive to number of iterations, and metric
is constantly improves with more iterations.
1
Please refer the code [Link] for more details.
Experiments I
Datasets
Dataset with known generative factors
1. 2D Shapes dataset[7] with n : 737,280, dim : 64 × 64
fk : shape(3), scale(6), orientation(40), x-position(32),
y-position(32)
2. 3D Shales dataset[8] with n : 480,000, dim : 64 × 64 × 3
fk : shape(4), scale(8), orientation(15), floor color(10), wall
color(10), object color(10)
Dataset with unknown generative factors
1. 3D Faces dataset[9] with n : 239,840, dim : 64 × 64 × 3
2. 3D Chairs dataset[10] with n : 86,366, dim : 64 × 64 × 3
3. CelebA dataset (Cropped)[11] with n : 202,599,
dim : 64 × 64 × 3
Experiments II
Effect of γ compared to β in β-VAE
Experiments III
Relationship between γ and reconstruction error
Experiments IV
Total correlation
Experiments V
Latent Traversal - 2D Shapes Dataset
Experiments VI
Latent Traversal - 3D Shapes Dataset
Experiments VII
Latent Traversal - 3D Chair Dataset
Experiments VIII
Latent Traversal - 3D Faces and CelebA
Conclusion
This work introduces FactorVAE, a novel method for
disentangled representation.
A new disentanglement metric is prorposed.
Limitations
Low total correlation is necessary but not sufficient for
disentangling of independent factors of variation. (When all
but one of the latent dimension were to collapse to prior,
TC=0 but not disentangled.)
The proposed metric requires to generate samples holding one
factor fixed, which is not always possible. (When training set
does not cover all possible factors)
The metric is also unsuitable for data with non-independent
factors of variation.
References
I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot,
M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae:
Learning basic visual concepts with a constrained variational
framework.,” ICLR, vol. 2, no. 5, p. 6, 2017.
H. Kim and A. Mnih, “Disentangling by factorising,” arXiv
preprint arXiv:1802.05983, 2018.
Y. Bengio, A. Courville, and P. Vincent, “Representation
learning: A review and new perspectives,” IEEE transactions on
pattern analysis and machine intelligence, vol. 35, no. 8,
pp. 1798–1828, 2013.
B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J.
Gershman, “Building machines that learn and think like
people,” Behavioral and Brain Sciences, vol. 40, no. 2017,
2017.
S. Watanabe, “Information theoretical analysis of multivariate
correlation,” IBM Journal of research and development, vol. 4,
Total Correlation
Definition
For a given n random variables {X1, X2, . . . , Xn},
Total correlation is defined as the KL divergence from the joint
distribution p(X1, . . . , Xn) to the independent distribution of
p(X1)p(X2) · · · p(Xn).
TC (X1, X2, . . . , Xn) ≡ DKL [p (X1, . . . , Xn) p (X1) p (X2) · · · p (Xn)]
TC (X1, X2, . . . , Xn) =
n
i=1
H (Xi ) − H (X1, X2, . . . , Xn)
= The amount of information shared
among the variables in the set.
A near-zero TC indicates that the variables in the group are
essentially statistically independent.
Back

More Related Content

What's hot

Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNEDavid Khosid
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN ImageryDeep Learning JP
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial NetworksDong Heon Cho
 
動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセット動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセットToru Tamaki
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOKai-Wen Zhao
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video  Processing (NeRF...[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video  Processing (NeRF...
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...Deep Learning JP
 
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−Deep Learning JP
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2harmonylab
 
[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audioDeep Learning JP
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayesKyuri Kim
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
 
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...Deep Learning JP
 
【DL輪読会】Novel View Synthesis with Diffusion Models
【DL輪読会】Novel View Synthesis with Diffusion Models【DL輪読会】Novel View Synthesis with Diffusion Models
【DL輪読会】Novel View Synthesis with Diffusion ModelsDeep Learning JP
 
【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...
【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...
【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...Deep Learning JP
 
異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例NU_I_TODALAB
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP
 

What's hot (20)

Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
【DL輪読会】StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセット動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセット
 
Multimodal Deep Learning
Multimodal Deep LearningMultimodal Deep Learning
Multimodal Deep Learning
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBO
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video  Processing (NeRF...[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video  Processing (NeRF...
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...
 
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
 
[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayes
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
 
【DL輪読会】Novel View Synthesis with Diffusion Models
【DL輪読会】Novel View Synthesis with Diffusion Models【DL輪読会】Novel View Synthesis with Diffusion Models
【DL輪読会】Novel View Synthesis with Diffusion Models
 
【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...
【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...
【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...
 
A Comparison of Block-Matching Motion Estimation Algorithms
A Comparison of Block-Matching Motion Estimation AlgorithmsA Comparison of Block-Matching Motion Estimation Algorithms
A Comparison of Block-Matching Motion Estimation Algorithms
 
異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 

Similar to Paper Summary of Disentangling by Factorising (Factor-VAE)

A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal designAlexander Decker
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 
Exponential lindley additive failure rate model
Exponential lindley additive failure rate modelExponential lindley additive failure rate model
Exponential lindley additive failure rate modeleSAT Journals
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
Citython presentation
Citython presentationCitython presentation
Citython presentationAnkit Tewari
 
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM MethodsA Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM Methodsijdmtaiir
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clusteringKrish_ver2
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNNLin JiaMing
 
4_22865_IS465_2019_1__2_1_02Data-2.ppt
4_22865_IS465_2019_1__2_1_02Data-2.ppt4_22865_IS465_2019_1__2_1_02Data-2.ppt
4_22865_IS465_2019_1__2_1_02Data-2.pptPaoloOchengco
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringIAEME Publication
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringprjpublications
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Riccardo Satta
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...eSAT Journals
 

Similar to Paper Summary of Disentangling by Factorising (Factor-VAE) (20)

A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Prac ex'cises 3[1].5
Prac ex'cises 3[1].5Prac ex'cises 3[1].5
Prac ex'cises 3[1].5
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Exponential lindley additive failure rate model
Exponential lindley additive failure rate modelExponential lindley additive failure rate model
Exponential lindley additive failure rate model
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Citython presentation
Citython presentationCitython presentation
Citython presentation
 
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM MethodsA Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
4_22865_IS465_2019_1__2_1_02Data-2.ppt
4_22865_IS465_2019_1__2_1_02Data-2.ppt4_22865_IS465_2019_1__2_1_02Data-2.ppt
4_22865_IS465_2019_1__2_1_02Data-2.ppt
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
 

Recently uploaded

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 

Recently uploaded (20)

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 

Paper Summary of Disentangling by Factorising (Factor-VAE)

  • 1. Paper Summary : Disentangling by Factorising Jun-sik Choi Department of Brain and Cognitive Engineering, Korea University November 26, 2019
  • 2. Overview of paper [2] To enhancing the disentangled representation, Factor-VAE is proposed. Factor-VAE enhances disentanglement by encouraging the distribution of representations to be factorial (independent accross the dimensions). Factor-VAE provides a better trade-off between disentanglement and reconstruction quality than β-VAE [1]. Also, a new disentnaglement metirc is proposed.
  • 3. Unsupervised Disentangled Representation Disentangled Representation a representation where a change in one dimension corresponds to a change in one factor of variation, while being relatively invariant to changes in other factors. [3] Why disentangled representation matters?[4] Data can be represented in more interpretable and semantic manner. Learned disentangled representations are more transferrable. Why disentangled representation in unsupervised manner 1. Humans are able to learn factors of variation unsupervised. 2. Labels are costly as obtaining them requires a human in the loop. 3. Labels assigned by humans might be inconsistent or leave out the factors that are difficult for humans to identify.
  • 4. Factor-VAE Goal Obtain a better trade-off between disentnaglement and reconstruction, which is one drawback of β-VAE [1]. How? Factor-VAE augments the VAE objective with a penalty that encourages the marginal distribution of representations to be factorial without substantially affecting the quality of reconstructions. The penalty is expressed as a KL divergence between the marginal distribution and the product of its marginals, optimized by a discriminator network following the divergence minimisation view of GANs.
  • 5. Trade-off between Disentanglement and Reconstruction in beta-VAE I Notations and assumptions - Observations: x(i) ∈ X, i = 1, . . . , N - Underlying generative factors: f = (f1, . . . , fK ) - Latent code that models f : z ∈ Rd - p(z) = N(0, I), decoder: pθ(x|z), encoder: qθ(z|x) Disentanglement of Representation - Variational posterior for an observation: qθ(z|x) = d j=1 N zj |µj (x), σ2 j (x) can be seen as the distribution of representation corresponding to the data point x.
  • 6. Trade-off between Disentanglement and Reconstruction in beta-VAE II - Marginal posterior and disentanglement q(z) = Epdata (x)[q(z|x)] = 1 N N i=1 q z|x(i) A disentangled represent would have each zj correspond to precisely one underlying factor fk , so we want q(z) be independently factorized: q(z) = d j=1 q (zj )
  • 7. Trade-off between Disentanglement and Reconstruction in beta-VAE III Further Decomposition of β-VAE objective - The β-VAE objective: 1 N N i=1 Eq(z|x(i) ) log p x(i) |z − βKL q z|x(i) p(z) is a lower bound of Epdata (x) log p x(i) Where, Eq(z|x(i) ) log p x(i) |z : negative reconstruction error KL q z|x(i) p(z) : complexity penalty.
  • 8. Trade-off between Disentanglement and Reconstruction in beta-VAE IV - The KL term can be further decomposed as: Epdata(x)[KL(q(z|x) p(z))] = I(x; z) + KL(q(z) p(z)) proof Epdata(x)[KL(q(z|x) p(z))] = Epdata(x)Eq(z|x) log q(z|x) p(z) = Epdata(x)Eq(z|x) log q(z|x) q(z) q(z) p(z) = Epdata(x)Eq(z|x) log q(z|x) q(z) + log q(z) p(z) = Epdata(x)[KL(q(z|x) q(z))] + Eq(x,z) log q(z) p(z) = Iq(x; z) + Eq(z) log q(z) p(z) = Iq(x; z) + KL(q(z) p(z))
  • 9. Trade-off between Disentanglement and Reconstruction in beta-VAE V Epdata(x)[KL(q(z|x) p(z))] = I(x; z) + KL(q(z) p(z)) - When increasing penalty for complexity by setting β > 1, KL(q(z) p(z)) and I(x; z) are both penalized. - Penalizing KL(q(z) p(z)) makes q(z) to be factorized as prior p(z). - Penalizing I(x; z) reduces amount of information about x stored in z, which lead to poor reconstruction.
  • 10. Total Correlation Penalty I Factor-VAE objective 1 N N i=1 Eq(z|x(i) ) log p x(i) |z −KL q z|x(i) p(z) − γKL(q(z) ¯q(z)) where, ¯q(z) := d j=1 q (zj ) is a lower bound on the marginal log likelihood Epdata(x)[log p(x)] and directly encourages independence in the code distribution. Total correlation [5] KL(q(z) ¯q(z)) A popular measure of dependence for multiple random variables. As both q(z)and ¯q(z) are intractable, an alternative approach for optimizing total correlation is required. Total Correlation
  • 11. Total Correlation Penalty II Alternative way to optimize total correlation 1. Sample q z|x(i) with uniformly sampled x(i) . 2. Generate d samples from q(z) and ignoring all but one dimension for each sample. Or, 1. Sample a batch from q(z) 2. Randomly permuting across the batch for eatch latent dimension. As long as the batch is large enough, the distribution of these samples will closely approximate ¯q(z).
  • 12. Total Correlation Penalty III Minimization of KL divergence By training a classifier (Discriminator), approximate the density ratio that arises in the KL term (Density-ratio trick [6]). TC(z) = KL(q(z) ¯q(z)) = Eq(z) log q(z) ¯q(z) ≈ Eq(z) log D(z) 1 − D(z) The discriminator and VAE trained jointly. The discriminator is trained to classify between samples from q(z) and ¯q(z).
  • 15. Metric for Disentanglement I Disentanglement metric proposed in [1] Weaknesses 1. The metric is sensitive to hyperparameters of the linear classifier optimization. 2. Learned representations can be a linear combination of several dimensions, so using linear classifier is inppropiate. 3. The metric has a failure mode. When only K − 1 factors out of K factors are disentangled, the classifier still gives 100% accuracy.
  • 16. Metric for Disentanglement II Proposed metric for disentanglement 1. Choose a factor k and generate data with this factor fixed, but all other factors varying randomly. 2. Obtain their representations. 3. Normalize each dimension by its empirical standard deviation s over the full data (or a large enough random subset). 4. Take the empirical variance Var z (l) d /sd in each dimension of normalized representations. 5. The target index k and index of dimension with the lowest variance are fed to the majority-vote classifier. If the representation is perfectly disentangled, the variance of dimension corresponding to the fixed factor will be 0.
  • 17. Metric for Disentanglement III As representations are normalized, the argmin Varl z (l) d /sd is invariant to rescaling of the representations in each dimension. Majority-vote classification1 1. For each L samples, one vote (ai , bi ), ai ∈ {1, . . . , D} , bi ∈ {1, . . . , K} is achieved. 2. Given M votes (ai , bi )M i=1, Voting matrix Vjk = M i=1 I (ai = j, bi = k) is achieved. 3. Then, the majority vote classifier is defined to be C(j) = arg maxk Vjk . 4. In other words, C(j) is the index of generative factor k which produces largest number of lowest variance for latent dimension j. 5. The metric is the accuracy of the classifier ΣD j=1VjC(j) Σj Σk Vjk . Note that for majority-vote classifier, there are no optimisation hyperparameters to tune, and the resulting classifier is a deterministic function of the training data.
  • 18. Metric for Disentanglement IV Comparison between metrics ([1, 2]) 1. New disentanglement metric of [2] is much less sensitive to hyperparameters than old metric of [1]. 2. Old metric is very sensitive to number of iterations, and metric is constantly improves with more iterations. 1 Please refer the code [Link] for more details.
  • 19. Experiments I Datasets Dataset with known generative factors 1. 2D Shapes dataset[7] with n : 737,280, dim : 64 × 64 fk : shape(3), scale(6), orientation(40), x-position(32), y-position(32) 2. 3D Shales dataset[8] with n : 480,000, dim : 64 × 64 × 3 fk : shape(4), scale(8), orientation(15), floor color(10), wall color(10), object color(10) Dataset with unknown generative factors 1. 3D Faces dataset[9] with n : 239,840, dim : 64 × 64 × 3 2. 3D Chairs dataset[10] with n : 86,366, dim : 64 × 64 × 3 3. CelebA dataset (Cropped)[11] with n : 202,599, dim : 64 × 64 × 3
  • 20. Experiments II Effect of γ compared to β in β-VAE
  • 21. Experiments III Relationship between γ and reconstruction error
  • 23. Experiments V Latent Traversal - 2D Shapes Dataset
  • 24. Experiments VI Latent Traversal - 3D Shapes Dataset
  • 25. Experiments VII Latent Traversal - 3D Chair Dataset
  • 26. Experiments VIII Latent Traversal - 3D Faces and CelebA
  • 27. Conclusion This work introduces FactorVAE, a novel method for disentangled representation. A new disentanglement metric is prorposed. Limitations Low total correlation is necessary but not sufficient for disentangling of independent factors of variation. (When all but one of the latent dimension were to collapse to prior, TC=0 but not disentangled.) The proposed metric requires to generate samples holding one factor fixed, which is not always possible. (When training set does not cover all possible factors) The metric is also unsuitable for data with non-independent factors of variation.
  • 28. References I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework.,” ICLR, vol. 2, no. 5, p. 6, 2017. H. Kim and A. Mnih, “Disentangling by factorising,” arXiv preprint arXiv:1802.05983, 2018. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behavioral and Brain Sciences, vol. 40, no. 2017, 2017. S. Watanabe, “Information theoretical analysis of multivariate correlation,” IBM Journal of research and development, vol. 4,
  • 29. Total Correlation Definition For a given n random variables {X1, X2, . . . , Xn}, Total correlation is defined as the KL divergence from the joint distribution p(X1, . . . , Xn) to the independent distribution of p(X1)p(X2) · · · p(Xn). TC (X1, X2, . . . , Xn) ≡ DKL [p (X1, . . . , Xn) p (X1) p (X2) · · · p (Xn)] TC (X1, X2, . . . , Xn) = n i=1 H (Xi ) − H (X1, X2, . . . , Xn) = The amount of information shared among the variables in the set. A near-zero TC indicates that the variables in the group are essentially statistically independent. Back