Interpretable Representation Learning with InfoGAN

InfoGAN
Interpretable Representation Learning by Information
Maximizing Generative Adversarial Nets

Introduction
 In ordinary GANs, the latent vector z has an arbitrary
representation. The vector has no intrinsic meaning.
 It would be desirable to make a more meaningful
representation of the outputs.

Disentangled Representation
 We wish to disentangle the representation of the output
images in the input latent vectors.
 That is, we wish to make the values of the latent vector c
correspond to features in the generated images.

Information Theory
A brief introduction to relevant concepts in information theory.

Entropy
Entropy can be intuitively
understood the amount of
information that a variable contains.
It is borrowed from statistical
thermodynamics and is directly
analogous to the physical definition
of randomness of particle states.
𝐻 𝑋 = −
𝑖=1
𝑛
𝑃 𝑥𝑖 log 𝑏 𝑃 𝑥𝑖
𝐻 𝑋 𝑌 = −
𝑖,𝑗
𝑝 𝑥𝑖, 𝑦𝑗 𝑙𝑜𝑔
𝑝(𝑥𝑖, 𝑦𝑖)
𝑝 𝑦𝑖

Mutual Information: Definition
 In information theory, I(X;Y),
the mutual information
between X and Y, measures
the “amount of information”
learned from knowledge of
random variable Y about the
random variable X.
 The mutual information can be
expressed as the difference of
two entropy terms:
I(X;Y) = H(X)-H(X|Y) = H(Y)-H(Y|X)

Mutual Information: Interpretation
 If X and Y are independent,
then I(X;Y) = 0, because
knowing one variable reveals
nothing about the other.
 If X and Y are related by a
deterministic, invertible
function, I(X;Y) is at its
maximum.
 I(X;Y) is the reduction of
uncertainty in X when Y is
observed and vice versa.

Mutual Information: Implications
 Formulation of a cost function using mutual information.
 Information regularized mini-max game.
min
𝐺
max
𝐷
𝑉𝐼 𝐷, 𝐺 = 𝑉 𝐷, 𝐺 − 𝜆𝐼(𝑐; 𝐺 𝑧, 𝑐 )
z: vector for representing intrinsic noise
c: Latent code representing meaningful information.

Implementation
Variational Mutual Information Maximization and Approximations.

Variational Mutual Information Maximization
 In practice, 𝐼(𝑐; 𝐺 𝑧, 𝑐 ) cannot be maximized directly as this
requires 𝑃(𝑐|𝑥), the posterior distribution.
 We can obtain a lower bound for 𝐼(𝑐; 𝐺 𝑧, 𝑐 ) by using an
auxiliary distribution, 𝑄 𝑐 𝑥 , to approximate 𝑃(𝑐|𝑥).
 This technique is known as Variational Mutual Information
Maximization. The equations are in the next slide.

 The Mutual Information is decomposed into its components.
 Q is introduced to use the definition of KL divergence.
 KL Divergence is always greater than 0 by definition.

 Although 𝐻(𝑐) can also be optimized, it is set as a constant for
simplicity. This is done by drawing c from a fixed distribution.
 The equality in this part is proven in the appendix of the paper. It
holds under most conditions.

 By the proofs discussed previously, the actual information
regularization is as follows.
min
𝐺,𝑄
max
𝐷
𝑉𝐼𝑛𝑓𝑜𝐺𝐴𝑁 𝐷, 𝐺, 𝑄 = 𝑉 𝐷, 𝐺 − 𝜆𝐿𝐼(𝐺, 𝑄)
𝐿𝐼: Information Lower Bound
𝜆: Weighting Hyperparameter

Practical Implementation
 In practice, we use a neural
network to represent Q.
 KL Divergence is minimized
when Q converges to P.
 Q is just D with an extra FC
layer. It outputs an estimate
of the c latent vector(s).
 𝐿𝐼(𝐺, 𝑄) has been observed
to converge faster than GAN
objectives.
 InfoGAN thus comes
essentially for free with GAN.

Additional explanation
 Cross Entropy Loss is used on the estimate of c and the actual c
(Some implementations use MSE).
 For discrete cases, the outputs are softmax activations. For
continuous cases, they may be sigmoid, tanh, or linear.
 The original is much more complicated and does not directly
output the estimate of the latent vector but estimates of its
distribution (mean and standard deviation).

Analysis
 The information lower bound
𝐿𝐼 does not increase without
information regularization.
 The maximum appears to be
2.3 for this case.

Results: MNIST
The discrete latent vector input for digits is
highly correlated with the output number.
It can be used to classify MNIST with a 5%
error rate, even when InfoGAN is trained
without any labels.
The continuous latent vector has found
the angle of the characters.
We confirm that the latent code validity by
extending the range beyond the original.
The ordinary GAN has learned nothing.
There is no control over which vector
learns what.

Results: Faces
Multiple vectors with a range between -1
and 1 are used.
The 4 latent vectors appear to have
captured azimuth, elevation, lighting, and
width.
There is smooth interpolation within and
even beyond the range.
Moreover, the other details change to
make a much more natural image.
It is not a simple case where only the
target feature changes while the other
factors remain unnaturally constant.

Conclusion
 InfoGAN can learn and interpret salient features of data
without any labels or supervision.
 InfoGAN can discover salient latent factors of variation
automatically and produce a latent vector containing that
information.

Interpretable Representation Learning with InfoGAN

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Interpretable Representation Learning with InfoGAN

Similar to Interpretable Representation Learning with InfoGAN (20)

More from Joonhyung Lee

More from Joonhyung Lee (11)

Recently uploaded

Recently uploaded (20)

Interpretable Representation Learning with InfoGAN