A summary of Categorical Reparameterization with Gumbel-Softmax by Jang et al. (ICLR 2017)

Categorical Reparameterization
with Gumbel-Softmax
Eric Jang et al., ICLR 2017
A summary by
Jin-Hwa Kim
BI Lab. SNU

Table of Contents
• Problem definition
• Score function estimator (a policy gradient method, REINFORCE)
• Gumbel-Softmax
• Implementations

Problem Definition
Feed-forward Sampling
Linear
Softmax
Cross Entropy Loss
Input x
Linear
Softmax
Input x
label
Sampler
Environment
reward𝜕ℒ/ 𝜕𝑝
𝜕𝑝/ 𝜕𝑜
𝜕𝑝/ 𝜕𝑜
𝜕𝑜/ 𝜕𝑥
𝜕𝑜/ 𝜕𝑥
Sampling is a selection following
a given probability distribution!
Non-differentiable

Score Function Estimator
• A policy gradient method
• REINFORCE (Williams, 1992)

Score Function
• Assume that we know the gradient
• Likelihood ratio estimator (Glynn, 1990) uses the following identity
score function
Use MC Sampling
Gradient of
policy networks
0.5
0.2
0.3
a b c

Monte-Carlo Sampling
• Update parameters by stochastic gradient descent
i-th action
For the other actions,
gradients are zero.
Since 𝜋 𝜃
(𝑖)
is sampled following a probabilistic distribution 𝜋 𝜃,
we can use 𝛻𝜃log𝜋 𝜃
(𝑖)
in the batch learning.

Monte-Carlo Policy Gradient (REINFORCE)
• REINFORCE only require that 𝜋 𝜃 is continuous in 𝜃
• The variance of SF is linearly proportional to sampling space (Rezende et
al., 2014a)
score function
reward baseline
MSE to reduce variance
batch learning using MC sampling

Reducing Variance using a Baseline
• Reduce variance without changing expectation

Cross Entropy vs. REINFORCE
Cross Entropy REINFORCE
pq r
𝜋
similar to cross entropy

Cross Entropy vs. REINFORCE
Cross Entropy REINFORCE
pq r
𝜋
Likewise to cross entropy…
Input of
SoftMax
learning rate
advantage

Derivative w.r.t. Input of SoftMax

Comparison
Feed-forward Sampling
Linear
Softmax
Loss
Input x
Linear
Softmax
Input x
label
Sampler
Environment
reward
𝜕ℒ/ 𝜕𝑝
𝜕𝑝/ 𝜕𝑜
𝜕𝑝/ 𝜕𝑜
𝜕𝑜/ 𝜕𝑥
𝜕𝑜/ 𝜕𝑥
REINFORCE
Linear
Softmax
Input x
Sampler
Environment
reward
𝜕𝑝/ 𝜕𝑜
𝜕𝑜/ 𝜕𝑥
𝜕log𝑝/ 𝜕𝑝

Gist of Gumbel-Softmax
• A efficient gradient estimator that
• replace the non-differentiable sample from a categorical distribution
• with a differentiable sample from the Gumbel-Softmax distribution

Distribution vs. Sampling
A probability distribution Sampling
0.5
0.2
0.3
a b c
a
b
c
Sampling
discrete values

Distribution vs. Sampling
A probability distribution Sampling
0.5
0.2
0.3
a b c
a
b
c
a b c
a b c
a b c
Continuous relaxations
of one-hot vectors
Sampling
discrete values
Gumbel-Max Trick

Gumbel-Max Trick
• To draw samples z from a categorical distribution with class probabilities 𝜋:
• 𝑔𝑖 is a i.i.d samples drawn from Gumbel(0,1)
• Gumbel(0,1) = − log − log 𝑢 , 𝑢 ~ Uniform(0,1)
• Gumbel, 1954 & Maddison et al., NIPS 2014
• Using Multinoulli distribution:
Inverse transform sampling

• To draw samples z from a categorical distribution with class probabilities 𝜋:
• To generate k-dimensional sample vectors y
• 𝜏 → 0: one-hot & categorical distribution
Softmax Replaces Argmax

Straight-Through Gumbel-Softmax Estimator
• Use argmax in forward pass
• But use the continuous approximation in backward pass

Categorical VAE Result
Test loss (negative variational lower bound) on binarized MNIST VAE with categorical latent variables (784 – (20x10) – 200).

REINFORCE
Torch7 PyTorch
https://github.com/pytorch/examples/blob/master/reinforcement_learning/reinforce.py
Just for a reference, not runnable code

Gumbel-Softmax
Torch7 PyTorch
https://github.com/pytorch/examples/blob/master/reinforcement_learning/reinforce.py
Just for a reference, not runnable code
https://github.com/Kaixhin/Autoencoders/blob/master/models/CatVAE.lua

A summary of Categorical Reparameterization with Gumbel-Softmax by Jang et al. (ICLR 2017)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A summary of Categorical Reparameterization with Gumbel-Softmax by Jang et al. (ICLR 2017)

Similar to A summary of Categorical Reparameterization with Gumbel-Softmax by Jang et al. (ICLR 2017) (20)

Recently uploaded

Recently uploaded (20)

A summary of Categorical Reparameterization with Gumbel-Softmax by Jang et al. (ICLR 2017)