Successfully reported this slideshow.                 Upcoming SlideShare
×

# Improving Variational Inference with Inverse Autoregressive Flow

2,431 views

Published on

This slide was created for NIPS 2016 study meetup.
IAF and other related researches are briefly explained.

paper:
Diederik P. Kingma et al., "Improving Variational Inference with Inverse Autoregressive Flow", 2016
https://papers.nips.cc/paper/6581-improving-variational-autoencoders-with-inverse-autoregressive-flow

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No • Be the first to comment

### Improving Variational Inference with Inverse Autoregressive Flow

1. 1. Improving Variational Inference with Inverse Autoregressive Flow Jan. 19, 2017 Tatsuya Shirakawa (tatsuya@abeja.asia) Diederik P. Kingma (OpenAI) Tim Salimans (OpenAI) Rafal Jozefowics (OpenAI) Xi Chen (OpenAI) Ilya Sutskever (OpenAI) Max Welling (University of Amsterdam)
2. 2. 1 Variational Autoencoder (VAE) log 𝑝 𝒙 ≥ 𝔼( 𝒛|𝒙 log 𝑝 𝒙, 𝒛 − log 𝑞(𝒛 |𝒙) ∥ log 𝑝 𝒙 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛 𝒙 ∥ 𝔼( 𝒛|𝒙 log 𝑝 𝒙 𝒛 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛 =: ℒ 𝒙; 𝜽 Model z ~ p(z;η) x ~ p(x|z;η) Optimization maximize 𝜼 1 𝑁 B log 𝑝 𝒙 𝒏; 𝜼 D EFG Inference Model z ~ q(z|x;ν) Optimization maximize 𝜽F(𝜼,𝝂) 1 𝑁 B ℒ 𝒙 𝒏; 𝜽 D EFG ELBO 𝒘𝒊𝒕𝒉 𝜽 = 𝝁, 𝝂 P(z|x; μ*) 𝑫 𝑲𝑳(𝒒 ∥ 𝒑) q(z|x; ν*) P(z|x; μ) q(z|x; ν)
3. 3. 2 Requirements for the inference model q(z|x) Computational Tractability 1. Computationally cheap to compute and differentiate 2. Computationally cheap to sample from 3. Parallel computation Accuracy 4. Sufficiently flexible to match the true posterior p(z|x) P(z|x; μ*) 𝑫 𝑲𝑳(𝒒 ∥ 𝒑) q(z|x; ν*) P(z|x; μ) q(z|x; ν)
4. 4. 3 Previous Designs of q(z|x) Basic Designs - Diagonal Gaussian Distribution - Full Covariance Gaussian Distribution Designs based on Change of Variables - Nice L. Dinh et al., “Nice: non-linear independent components estimation”, 2014 - Normalizing Flow D. J. Rezende et al., “Variational inference with normalizing flows”, ICML2015 Designs based on Adding Auxiliary Variables - Hamiltonian Flow/Hamiltonian Variational Inference T. Salimans et al., ”Markov chain Monte Carlo and variational inference: Bridging the gap”, 2014
5. 5. 4 Diagonal/Full Covariance Gaussian Distribution Diagonal: Efficient but not flexible 𝑞 𝒛 𝒙 = ΠU 𝑁 𝒛𝒊|𝜇U 𝒙 , 𝜎U 𝒙 Full Covariance: Not Efficient and not flexible (unimodal) 𝑞 𝒛 𝒙 = 𝑁 𝒛|𝝁 𝒙 , 𝚺 𝒙 1. Computationally cheap to compute and differentiate ✓ / ✗ 2. Computationally cheap to sample from ✓ / ✗ 3. Parallel computation ✓ / ✗ 4. Sufficiently flexible to match the true posterior p(z|x) ✗
6. 6. 5 Change of Variables based methods Transoform 𝑞 𝑧Z 𝑥 to make more powerful distribution 𝑞 𝑧 𝑥 via sequential application of change of variables 𝒛 𝒕 = 𝑓^ 𝒛 𝒕_𝟏 𝑞 𝒛 𝒕 𝒙 = 𝑞 𝒛 𝒕_𝟏 𝒙 det 𝑑𝑓^ 𝒛 𝒕_𝟏 𝑑𝒛 𝒕_𝟏 _G ⇒ log 𝑞 𝒛 𝑻 𝒙 = log 𝑞 𝒛 𝟎 𝒙 − B log det 𝑑𝑓^ 𝒛 𝒕_𝟏 𝑑𝒛 𝒕_𝟏 ^ • Nice L. Dinh et al., “Nice: non-linear independent components estimation”, 2014 • Normalizing Flow D. J. Rezende et al., “Variational inference with normalizing flows”, ICML2015
7. 7. 6 Normalizing Flow Transformation via 𝒛 𝒕 = 𝒛 𝒕_𝟏 + 𝒖 𝒕 𝑓^ 𝒘 𝒕 𝒛 𝒕_𝟏 + 𝑏^ Key Features - Determinants are computable Drawbacks - Information goes through single bottleneck 1. Computationally cheap to compute and differentiate ✓ 2. Computationally cheap to sample from ✓ 3. Parallel computation ✗ 4. Sufficiently flexible to match the true posterior p(z|x) ✗ single bottleneck ⊕ 𝒛 𝒕_𝟏 𝒛 𝒕 𝒘 𝒕 𝑻 𝒛 𝒕 + 𝑏^ 𝒖 𝒕 𝑓^ 𝒘 𝒕 𝑻 𝒛 𝒕 + 𝑏^
8. 8. 7 Hamiltonian Flow / Hamiltonian Variational Inference ELBO with auxiliary variables y log 𝑝 𝒙 ≥ log 𝑝 𝒙 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛 𝒙 − 𝐷23 𝑞 𝒚 𝒙, 𝒛 ∥ 𝑟 𝒚 𝒙, 𝒛 =: ℒ 𝒙 Drawing (y, z) via HMC 𝑦^, 𝑧^ ~𝐻𝑀𝐶 𝑦^, 𝑧^|𝑦^_G, 𝑧^_G Key Features - Capability to sample from exact posterior Drawbacks - Long mixing time and lower ELBO 1. Computationally cheap to compute and differentiate ✗ 2. Computationally cheap to sample from ✗ 3. Parallel computation ✗ 4. Sufficiently flexible to match the true posterior p(z|x) ✓
9. 9. 8 Nice Transform only half of z at each steps 𝒛 𝒕 = 𝒛 𝒕 𝜶 , 𝒛 𝒕 𝜷 = 𝒛 𝒕_𝟏 𝜶 , 𝒛 𝒕_𝟏 𝜷 + 𝑓^ 𝒙, 𝒛 𝒕_𝟏 𝜶 , Key Features - Determinant of the Jacobian det uvw 𝒛 𝒕x𝟏 u𝒛 𝒕x𝟏 is always 1 Drawbacks - Limited form of transformation - less accurate powerful than Normalizing Flow (Next) 1. Computationally cheap to compute and differentiate ✓ 2. Computationally cheap to sample from ✓ 3. Parallel computation ✗ 4. Sufficiently flexible to match the true posterior p(z|x) ✗
10. 10. 9 Autoregressive Flow (proposed) Autoregressive Flow (𝑑𝜇^,U/𝑑𝑧^,z = 𝑑𝜎^,U/𝑑𝑧^,z = 0 if 𝑖 ≤ 𝑗) 𝑧^,U = 𝜇^,U 𝒛 𝒕,𝟎:𝒊_𝟏 + 𝜎^,U 𝒛 𝒕,𝟎:𝒊_𝟏 ⊙ 𝑧^_G,U Key features - Powerful - Easy to compute det 𝜕𝒛 𝒕/𝜕𝒛 𝒕_𝟏 = ΠU 𝜎^,U 𝐳𝐭_𝟏 Drawbacks - Difficult to parallelize 1. Computationally cheap to compute and differentiate ✓ 2. Computationally cheap to sample from ✓ 3. Parallel computation ✗ 4. Sufficiently flexible to match the true posterior p(z|x) ✓
11. 11. 10 Inverse Autoregressive Flow (proposed) Inverting AF (𝝁 𝒕, 𝝈 𝒕 is also autoregressive) 𝒛 𝒕 = 𝒛 𝒕_𝟏 − 𝝁 𝒕 𝒛 𝒕_𝟏 𝝈 𝒕 𝒛 𝒕_𝟏 Key Features - Equally powerful as AF - Easy to compute det 𝜕𝒛 𝒕/𝜕𝒛 𝒕_𝟏 = 1/ΠU 𝜎^,U 𝐳𝐭_𝟏 - Parallelizable 1. Computationally cheap to compute and differentiate ✓ 2. Computationally cheap to sample from ✓ 3. Parallel computation ✓ 4. Sufficiently flexible to match the true posterior p(z|x) ✓
12. 12. 11 IAF through Masked Autoencoder (MADE) Modeling autoregressive 𝝁 𝒕 and 𝝈 𝒕 with MADE • Removing paths from futures from Autoencoders by introducing masks • MADE is a probabilistic model 𝑝 𝑥 = ΠU 𝑝 𝑥U 𝑥Z:U_G
13. 13. 12 Experiments IAF is evaluated on image generating models Models for MNIST - Convolutional VAE with ResNet blocks - IAF = 2-layer MADE - IAF transformations are stacked with ordering reversed alternately Models for CIFAR-10 (very complicated)
14. 14. 13 MNIST
15. 15. 14 CIFAR-10
16. 16. 15 IAF in 1 slide 𝑫 𝑲𝑳(𝒒 ∥ 𝒑) 𝒒 𝒛 𝑻 𝒙; 𝝂 𝑻 𝝂 𝑻 𝒑 𝒛 𝒙; 𝝁∗𝒑 𝒛 𝒙; 𝝁 𝒒 𝒛 𝒙; 𝝂 𝑻 ∗ 𝒒 𝒛 𝒕 𝒙; 𝝂 𝒕 𝝂 𝒕 𝒒 𝒛 𝟎 𝒙; 𝝂 𝟎 𝝂 𝟎 Autoregressive Flow Inverse Autoregressive Flow IAF is ü Easy to compute and differentiate ü Easy to sample from ü Parallelizable ü Flexible 𝒒 𝒛 𝒙; 𝝂 𝑻
17. 17. We are hiring! http://www.abeja.asia/ https://www.wantedly.com/companies/abeja