Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving Variational Inference with Inverse Autoregressive Flow

2,431 views

Published on

This slide was created for NIPS 2016 study meetup.
IAF and other related researches are briefly explained.

paper:
Diederik P. Kingma et al., "Improving Variational Inference with Inverse Autoregressive Flow", 2016
https://papers.nips.cc/paper/6581-improving-variational-autoencoders-with-inverse-autoregressive-flow

Published in: Data & Analytics
  • Be the first to comment

Improving Variational Inference with Inverse Autoregressive Flow

  1. 1. Improving Variational Inference with Inverse Autoregressive Flow Jan. 19, 2017 Tatsuya Shirakawa (tatsuya@abeja.asia) Diederik P. Kingma (OpenAI) Tim Salimans (OpenAI) Rafal Jozefowics (OpenAI) Xi Chen (OpenAI) Ilya Sutskever (OpenAI) Max Welling (University of Amsterdam)
  2. 2. 1 Variational Autoencoder (VAE) log ๐‘ ๐’™ โ‰ฅ ๐”ผ( ๐’›|๐’™ log ๐‘ ๐’™, ๐’› โˆ’ log ๐‘ž(๐’› |๐’™) โˆฅ log ๐‘ ๐’™ โˆ’ ๐ท23 ๐‘ž ๐’›|๐’™ โˆฅ ๐‘ ๐’› ๐’™ โˆฅ ๐”ผ( ๐’›|๐’™ log ๐‘ ๐’™ ๐’› โˆ’ ๐ท23 ๐‘ž ๐’›|๐’™ โˆฅ ๐‘ ๐’› =: โ„’ ๐’™; ๐œฝ Model z ~ p(z;ฮท) x ~ p(x|z;ฮท) Optimization maximize ๐œผ 1 ๐‘ B log ๐‘ ๐’™ ๐’; ๐œผ D EFG Inference Model z ~ q(z|x;ฮฝ) Optimization maximize ๐œฝF(๐œผ,๐‚) 1 ๐‘ B โ„’ ๐’™ ๐’; ๐œฝ D EFG ELBO ๐’˜๐’Š๐’•๐’‰ ๐œฝ = ๐, ๐‚ P(z|x; ฮผ*) ๐‘ซ ๐‘ฒ๐‘ณ(๐’’ โˆฅ ๐’‘) q(z|x; ฮฝ*) P(z|x; ฮผ) q(z|x; ฮฝ)
  3. 3. 2 Requirements for the inference model q(z|x) Computational Tractability 1. Computationally cheap to compute and differentiate 2. Computationally cheap to sample from 3. Parallel computation Accuracy 4. Sufficiently flexible to match the true posterior p(z|x) P(z|x; ฮผ*) ๐‘ซ ๐‘ฒ๐‘ณ(๐’’ โˆฅ ๐’‘) q(z|x; ฮฝ*) P(z|x; ฮผ) q(z|x; ฮฝ)
  4. 4. 3 Previous Designs of q(z|x) Basic Designs - Diagonal Gaussian Distribution - Full Covariance Gaussian Distribution Designs based on Change of Variables - Nice L. Dinh et al., โ€œNice: non-linear independent components estimationโ€, 2014 - Normalizing Flow D. J. Rezende et al., โ€œVariational inference with normalizing flowsโ€, ICML2015 Designs based on Adding Auxiliary Variables - Hamiltonian Flow/Hamiltonian Variational Inference T. Salimans et al., โ€Markov chain Monte Carlo and variational inference: Bridging the gapโ€, 2014
  5. 5. 4 Diagonal/Full Covariance Gaussian Distribution Diagonal: Efficient but not flexible ๐‘ž ๐’› ๐’™ = ฮ U ๐‘ ๐’›๐’Š|๐œ‡U ๐’™ , ๐œŽU ๐’™ Full Covariance: Not Efficient and not flexible (unimodal) ๐‘ž ๐’› ๐’™ = ๐‘ ๐’›|๐ ๐’™ , ๐šบ ๐’™ 1. Computationally cheap to compute and differentiate โœ“ / โœ— 2. Computationally cheap to sample from โœ“ / โœ— 3. Parallel computation โœ“ / โœ— 4. Sufficiently flexible to match the true posterior p(z|x) โœ—
  6. 6. 5 Change of Variables based methods Transoform ๐‘ž ๐‘งZ ๐‘ฅ to make more powerful distribution ๐‘ž ๐‘ง ๐‘ฅ via sequential application of change of variables ๐’› ๐’• = ๐‘“^ ๐’› ๐’•_๐Ÿ ๐‘ž ๐’› ๐’• ๐’™ = ๐‘ž ๐’› ๐’•_๐Ÿ ๐’™ det ๐‘‘๐‘“^ ๐’› ๐’•_๐Ÿ ๐‘‘๐’› ๐’•_๐Ÿ _G โ‡’ log ๐‘ž ๐’› ๐‘ป ๐’™ = log ๐‘ž ๐’› ๐ŸŽ ๐’™ โˆ’ B log det ๐‘‘๐‘“^ ๐’› ๐’•_๐Ÿ ๐‘‘๐’› ๐’•_๐Ÿ ^ โ€ข Nice L. Dinh et al., โ€œNice: non-linear independent components estimationโ€, 2014 โ€ข Normalizing Flow D. J. Rezende et al., โ€œVariational inference with normalizing flowsโ€, ICML2015
  7. 7. 6 Normalizing Flow Transformation via ๐’› ๐’• = ๐’› ๐’•_๐Ÿ + ๐’– ๐’• ๐‘“^ ๐’˜ ๐’• ๐’› ๐’•_๐Ÿ + ๐‘^ Key Features - Determinants are computable Drawbacks - Information goes through single bottleneck 1. Computationally cheap to compute and differentiate โœ“ 2. Computationally cheap to sample from โœ“ 3. Parallel computation โœ— 4. Sufficiently flexible to match the true posterior p(z|x) โœ— single bottleneck โŠ• ๐’› ๐’•_๐Ÿ ๐’› ๐’• ๐’˜ ๐’• ๐‘ป ๐’› ๐’• + ๐‘^ ๐’– ๐’• ๐‘“^ ๐’˜ ๐’• ๐‘ป ๐’› ๐’• + ๐‘^
  8. 8. 7 Hamiltonian Flow / Hamiltonian Variational Inference ELBO with auxiliary variables y log ๐‘ ๐’™ โ‰ฅ log ๐‘ ๐’™ โˆ’ ๐ท23 ๐‘ž ๐’›|๐’™ โˆฅ ๐‘ ๐’› ๐’™ โˆ’ ๐ท23 ๐‘ž ๐’š ๐’™, ๐’› โˆฅ ๐‘Ÿ ๐’š ๐’™, ๐’› =: โ„’ ๐’™ Drawing (y, z) via HMC ๐‘ฆ^, ๐‘ง^ ~๐ป๐‘€๐ถ ๐‘ฆ^, ๐‘ง^|๐‘ฆ^_G, ๐‘ง^_G Key Features - Capability to sample from exact posterior Drawbacks - Long mixing time and lower ELBO 1. Computationally cheap to compute and differentiate โœ— 2. Computationally cheap to sample from โœ— 3. Parallel computation โœ— 4. Sufficiently flexible to match the true posterior p(z|x) โœ“
  9. 9. 8 Nice Transform only half of z at each steps ๐’› ๐’• = ๐’› ๐’• ๐œถ , ๐’› ๐’• ๐œท = ๐’› ๐’•_๐Ÿ ๐œถ , ๐’› ๐’•_๐Ÿ ๐œท + ๐‘“^ ๐’™, ๐’› ๐’•_๐Ÿ ๐œถ , Key Features - Determinant of the Jacobian det uvw ๐’› ๐’•x๐Ÿ u๐’› ๐’•x๐Ÿ is always 1 Drawbacks - Limited form of transformation - less accurate powerful than Normalizing Flow (Next) 1. Computationally cheap to compute and differentiate โœ“ 2. Computationally cheap to sample from โœ“ 3. Parallel computation โœ— 4. Sufficiently flexible to match the true posterior p(z|x) โœ—
  10. 10. 9 Autoregressive Flow (proposed) Autoregressive Flow (๐‘‘๐œ‡^,U/๐‘‘๐‘ง^,z = ๐‘‘๐œŽ^,U/๐‘‘๐‘ง^,z = 0 if ๐‘– โ‰ค ๐‘—) ๐‘ง^,U = ๐œ‡^,U ๐’› ๐’•,๐ŸŽ:๐’Š_๐Ÿ + ๐œŽ^,U ๐’› ๐’•,๐ŸŽ:๐’Š_๐Ÿ โŠ™ ๐‘ง^_G,U Key features - Powerful - Easy to compute det ๐œ•๐’› ๐’•/๐œ•๐’› ๐’•_๐Ÿ = ฮ U ๐œŽ^,U ๐ณ๐ญ_๐Ÿ Drawbacks - Difficult to parallelize 1. Computationally cheap to compute and differentiate โœ“ 2. Computationally cheap to sample from โœ“ 3. Parallel computation โœ— 4. Sufficiently flexible to match the true posterior p(z|x) โœ“
  11. 11. 10 Inverse Autoregressive Flow (proposed) Inverting AF (๐ ๐’•, ๐ˆ ๐’• is also autoregressive) ๐’› ๐’• = ๐’› ๐’•_๐Ÿ โˆ’ ๐ ๐’• ๐’› ๐’•_๐Ÿ ๐ˆ ๐’• ๐’› ๐’•_๐Ÿ Key Features - Equally powerful as AF - Easy to compute det ๐œ•๐’› ๐’•/๐œ•๐’› ๐’•_๐Ÿ = 1/ฮ U ๐œŽ^,U ๐ณ๐ญ_๐Ÿ - Parallelizable 1. Computationally cheap to compute and differentiate โœ“ 2. Computationally cheap to sample from โœ“ 3. Parallel computation โœ“ 4. Sufficiently flexible to match the true posterior p(z|x) โœ“
  12. 12. 11 IAF through Masked Autoencoder (MADE) Modeling autoregressive ๐ ๐’• and ๐ˆ ๐’• with MADE โ€ข Removing paths from futures from Autoencoders by introducing masks โ€ข MADE is a probabilistic model ๐‘ ๐‘ฅ = ฮ U ๐‘ ๐‘ฅU ๐‘ฅZ:U_G
  13. 13. 12 Experiments IAF is evaluated on image generating models Models for MNIST - Convolutional VAE with ResNet blocks - IAF = 2-layer MADE - IAF transformations are stacked with ordering reversed alternately Models for CIFAR-10 (very complicated)
  14. 14. 13 MNIST
  15. 15. 14 CIFAR-10
  16. 16. 15 IAF in 1 slide ๐‘ซ ๐‘ฒ๐‘ณ(๐’’ โˆฅ ๐’‘) ๐’’ ๐’› ๐‘ป ๐’™; ๐‚ ๐‘ป ๐‚ ๐‘ป ๐’‘ ๐’› ๐’™; ๐โˆ—๐’‘ ๐’› ๐’™; ๐ ๐’’ ๐’› ๐’™; ๐‚ ๐‘ป โˆ— ๐’’ ๐’› ๐’• ๐’™; ๐‚ ๐’• ๐‚ ๐’• ๐’’ ๐’› ๐ŸŽ ๐’™; ๐‚ ๐ŸŽ ๐‚ ๐ŸŽ Autoregressive Flow Inverse Autoregressive Flow IAF is รผ Easy to compute and differentiate รผ Easy to sample from รผ Parallelizable รผ Flexible ๐’’ ๐’› ๐’™; ๐‚ ๐‘ป
  17. 17. We are hiring! http://www.abeja.asia/ https://www.wantedly.com/companies/abeja

ร—