SlideShare a Scribd company logo
1 of 16
Download to read offline
REBAR: Low-variance, unbiased gradient estimates
for discrete latent variable models
Sangwoo Mo
KAIST AI Lab.
November 29, 2017
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 1 / 16
General Problem
Let z ∼ p(z|θ). Want to maximize
L(θ) = Ep(z)[f (z)1].
Example:
ELBO2
L(θ, φ) = Eqφ(z|x)[pθ(x|z)]
Policy Gradient
L(θ) = Epθ(τ)[R(τ)]
1
assume f (z) is independent to θ
2
omit KL term
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 2 / 16
General Problem
Let z ∼ p(z|θ). Want to maximize
L(θ) = Ep(z)[f (z)].
Want to optimize by gradient descent1. Need to compute
d
dθ
L(θ) =
d
dθ
Ep(z)[f (z)]
Caveat: We cannot simply put d
dθ inside since z depends on θ.
1
assume f (z) is differentiable
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 3 / 16
Background
REINFORCE:
d
dθ
Ep(z)[f (z)] =
d
dθ
f (z)p(z)dz
= f (z)
∂
∂θ
p(z)dz
= f (z)
∂
∂θ p(z)
p(z)
p(z)dz
= f (z)
∂
∂θ
log p(z)dz
= Ep(z) f (z)
∂
∂θ
log p(z)
It is unbiased, but variance is too high.
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 4 / 16
Background
Control variate: Subtract baseline c.
d
dθ
Ep(z)[f (z)] =
d
dθ
Ep(z,c)[f (z) − c] + Ep(z,c)[c]
= Ep(z,c) (f (z) − c)
∂
∂θ
log p(z) +
∂
∂θ
Ep(z,c)[c]
Qustion: How to choose proper1 c?
constant value e.g. Ep(z)[f (z)]
linear approximation of f arround Ep(z)[z]
1
i) c should be correlated to p(z), ii) if c
|=
θ, second term is eleminated
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 5 / 16
Background
Reparametrization trick: Assume z = g(θ, ).
d
dθ
Ep(z)[f (z)] =
d
dθ
f (z)p(z)dz
=
d
dθ
f (g(θ, ))p( )d
=
∂f
∂g
∂g
∂θ
p( )d
= Ep( )
∂f
∂g
∂g
∂θ
It is unbiased & low variance, and successful for continuous1 z
However, it is not directly applicable for discrete case
1
VAE assumes z ∼ N(µ, σ) and reparametrize it as z = µ + σ where ∼ N(0, 1)
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 6 / 16
Background
Gumbel-softmax trick:
It is well-known that z ∼ Cat(θ) is equivalent to
z = H(w) = arg maxi [log θi − log(− log( i ))]
where H is hard argmax, w = g(θ, ), and i ∼ Uniform(0, 1).
Instead of H, use softmax σλ(w) (with temperature λ).
Then σλ(g(θ, )) is differentiable reparametrization of z.
It is low variance, but biased.
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 7 / 16
REBAR
Motivation:
Gumbel-softmax is highly correlated biased estimator
Use Gumbel-softmax as control variate of REINFORCE
However, we can do more than na¨ıvely applying this idea
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 8 / 16
REBAR
Observation:
We can reduce variance of REINFORCE by marginalizing w over z.
∂
∂θ
Ep(w) [f (σλ(w))] = Ep(w) f (σλ(w))
∂
∂θ
log p(w)
= Ep(z) Ep(w|z) f (σλ(w))
∂
∂θ
(log p(w|z) + log p(z))
= Ep(z)
∂
∂θ
Ep(w|z) [f (σλ(w))]
+ Ep(z) Ep(w|z)[f (σλ(w))]
∂
∂θ
log p(z)
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 9 / 16
REBAR
Observation:
Here, the first term can be reparametrized as
Ep(z)
∂
∂θ
Ep(w|z) [f (σλ(w))] = Ep(z) Ep(δ)
∂
∂θ
f (σλ(˜w))
where ˜w = ˜g(θ, z, δ)1 and δi ∼ Uniform(0, 1).
1
conditional distribution of g given z
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 10 / 16
REBAR
Putting it all together,
∂
∂θ
Ep(z)[f (z)] = E ,δ [f (H(w)) − ηf (σλ(˜w))]
∂
∂θ
log p(z)
z=H(w)
+ η
∂
∂θ
f (σλ(w)) − η
∂
∂θ
f (σλ(˜w))
where w = g(θ, ), ˜w = ˜g(θ, H(w), δ), and i , δi ∼ Uniform(0, 1).
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 11 / 16
Hyperparameter Optimization
Let r(η, λ) be the Monte Carlo REBAR estiamtor.
Since r is unbiased, E[r] does not depend on η and λ. Thus,
∂
∂η
Var(r) =
∂
∂η
E[r2
] − E[r]2
= E 2r
∂r
∂η
.
Now we can optimize η (and λ) to minimize variance.
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 12 / 16
Experiments
Minimize Ep(z)[(z − 0.45)2] where z ∼ Bernoulli(θ).
left: log variance / right: loss
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 13 / 16
Experiments
Maximize ELBO of Sigmoid Belief Network
log p(x|θ) ≥ Eq(z|x,θ)[log p(x, z|θ) − log q(z|x, θ)]
left: 2-layer linear / right: 1-layer nonlinear (log variance)
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 14 / 16
Experiments
Maximize ELBO of Sigmoid Belief Network
log p(x|θ) ≥ Eq(z|x,θ)[log p(x, z|θ) − log q(z|x, θ)]
left: 2-layer linear / right: 1-layer nonlinear (objective)
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 15 / 16
Questions?
Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 16 / 16

More Related Content

What's hot

距離と分類の話
距離と分類の話距離と分類の話
距離と分類の話考司 小杉
 
統計的因果推論からCausalMLまで走り抜けるスライド
統計的因果推論からCausalMLまで走り抜けるスライド統計的因果推論からCausalMLまで走り抜けるスライド
統計的因果推論からCausalMLまで走り抜けるスライドfusha
 
ブートストラップ法とその周辺とR
ブートストラップ法とその周辺とRブートストラップ法とその周辺とR
ブートストラップ法とその周辺とRDaisuke Yoneoka
 
ベイズ推論による機械学習入門 第4章
ベイズ推論による機械学習入門 第4章ベイズ推論による機械学習入門 第4章
ベイズ推論による機械学習入門 第4章YosukeAkasaka
 
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介Naoki Hayashi
 
ディープボルツマンマシン入門
ディープボルツマンマシン入門ディープボルツマンマシン入門
ディープボルツマンマシン入門Saya Katafuchi
 
20130716 はじパタ3章前半 ベイズの識別規則
20130716 はじパタ3章前半 ベイズの識別規則20130716 はじパタ3章前半 ベイズの識別規則
20130716 はじパタ3章前半 ベイズの識別規則koba cky
 
マハラノビス距離とユークリッド距離の違い
マハラノビス距離とユークリッド距離の違いマハラノビス距離とユークリッド距離の違い
マハラノビス距離とユークリッド距離の違いwada, kazumi
 
実験計画法入門 Part 2
実験計画法入門 Part 2実験計画法入門 Part 2
実験計画法入門 Part 2haji mizu
 
統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)Kota Mori
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章Shuyo Nakatani
 
階層ベイズによるワンToワンマーケティング入門
階層ベイズによるワンToワンマーケティング入門階層ベイズによるワンToワンマーケティング入門
階層ベイズによるワンToワンマーケティング入門shima o
 
FDRの使い方 (Kashiwa.R #3)
FDRの使い方 (Kashiwa.R #3)FDRの使い方 (Kashiwa.R #3)
FDRの使い方 (Kashiwa.R #3)Haruka Ozaki
 
階層モデルの分散パラメータの事前分布について
階層モデルの分散パラメータの事前分布について階層モデルの分散パラメータの事前分布について
階層モデルの分散パラメータの事前分布についてhoxo_m
 
充足可能性問題のいろいろ
充足可能性問題のいろいろ充足可能性問題のいろいろ
充足可能性問題のいろいろHiroshi Yamashita
 
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料 「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料 Ken'ichi Matsui
 
Dynamic Time Warping を用いた高頻度取引データのLead-Lag 効果の推定
Dynamic Time Warping を用いた高頻度取引データのLead-Lag 効果の推定Dynamic Time Warping を用いた高頻度取引データのLead-Lag 効果の推定
Dynamic Time Warping を用いた高頻度取引データのLead-Lag 効果の推定Katsuya Ito
 

What's hot (20)

階層ベイズとWAIC
階層ベイズとWAIC階層ベイズとWAIC
階層ベイズとWAIC
 
距離と分類の話
距離と分類の話距離と分類の話
距離と分類の話
 
Deep walk について
Deep walk についてDeep walk について
Deep walk について
 
統計的因果推論からCausalMLまで走り抜けるスライド
統計的因果推論からCausalMLまで走り抜けるスライド統計的因果推論からCausalMLまで走り抜けるスライド
統計的因果推論からCausalMLまで走り抜けるスライド
 
ブートストラップ法とその周辺とR
ブートストラップ法とその周辺とRブートストラップ法とその周辺とR
ブートストラップ法とその周辺とR
 
ベイズ推論による機械学習入門 第4章
ベイズ推論による機械学習入門 第4章ベイズ推論による機械学習入門 第4章
ベイズ推論による機械学習入門 第4章
 
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介
 
ディープボルツマンマシン入門
ディープボルツマンマシン入門ディープボルツマンマシン入門
ディープボルツマンマシン入門
 
20130716 はじパタ3章前半 ベイズの識別規則
20130716 はじパタ3章前半 ベイズの識別規則20130716 はじパタ3章前半 ベイズの識別規則
20130716 はじパタ3章前半 ベイズの識別規則
 
マハラノビス距離とユークリッド距離の違い
マハラノビス距離とユークリッド距離の違いマハラノビス距離とユークリッド距離の違い
マハラノビス距離とユークリッド距離の違い
 
実験計画法入門 Part 2
実験計画法入門 Part 2実験計画法入門 Part 2
実験計画法入門 Part 2
 
統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
階層ベイズによるワンToワンマーケティング入門
階層ベイズによるワンToワンマーケティング入門階層ベイズによるワンToワンマーケティング入門
階層ベイズによるワンToワンマーケティング入門
 
FDRの使い方 (Kashiwa.R #3)
FDRの使い方 (Kashiwa.R #3)FDRの使い方 (Kashiwa.R #3)
FDRの使い方 (Kashiwa.R #3)
 
階層モデルの分散パラメータの事前分布について
階層モデルの分散パラメータの事前分布について階層モデルの分散パラメータの事前分布について
階層モデルの分散パラメータの事前分布について
 
充足可能性問題のいろいろ
充足可能性問題のいろいろ充足可能性問題のいろいろ
充足可能性問題のいろいろ
 
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料 「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
 
Dynamic Time Warping を用いた高頻度取引データのLead-Lag 効果の推定
Dynamic Time Warping を用いた高頻度取引データのLead-Lag 効果の推定Dynamic Time Warping を用いた高頻度取引データのLead-Lag 効果の推定
Dynamic Time Warping を用いた高頻度取引データのLead-Lag 効果の推定
 
Juliaで並列計算
Juliaで並列計算Juliaで並列計算
Juliaで並列計算
 

Similar to REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

Darmon Points: an Overview
Darmon Points: an OverviewDarmon Points: an Overview
Darmon Points: an Overviewmmasdeu
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansFrank Nielsen
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationCemal Ardil
 
Group theory notes
Group theory notesGroup theory notes
Group theory notesmkumaresan
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measuresJulyan Arbel
 
Note on Character Theory-summer 2013
Note on Character Theory-summer 2013Note on Character Theory-summer 2013
Note on Character Theory-summer 2013Fan Huang (Wright)
 
A STUDY ON L-FUZZY NORMAL SUBl -GROUP
A STUDY ON L-FUZZY NORMAL SUBl -GROUPA STUDY ON L-FUZZY NORMAL SUBl -GROUP
A STUDY ON L-FUZZY NORMAL SUBl -GROUPmathsjournal
 
A Unified Perspective for Darmon Points
A Unified Perspective for Darmon PointsA Unified Perspective for Darmon Points
A Unified Perspective for Darmon Pointsmmasdeu
 
cmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdfcmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdfjyjyzr69t7
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo
 
Berezin-Toeplitz Quantization On Coadjoint orbits
Berezin-Toeplitz Quantization On Coadjoint orbitsBerezin-Toeplitz Quantization On Coadjoint orbits
Berezin-Toeplitz Quantization On Coadjoint orbitsHassan Jolany
 
l1-Embeddings and Algorithmic Applications
l1-Embeddings and Algorithmic Applicationsl1-Embeddings and Algorithmic Applications
l1-Embeddings and Algorithmic ApplicationsGrigory Yaroslavtsev
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationTomonari Masada
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDValentin De Bortoli
 
Rainone - Groups St. Andrew 2013
Rainone - Groups St. Andrew 2013Rainone - Groups St. Andrew 2013
Rainone - Groups St. Andrew 2013Raffaele Rainone
 

Similar to REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models (20)

Darmon Points: an Overview
Darmon Points: an OverviewDarmon Points: an Overview
Darmon Points: an Overview
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
 
Group theory notes
Group theory notesGroup theory notes
Group theory notes
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
 
Note on Character Theory-summer 2013
Note on Character Theory-summer 2013Note on Character Theory-summer 2013
Note on Character Theory-summer 2013
 
A STUDY ON L-FUZZY NORMAL SUBl -GROUP
A STUDY ON L-FUZZY NORMAL SUBl -GROUPA STUDY ON L-FUZZY NORMAL SUBl -GROUP
A STUDY ON L-FUZZY NORMAL SUBl -GROUP
 
A Unified Perspective for Darmon Points
A Unified Perspective for Darmon PointsA Unified Perspective for Darmon Points
A Unified Perspective for Darmon Points
 
cmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdfcmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdf
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
Berezin-Toeplitz Quantization On Coadjoint orbits
Berezin-Toeplitz Quantization On Coadjoint orbitsBerezin-Toeplitz Quantization On Coadjoint orbits
Berezin-Toeplitz Quantization On Coadjoint orbits
 
l1-Embeddings and Algorithmic Applications
l1-Embeddings and Algorithmic Applicationsl1-Embeddings and Algorithmic Applications
l1-Embeddings and Algorithmic Applications
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM Allocation
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Can a Fiducial Phoenix...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Can a Fiducial Phoenix...MUMS: Bayesian, Fiducial, and Frequentist Conference - Can a Fiducial Phoenix...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Can a Fiducial Phoenix...
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGD
 
Rainone - Groups St. Andrew 2013
Rainone - Groups St. Andrew 2013Rainone - Groups St. Andrew 2013
Rainone - Groups St. Andrew 2013
 

More from Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation LearningSangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataSangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningSangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Sangwoo Mo
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video TransformersSangwoo Mo
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sangwoo Mo
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsSangwoo Mo
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear ComplexitySangwoo Mo
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsSangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningSangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 

More from Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

  • 1. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models Sangwoo Mo KAIST AI Lab. November 29, 2017 Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 1 / 16
  • 2. General Problem Let z ∼ p(z|θ). Want to maximize L(θ) = Ep(z)[f (z)1]. Example: ELBO2 L(θ, φ) = Eqφ(z|x)[pθ(x|z)] Policy Gradient L(θ) = Epθ(τ)[R(τ)] 1 assume f (z) is independent to θ 2 omit KL term Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 2 / 16
  • 3. General Problem Let z ∼ p(z|θ). Want to maximize L(θ) = Ep(z)[f (z)]. Want to optimize by gradient descent1. Need to compute d dθ L(θ) = d dθ Ep(z)[f (z)] Caveat: We cannot simply put d dθ inside since z depends on θ. 1 assume f (z) is differentiable Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 3 / 16
  • 4. Background REINFORCE: d dθ Ep(z)[f (z)] = d dθ f (z)p(z)dz = f (z) ∂ ∂θ p(z)dz = f (z) ∂ ∂θ p(z) p(z) p(z)dz = f (z) ∂ ∂θ log p(z)dz = Ep(z) f (z) ∂ ∂θ log p(z) It is unbiased, but variance is too high. Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 4 / 16
  • 5. Background Control variate: Subtract baseline c. d dθ Ep(z)[f (z)] = d dθ Ep(z,c)[f (z) − c] + Ep(z,c)[c] = Ep(z,c) (f (z) − c) ∂ ∂θ log p(z) + ∂ ∂θ Ep(z,c)[c] Qustion: How to choose proper1 c? constant value e.g. Ep(z)[f (z)] linear approximation of f arround Ep(z)[z] 1 i) c should be correlated to p(z), ii) if c |= θ, second term is eleminated Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 5 / 16
  • 6. Background Reparametrization trick: Assume z = g(θ, ). d dθ Ep(z)[f (z)] = d dθ f (z)p(z)dz = d dθ f (g(θ, ))p( )d = ∂f ∂g ∂g ∂θ p( )d = Ep( ) ∂f ∂g ∂g ∂θ It is unbiased & low variance, and successful for continuous1 z However, it is not directly applicable for discrete case 1 VAE assumes z ∼ N(µ, σ) and reparametrize it as z = µ + σ where ∼ N(0, 1) Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 6 / 16
  • 7. Background Gumbel-softmax trick: It is well-known that z ∼ Cat(θ) is equivalent to z = H(w) = arg maxi [log θi − log(− log( i ))] where H is hard argmax, w = g(θ, ), and i ∼ Uniform(0, 1). Instead of H, use softmax σλ(w) (with temperature λ). Then σλ(g(θ, )) is differentiable reparametrization of z. It is low variance, but biased. Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 7 / 16
  • 8. REBAR Motivation: Gumbel-softmax is highly correlated biased estimator Use Gumbel-softmax as control variate of REINFORCE However, we can do more than na¨ıvely applying this idea Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 8 / 16
  • 9. REBAR Observation: We can reduce variance of REINFORCE by marginalizing w over z. ∂ ∂θ Ep(w) [f (σλ(w))] = Ep(w) f (σλ(w)) ∂ ∂θ log p(w) = Ep(z) Ep(w|z) f (σλ(w)) ∂ ∂θ (log p(w|z) + log p(z)) = Ep(z) ∂ ∂θ Ep(w|z) [f (σλ(w))] + Ep(z) Ep(w|z)[f (σλ(w))] ∂ ∂θ log p(z) Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 9 / 16
  • 10. REBAR Observation: Here, the first term can be reparametrized as Ep(z) ∂ ∂θ Ep(w|z) [f (σλ(w))] = Ep(z) Ep(δ) ∂ ∂θ f (σλ(˜w)) where ˜w = ˜g(θ, z, δ)1 and δi ∼ Uniform(0, 1). 1 conditional distribution of g given z Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 10 / 16
  • 11. REBAR Putting it all together, ∂ ∂θ Ep(z)[f (z)] = E ,δ [f (H(w)) − ηf (σλ(˜w))] ∂ ∂θ log p(z) z=H(w) + η ∂ ∂θ f (σλ(w)) − η ∂ ∂θ f (σλ(˜w)) where w = g(θ, ), ˜w = ˜g(θ, H(w), δ), and i , δi ∼ Uniform(0, 1). Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 11 / 16
  • 12. Hyperparameter Optimization Let r(η, λ) be the Monte Carlo REBAR estiamtor. Since r is unbiased, E[r] does not depend on η and λ. Thus, ∂ ∂η Var(r) = ∂ ∂η E[r2 ] − E[r]2 = E 2r ∂r ∂η . Now we can optimize η (and λ) to minimize variance. Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 12 / 16
  • 13. Experiments Minimize Ep(z)[(z − 0.45)2] where z ∼ Bernoulli(θ). left: log variance / right: loss Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 13 / 16
  • 14. Experiments Maximize ELBO of Sigmoid Belief Network log p(x|θ) ≥ Eq(z|x,θ)[log p(x, z|θ) − log q(z|x, θ)] left: 2-layer linear / right: 1-layer nonlinear (log variance) Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 14 / 16
  • 15. Experiments Maximize ELBO of Sigmoid Belief Network log p(x|θ) ≥ Eq(z|x,θ)[log p(x, z|θ) − log q(z|x, θ)] left: 2-layer linear / right: 1-layer nonlinear (objective) Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 15 / 16
  • 16. Questions? Sangwoo Mo (KAIST AI Lab.) REBAR November 29, 2017 16 / 16