Sampling Methods for Statistical Inference
2020. 10. 19
Jinhwan Suk
Department of Mathematical Science, KAIST
GROOT
SEMINAR
GROOT SEMINAR
GROOT AI
Obstacles in latent modeling
• Latent variable model
• We need posterior
Sampling Methods for Statistical Inference
GROOT
SEMINAR
Latent Variable Model
GROOT
SEMINAR
𝑧 𝒟
e.g. RBM, VAE, GAN, HMM, Particle Filter, …
prior
likelihood
Data
Latent Variable Model
GROOT
SEMINAR
𝑧 𝒟
e.g. RBM, VAE, GAN, HMM, Particle Filter, …
prior
likelihood
Data
Latent Variable Model
GROOT
SEMINAR
𝑧 𝒟
e.g. RBM, VAE, GAN, HMM, Particle Filter, …
prior
likelihood
Data
Latent Variable Model
GROOT
SEMINAR
𝑧 𝒟
e.g. RBM, VAE, GAN, HMM, Particle Filter, …
prior
likelihood
Hidden markov model
Data
Latent Variable Model
GROOT
SEMINAR
𝑧 𝒟
e.g. RBM, VAE, GAN, HMM, Particle Filter, …
prior
likelihood
Hidden markov model VAE
Data
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
EM	Algorithm
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
EM	Algorithm
log𝑝( 𝒟) =
∫
log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
                         =
∫
𝑙𝑜𝑔
𝑝( 𝒟, 𝑧)
𝑝( 𝑧 𝒟)
 𝑝( 𝑧 𝒟) 𝑑𝑧
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
EM	Algorithm
log𝑝( 𝒟) =
∫
log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
                         =
∫
𝑙𝑜𝑔
𝑝( 𝒟, 𝑧)
𝑝( 𝑧 𝒟)
 𝑝( 𝑧 𝒟) 𝑑𝑧
=
∫
log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 −
∫
log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
EM	Algorithm
log𝑝( 𝒟) =
∫
log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
                         =
∫
𝑙𝑜𝑔
𝑝( 𝒟, 𝑧)
𝑝( 𝑧 𝒟)
 𝑝( 𝑧 𝒟) 𝑑𝑧
=
∫
log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 −
∫
log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
= 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] + 𝐻 ≥ 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)]
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
EM	Algorithm
log𝑝( 𝒟) =
∫
log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
                         =
∫
𝑙𝑜𝑔
𝑝( 𝒟, 𝑧)
𝑝( 𝑧 𝒟)
 𝑝( 𝑧 𝒟) 𝑑𝑧
=
∫
log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 −
∫
log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
= 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] + 𝐻 ≥ 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)]
E-Step
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
EM	Algorithm
log𝑝( 𝒟) =
∫
log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
                         =
∫
𝑙𝑜𝑔
𝑝( 𝒟, 𝑧)
𝑝( 𝑧 𝒟)
 𝑝( 𝑧 𝒟) 𝑑𝑧
=
∫
log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 −
∫
log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
= 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] + 𝐻 ≥ 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)]
E-Step
𝑝( 𝑧 𝒟) =
𝑝( 𝒟, 𝑧)
𝑝(𝒟)
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
𝒟
Data
EM	Algorithm
log𝑝( 𝒟) =
∫
log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
                         =
∫
𝑙𝑜𝑔
𝑝( 𝒟, 𝑧)
𝑝( 𝑧 𝒟)
 𝑝( 𝑧 𝒟) 𝑑𝑧
=
∫
log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 −
∫
log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
= 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] + 𝐻 ≥ 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)]
E-Step
𝑝( 𝑧 𝒟) =
𝑝( 𝒟, 𝑧)
𝑝(𝒟)
Intractable!
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
Posterior inference
𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟]
𝒟
Data
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
Posterior inference
𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟]
𝒟
Data
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
Posterior inference
𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟]
𝒟
Data
“Bayesian inference is all about posterior inference”
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
Posterior inference
𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟]
𝒟
Data
Direct computing is impossible
“Bayesian inference is all about posterior inference”
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
Posterior inference
𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟]
𝒟
Data
Direct computing is impossible
Approximation! But…how?
“Bayesian inference is all about posterior inference”
Latent Variable Model
GROOT
SEMINAR
zprior
likelihood
^𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟)
Inference
𝑝𝜃( 𝒟) =
∫
𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧
Compute Intractable!
Given
Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧)
Posterior inference
𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟]
𝒟
Data
Direct computing is impossible
Approximation! But…how?
“Bayesian inference is all about posterior inference”
Let target distribution denoted by 𝑝(𝑥)
Two ways of approximation
GROOT
SEMINAR
Two ways of approximation
GROOT
SEMINAR
1.
Two ways of approximation
GROOT
SEMINAR
1.
2.
Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2.
Variational Inference
Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2.
Variational Inference
Optimization problem
Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2. 표본을 추출하자
Variational Inference
Optimization problem
Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2. 표본을 추출하자
Variational Inference
Monte Carlo Method
Optimization problem
Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2. 표본을 추출하자
Variational Inference
Monte Carlo Method
Optimization problem
Sampling method
Monte Carlo Method
GROOT
SEMINAR
Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But…
Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
Rejection Sampling
Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
: proposal distribution𝑞(𝑥)
Rejection Sampling
Easy to compute, easy to sample
Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
: proposal distribution𝑞(𝑥)
Rejection Sampling
Easy to compute, easy to sample
𝑀𝑞(𝑥) ≥ ~𝑝(𝑥)
If , we reject the sample, otherwise we accept it𝑢 >
~𝑝(𝑥)
𝑀𝑞(𝑥)
Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
: proposal distribution𝑞(𝑥)
Rejection Sampling
Easy to compute, easy to sample
𝑀𝑞(𝑥) ≥ ~𝑝(𝑥)
If , we reject the sample, otherwise we accept it𝑢 >
~𝑝(𝑥)
𝑀𝑞(𝑥)
Monte Carlo Method
GROOT
SEMINAR
Monte Carlo Method
GROOT
SEMINAR
𝜇 = 𝔼𝑝(𝑥)[ 𝑓(𝑥)] =
∫
𝑓(𝑥)
~𝑝(𝑥)
𝑍
𝑑𝑥 =
1
𝑍 ∫
𝑓(𝑥)
~𝑝(𝑥)
𝑞(𝑥)
𝑞(𝑥)𝑑𝑥
Importance Samping
Monte Carlo Method
GROOT
SEMINAR
𝜇 = 𝔼𝑝(𝑥)[ 𝑓(𝑥)] =
∫
𝑓(𝑥)
~𝑝(𝑥)
𝑍
𝑑𝑥 =
1
𝑍 ∫
𝑓(𝑥)
~𝑝(𝑥)
𝑞(𝑥)
𝑞(𝑥)𝑑𝑥
Importance Samping
𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈  
1
𝑛𝑍 ∑
𝑓( 𝑥𝑖)
~𝑝( 𝑥𝑖)
𝑞( 𝑥𝑖)
=
1
𝑛𝑍 ∑
𝑤( 𝑥𝑖) 𝑓(𝑥𝑖)
𝑍 =
∫
~𝑝(𝑥)𝑑𝑥 =
∫
~𝑝(𝑥)
𝑞(𝑥)
𝑞(𝑥)𝑑𝑥 ≈
1
𝑛 ∑
𝑤(𝑥𝑖)
𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈
∑
~𝑤( 𝑥𝑖) 𝑓(𝑥𝑖)
Monte Carlo Method
GROOT
SEMINAR
𝜇 = 𝔼𝑝(𝑥)[ 𝑓(𝑥)] =
∫
𝑓(𝑥)
~𝑝(𝑥)
𝑍
𝑑𝑥 =
1
𝑍 ∫
𝑓(𝑥)
~𝑝(𝑥)
𝑞(𝑥)
𝑞(𝑥)𝑑𝑥
Importance Samping
𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈  
1
𝑛𝑍 ∑
𝑓( 𝑥𝑖)
~𝑝( 𝑥𝑖)
𝑞( 𝑥𝑖)
=
1
𝑛𝑍 ∑
𝑤( 𝑥𝑖) 𝑓(𝑥𝑖)
𝑍 =
∫
~𝑝(𝑥)𝑑𝑥 =
∫
~𝑝(𝑥)
𝑞(𝑥)
𝑞(𝑥)𝑑𝑥 ≈
1
𝑛 ∑
𝑤(𝑥𝑖)
𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈
∑
~𝑤( 𝑥𝑖) 𝑓(𝑥𝑖)
i.i.d sampling is very vulnerable in high-dimensional spaces
Markov Chain Monte Carlo
• Gibbs Sampling
• Metropolis-Hestings Algorithm
Sampling Methods for Statistical Inference
GROOT
SEMINAR
Basic idea of MCMC
GROOT
SEMINAR
Markov Chain :
𝑝( 𝑥(𝑡+1)
 𝑥(1),  𝑥(2)
, …,  𝑥(𝑡)
) = 𝑝( 𝑥(𝑡+1)
𝑥(𝑡)
)
is a sequence of random variables. It forms a Markov chain if𝑥(1)
,  𝑥(2)
, …
A Markov chain can be specified by
1. Initial distribution
2. Transition probability
𝑝1(𝑥) = 𝑝( 𝑥(1)
)
𝑇 𝑘(𝑥′,  𝑥) = 𝑝( 𝑥(𝑡+1)
= 𝑥′ 𝑥(𝑡)
= 𝑥)
Ergodicity
, regardless of the initial distributionlim
𝑛→∞
𝑇 𝑛
𝑝1 = 𝜋 𝑝1
1. Build a Markov chain having
as an invariant distribution
2. Sample from the chain
3. Compute
𝑝(𝑥)
( 𝑥(𝑡)
)𝑡≥1
𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈ 𝔼 𝑇 𝑛 𝑝1(𝑥)[ 𝑓(𝑥)]
≈
1
𝑛 ∑
𝑓( 𝑥(𝑡)
)
MCMC Algorithms
GROOT
SEMINAR
𝑥 𝑠+1
𝑖 ∼ 𝑝(𝑥𝑖 | 𝒙−𝒊)Gibbs Sampling
Metropolis-Hastings algorithm Hamiltonian MCMC
Sequential MCMC
Stochastic gradient MCMC
Particle MCMC
Thank you
Sampling Methods for Statistical Inference
2020. 10. 19
Jinhwan Suk
Department of Mathematical Science, KAIST
GROOT
SEMINAR

Sampling method : MCMC

  • 1.
    Sampling Methods forStatistical Inference 2020. 10. 19 Jinhwan Suk Department of Mathematical Science, KAIST GROOT SEMINAR GROOT SEMINAR GROOT AI
  • 2.
    Obstacles in latentmodeling • Latent variable model • We need posterior Sampling Methods for Statistical Inference GROOT SEMINAR
  • 3.
    Latent Variable Model GROOT SEMINAR 𝑧𝒟 e.g. RBM, VAE, GAN, HMM, Particle Filter, … prior likelihood Data
  • 4.
    Latent Variable Model GROOT SEMINAR 𝑧𝒟 e.g. RBM, VAE, GAN, HMM, Particle Filter, … prior likelihood Data
  • 5.
    Latent Variable Model GROOT SEMINAR 𝑧𝒟 e.g. RBM, VAE, GAN, HMM, Particle Filter, … prior likelihood Data
  • 6.
    Latent Variable Model GROOT SEMINAR 𝑧𝒟 e.g. RBM, VAE, GAN, HMM, Particle Filter, … prior likelihood Hidden markov model Data
  • 7.
    Latent Variable Model GROOT SEMINAR 𝑧𝒟 e.g. RBM, VAE, GAN, HMM, Particle Filter, … prior likelihood Hidden markov model VAE Data
  • 8.
    Latent Variable Model GROOT SEMINAR zprior likelihoodGiven Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data
  • 9.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data
  • 10.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data
  • 11.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data
  • 12.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data EM Algorithm
  • 13.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data EM Algorithm log𝑝( 𝒟) = ∫ log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧                          = ∫ 𝑙𝑜𝑔 𝑝( 𝒟, 𝑧) 𝑝( 𝑧 𝒟)  𝑝( 𝑧 𝒟) 𝑑𝑧
  • 14.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data EM Algorithm log𝑝( 𝒟) = ∫ log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧                          = ∫ 𝑙𝑜𝑔 𝑝( 𝒟, 𝑧) 𝑝( 𝑧 𝒟)  𝑝( 𝑧 𝒟) 𝑑𝑧 = ∫ log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 − ∫ log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧
  • 15.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data EM Algorithm log𝑝( 𝒟) = ∫ log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧                          = ∫ 𝑙𝑜𝑔 𝑝( 𝒟, 𝑧) 𝑝( 𝑧 𝒟)  𝑝( 𝑧 𝒟) 𝑑𝑧 = ∫ log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 − ∫ log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧 = 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] + 𝐻 ≥ 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)]
  • 16.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data EM Algorithm log𝑝( 𝒟) = ∫ log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧                          = ∫ 𝑙𝑜𝑔 𝑝( 𝒟, 𝑧) 𝑝( 𝑧 𝒟)  𝑝( 𝑧 𝒟) 𝑑𝑧 = ∫ log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 − ∫ log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧 = 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] + 𝐻 ≥ 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] E-Step
  • 17.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data EM Algorithm log𝑝( 𝒟) = ∫ log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧                          = ∫ 𝑙𝑜𝑔 𝑝( 𝒟, 𝑧) 𝑝( 𝑧 𝒟)  𝑝( 𝑧 𝒟) 𝑑𝑧 = ∫ log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 − ∫ log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧 = 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] + 𝐻 ≥ 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] E-Step 𝑝( 𝑧 𝒟) = 𝑝( 𝒟, 𝑧) 𝑝(𝒟)
  • 18.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) 𝒟 Data EM Algorithm log𝑝( 𝒟) = ∫ log 𝑝( 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧                          = ∫ 𝑙𝑜𝑔 𝑝( 𝒟, 𝑧) 𝑝( 𝑧 𝒟)  𝑝( 𝑧 𝒟) 𝑑𝑧 = ∫ log𝑝( 𝒟, 𝑧) 𝑝(𝑧| 𝒟) 𝑑𝑧 − ∫ log𝑝( 𝑧 𝒟) 𝑝( 𝑧 𝒟) 𝑑𝑧 = 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] + 𝐻 ≥ 𝔼𝑝( 𝑧 𝒟)[log𝑝( 𝒟, 𝑧)] E-Step 𝑝( 𝑧 𝒟) = 𝑝( 𝒟, 𝑧) 𝑝(𝒟) Intractable!
  • 19.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) Posterior inference 𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟] 𝒟 Data
  • 20.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) Posterior inference 𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟] 𝒟 Data
  • 21.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) Posterior inference 𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟] 𝒟 Data “Bayesian inference is all about posterior inference”
  • 22.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) Posterior inference 𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟] 𝒟 Data Direct computing is impossible “Bayesian inference is all about posterior inference”
  • 23.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) Posterior inference 𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟] 𝒟 Data Direct computing is impossible Approximation! But…how? “Bayesian inference is all about posterior inference”
  • 24.
    Latent Variable Model GROOT SEMINAR zprior likelihood ^𝜃= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝𝜃(𝒟) Inference 𝑝𝜃( 𝒟) = ∫ 𝑝𝜃(𝑧) 𝑝 𝜃( 𝒟 𝑧) 𝑑𝑧 Compute Intractable! Given Prior : 𝑝𝜃(𝑧) Posterior : 𝑝𝜃(𝒟| 𝑧) Posterior inference 𝔼[ 𝑧 𝒟] 𝔼[ 𝑓(𝑧) 𝒟] 𝒟 Data Direct computing is impossible Approximation! But…how? “Bayesian inference is all about posterior inference” Let target distribution denoted by 𝑝(𝑥)
  • 25.
    Two ways ofapproximation GROOT SEMINAR
  • 26.
    Two ways ofapproximation GROOT SEMINAR 1.
  • 27.
    Two ways ofapproximation GROOT SEMINAR 1. 2.
  • 28.
    Two ways ofapproximation GROOT SEMINAR 두 분포 사이의 거리를 줄이자 1. 2. Variational Inference
  • 29.
    Two ways ofapproximation GROOT SEMINAR 두 분포 사이의 거리를 줄이자 1. 2. Variational Inference Optimization problem
  • 30.
    Two ways ofapproximation GROOT SEMINAR 두 분포 사이의 거리를 줄이자 1. 2. 표본을 추출하자 Variational Inference Optimization problem
  • 31.
    Two ways ofapproximation GROOT SEMINAR 두 분포 사이의 거리를 줄이자 1. 2. 표본을 추출하자 Variational Inference Monte Carlo Method Optimization problem
  • 32.
    Two ways ofapproximation GROOT SEMINAR 두 분포 사이의 거리를 줄이자 1. 2. 표본을 추출하자 Variational Inference Monte Carlo Method Optimization problem Sampling method
  • 33.
  • 34.
    Monte Carlo Method GROOT SEMINAR TheLaw of Large Numbers 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 𝑛 → 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
  • 35.
    Monte Carlo Method GROOT SEMINAR TheLaw of Large Numbers 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 𝑛 → 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑 But…
  • 36.
    Monte Carlo Method GROOT SEMINAR TheLaw of Large Numbers 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 𝑛 → 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑 But… Yes, How do we sample from ?𝑝(𝑥)
  • 37.
    Monte Carlo Method GROOT SEMINAR TheLaw of Large Numbers 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 𝑛 → 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑 But… Yes, How do we sample from ?𝑝(𝑥) Rejection Sampling
  • 38.
    Monte Carlo Method GROOT SEMINAR TheLaw of Large Numbers 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 𝑛 → 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑 But… Yes, How do we sample from ?𝑝(𝑥) : proposal distribution𝑞(𝑥) Rejection Sampling Easy to compute, easy to sample
  • 39.
    Monte Carlo Method GROOT SEMINAR TheLaw of Large Numbers 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 𝑛 → 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑 But… Yes, How do we sample from ?𝑝(𝑥) : proposal distribution𝑞(𝑥) Rejection Sampling Easy to compute, easy to sample 𝑀𝑞(𝑥) ≥ ~𝑝(𝑥) If , we reject the sample, otherwise we accept it𝑢 > ~𝑝(𝑥) 𝑀𝑞(𝑥)
  • 40.
    Monte Carlo Method GROOT SEMINAR TheLaw of Large Numbers 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 𝑛 → 𝔼[𝑋𝑖]𝑋1,  𝑋2, …,  𝑋𝑛 : 𝑖 . 𝑖 . 𝑑 But… Yes, How do we sample from ?𝑝(𝑥) : proposal distribution𝑞(𝑥) Rejection Sampling Easy to compute, easy to sample 𝑀𝑞(𝑥) ≥ ~𝑝(𝑥) If , we reject the sample, otherwise we accept it𝑢 > ~𝑝(𝑥) 𝑀𝑞(𝑥)
  • 41.
  • 42.
    Monte Carlo Method GROOT SEMINAR 𝜇= 𝔼𝑝(𝑥)[ 𝑓(𝑥)] = ∫ 𝑓(𝑥) ~𝑝(𝑥) 𝑍 𝑑𝑥 = 1 𝑍 ∫ 𝑓(𝑥) ~𝑝(𝑥) 𝑞(𝑥) 𝑞(𝑥)𝑑𝑥 Importance Samping
  • 43.
    Monte Carlo Method GROOT SEMINAR 𝜇= 𝔼𝑝(𝑥)[ 𝑓(𝑥)] = ∫ 𝑓(𝑥) ~𝑝(𝑥) 𝑍 𝑑𝑥 = 1 𝑍 ∫ 𝑓(𝑥) ~𝑝(𝑥) 𝑞(𝑥) 𝑞(𝑥)𝑑𝑥 Importance Samping 𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈   1 𝑛𝑍 ∑ 𝑓( 𝑥𝑖) ~𝑝( 𝑥𝑖) 𝑞( 𝑥𝑖) = 1 𝑛𝑍 ∑ 𝑤( 𝑥𝑖) 𝑓(𝑥𝑖) 𝑍 = ∫ ~𝑝(𝑥)𝑑𝑥 = ∫ ~𝑝(𝑥) 𝑞(𝑥) 𝑞(𝑥)𝑑𝑥 ≈ 1 𝑛 ∑ 𝑤(𝑥𝑖) 𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈ ∑ ~𝑤( 𝑥𝑖) 𝑓(𝑥𝑖)
  • 44.
    Monte Carlo Method GROOT SEMINAR 𝜇= 𝔼𝑝(𝑥)[ 𝑓(𝑥)] = ∫ 𝑓(𝑥) ~𝑝(𝑥) 𝑍 𝑑𝑥 = 1 𝑍 ∫ 𝑓(𝑥) ~𝑝(𝑥) 𝑞(𝑥) 𝑞(𝑥)𝑑𝑥 Importance Samping 𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈   1 𝑛𝑍 ∑ 𝑓( 𝑥𝑖) ~𝑝( 𝑥𝑖) 𝑞( 𝑥𝑖) = 1 𝑛𝑍 ∑ 𝑤( 𝑥𝑖) 𝑓(𝑥𝑖) 𝑍 = ∫ ~𝑝(𝑥)𝑑𝑥 = ∫ ~𝑝(𝑥) 𝑞(𝑥) 𝑞(𝑥)𝑑𝑥 ≈ 1 𝑛 ∑ 𝑤(𝑥𝑖) 𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈ ∑ ~𝑤( 𝑥𝑖) 𝑓(𝑥𝑖) i.i.d sampling is very vulnerable in high-dimensional spaces
  • 45.
    Markov Chain MonteCarlo • Gibbs Sampling • Metropolis-Hestings Algorithm Sampling Methods for Statistical Inference GROOT SEMINAR
  • 46.
    Basic idea ofMCMC GROOT SEMINAR Markov Chain : 𝑝( 𝑥(𝑡+1)  𝑥(1),  𝑥(2) , …,  𝑥(𝑡) ) = 𝑝( 𝑥(𝑡+1) 𝑥(𝑡) ) is a sequence of random variables. It forms a Markov chain if𝑥(1) ,  𝑥(2) , … A Markov chain can be specified by 1. Initial distribution 2. Transition probability 𝑝1(𝑥) = 𝑝( 𝑥(1) ) 𝑇 𝑘(𝑥′,  𝑥) = 𝑝( 𝑥(𝑡+1) = 𝑥′ 𝑥(𝑡) = 𝑥) Ergodicity , regardless of the initial distributionlim 𝑛→∞ 𝑇 𝑛 𝑝1 = 𝜋 𝑝1 1. Build a Markov chain having as an invariant distribution 2. Sample from the chain 3. Compute 𝑝(𝑥) ( 𝑥(𝑡) )𝑡≥1 𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈ 𝔼 𝑇 𝑛 𝑝1(𝑥)[ 𝑓(𝑥)] ≈ 1 𝑛 ∑ 𝑓( 𝑥(𝑡) )
  • 47.
    MCMC Algorithms GROOT SEMINAR 𝑥 𝑠+1 𝑖∼ 𝑝(𝑥𝑖 | 𝒙−𝒊)Gibbs Sampling Metropolis-Hastings algorithm Hamiltonian MCMC Sequential MCMC Stochastic gradient MCMC Particle MCMC
  • 48.
    Thank you Sampling Methodsfor Statistical Inference 2020. 10. 19 Jinhwan Suk Department of Mathematical Science, KAIST GROOT SEMINAR