Main obstacles of Bayesian statistics or Bayesian machine learning is computing posterior distribution. In many contexts, computing posterior distribution is intractable. Today, there are two main stream to detour directly computing posterior distribution. One is using sampling method(ex. MCMC) and another is Variational inference. Compared to Variational inference, MCMC takes more time and vulnerable to high-dimensional parameters. However, MCMC has strength in simplicity and guarantees of convergence. I'll briefly introduce several methods people using in application.
28. Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2.
Variational Inference
29. Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2.
Variational Inference
Optimization problem
30. Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2. 표본을 추출하자
Variational Inference
Optimization problem
31. Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2. 표본을 추출하자
Variational Inference
Monte Carlo Method
Optimization problem
32. Two ways of approximation
GROOT
SEMINAR
두 분포 사이의 거리를 줄이자
1.
2. 표본을 추출하자
Variational Inference
Monte Carlo Method
Optimization problem
Sampling method
36. Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1, 𝑋2, …, 𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
37. Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1, 𝑋2, …, 𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
Rejection Sampling
38. Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1, 𝑋2, …, 𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
: proposal distribution𝑞(𝑥)
Rejection Sampling
Easy to compute, easy to sample
39. Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1, 𝑋2, …, 𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
: proposal distribution𝑞(𝑥)
Rejection Sampling
Easy to compute, easy to sample
𝑀𝑞(𝑥) ≥ ~𝑝(𝑥)
If , we reject the sample, otherwise we accept it𝑢 >
~𝑝(𝑥)
𝑀𝑞(𝑥)
40. Monte Carlo Method
GROOT
SEMINAR
The Law of Large Numbers
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑛
→ 𝔼[𝑋𝑖]𝑋1, 𝑋2, …, 𝑋𝑛 : 𝑖 . 𝑖 . 𝑑
But… Yes, How do we sample from ?𝑝(𝑥)
: proposal distribution𝑞(𝑥)
Rejection Sampling
Easy to compute, easy to sample
𝑀𝑞(𝑥) ≥ ~𝑝(𝑥)
If , we reject the sample, otherwise we accept it𝑢 >
~𝑝(𝑥)
𝑀𝑞(𝑥)
45. Markov Chain Monte Carlo
• Gibbs Sampling
• Metropolis-Hestings Algorithm
Sampling Methods for Statistical Inference
GROOT
SEMINAR
46. Basic idea of MCMC
GROOT
SEMINAR
Markov Chain :
𝑝( 𝑥(𝑡+1)
𝑥(1), 𝑥(2)
, …, 𝑥(𝑡)
) = 𝑝( 𝑥(𝑡+1)
𝑥(𝑡)
)
is a sequence of random variables. It forms a Markov chain if𝑥(1)
, 𝑥(2)
, …
A Markov chain can be specified by
1. Initial distribution
2. Transition probability
𝑝1(𝑥) = 𝑝( 𝑥(1)
)
𝑇 𝑘(𝑥′, 𝑥) = 𝑝( 𝑥(𝑡+1)
= 𝑥′ 𝑥(𝑡)
= 𝑥)
Ergodicity
, regardless of the initial distributionlim
𝑛→∞
𝑇 𝑛
𝑝1 = 𝜋 𝑝1
1. Build a Markov chain having
as an invariant distribution
2. Sample from the chain
3. Compute
𝑝(𝑥)
( 𝑥(𝑡)
)𝑡≥1
𝔼𝑝(𝑥)[ 𝑓(𝑥)] ≈ 𝔼 𝑇 𝑛 𝑝1(𝑥)[ 𝑓(𝑥)]
≈
1
𝑛 ∑
𝑓( 𝑥(𝑡)
)