PRML Reading Chapter 11 - Sampling Method

VC.M. Bishop’s PRML
Tran Quoc Hoan
@k09hthaduonght.wordpress.com/
10 January 2016, PRML Reading, Hasegawa lab., Tokyo
The University of Tokyo
Chapter 11: Sampling Methods

Introduction
Introduction 2
Generating a random number is not easy!
True Random  
Number
Pseudo-Random  
Number
Gather “entropy”, or seemingly  
random data from the physical world
Seed
Random number
PRG
http://www.howtogeek.com/183051/htg-explains-how-computers-generate-random-numbers/
Ex. Mersenne Twister
Generating a number follow a probability
distribution is more diﬃcult!

For today’s meeting
Agendas 3
• Cover PRML chapter 11
• Understand the general concept of sampling from
desired distribution
• Introduction to MCMC world
• More about MCMC
• Details in PaperAlert

Outline
Sampling Methods 4
Basic Sampling  
Algorithms
Markov Chain  
Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the Partition  
Function
Part I:
General concept  
of basic sampling
Part II:
MCMC world

Progress…
5
Basic Sampling 
Algorithms
Markov Chain 
Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the 
Partition Function
Sampling Methods

Standard distributions
11.1 Basic Sampling Algorithms 6
• Goal: Sampling from desired distribution p(y) .
• Assumption: can generate random in U[0,1]
z y = h 1
(z)
h(y) =
Z y
1
p(x)dx
Generate
random
Transform
Uniformly
distributed
h is cumulative
distribution of p
0  y < 1
h(y) = 1 exp( y)
p(y) = exp( y)
y = 1
ln(1 z)
Ex. Consider exponential
distribution
where
then
and
p(y) = p(z)
dz
dy If h-1 is easy to know

Transformation method
711.1 Basic Sampling Algorithms

Rejection Sampling
• Assumption 1: Sampling from p(z) is diﬃcult but we are
able to evaluate p(z) for any given value of z, up to some
normalizing constant Z
• Assumption 2: We know how to sample from a proposal
distribution q(z) and there exist a constant k such that
p(z) =
1
Zp
˜p(z)
kq(z) ˜p(z)
• Then we know algorithm to obtain independent samples
from p(.)
(11.13)

Rejection Sampling
Generate z0
from
proposal
q(.)
Consider
constant k
such that
kq(z) cover
p~(z)
Generate
u0 from
U[0, kq(z0)]
Reject z0 if
Keep z0 if
u0 > ˜p(z0)
u0  ˜p(z0)
• Eﬃciency of the method
depend on the ratio between
the grey area and the white
area
• Proof p(accept) =
Z
˜p(z)
kq(z)
q(z)dz
=
1
k
Z
˜p(z)dz

Rejection Sampling Example
• Sampling from Gamma
distribution (green curve)
Gam(z|a, b) =
ba
za 1
exp( bz)
(a)
at z = (a-1)/b
• Proposal distribution -> Cauchy
distribution (red curve)
q(z) =
c0
1 +
(z z0)2
d2
0
achieved by transforming z = d0 tan(⇡u) + z0
where u draw uniformly from [0, 1]
• We need to ﬁnd z0, c0, d0 such that q(z) is greater (or equal) everywhere
to Gam(z|a,b) with smallest d0c0 (deﬁnes area)
z0 =
a 1
b
, d2
0 = 2a 1, c0 =
1
⇡d0

Adaptive Rejection Sampling
• The proposal distribution q(.) may be difficult to construct.
Fig 11.6: If a sample point is rejected, it is added to the set of the
grid points and used to refine the envelope distribution.
Construct
q(z) from
initial grid
points
Generate z4
from q(z)
Generate
u0 from
U[0, q(z4)]
Reject z4 if
Keep z4 if
• Rejection sampling methods
are inefficient if sampling in high
dimension (exponential decrease of
acceptance rate with dimensionality)
u0  ˜p(z4)
u0 > ˜p(z4)
but it is used to refine
the envelope
z4

Importance Sampling
IntegralBasic idea:
Transform the integral
into an expectation over a
simple, known
distribution
p(z) f(z)
z
q(z)
Conditions:
q(z) > 0 when f(z)p(z) ≠ 0
Easy to sample from q(z)
E[f] =
Z
f(z)p(z)dz
E[f] =
Z
f(z)p(z)
q(z)
q(z)
dz
E[f] =
Z
f(z)
p(z)
q(z)
q(z)dz
E[f] =
1
S
X
s
w(s)
f(z(s)
)
Proposal
Importance
weight
Monte Carlo
correct the bias
introduced by
sampling from a
wrong distribution
• All the generated samples are retained
Normalized
w(s)
/
p(z(s)
)
q(z(s))
z(s)
⇠ q(z)

SIR(sampling-importance-resampling)
• Rejection sampling: choosing q(z) and constant k is not suitable way
• SIR: based on the use of a proposal distribution q(z) but avoids
having to determine the constant k
1. Draw L samples z(1)
, z(2)
, ...z(L)
from q(z)
2. Calculate the importance weight
p(z(l)
)
q(z(l))
8l = 1...L
3. Normalize the weights to obtain w1...wL
4. Draw a second set of L samples from the discrete distribution
(z(1)
, z(2)
, ...z(L)
) with probabilities (w1...wL)
• The resulting L samples are distributed according to p(z) if L -> ∞

SIR(sampling-importance-resampling)
1. Draw L samples z(1)
, z(2)
, ...z(L)
from q(z)
2. Calculate the importance weight
p(z(l)
)
q(z(l))
8l = 1...L
3. Normalize the weights to obtain w1...wL
4. Draw a second set of L samples from the discrete distribution
(z(1)
, z(2)
, ...z(L)
) with probabilities (w1...wL)
• Proof
=
P
l I(z(l)
 a)˜p(z(l)
)/q(z(l)
)
P
l ˜p(z(l))/q(z(l))
p(z  a) =
X
l:z(l)a
wl p(z  a) =
R
I(z  a){˜p(z)/q(z)}q(z)dz
R
{˜p(z)/q(z)}q(z)dz
=
R
I(z  a)˜p(z)dz
R
˜p(z)dz
=
Z
I(z  a)p(z)dz
I(F) = 1 if F is TRUE else 0
If L ! 1 then

Sampling and the EM algorithm
• Use some Monte Carlo method to approximate the expectation of the E-step
Monte Carlo EM algorithm
• The expected complete-data log likelihood, given by  
(Z: hidden; X: observed; : parameters)✓
(11.28)Q(✓, ✓old
) =
Z
p(Z|X, ✓old
) ln p(Z, X|✓)dZ
may be approximated by (where Z(l)
are drawn from p(Z, X|✓old
) )
Q(✓, ✓old
) ⇡
1
L
LX
l=1
ln p(Z(l)
, X|✓) (11.29)
Stochastic EM algorithm
• Considering a ﬁnite mixture model, only one sample Z may be drawn at each
E-step (makes a hard assignment of each data point to one of the components)

IP Algorithm
• For a full Bayesian treatment in which we wish to draw samples
from the joint posterior p(✓, Z|X)
IP algorithm
• I-step. We wish to sample from p(Z|X) but we cannot do this directly. Notice
that
p(Z|X) =
Z
p(Z|✓, X)p(✓|X)d✓ (11.30)
p(✓|X), and then use this to draw a sample Z(l)
from p(Z|✓(l)
, X)
for l = 1...L we ﬁrst draw a sample ✓(l)
from the current estimate for
• P-step. Given the relation
p(✓|X) =
Z
p(✓|Z, X)p(Z|X)dZ (11.31)
we use the samples {Z(l)} obtained from I-step to compute
p(✓|X) ⇡
1
L
LX
l=1
ln p(✓|Z(l)
, X) (11.32)

In Reviews…
• Inverse function method
- Analytical reliable but unable to deal with complicated distribution
• Rejection sampling
- Able to deal with complicated distribution but diﬃcult to choose proposal
distribution and constant k
- Sometimes, it wastes samples due to rejection process
• Adaptive rejection sampling
- Use envelope function to reduce rejected samples.
- Diﬃcult to deal with high dimension, sharp peak distribution
• Importance sampling
- Approximate expectation with weights in proposal distribution, not sample
from desired distribution
• SIR
- Combine rejection sampling and importance sampling
• Monte Carlo EM
• IP algorithm for data expand

Progress…
18
Basic Sampling  
Algorithms
Markov Chain  
Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Function
Part I:
General concept  
of basic sampling
Part II:
Welcome to MCMC world
Sampling Methods

Markov Chain Monte Carlo (MCMC)
1911.2 Markov Chain Monte Carlo
• MCMC: general strategy which allows sampling from a large class
of distribution (based on the mechanism of Markov chains)
• MCMC scales well with the dimensionality of the sample space
Posterior distributionMLE Likelihood function MCMC
Estimate valueWrong estimate
Estimate top of mountain  
(depend on initial value)
Estimate posterior distribution  
(approach to global optimal,
not depend on initial value)
Slice sampling
Gibbs
sampling
Metropolis
method
Metropolis-Hastings Method
Markov Chain Monte Carlo
Inverse function
Rejection
sampling
Adaptive rejection
sampling
Importance
sampling
SIR
Data expand
sampling

MCMC: the idea
• Goal: to generate a set of samples from p(z)
• Idea: to generate samples from a Markov Chain whose invariant
distribution is p(z)
1. Knowing the current sample is z(τ), generate a candidate sample z*
from a proposal distribution q(z|z(τ))
2. Accept the sample according to an appropriate criterion.
3. If the candidate sample is accepted then z(τ+1) = z* otherwise z(τ+1) = z(τ)
• The proposal distribution depends on the current state
• Samples z(1),z(2),… form a Markov chain and the distribution of z(τ)
tends to p(z) as τ -> ∞
• Assumption: We know how to evaluate (but not Zp)˜p(z) = Zpp(z)

Metropolis Algorithm
• The proposal distribution is symmetric
• The candidate sample is accepted with probability
q(zA|zB) = q(zB|zA)
A(z⇤
, z(⌧)
) = min
✓
1,
˜p(z⇤
)
˜p(z(⌧))
◆
(11.33)
Fig 11.9: The proposal distribution is an
isotopic Gaussian distribution whose std =
0.2. Accepted steps in green, rejected
steps in red, std contour is ellipse. 150
candidate samples, 43 rejected.

Markov Chains
• Q: under what circumstances will a Markov chain converge to the desired
distribution ?
• First order Markov chain: series of random variables z(1), …,z(M)
such that
p(z(m+1)
|z(1)
, ..., z(m)
) = p(z(m+1)
|z(m)
) 8m (11.37)
• Markov chain speciﬁed by p(z(0)) and the transition probabilities
Tm(z(m)
, z(m+1)
) = p(z(m+1)
|z(m)
)
• A distribution p*(z) is said to be invariant for a Markov chain if
p⇤
(z) =
X
z0
T(z0
, z)p⇤
(z0
)
with a suﬃcient condition is to choose the transitions to satisfy the
property of detailed balance
p⇤
(z)T(z, z0
) = T(z0
, z)p⇤
(z0
) (11.40)

Markov Chains

Ergodicity
Unique invariant distribution
if ‘forget’ starting point, z(0)
Image source: Murray, MLSS 2009 slides

Markov Chains
(11.40)
• Goal: to generate a set of samples from p(z)
• Idea: to generate samples from a Markov Chain whose invariant
distribution is p(z)
• How: choose the transition probability T( z, z’ ) satisfy the property
of detailed balance for p(z)
p(z)T(z, z0
) = T(z0
, z)p(z0
)
• T( z, z’ ) can be constructed from a set of “base” transitions B1, B2, …,Bk
T(z0
, z) =
KX
k=1
↵kBk(z0
, z)
T(z0
, z) =
X
z1
...
X
zK 1
B1(z0
, z1)...BK 1(zK 2, zK 1)BK(zK 1, z)
or
(11.42)
(11.43)

The Metropolis-Hasting Algorithm
• Generalization of the Metropolis algorithm (the proposal
distribution q is no longer symmetric).
• Knowing the current sample is z(τ), generate a candidate
sample z* from a proposal distribution q(z|z(τ))
• Accept it with probability
Ak(z⇤
, z(⌧)
) = min
✓
1,
˜p(z⇤
)qk(z(⌧)
|z⇤
)
˜p(z(⌧))qk(z⇤|z(⌧))
◆
(11.44)
where k labels the members of the set of possible
transitions being considered.

• Prove that p(z) is the invariant distribution of the chain
• Notice that the transition probability of this chain is deﬁned as
• We need to prove
p(z)Tk(z, z0
) = Tk(z0
, z)p(z0
)
Ak(z⇤
, z(⌧)
) = min
✓
1,
˜p(z⇤
)qk(z(⌧)
|z⇤
)
˜p(z(⌧))qk(z⇤|z(⌧))
◆
p(z) = ˜p(z)/Zp
Proof
Tk(z, z0
) = qk(z0
|z)Ak(z0
, z)
p(z)qk(z0
|z)Ak(z0
, z) = min(p(z)qk(z0
|z), p(z0
)qk(z|z0
))
Use
= min(p(z0
)qk(z|z0
), p(z)qk(z0
|z))
= p(z)qk(z|z0
)Ak(z, z0
)
(Q.E.D)

• Common choice for q: Gaussian centered on the current state
✓ small variance -> high rate of acceptation but slow
exploration of the state space + non independent samples
✓ large variance -> high rate of rejection
Fig 11.10: Use of an isotropic Gaussian
proposal (blue circle) to sample from a
Gaussian distribution (red). The scale ρ
of the proposal should be on the order of
σmin , but the algorithm may have low
convergence (to explore the state space
in the other direction -> (σmax/σmin)2
iterations required)

Summary so far…
• We need approximate methods to solve sum/integrals
• Monte Carlo does not explicitly depend on dimension,
although simple methods work only in low dimensions
• Markov Chain Monte Carlo (MCMC) can make local
moves. By assuming less, it’s more applicable to higher
dimensions
• Simple computations => “easy” to implement
(harder to diagnose)

Progress…
30
Basic Sampling 
Algorithms
Markov Chain 
Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the 
Partition Function
Sampling Methods

Gibbs Sampling
3111.3 Gibbs Sampling
• Sample each variable in turn, conditioned on the values of all
other variables in the distribution (method with no rejection)
✓ Initialize {z1, z2, …, zM}
✓ For τ = 1,2,…,T pick each variable in sequently turn or randomly and
resample
z⌧+1
i / p(zi|z⌧
i) for i = 1...M
Proof of validity
• Consider a Metropolis-Hastings sampling step involving the variable zk in
which the remaining variables zk remain ﬁxed and the transition probability
qk(z⇤
|z) = p(z⇤
k|zk)
then, acceptance probability is
Ak(z⇤
, z) =
p(z⇤
)qk(z|z⇤
)
p(z)qk(z⇤|z)
=
p(z⇤
k|z⇤
k)p(z⇤
k)p(zk|z⇤
k)
p(zk|zk)p(zk)p(z⇤
k|zk)
= 1
where z⇤
k = zk

Gibbs Sampling
3211.3 Gibbs Sampling
Fig 11.11: Illustration of Gibbs sampling, by
alternate updates of two variables (blue steps)
whose distribution is a correlated Gaussian (red).
The step size is governed by the standard deviation
of the conditional distribution (green curve), and
is O(l), leading to slow progress. The number of
steps needed to obtain an independent sample from
the distribution is O((L/l)2)

Progress…
33
Basic Sampling 
Algorithms
Markov Chain 
Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the 
Partition Function
Sampling Methods

Auxiliary variables
3411.4 Slice Sampling
• Collapsing: analytically integrate variables out
• Auxiliary methods
Introduce extra variables integrate by MCMC
Explore where⇡(✓, h)
Z
⇡(✓, h)dh = ⇡(✓)

Slice Sampling
• Problem of Metropolis algorithm ( proposal q(z|z’) = q(z’|z) )
✓ Step size is too small, slow convergence (random walk behavior)
✓ Step size is too large, high estimator variance (high rejection rate)
• Idea: adapt step size automatically to suitable value
• Technique: introduce variable u and sample (u, z) jointly. Ignoring u
leads to the desired samples of p(z)

Slice Sampling
• Sample z and u uniformly from area under the distribution
✓ Fix z, sample u uniform from
✓ Fix u, sample z uniform from the slice through the distribution
• How to sample z from the slice
slice
[0, ˜p(z)]
{z : ˜p(z) > u}
✓ Start with the region of width w containing z(τ)
✓ If end point in slice, then extend region by w in that direction
✓ Sample z’ uniform from region
✓ If z’ in slice, then accept as z(τ+1)
✓ If not: make z’ new end point of the region, and resample z’
Multivariate distribution: slice
sampling within Gibbs sampler
See next slides for more details

Slice Sampling Idea
˜p(z)
(z, u)
z
Sample uniformly under curve ˜p(z) / p(z)
p(u|z) = Uniform[0, ˜p(z)]
p(z|u) /
(
1 if ˜p(z) u
0 if otherwise
= Uniform on the slice
u
Slide from MCMC NIPS2015 tutorial

Slice Sampling Idea
Rejection sampling p(z|u) using broader uniform
z
(z, u)
u
Unimodal conditionals

Slice Sampling Idea
Adaptive rejection sampling p(z|u)
z
(z, u)
u

Slice Sampling Idea
Quickly ﬁnd new z and no rejection recorded
z
(z, u)
u
|

Slice Sampling Idea
Multimodal conditionals
˜p(z)
(z, u)
u
z
Use updates that leave p(z|u) invariant
- place bracket randomly around point
- linearly step out until ends are oﬀ slice
- sample on bracket, shrinking as before

Progress…
42
Basic Sampling 
Algorithms
Markov Chain 
Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the 
Partition Function
Sampling Methods

Hybrid Monte Carlo
4311.5 Hybrid Monte Carlo
• Problem of Metropolis algorithm is the step size trade-oﬀ
• Hybrid Monte Carlo is suitable in continuous state spaces
✓ Able to make large jumps in state space with low rejection rate
✓ Adopts physical system (Hamiltonian) dynamics rather than a
probability distribution to propose future states in the Markov chain.
• Goal: to sample from
p(z) =
1
Zp
exp( E(z))
where E(z) is considered as potential energy function of system over z

Hamiltonian dynamics
• Hamiltonian dynamics describe how kinetic energy is converted to
potential energy (and vice versa) as an object moves throughout in time
• Evolution of state variable z = {zi} under continuous time τ.
• Momentum variables correspond to rate of change of state.
ri =
dzi
d⌧
(11.53)Join (z, r) space is
called phase space
• For each location the object takes, there is an associated potential energy
E(z), and for each momentum there is an associated kinetic energy K(r).
Total energy of the system is constant
and known as Hamiltonian
H(z, r) = E(z) + K(r)
and
@ri
@⌧
=
@H
@zi
=
@E(z)
@zi
@zi
@⌧
=
@H
@ri
=
@K(r)
@ri
• Preserve volume in phase space div V = 0 with V =
✓
dz
d⌧
,
dr
d⌧
◆
(11.62)

Simulating Hamiltonian dynamics
@ri
@⌧
=
@H
@zi
=
@E(z)
@zi
@zi
@⌧
=
@H
@ri
=
@K(r)
@ri
• If we have expression for partial and a set of initial conditions
(z0, r0), we can predict the location and momentum at any point
in time.
Leap Frog method (run for L steps to simulate dynamics over L x δ units of time)
1. Take a half step in time to update the momentum variable
ri(⌧ + /2) = ri(⌧) ( /2)
@E
@zi(⌧)
zi(⌧ + ) = zi(⌧) +
@K
@ri(⌧ + /2)
2. Take a full step in time to update the position variable
3. Take the remaining half step in time to ﬁnish updating the momentum
variable
ri(⌧ + ) = ri(⌧ + /2) ( /2)
@E
@zi(⌧ + )

Simulating Hamiltonian oscillator
F = kz
K(v) =
(mv)2
2m
=
v2
2
=
r2
2
= K(r)
Leap Frog equations
1. r(⌧ + /2) = r(⌧) ( /2)z(⌧)
2. z(⌧ + ) = z(⌧) + ( )r(⌧ + /2)
3. r(⌧ + ) = r(⌧ + /2) ( /2)z(⌧ + )
r
z
E+K H
Energy Phase Space
Img Ref. https://theclevermachine.wordpress.com/2012/11/18/
mcmc-hamiltonian-monte-carlo-a-k-a-hybrid-monte-carlo/
E(z) =
Z
Fdz =
kz2
2
Harmonic Oscillator

Target distribution
• Consider canonical distribution p(✓) =
1
Zp
exp( E(✓))
• Canonical distribution for the Hamiltonian dynamics energy function is
p(z, r) / exp( H(z, r)) = exp( E(z) K(r))
/ p(z)p(r) state z and momentum r are independently distributed
• We can use Hamiltonian dynamics to sample from the joint canonical
distribution over r and z and simply ignore the momentum contributions.
idea of introducing auxiliary variables (r) to facilitate the Markov chain of (z)
• A common choose
K(r) =
rT
r
2
and E(z) = log p(z)

Hybrid Monte Carlo
• Combination of Metropolis algorithm and Hamiltonian Dynamics
Algorithm to draw M samples from a target distribution
1. Set τ = 0
2. Generate an initial position state z(0) ~ π(0)
3. Repeat until τ = M
Set τ = τ + 1
- Sample a new initial momentum variable from the momentum canonical distribution r0 ~ p(r)
- Set z0 = z(τ - 1)
- Run Leap Frog algorithm starting at [z0, r0] for L step and step size δ to obtain proposed
states z* and r*
- Calculate the Metropolis acceptance probability
↵ = min(1, exp{H(z0, r0) H(z⇤
, r⇤
)})
- Draw a random number u uniformly from [0, 1]
If u ≤ α accept the position and set the next state z(τ) = z* else set z(τ)= z(τ-1)

Hybrid Monte Carlo simulation
Hamiltonian Monte Carlo for sampling
a Bivariate Normal distribution
E(z) = log(e
zT ⌃ 1z
2 ) const
p(z) = N(µ, ⌃) with µ = [0, 0]
The MH algorithm converges much slower
than HMC, and consecutive samples have
much higher autocorrelation than samples
drawn using HMC
Img Source. https://theclevermachine.wordpress.com/2012/11/18/
mcmc-hamiltonian-monte-carlo-a-k-a-hybrid-monte-carlo/

Detailed balance
Transition probability going from R to R’ Transition probability going from R’ to R
1
ZH
exp( H(R)) V
1
2
min{1, exp(H(R) H(R0
))}
1
ZH
exp( H(R0
)) V
1
2
min{1, exp(H(R0
) H(R))}
Update after sequence of L leapfrog iterations of step size δ 
the leapfrog integration preserves phase-space volume
R
R’
=
time-reversible
prob of choosing positive step size δ or negative step size -δ

Progress…
51
Basic Sampling 
Algorithms
Markov Chain 
Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Estimating the 
Partition Function
Sampling Methods

Estimating the Partition Function
5211.6 Estimating the Partition Function
• Most sampling algorithms require distribution up to the constant
partition function ZE (not needed in order to draw samples from p(z))
pE(z) =
1
ZE
exp{ E(z)}
ZE =
X
z
exp{ E(z)}
• Partition function is useful for model comparison (because it
represent for the probability of observed data).
p(hidden|observed) =
p(hidden, observed)
p(observed)
• For model comparison, we’re interested in ratio of partition functions

Using importance sampling
• Use importance sampling from proposal pG with energy G(z)
ZE
ZG
=
P
z exp( E(z))
P
z exp( G(z))
=
P
z exp( E(z) + G(z)) exp( G(z))
P
z exp( G(z))
= EpG
[exp( E(z) + G(z))] '
1
L
exp( E(z(l)
) + G(z(l)
)) (11.72)
sampled from pG
• Problem: pG need match pE
• Idea: we can use samples z(l) from pE from a Markov chain
• If ZG is easy to compute we can estimate ZE
pG(z) =
1
L
LX
l=1
T(z(l)
, z) (11.73)
where T gives the transition probabilities of the chain
• We now deﬁne G(z) = -log pG(z) and use in (11.72)

Chaining
• Partition function ratio estimation requires matching distributions.
• Partition function ZG needs to be evaluated exactly (but only simple
distribution) => Poor matching with complicated distribution
• Idea: use set of distributions between the simple p1 and complex pM
ZM
Z1
=
Z2
Z1
Z3
Z2
...
ZM
ZM 1
E↵(z) = (1 ↵)E1(z) + ↵EM (z)
• The intermediate distributions interpolate from E1 to EM
(11.74)
(11.75)
• Use single Markov chain run initially for the system p1 and then after some
suitable number of steps moves on to the next distribution in the sequence.

Summary
55
Basic Sampling  
Algorithms
Markov Chain  
Monte Carlo
Gibbs Sampling
Slice Sampling
Hybrid Monte Carlo
Function
Part I: General concept  
of basic sampling
Part II:
MCMC world
Sampling Methods

Papers Alert
56Sampling Methods
• Markov Chain Monte Carlo Method without Detailed Balance
http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.105.120603
• Hamiltonian Annealed Importance Sampling for partition function estimation
http://arxiv.org/abs/1205.1925
• Hamiltonian Monte Carlo with Reduced Momentum Flips
(2010) Hidemaro Suwa and Synge Todo
(2012) Jascha Sohl-Dickstein, Benjamin J. Culpepper
(2012) Jascha Sohl-Dickstein http://arxiv.org/abs/1205.1939
http://jmlr.org/proceedings/papers/v32/sohl-dickstein14.pdf
• Hamiltonian Monte Carlo Without Detailed Balance
(2014) Jascha Sohl-Dickstein
• A Markov Jump Process for More Eﬃcient Hamiltonian Monte Carlo
(2015) Jascha Sohl-Dickstein http://arxiv.org/abs/1509.03808
http://jmlr.org/proceedings/papers/v37/salimans15.pdf
• Markov Chain Monte Carlo and Variational Inference: Bridging the Gap
(2015) Tim Salimans

Observing Dark Worlds
57Dark Matter Worlds Halo
Dark Matter bending the light from a background galaxy. In extreme cases
the galaxy here is seen as the two arcs surrounding it
https://www.kaggle.com/c/DarkWorlds

We observe that this stuﬀ
aggregates and forms massive
structures called Dark Matter
Halos.
There are many galaxies
behind a Dark Matter
halo, their shapes will
correlate with its position.

The task is then to use this
“bending of light” to
estimate where in the sky
this dark matter is located.

https://www.kaggle.com/c/DarkWorlds• It is really one of statistics: given the noisy data (the elliptical galaxies)
recover the model and parameters (position and mass of the dark
matter) that generated them
• Step 1: construct a prior distribution p(x) for halo positions (e.g. uniform)
• Step 2: construct a probabilistic model for the data (observed ellipticities of
the galaxies) p(e|x) p(ei|x) = N(
X
j=allhalos
di,jmjf(ri,j), 2
)
http://timsalimans.com/observing-dark-worlds/
✦ dij = tangential direction, i.e. the direction in which halo j bends the light of galaxy i
✦ mj is the mass of halo j
✦ f(rij) is a decreasing function in the euclidean distance rij between galaxy i and halo j.
✦For the large halos assign m as a log-uniform distribution in [40,180], and f(rij) = 1/max(rij, 240)
✦For the small halos, ﬁxed the mass at 20 and f(rij) = 1/max(rij, 70)

• Step 3: Get posterior distribution for halo positions p(x|e) =
p(e|x)p(x)/p(e) (simple random-walk Metropolis Hastings
sampler to approximate the posterior distribution )
• Step 4: Minimization the expected loss
˜x = arg min
prediction
Ep(x|e)L(prediction, x)
http://timsalimans.com/observing-dark-worlds/

Dark Matter Worlds Halo Slide from MCMC NIPS2015 tutorial

PRML Reading Chapter 11 - Sampling Method

More Related Content

What's hot

Viewers also liked

Similar to PRML Reading Chapter 11 - Sampling Method

More from Ha Phuong

Recently uploaded

PRML Reading Chapter 11 - Sampling Method