Markov Chain Monte Carlo explained

MarkovChainMonteCarlo
theory and worked examples

Dario Digiuni,
A.A. 2007/2008

Markov Chain Monte Carlo
• Class of sampling algorithms

• High sampling efficiency

• Sample from a distribution with unknown normalization constant

• Often the only way to solve problems in time polynomial in the
number of dimensions
e.g. evaluation of a convex body volume

MCMC: applications
• Statistical Mechanics
Metropolis-Hastings

• Optimization
▫ Simulated annealing

• Bayesian Inference
▫ Metropolis-Hastings
▫ Gibbs sampling

The Monte Carlo principle
• Sample a set of N independent and identically-distributed variables

• Approximation of the target p.d.f. with the empirical expression

… then approximation of the integrals!

Rejection Sampling
1. It needs finding M!
2. Low acceptance rate

Idea
• I can use the previously sampled value to find the following one

• Exploration of the configuration space by means of Markov Chains:

def .: Markov process

def .: Markov chain

Invariant distribution
• Stability conditions:

1. Irreducibility= for every state there exists a finite probability to visit
any other state
2. Aperiodicity = there are no loops.

• Sufficient condition
1. Detailed balance principle

MCMC algorithms are aperiodic, irreducible Markov chains having
the target pdf as the invariant distribution

Example
• What is the probability to find the lift at the ground floor in a three
floor building?

▫ 3 states Markov chain

▫ Lift= Random Walker

▫ Transition matrix

▫ Looking for the invariant distribution
… burn-in …

Example - 2
• I can apply the matrix T on the right to any of the states, e.g.

homogeneous
Markov chain

~ 50% is the probability to find
• Google’s PageRank: the lift at the ground floor

▫ Websites are the states, T is defined by the number of hyperlinks among
them and the user is the random walker:

 The webpages are displayed following the invariant distribution!

Metropolis-Hastings
• Given the target distribution
equivalent to T
1. Choose a value for

2. Sample from a proposal distribution

3. Accept the new value with probability

4. Return to 1
Ratio independent Equal in Metropolis algorithm
of the normalization!

M.-H. – Pros and Cons
• Very general sampling method:

▫ I can sample from a unnormalized distribution

▫ It does not require to provide upper bound for the function

• Good working depends on the choice of the proposal distribution

▫ well-mixing condition

M.-H. - Example
• In Statistical Mechanics it is important to evalue the partition
function,

e.g. Ising model
Sum every possible spin state:
In a 10 x 10 x 10 spin cube,
I would have to sum over
MCMC APPROACH:

1. Evaluate the system’s energy Possible states = UNFEASIBLE

2. Pick up a spin at random and flip it:

1. If energy decreases, this is the new spin configuration

2. If energy increases, this is the new spin configuration with
probability

Simulated Annealing
• It allows one to find the global maximum of a generic pdf

▫ No comparison between the value of local minima required
▫ Application to the maximum-likelihood method

• It is a non-homogeneous Markov chain whose invariant distribution
keeps changing as follows:

Simulated Annealing: example
• Let us apply the algorithm to a simple, 1-dimensional case

• The optimal cooling scheme is

Simulated Annealing: Pros and Cons
• The global maximum is univocally determined
▫ Even if walker starts next to a local (non global!) maximum, it converges to the
true global maximum

• It requires a good tuning of the parameters

Gibbs Sampler
• Optimal method to marginalize multidimensional distributions

• Let us assume we have a n-dimensional vector and that we know all
the conditional probability expression for the pdf

• We take the following proposal distribution:

Gibbs Sampler - 2
• Then:

very efficient
method!

Gibbs Sampler – practically
1. §Initialize fix n-1 coordinates and sample
from the resulting pdf

2. for (i=0 ; i < N; i++)

• Sample

• Sample

• Sample

• Sample

Gibbs Sampler – example

• Let us pretend we cannot determine the normalization
constant…

… but we can make a comparison with the true marginalized
pdf…

Gibbs Sampler – results
• Comparison between Gibbs
Sampling and the true M.-H.
sampling from the marginalized pdf

• Good c2 agreement

A complex MCMC application
A radioactive source decays with frequency l1 and a detector records
only every k1 –th event, then at the moment tc the decay rate
changes to l2 and only one event out ofk2 is recorded.

Apparently l1 , k1 , tc , l2 and k2 are undetermined.

We wish to find them.

Preparation
• The waiting time for the k-th event in a Poissonian process with
frequency l is distributed according to:

• I can sample a big amount of events from this pdf, changing the
parameters l1 e k1 to l2 e k2 at time tc

• I evaluate the likelihood:

Idea
• I assume log-likelihood to be the invariant distribution!
▫ which are the Markov chain states?

struct State {
Parameter
double lambda1, lambda2;
space
double tc;
int k1, k2; Corresponding log-
double plog; likelihood value

State(double la1, double la2, double t, int kk1, int kk2) :

lambda1(la1), lambda2(la2), tc(t), k1(kk1), k2(kk2) {}

State() {};
};

Practically
• I have to find an appropriate proposal distribution to move among
the states
▫ Attention: varying li and ki I have toi prevent the acceptance rate to be
too low… but also too high!

• The a ratio is evaluated as the ratio between the final-state and
initial-state likelihood values.

• Try to guess the values for li , ki and tc

• Let the chain evolve for a burn-in time and then record the results.

Results • Even if the inital guess is quite far from the real
value, the random walker converges.
guess: l1=5 l2 = 5 k1 = 3 k2 = 2

real: l1=1 l2 = 2 k1 = 1, k2 = 1

Results- 2
• Estimate of the uncertainty

l2

l1

Results- 3
• All the parameters can be detemined quickly
guess: tc=150 real: tc=300

References
• C. Andrieu, N. De Freitas, A. Doucet e M.I. Jordan, Machine Learning 50
(2003), 5-43.

• G. Casella e E.I. George, The American Statistician 46, 3 (1992), 167-174.

• W.H. Press, S. A. Teukolsky, W.T. Vetterling e B.P. Flannery, Numerical
Recipes , Third Edition, Cambridge University Press, 2007.

• M. Loreti, Teoria degli errori e fondamenti di statistica, Decibel, Zanichelli
(1998).

• B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes
for EEB 581

Markov Chain Monte Carlo explained

More Related Content

What's hot

Similar to Markov Chain Monte Carlo explained

Recently uploaded

Markov Chain Monte Carlo explained