Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Markov Chain Monte Carlo explained

5,488

Published on

An brief overview of some of the most famous Markov Chain Monte Carlo Methods.

An brief overview of some of the most famous Markov Chain Monte Carlo Methods.

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
5,488
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
136
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. MarkovChainMonteCarlo theory and worked examples Dario Digiuni, A.A. 2007/2008
• 2. Markov Chain Monte Carlo • Class of sampling algorithms • High sampling efficiency • Sample from a distribution with unknown normalization constant • Often the only way to solve problems in time polynomial in the number of dimensions e.g. evaluation of a convex body volume
• 3. MCMC: applications • Statistical Mechanics Metropolis-Hastings • Optimization ▫ Simulated annealing • Bayesian Inference ▫ Metropolis-Hastings ▫ Gibbs sampling
• 4. The Monte Carlo principle • Sample a set of N independent and identically-distributed variables • Approximation of the target p.d.f. with the empirical expression … then approximation of the integrals!
• 5. Rejection Sampling 1. It needs finding M! 2. Low acceptance rate
• 6. Idea • I can use the previously sampled value to find the following one • Exploration of the configuration space by means of Markov Chains: def .: Markov process def .: Markov chain
• 7. Invariant distribution • Stability conditions: 1. Irreducibility= for every state there exists a finite probability to visit any other state 2. Aperiodicity = there are no loops. • Sufficient condition 1. Detailed balance principle MCMC algorithms are aperiodic, irreducible Markov chains having the target pdf as the invariant distribution
• 8. Example • What is the probability to find the lift at the ground floor in a three floor building? ▫ 3 states Markov chain ▫ Lift= Random Walker ▫ Transition matrix ▫ Looking for the invariant distribution … burn-in …
• 9. Example - 2 • I can apply the matrix T on the right to any of the states, e.g. homogeneous Markov chain ~ 50% is the probability to find • Google’s PageRank: the lift at the ground floor ▫ Websites are the states, T is defined by the number of hyperlinks among them and the user is the random walker:  The webpages are displayed following the invariant distribution!
• 10. Metropolis-Hastings • Given the target distribution equivalent to T 1. Choose a value for 2. Sample from a proposal distribution 3. Accept the new value with probability 4. Return to 1 Ratio independent Equal in Metropolis algorithm of the normalization!
• 11. M.-H. – Pros and Cons • Very general sampling method: ▫ I can sample from a unnormalized distribution ▫ It does not require to provide upper bound for the function • Good working depends on the choice of the proposal distribution ▫ well-mixing condition
• 12. M.-H. - Example • In Statistical Mechanics it is important to evalue the partition function, e.g. Ising model Sum every possible spin state: In a 10 x 10 x 10 spin cube, I would have to sum over MCMC APPROACH: 1. Evaluate the system’s energy Possible states = UNFEASIBLE 2. Pick up a spin at random and flip it: 1. If energy decreases, this is the new spin configuration 2. If energy increases, this is the new spin configuration with probability
• 13. Simulated Annealing • It allows one to find the global maximum of a generic pdf ▫ No comparison between the value of local minima required ▫ Application to the maximum-likelihood method • It is a non-homogeneous Markov chain whose invariant distribution keeps changing as follows:
• 14. Simulated Annealing: example • Let us apply the algorithm to a simple, 1-dimensional case • The optimal cooling scheme is
• 15. Simulated Annealing: Pros and Cons • The global maximum is univocally determined ▫ Even if walker starts next to a local (non global!) maximum, it converges to the true global maximum • It requires a good tuning of the parameters
• 16. Gibbs Sampler • Optimal method to marginalize multidimensional distributions • Let us assume we have a n-dimensional vector and that we know all the conditional probability expression for the pdf • We take the following proposal distribution:
• 17. Gibbs Sampler - 2 • Then: very efficient method!
• 18. Gibbs Sampler – practically
• 19. Gibbs Sampler – practically 1. §Initialize fix n-1 coordinates and sample from the resulting pdf 2. for (i=0 ; i < N; i++) • Sample • Sample • Sample • Sample
• 20. Gibbs Sampler – example • Let us pretend we cannot determine the normalization constant… … but we can make a comparison with the true marginalized pdf…
• 21. Gibbs Sampler – results • Comparison between Gibbs Sampling and the true M.-H. sampling from the marginalized pdf • Good c2 agreement
• 22. A complex MCMC application A radioactive source decays with frequency l1 and a detector records only every k1 –th event, then at the moment tc the decay rate changes to l2 and only one event out ofk2 is recorded. Apparently l1 , k1 , tc , l2 and k2 are undetermined. We wish to find them.
• 23. Preparation • The waiting time for the k-th event in a Poissonian process with frequency l is distributed according to: • I can sample a big amount of events from this pdf, changing the parameters l1 e k1 to l2 e k2 at time tc • I evaluate the likelihood:
• 24. Idea • I assume log-likelihood to be the invariant distribution! ▫ which are the Markov chain states? struct State { Parameter double lambda1, lambda2; space double tc; int k1, k2; Corresponding log- double plog; likelihood value State(double la1, double la2, double t, int kk1, int kk2) : lambda1(la1), lambda2(la2), tc(t), k1(kk1), k2(kk2) {} State() {}; };
• 25. Practically • I have to find an appropriate proposal distribution to move among the states ▫ Attention: varying li and ki I have toi prevent the acceptance rate to be too low… but also too high! • The a ratio is evaluated as the ratio between the final-state and initial-state likelihood values. • Try to guess the values for li , ki and tc • Let the chain evolve for a burn-in time and then record the results.
• 26. Results • Even if the inital guess is quite far from the real value, the random walker converges. guess: l1=5 l2 = 5 k1 = 3 k2 = 2 real: l1=1 l2 = 2 k1 = 1, k2 = 1
• 27. Results- 2 • Estimate of the uncertainty l2 l1
• 28. Results- 3 • All the parameters can be detemined quickly guess: tc=150 real: tc=300
• 29. References • C. Andrieu, N. De Freitas, A. Doucet e M.I. Jordan, Machine Learning 50 (2003), 5-43. • G. Casella e E.I. George, The American Statistician 46, 3 (1992), 167-174. • W.H. Press, S. A. Teukolsky, W.T. Vetterling e B.P. Flannery, Numerical Recipes , Third Edition, Cambridge University Press, 2007. • M. Loreti, Teoria degli errori e fondamenti di statistica, Decibel, Zanichelli (1998). • B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes for EEB 581