Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo

2,364 views

Published on

This is the invited talk give at the Basque Center for Applied Mathematics (BCAM) in Spain in 2010.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,364
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
119
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo

  1. 1. Monte Carlo & MCMCXin-She YangMonte CarloEstimating πBuffon’s Monte Carlo Simulations, Sampling andproblemProbabilityMonte Carlo Markov Chain Monte CarloMonte CarlointegrationQuality ofSamplingQuasi-MonteCarlo Xin-She YangPseudorandomPseudorandomnumbergenerationOtherdistributions c 2010LimitationsMultivariatedistributionsMarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  2. 2. Estimating πMonte Carlo & MCMCXin-She Yang How to estimate π using only a ruler and some match sticks?Monte CarloEstimating πBuffon’sproblemProbabilityMonte CarloMonte CarlointegrationQuality ofSamplingQuasi-MonteCarloPseudorandomPseudorandomnumbergenerationOtherdistributionsLimitationsMultivariatedistributionsMarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  3. 3. Buffon’s Needle ProblemMonte Carlo & MCMC Buffon’s needle problem (1733). Probability of crossing a lineXin-She Yang 2 L p= · ,Monte Carlo π dEstimating π where L = length of needles, and d =spacing.Buffon’sproblemProbabilityMonte CarloMonte CarlointegrationQuality ofSamplingQuasi-MonteCarloPseudorandomPseudorandomnumbergenerationOtherdistributionsLimitationsMultivariatedistributionsMarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  4. 4. Probability of Crossing a LineMonte Carlo & MCMCXin-She Yang Since p ≈ n/N ≈ 2L/πd, we haveMonte Carlo 2N LEstimating πBuffon’s π≈ · .problem n dProbabilityMonte CarloMonte Carlointegration Lazzarini (1901): L = 5d/6, N = 3408, n = 1808, soQuality ofSamplingQuasi-Monte 2 × 3408 5Carlo π≈ · ≈ 3.14159290.Pseudorandom 1808 6PseudorandomnumbergenerationOtherdistributions Too accurate?! Is this right? What happens when n = 1809?Limitations √Multivariatedistributions Errors ∼ 1/ N ∼ 2%.MarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  5. 5. Monte Carlo MethodsMonte Carlo & MCMC Everyone has used Monte Carlo methods in some way ...Xin-She YangMonte CarloEstimating πBuffon’sproblemProbabilityMonte CarloMonte CarlointegrationQuality ofSamplingQuasi-MonteCarloPseudorandomPseudorandomnumbergenerationOtherdistributionsLimitationsMultivariatedistributionsMarkov Measure temperatures, choose a product, ...ChainsMarkov chainsMarkov chains Taste soup, wine ...A FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  6. 6. Monte Carlo IntegrationMonte Carlo & MCMC n 1Xin-She Yang I= fdv = V fi + O(ǫ), Ω NMonte Carlo i =1Estimating π 1 N 2 √Buffon’sproblem N i =1 fi − µ2Probability ǫ∼ ∼ O(1/ N).Monte Carlo NMonte CarlointegrationQuality ofSamplingQuasi-MonteCarloPseudorandomPseudorandomnumbergenerationOtherdistributionsLimitationsMultivariatedistributionsMarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  7. 7. Importance and Quality of the SamplesMonte Carlo & MCMC Higher dimensions – even more challenging!Xin-She Yang I= ... f (u, v , ..., w ) du dv ...dw .Monte CarloEstimating πBuffon’sproblem √Probability Errors ∼ 1/ NMonte CarloMonte CarlointegrationQuality of Higher dimensional integralsSamplingQuasi-MonteCarlo How to distribute these sampling points?PseudorandomPseudorandomnumber Regular grids: E ∼ O(N −2/d ) in d ≥ 4 dimensions (notgenerationOther enough!)distributionsLimitationsMultivariatedistributions Strategies: importance sampling, Latin hypercube, ...MarkovChainsMarkov chains Any other ways?Markov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  8. 8. Quasi-Monte Carlo MethodsMonte Carlo & MCMC In essence, that is to distribute (consecutive) sampling pointsXin-She Yang as far away as possible, using quasi-random or low-discrepancy numbers (not pseudo-random)... Halton, Sobol, Corput ...Monte CarloEstimating πBuffon’s For example, Corput express an integer n as a prime base bproblemProbability mMonte CarloMonte Carlo n= aj (n)b j , aj ∈ {0, 1, 2, ..., b − 1}.integrationQuality of j=0SamplingQuasi-MonteCarlo Then, it is reversed or reflectedPseudorandom mPseudorandom 1numbergeneration φb (n) = aj (n) .Other b j+1distributions j=0LimitationsMultivariatedistributions For example, 0, 1, 2, ..., 15 =⇒ 0, 1 , 1 , 3 , 1 , ..., 15 . 2 4 4 8 16MarkovChainsMarkov chains Errors ∼ O(1/N)Markov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  9. 9. Pseudorandom numbers – by deterministic sequencesMonte Carlo & MCMC Uniform Distributions:Xin-She Yang di = (adi −1 + c) mod m,Monte CarloEstimating π Classic IBM generator:Buffon’s m = 231 (strong correlation!)problemProbability a = 65539, c = 0,Monte CarloMonte CarlointegrationQuality of In fact, correlation coefficient is 1!SamplingQuasi-Monte Better choice (old Matlab):CarloPseudorandom a = 75 = 16807, c = 0, m =31 −1 = 2, 147, 483, 647.PseudorandomnumbergenerationOther If scaled by m, all numbers are in [1/m, (m − 1)/m].distributionsLimitations New Matlab: [ǫ, 1 − ǫ], ǫ = 2−53 ≈ 1.1 × 10−16 .MultivariatedistributionsMarkovChains IEEE: 64-bits system = 53 bits for a signed fraction in base 2Markov chainsMarkov chains and 11 bits for a signed exponent.A FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  10. 10. Other DistributionsMonte Carlo & MCMC Inverse transform method, rejection method, Mersenne twister,Xin-She Yang ..., Markov chain Monte Carlo. 2 √1 e −u /2 ,Monte CarloEstimating π Standard norm distribution: p(u) = 2πBuffon’s v −u 2 /2 du CDF: Φ(v ) = √1 = 1 v 2 [1 + ( 2 )],problem −∞ eProbability √ 2πMonte CarloMonte Carlo √integrationQuality of v = Φ−1 (u) = 2 erf−1 (2u − 1),Sampling 1200 10000Quasi-MonteCarlo 1000 8000PseudorandomPseudorandom 800number 6000generation 600Otherdistributions 4000Limitations 400Multivariatedistributions 2000 200MarkovChains 0 0 0.2 0.4 0.6 0.8 1 0 -6 -4 -2 0 2 4 6Markov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  11. 11. Transform method: LimitationsMonte Carlo & MCMCXin-She YangMonte Carlo √Estimating π v = Φ−1 (u) = 2 erf−1 (2u − 1),Buffon’sproblemProbabilityMonte CarloMonte Carlo √integration π πx 3 7π 2 x 5 127π 3 x 7Quality ofSampling erf−1 (x) = x+ + + + ··· .Quasi-MonteCarlo 2 12 480 40320PseudorandomPseudorandomnumbergeneration Not so easy to calculate!OtherdistributionsLimitations Sometimes, the inverse may not be possible.MultivariatedistributionsMarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  12. 12. Multivariate DistributionsMonte Carlo & MCMC Bivariate normal distributions:Xin-She Yang 1 −(v1 +v2 )/2 2 2 p(v1 , v2 ) = e .Monte Carlo 2πEstimating πBuffon’s Box-M¨ller method: from u1 , u2 ∼ uniform distributions uproblemProbabilityMonte CarloMonte Carlo v1 = −2 ln u1 cos(2πu2 ), v2 = −2 ln u1 sin(2πu2 ).integrationQuality ofSamplingQuasi-MonteCarlo ProblemsPseudorandomPseudorandomnumber Difficult to calculate the inverse in most casesgenerationOther (sometimes, even impossible!).distributionsLimitationsMultivariate Other methods (e.g., rejection method) are inefficient.distributionsMarkovChainsMarkov chains So – the Markov chain Monte Carlo (MCMC) way!Markov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  13. 13. Random Walk down the Markov ChainsMonte Carlo & MCMC Random walk – A drunkard’s walk:Xin-She Yang ut+1 = µ + ut + wt ,Monte CarloEstimating π where wt is a random variable, and µ is the drift.Buffon’sproblem For example, wt ∼ N(0, σ 2 ) (Gaussian).ProbabilityMonte CarloMonte Carlo 25 10integrationQuality of 20Sampling 5Quasi-MonteCarlo 15 0Pseudorandom 10Pseudorandom -5number 5generation -10Other 0distributionsLimitations -5 -15Multivariatedistributions -10 -20 0 100 200 300 400 500 -15 -10 -5 0 5 10 15 20MarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  14. 14. Markov ChainsMonte Carlo & MCMCXin-She Yang Markov chain: the next state only depends on the current state and the transition probability.Monte CarloEstimating πBuffon’sproblemProbability P(i , j) ≡ P(Vt+1 = Sj V0 = Sp , ..., Vt = Si )Monte CarloMonte CarlointegrationQuality of = P(Vt+1 = Sj Vt = Sj ),SamplingQuasi-MonteCarlo =⇒ Pij πi∗ = Pji πj∗ , π ∗ = stionary probability distribution.PseudorandomPseudorandomnumbergenerationOther Examples: Brownian motiondistributionsLimitationsMultivariatedistributions ui +1 = µ + ui + ǫi , ǫi ∼ N(0, σ 2 ).MarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  15. 15. Markov ChainsMonte Carlo & MCMC Monopoly (board games)Xin-She YangMonte CarloEstimating πBuffon’sproblemProbabilityMonte CarloMonte CarlointegrationQuality ofSamplingQuasi-MonteCarloPseudorandomPseudorandomnumbergenerationOtherdistributionsLimitationsMultivariatedistributionsMarkovChains Monopoly AnimationMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  16. 16. A Famous $Billion Markov Chain – PageRankMonte Carlo & MCMCXin-She Yang Google PageRank Algorithm (by Page et al., 1997)Monte CarloEstimating πBuffon’sproblemProbabilityMonte CarloMonte CarlointegrationQuality ofSamplingQuasi-MonteCarloPseudorandomPseudorandomnumbergenerationOtherdistributionsLimitationsMultivariatedistributions Billions of web pages: pages = states, link probability ∼ 1/tMarkovChains where t ≈ the expectation of the number of clicks.Markov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  17. 17. Googling as a Markov Chain (t)Monte Carlo (t+1) 1−α Ranki & MCMC Rankj = +α ,Xin-She Yang N B(pi ) pi ∈Ω(pi )Monte CarloEstimating π where N=number of pages, B(pi ) is the link bounds of page (t=0)Buffon’sproblem pi , and α=a ranking factor (≈ 0.85). Ranki = 1/N.Probability TMonte CarloMonte Carlo Let R = Rank1 , ..., RankN , and L(pi , pj ) = 0 if no linksintegration =⇒Quality ofSampling  Quasi-MonteCarlo  (1 − α)  L(p1 , p1 ) ... L(p1 , pj ) ...L(p1 , pN ) . .  Pseudorandom    . Pseudorandom 1 .    R=  .  + α L(pi , p1 ) L(pi , pj ) ...L(pi , pN )  R,    numbergeneration N . . ..   Other    . . distributions  . Limitations (1 − α) L(pN , p1 ) ... L(pN , pN )Multivariatedistributions where N L(pi , pj ) = 1. Google Matrix (stochastic, sparse).MarkovChains i =1Markov chainsMarkov chains =⇒ a stationary probability distribution R (update monthly).A FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  18. 18. Markov Chain Monte CarloMonte Carlo & MCMCXin-She YangMonte Carlo Landmarks: Monte Carlo method (1930s, 1945, from 1950s)Estimating πBuffon’s e.g., Metropolis Algorithm (1953), Metropolis-Hastings (1970).problemProbabilityMonte CarloMonte Carlo Markov Chain Monte Carlo (MCMC) methods – A class ofintegrationQuality of methods.SamplingQuasi-MonteCarlo Really took off in 1990s, now applied to a wide range of areas:PseudorandomPseudorandom physics, Bayesian statistics, climate changes, machine learning,numbergenerationOther finance, economy, medicine, biology, materials and engineeringdistributionsLimitations ...MultivariatedistributionsMarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  19. 19. Metropolis-HastingsMonte Carlo & MCMC The Metropolis-Hastings algorithm algorithm:Xin-She Yang 1 Begin with any initial θ0 at time t ← 0 such thatMonte Carlo p(θ0 ) > 0Estimating πBuffon’sproblem 2 Generating a candidate sample θ∗ ∼ q(θt , .) from aProbabilityMonte Carlo proposal distributionMonte CarlointegrationQuality of 3 Evaluate the acceptance probability α(θt , θ∗ ) given bySamplingQuasi-MonteCarlo p(θ∗ )q(θ∗ , θt )Pseudorandom α = min ,1Pseudorandomnumber p(θt )q(θt , θ∗ )generationOtherdistributions 4 Generate a uniformly-distributed random number u ∼LimitationsMultivariate Unif[0, 1], and accept θ∗ if α ≥ u. That is, if α ≥ u thendistributionsMarkov θt+1 ← θ∗ else θt+1 ← θtChainsMarkov chains 5 Increase the counter or time t ← t + 1, and go to step 2Markov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  20. 20. Mixture distribution: A distribution with known mean and variance.Monte Carlo & MCMC f (x|µ, σ 2 ) = K αi pi (x|µi , σi2 ), i =i K i =1 αi = 1.Xin-She Yang E.g., α1 = α2 = 1/2, µ1 = 2, µ2 = −2 and σ1 = σ2 = 1. 6Monte Carlo 4Estimating π 2Buffon’sproblem 0Probability -2Monte CarloMonte Carlo -4 0 2000 4000 6000 8000 10000integrationQuality ofSampling 0.2Quasi-Monte 0.18Carlo 0.16PseudorandomPseudorandom 0.14numbergeneration 0.12Other 0.1distributionsLimitations 0.08Multivariatedistributions 0.06 0.04MarkovChains 0.02Markov chains 0Markov chains −6 −4 −2 0 2 4 6A FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  21. 21. When to Stop the ChainMonte Carlo & MCMC As the MCMC runs, convergence may be reachedXin-She Yang When does a chain converge? When to stop the chain ... ?Monte CarloEstimating π Are the samples correlated ?Buffon’sproblemProbability 0Monte CarloMonte Carlointegration 100Quality ofSampling 200Quasi-MonteCarloPseudorandom 300Pseudorandomnumber 400generationOtherdistributions 500LimitationsMultivariatedistributions 600MarkovChains 0 100 200 300 400 500 600 700 800 900Markov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  22. 22. A Long Single Chain or Multiple Short Chains?Monte Carlo & MCMCXin-She YangMonte Carlo When a Markov chain will converge in practice? If it hasEstimating πBuffon’s converged, what does it mean?problemProbabilityMonte Carlo Is a very long chain really good enough (from statisticalMonte Carlointegration point of view)?Quality ofSamplingQuasi-Monte How long is long enough?CarloPseudorandom Are multiple chains better?Pseudorandomnumbergeneration How to improve the sampling efficiency and/or mixingOtherdistributions properties ?LimitationsMultivariatedistributionsMarkovChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  23. 23. Simulated TemperingMonte Carlo & MCMC Simulated annealing: temperature T from high to low.Xin-She Yang Simulated tempering: raise T to a higher value, reduce to low.Monte CarloEstimating πBuffon’s πτ = π(x)1/τ , πτ →∞ → 1, as τ → ∞.problemProbabilityMonte Carlo The basic idea is to reduce from a very high τ to τ0 = 1.Monte CarlointegrationQuality ofSampling flattenQuasi-MonteCarlo =⇒Pseudorandom π≥ 0 πτ = π(x)1/τPseudorandomnumbergenerationOtherdistributionsLimitations TemperingMultivariatedistributions Use flattened (near uniform) distributions asMarkovChains proposals/candidates to produce high quality samplings.Markov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  24. 24. Sampling: Forward or Backward? Which Way?Monte Carlo & MCMC Is this the only way?Xin-She Yang No! – Coupling from the Past & MetaheuristicsMonte CarloEstimating πBuffon’sproblemProbabilityMonte Carlo If we go backward along the chain, any advantages? If so, how?Monte CarlointegrationQuality ofSampling Is there a universally efficient sampling tool for drawingQuasi-MonteCarlo samples in general?PseudorandomPseudorandomnumber No! – No-free-lunch theorem (Wolpert & Macready, 1997)generationOtherdistributions The aim of the research is to find the best algorithm(s) for aLimitationsMultivariatedistributions given/specific problem/distribution.MarkovChainsMarkov chains Also Metaheuristics (very promosing).Markov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC
  25. 25. Thank youMonte Carlo & MCMCXin-She Yang ReferencesMonte Carlo Gamerman D., Markov Chain Monte Carlo, Chapman & Hall/CRC, (1997).Estimating π Corcoran J. and Tweedie R., Perfect sampling ... Jour. Stat. Plan. Infer., 104, 297 (2002).Buffon’sproblem Cox M., Forbes A. B., Harris P. M., Smith I., Classification and solution of regression ..., NPL SSfMProbability Report, (2004).Monte Carlo Propp J. & Wilson D., Exact sampling ..., Random Stru. Alg., 9, 223 (1996).Monte Carlointegration Yang X. S., Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).Quality ofSampling Yang X. S., Introduction to Computational Mathematics, World Scientific, (2008).Quasi-Monte Yang X. S., Engineering Optimization: An Introduction with Metaheuristic Applications, Wiley,Carlo (2010).PseudorandomPseudorandomnumbergenerationOtherdistributions Acknowledgement:LimitationsMultivariate EPSRC, SSfM, NPL, CUED, and London Maths Society.distributionsMarkov Thank you!ChainsMarkov chainsMarkov chainsA FamousMarkov Chain Xin-She Yang Monte Carlo & MCMC

×