A short history of MCMC

  • 4,419 views
Uploaded on

Talk given in Bristol, April 02, 2011, at the Julian Besag's memorial

Talk given in Bristol, April 02, 2011, at the Julian Besag's memorial

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,419
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
114
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Christian P. Robert and George Casella Universit´ Paris-Dauphine, IuF, & CRESt e and University of Florida April 2, 2011
  • 2. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data In memoriam, Julian Besag, 1945–2010
  • 3. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data IntroductionIntroduction Markov Chain Monte Carlo (MCMC) methods around for almost as long as Monte Carlo techniques, even though impact on Statistics not been truly felt until the late 1980s / early 1990s .
  • 4. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data IntroductionIntroduction Markov Chain Monte Carlo (MCMC) methods around for almost as long as Monte Carlo techniques, even though impact on Statistics not been truly felt until the late 1980s / early 1990s . Contents: Distinction between Metropolis-Hastings based algorithms and those related with Gibbs sampling, and brief entry into “second-generation MCMC revolution”.
  • 5. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data IntroductionA few landmarks Realization that Markov chains could be used in a wide variety of situations only came to “mainstream statisticians” with Gelfand and Smith (1990) despite earlier publications in the statistical literature like Hastings (1970) and growing awareness in spatial statistics (Besag, 1986)
  • 6. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data IntroductionA few landmarks Realization that Markov chains could be used in a wide variety of situations only came to “mainstream statisticians” with Gelfand and Smith (1990) despite earlier publications in the statistical literature like Hastings (1970) and growing awareness in spatial statistics (Besag, 1986) Several reasons: lack of computing machinery lack of background on Markov chains lack of trust in the practicality of the method
  • 7. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los AlamosBombs before the revolution Monte Carlo methods born in Los Alamos, New Mexico, during WWII, mostly by physicists working on atomic bombs and eventually producing the Metropolis algorithm in the early 1950’s. [Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
  • 8. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los AlamosBombs before the revolution Monte Carlo methods born in Los Alamos, New Mexico, during WWII, mostly by physicists working on atomic bombs and eventually producing the Metropolis algorithm in the early 1950’s. [Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]
  • 9. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los AlamosMonte Carlo genesis Monte Carlo method usually traced to Ulam and von Neumann: Stanislaw Ulam associates idea with an intractable combinatorial computation attempted in 1946 about “solitaire” Idea was enthusiastically adopted by John von Neumann for implementation on neutron diffusion Name “Monte Carlo“ being suggested by Nicholas Metropolis [Eckhardt, 1987]
  • 10. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los AlamosMonte Carlo genesis Monte Carlo method usually traced to Ulam and von Neumann: Stanislaw Ulam associates idea with an intractable combinatorial computation attempted in 1946 about “solitaire” Idea was enthusiastically adopted by John von Neumann for implementation on neutron diffusion Name “Monte Carlo“ being suggested by Nicholas Metropolis [Eckhardt, 1987]
  • 11. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los AlamosMonte Carlo genesis Monte Carlo method usually traced to Ulam and von Neumann: Stanislaw Ulam associates idea with an intractable combinatorial computation attempted in 1946 about “solitaire” Idea was enthusiastically adopted by John von Neumann for implementation on neutron diffusion Name “Monte Carlo“ being suggested by Nicholas Metropolis [Eckhardt, 1987]
  • 12. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los AlamosMonte Carlo genesis Monte Carlo method usually traced to Ulam and von Neumann: Stanislaw Ulam associates idea with an intractable combinatorial computation attempted in 1946 about “solitaire” Idea was enthusiastically adopted by John von Neumann for implementation on neutron diffusion Name “Monte Carlo“ being suggested by Nicholas Metropolis [Eckhardt, 1987]
  • 13. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los AlamosMonte Carlo with computers Very close “coincidence” with appearance of very first computer, ENIAC, born Feb. 1946, on which von Neumann implemented Monte Carlo in 1947
  • 14. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Los AlamosMonte Carlo with computers Very close “coincidence” with appearance of very first computer, ENIAC, born Feb. 1946, on which von Neumann implemented Monte Carlo in 1947 Same year Ulam and von Neumann (re)invented inversion and accept-reject techniques In 1949, very first symposium on Monte Carlo and very first paper [Metropolis and Ulam, 1949]
  • 15. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953The Metropolis et al. (1953) paper Very first MCMC algorithm associated with the second computer, MANIAC, Los Alamos, early 1952. Besides Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller contributed to create the Metropolis algorithm...
  • 16. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953Motivating problem Computation of integrals of the form F (p, q) exp{−E(p, q)/kT }dpdq I= , exp{−E(p, q)/kT }dpdq with energy E defined as N N 1 E(p, q) = V (dij ), 2 i=1 j=1 j=i and N number of particles, V a potential function and dij the distance between particles i and j.
  • 17. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953Boltzmann distribution Boltzmann distribution exp{−E(p, q)/kT } parameterised by temperature T , k being the Boltzmann constant, with a normalisation factor Z(T ) = exp{−E(p, q)/kT }dpdq not available in closed form.
  • 18. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953Computational challenge Since p and q are 2N -dimensional vectors, numerical integration is impossible Plus, standard Monte Carlo techniques fail to correctly approximate I: exp{−E(p, q)/kT } is very small for most realizations of random configurations (p, q) of the particle system.
  • 19. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953Metropolis algorithm Consider a random walk modification of the N particles: for each 1 ≤ i ≤ N , values xi = xi + αξ1i and yi = yi + αξ2i are proposed, where both ξ1i and ξ2i are uniform U(−1, 1). The energy difference between new and previous configurations is ∆E and the new configuration is accepted with probability 1 ∧ exp{−∆E/kT } , and otherwise the previous configuration is replicated∗ ∗ counting one more time in the average of the F (pt , pt )’s over the τ moves of the random walk.
  • 20. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953Convergence Validity of the algorithm established by proving 1. irreducibility 2. ergodicity, that is convergence to the stationary distribution.
  • 21. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953Convergence Validity of the algorithm established by proving 1. irreducibility 2. ergodicity, that is convergence to the stationary distribution. Second part obtained via discretization of the space: Metropolis et al. note that the proposal is reversible, then establish that exp{−E/kT } is invariant.
  • 22. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953Convergence Validity of the algorithm established by proving 1. irreducibility 2. ergodicity, that is convergence to the stationary distribution. Second part obtained via discretization of the space: Metropolis et al. note that the proposal is reversible, then establish that exp{−E/kT } is invariant. Application to the specific problem of the rigid-sphere collision model. The number of iterations of the Metropolis algorithm seems to be limited: 16 steps for burn-in and 48 to 64 subsequent iterations (that still required four to five hours on the Los Alamos MANIAC).
  • 23. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953Physics and chemistry The method of Markov chain Monte Carlo immediately had wide use in physics and chemistry. [Geyer & Thompson, 1992] Hammersley and Handscomb, 1967 Piekaar and Clarenburg, 1967 Kennedy and Kutil, 1985 Sokal, 1989 &tc...
  • 24. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Metropolis et al., 1953Physics and chemistry Statistics has always been fuelled by energetic mining of the physics literature. [Clifford, 1993] Hammersley and Handscomb, 1967 Piekaar and Clarenburg, 1967 Kennedy and Kutil, 1985 Sokal, 1989 &tc...
  • 25. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970A fair generalisation In Biometrika 1970, Hastings defines MCMC methodology for finite and reversible Markov chains, the continuous case being discretised:
  • 26. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970A fair generalisation In Biometrika 1970, Hastings defines MCMC methodology for finite and reversible Markov chains, the continuous case being discretised: Generic acceptance probability for a move from state i to state j is sij αij = πi q , 1 + πj qij ji where sij is a symmetric function.
  • 27. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970State of the art Note Generic form that encompasses both Metropolis et al. (1953) and Barker (1965). Peskun’s ordering not yet discovered: Hastings mentions that little is known about the relative merits of those two choices (even though) Metropolis’s method may be preferable. Warning against high rejection rates as indicative of a poor choice of transition matrix, but not mention of the opposite pitfall of low rejection.
  • 28. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970What else?! Items included in the paper are a Poisson target with a ±1 random walk proposal, a normal target with a uniform random walk proposal mixed with its reflection (i.e. centered at −X(t) rather than X(t)), a multivariate target where Hastings introduces Gibbs sampling, updating one component at a time and defining the composed transition as satisfying the stationary condition because each component does leave the target invariant a reference to Erhman, Fosdick and Handscomb (1960) as a preliminary if specific instance of this Metropolis-within-Gibbs sampler an importance sampling version of MCMC, some remarks about error assessment, a Gibbs sampler for random orthogonal matrices
  • 29. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970Three years later Peskun (1973) compares Metropolis’ and Barker’s acceptance probabilities and shows (again in a discrete setup) that Metropolis’ is optimal (in terms of the asymptotic variance of any empirical average).
  • 30. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Hastings, 1970Three years later Peskun (1973) compares Metropolis’ and Barker’s acceptance probabilities and shows (again in a discrete setup) that Metropolis’ is optimal (in terms of the asymptotic variance of any empirical average). Proof direct consequence of Kemeny and Snell (1960) on asymptotic variance. Peskun also establishes that this variance can improve upon the iid case if and only if the eigenvalues of P − A are all negative, when A is the transition matrix corresponding to the iid simulation and P the transition matrix corresponding to the Metropolis algorithm, but he concludes that the trace of P − A is always positive.
  • 31. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksJulian’s early works (1) Early 1970’s, Hammersley, Clifford, and Besag were working on the specification of joint distributions from conditional distributions and on necessary and sufficient conditions for the conditional distributions to be compatible with a joint distribution.
  • 32. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksJulian’s early works (1) Early 1970’s, Hammersley, Clifford, and Besag were working on the specification of joint distributions from conditional distributions and on necessary and sufficient conditions for the conditional distributions to be compatible with a joint distribution. [Hammersley and Clifford, 1971]
  • 33. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksJulian’s early works (1) Early 1970’s, Hammersley, Clifford, and Besag were working on the specification of joint distributions from conditional distributions and on necessary and sufficient conditions for the conditional distributions to be compatible with a joint distribution. What is the most general form of the conditional probability functions that define a coherent joint function? And what will the joint look like? [Besag, 1972]
  • 34. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksHammersley-Clifford theorem Theorem (Hammersley-Clifford) Joint distribution of vector associated with a dependence graph must be represented as product of functions over the cliques of the graphs, i.e., of functions depending only on the components indexed by the labels in the clique. [Cressie, 1993; Lauritzen, 1996]
  • 35. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksHammersley-Clifford theorem Theorem (Hammersley-Clifford) A probability distribution P with positive and continuous density f satisfies the pairwise Markov property with respect to an undirected graph G if and only if it factorizes according to G , i.e., (F ) ≡ (G) [Cressie, 1993; Lauritzen, 1996]
  • 36. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksHammersley-Clifford theorem Theorem (Hammersley-Clifford) Under the positivity condition, the joint distribution g satisfies p g j (y j |y 1 , . . . , y j−1 ,y j+1 , . . . , y p) g(y1 , . . . , yp ) ∝ g j (y j |y 1 , . . . , y j−1 ,y , . . . , y p) j=1 j+1 for every permutation on {1, 2, . . . , p} and every y ∈ Y . [Cressie, 1993; Lauritzen, 1996]
  • 37. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksAn apocryphal theorem The Hammersley-Clifford theorem was never published by its authors, but only through Grimmet (1973), Preston (1973), Sherman (1973), Besag (1974). The authors were dissatisfied with the positivity constraint: The joint density could only be recovered from the full conditionals when the support of the joint was made of the product of the supports of the full conditionals (with obvious counter-examples. Moussouris’ counter-example put a full stop to their endeavors. [Hammersley, 1974]
  • 38. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksTo Gibbs or not to Gibbs? Julian Besag should certainly be credited to a large extent of the (re?-)discovery of the Gibbs sampler.
  • 39. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksTo Gibbs or not to Gibbs? Julian Besag should certainly be credited to a large extent of the (re?-)discovery of the Gibbs sampler. The simulation procedure is to consider the sites cyclically and, at each stage, to amend or leave unaltered the particular site value in question, according to a probability distribution whose elements depend upon the current value at neighboring sites (...) However, the technique is unlikely to be particularly helpful in many other than binary situations and the Markov chain itself has no practical interpretation. [Besag, 1974]
  • 40. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksBroader perspective In 1964, Hammersley and Handscomb wrote a (the first?) textbook on Monte Carlo methods: they cover They cover such topics as “Crude Monte Carlo“; importance sampling; control variates; and “Conditional Monte Carlo”, which looks surprisingly like a missing-data Gibbs completion approach. They state in the Preface We are convinced nevertheless that Monte Carlo methods will one day reach an impressive maturity.
  • 41. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksClicking in After Peskun (1973), MCMC mostly dormant in mainstream statistical world for about 10 years, then several papers/books highlighted its usefulness in specific settings: Geman and Geman (1984) Besag (1986) Strauss (1986) Ripley (Stochastic Simulation, 1987) Tanner and Wong (1987) Younes (1988)
  • 42. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksEnters the Gibbs sampler Geman and Geman (1984), building on Metropolis et al. (1953), Hastings (1970), and Peskun (1973), constructed a Gibbs sampler for optimisation in a discrete image processing problem without completion. Responsible for the name Gibbs sampling, because method used for the Bayesian study of Gibbs random fields linked to the physicist Josiah Willard Gibbs (1839–1903) Back to Metropolis et al., 1953: the Gibbs sampler is used as a simulated annealing algorithm and ergodicity is proven on the collection of global maxima
  • 43. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksBesag (1986) integrates GS for SA... ...easy to construct the transition matrix Q, of a discrete time Markov chain, with state space Ω and limit distribution (4). Simulated annealing proceeds by running an associated time inhomogeneous Markov chain with transition matrices QT , where T is progressively decreased according to a prescribed “schedule” to a value close to zero. [Besag, 1986]
  • 44. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works...and links with Metropolis-Hastings... There are various related methods of constructing a manageable QT (Hastings, 1970). Geman and Geman (1984) adopt the simplest, which they term the ”Gibbs sampler” (...) time reversibility, a common ingredient in this type of problem (see, for example, Besag, 1977a), is present at individual stages but not over complete cycles, though Peter Green has pointed out that it returns if QT is taken over a pair of cycles, the second of which visits pixels in reverse order [Besag, 1986]
  • 45. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works...seeing the larger picture,... As Geman and Geman (1984) point out, any property of the (posterior) distribution P (x|y) can be simulated by running the Gibbs sampler at “temperature” T = 1. Thus, if xi maximizes P (xi |y), then it is the most ˆ frequently occurring colour at pixel i in an infinite realization of the Markov chain with transition matrix Q of Section 2.3. The xi ’s can therefore be simultaneously ˆ estimated from a single finite realization of the chain. It is not yet clear how long the realization needs to be, particularly for estimation near colour boundaries, but the amount of computation required is generally prohibitive for routine purposes [Besag, 1986]
  • 46. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works...seeing the larger picture,... P (x|y) can be simulated using the Gibbs sampler, as suggested by Grenander (1983) and by Geman and Geman (1984). My dismissal of such an approach for routine applications was somewhat cavalier: purpose-built array processors could become relatively inexpensive (...) suppose that, for 100 complete cycles say, images have been collected from the Gibbs sampler (or by Metropolis’ method), following a “settling-in” period of perhaps another 100 cycles, which should cater for fairly intricate priors (...) These 100 images should often be adequate for estimating properties of the posterior (...) and for making approximate associated confidence statements, as mentioned by Mr Haslett. [Besag, 1986]
  • 47. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works...if not going fully Bayes! ...a neater and more efficient procedure [for parameter estimation] is to adopt maximum ”pseudo-likelihood” estimation (Besag, 1975)
  • 48. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works...if not going fully Bayes! ...a neater and more efficient procedure [for parameter estimation] is to adopt maximum ”pseudo-likelihood” estimation (Besag, 1975) I have become increasingly enamoured with the Bayesian paradigm [Besag, 1986]
  • 49. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works...if not going fully Bayes! ...a neater and more efficient procedure [for parameter estimation] is to adopt maximum ”pseudo-likelihood” estimation (Besag, 1975) I have become increasingly enamoured with the Bayesian paradigm [Besag, 1986] The pair (xi , βi ) is then a (bivariate) Markov field and can be reconstructed as a bivariate process by the methods described in Professor Besag’s paper. [Clifford, 1986]
  • 50. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early works...if not going fully Bayes! ...a neater and more efficient procedure [for parameter estimation] is to adopt maximum ”pseudo-likelihood” estimation (Besag, 1975) I have become increasingly enamoured with the Bayesian paradigm [Besag, 1986] The simulation-based estimator Epost Ψ(X) will differ ˆ from the m.a.p. estimator Ψ(x). [Silverman, 1986]
  • 51. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksDiscussants of Besag (1986) Impressive who’s who: D.M. Titterington, P. Clifford, P. Green, P. Brown, B. Silverman, F. Critchley, F. Kelly, K. Mardia, C. Jennison, J. Kent, D. Spiegelhalter, H. Wynn, D. and S. Geman, J. Haslett, J. Kay, H. K¨nsch, P. Switzer, B. Torsney, &tc u
  • 52. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksA comment on Besag (1986) While special purpose algorithms will determine the utility of the Bayesian methods, the general purpose methods-stochastic relaxation and simulation of solutions of the Langevin equation (Grenander, 1983; Geman and Geman, 1984; Gidas, 1985a; Geman and Hwang, 1986) have proven enormously convenient and versatile. We are able to apply a single computer program to every new problem by merely changing the subroutine that computes the energy function in the Gibbs representation of the posterior distribution. [Geman and McClure, 1986]
  • 53. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksAnother one It is easy to compute exact marginal and joint posterior probabilities of currently unobserved features, conditional on those clinical findings currently available (Spiegelhalter, 1986a,b), the updating taking the form of ‘propagating evidence’ through the network (...) it would be interesting to see if the techniques described tonight, which are of intermediate complexity, may have any applications in this new and exciting area [causal networks]. [Spiegelhalter, 1986]
  • 54. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksThe candidate’s formula Representation of the marginal likelihood as π(θ)f (x|θ) m(x) π(θ|x) or of the marginal predictive as pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y ) [Besag, 1989]
  • 55. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksThe candidate’s formula Representation of the marginal likelihood as π(θ)f (x|θ) m(x) π(θ|x) or of the marginal predictive as pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y ) [Besag, 1989] Why candidate? “Equation (2) appeared without explanation in a Durham University undergraduate final examination script of 1984. Regrettably, the student’s name is no longer known to me.”
  • 56. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksImplications Newton and Raftery (1994) used this representation to derive the [infamous] harmonic mean approximation to the marginal likelihood Gelfand and Dey (1994) Geyer and Thompson (1995) Chib (1995)
  • 57. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksImplications Newton and Raftery (1994) Gelfand and Dey (1994) also relied on this formula for the same purpose in a more general perspective Geyer and Thompson (1995) Chib (1995)
  • 58. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksImplications Newton and Raftery (1994) Gelfand and Dey (1994) Geyer and Thompson (1995) derived MLEs by a Monte Carlo approximation to the normalising constant Chib (1995)
  • 59. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data Before the revolution Julian’s early worksImplications Newton and Raftery (1994) Gelfand and Dey (1994) Geyer and Thompson (1995) Chib (1995) uses this representation to build a MCMC approximation to the marginal likelihood
  • 60. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Final steps toImpact “This is surely a revolution.” [Clifford, 1993] Geman and Geman (1984) is one more spark that led to the explosion, as it had a clear influence on Gelfand, Green, Smith, Spiegelhalter and others. Sparked new interest in Bayesian methods, statistical computing, algorithms, and stochastic processes through the use of computing algorithms such as the Gibbs sampler and the Metropolis–Hastings algorithm.
  • 61. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Final steps toImpact “[Gibbs sampler] use seems to have been isolated in the spatial statistics community until Gelfand and Smith (1990)” [Geyer, 1990] Geman and Geman (1984) is one more spark that led to the explosion, as it had a clear influence on Gelfand, Green, Smith, Spiegelhalter and others. Sparked new interest in Bayesian methods, statistical computing, algorithms, and stochastic processes through the use of computing algorithms such as the Gibbs sampler and the Metropolis–Hastings algorithm.
  • 62. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Final steps toData augmentation Tanner and Wong (1987) has essentialy the same ingredients as Gelfand and Smith (1990): simulating from conditionals is simulating from the joint
  • 63. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Final steps toData augmentation Tanner and Wong (1987) has essentialy the same ingredients as Gelfand and Smith (1990): simulating from conditionals is simulating from the joint Lower impact: emphasis on missing data problems (hence data augmentation) MCMC approximation to the target at every iteration K 1 π(θ|x) ≈ π(θ|x, z t,k ) , z t,k ∼ πt−1 (z|x) , ˆ K k=1 too close to Rubin’s (1978) multiple imputation theoretical backup based on functional analysis (Markov kernel had to be uniformly bounded and equicontinuous)
  • 64. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990Epiphany In June 1989, at a Bayesian workshop in Sherbrooke, Qu´bec, Adrian Smith exposed for the first time (?) e the generic features of Gibbs sampler, exhibiting a ten line Fortran program handling a random effect model Yij = θi + εij , i = 1, . . . , K, j = 1, . . . , J, 2 2 θi ∼ N(µ, σθ ) εij ∼ N(0, σε ) by full conditionals on µ, σθ , σε ... [Gelfand and Smith, 1990] This was enough to convince the whole audience!
  • 65. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990Garden of Eden In early 1990s, researchers found that Gibbs and then Metropolis - Hastings algorithms would crack almost any problem! Flood of papers followed applying MCMC: linear mixed models (Gelfand et al., 1990; Zeger and Karim, 1991; Wang et al., 1993, 1994) generalized linear mixed models (Albert and Chib, 1993) mixture models (Tanner and Wong, 1987; Diebolt and X., 1990, 1994; Escobar and West, 1993) changepoint analysis (Carlin et al., 1992) point processes (Grenander and Møller, 1994) &tc
  • 66. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990Garden of Eden In early 1990s, researchers found that Gibbs and then Metropolis - Hastings algorithms would crack almost any problem! Flood of papers followed applying MCMC: genomics (Stephens and Smith, 1993; Lawrence et al., 1993; Churchill, 1995; Geyer and Thompson, 1995) ecology (George and X, 1992; Dupuis, 1995) variable selection in regression (George and mcCulloch, 1993) spatial statistics (Raftery and Banfield, 1991) longitudinal studies (Lange et al., 1992) &tc
  • 67. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990[some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, relied on MCMC methods for ML estimation Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996
  • 68. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990[some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 discussed convergence diagnoses and applications, incl. mixtures for Gibbs and Metropolis–Hastings Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996
  • 69. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990[some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 stated the desideratas for convergences, and connect MCMC with auxiliary and antithetic variables Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996
  • 70. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990[some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 laid out all of the assumptions needed to analyze the Markov chains and then developed their properties, in particular, convergence of ergodic averages and central limit theorems Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996
  • 71. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990[some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 analyzed the covariance structure of Gibbs sampling, and were able to formally establish the validity of Rao-Blackwellization in Gibbs sampling Mengersen and Tweedie, 1996
  • 72. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990[some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996 set the tone for the study of the speed of convergence of MCMC algorithms to the target distribution
  • 73. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Gelfand and Smith, 1990[some of the] early theoretical advances “It may well be remembered as the afternoon of the 11 Bayesians” [Clifford, 1993] Geyer and Thompson, 1992, Smith and Roberts, 1993 Besag and Green, 1993 Tierney, 1994 Liu, Wong and Kong, 1994,95 Mengersen and Tweedie, 1996 Gilks, Clayton and Spiegelhalter, 1993 &tc...
  • 74. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data The Revolution Convergence diagnosesConvergence diagnoses Can we really tell when a complicated Markov chain has reached equilibrium? Frankly, I doubt it. [Clifford, 1993] Explosion of methods Gelman and Rubin (1991) Besag and Green (1992) Geyer (1992) Raftery and Lewis (1992) Cowles and Carlin (1996) coda Brooks and Roberts (1998) &tc
  • 75. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Particle systemsParticles, again Iterating importance sampling is about as old as Monte Carlo methods themselves! [Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955] Found in the molecular simulation literature of the 50’s with self-avoiding random walks and signal processing [Marshall, 1965; Handschin and Mayne, 1969]
  • 76. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Particle systemsParticles, again Iterating importance sampling is about as old as Monte Carlo methods themselves! [Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955] Found in the molecular simulation literature of the 50’s with self-avoiding random walks and signal processing [Marshall, 1965; Handschin and Mayne, 1969] Use of the term “particle” dates back to Kitagawa (1996), and Carpenter et al. (1997) coined the term “particle filter”.
  • 77. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Particle systemsBootstrap filter and sequential Monte Carlo Gordon, Salmon and Smith (1993) introduced the bootstrap filter which, while formally connected with importance sampling, involves past simulations and possible MCMC steps (Gilks and Berzuini, 2001). Sequential imputation was developped in Kong, Liu and Wong (1994), while Liu and Chen (1995) first formally pointed out the importance of resampling in “sequential Monte Carlo”, a term they coined
  • 78. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Particle systemspMC versus pMCMC Recycling of past simulations legitimate to build better importance sampling functions as in population Monte Carlo [Iba, 2000; Capp´ et al, 2004; Del Moral et al., 2007] e Recent synthesis by Andrieu, Doucet, and Hollenstein (2010) using particles to build an evolving MCMC kernel pθ (y1:T ) in ˆ state space models p(x1:T )p(y1:T |x1:T ), along with Andrieu’s and Roberts’ (2009) use of approximations in MCMC acceptance steps [Kennedy and Kulti, 1985]
  • 79. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Reversible jumpReversible jump Generaly considered as the second Revolution. Formalisation of a Markov chain moving across models and parameter spaces allows for the Bayesian processing of a wide variety of models and to the success of Bayesian model choice Definition of a proper balance condition on cross-model Markov kernels gives a generic setup for exploring variable dimension spaces, even when the number of models under comparison is infinite. [Green, 1995]
  • 80. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Perfect samplingPerfect sampling Seminal paper of Propp and Wilson (1996) showed how to use MCMC methods to produce an exact (or perfect) simulation from the target.
  • 81. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution Perfect samplingPerfect sampling Seminal paper of Propp and Wilson (1996) showed how to use MCMC methods to produce an exact (or perfect) simulation from the target. Outburst of papers, particularly from Jesper Møller and coauthors, but the excitement somehow dried out [except in dedicated areas] as construction of perfect samplers is hard and coalescence times very high... [Møller and Waagepetersen, 2003]
  • 82. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data After the Revolution EnvoiTo be continued... ...standing on the shoulders of giants