A short history of MCMC

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

A Short History of Markov Chain Monte Carlo:
Subjective Recollections from Incomplete Data

Christian P. Robert and George Casella

Universit´ Paris-Dauphine, IuF, & CRESt
e
and University of Florida

April 2, 2011


In memoriam, Julian Besag, 1945–2010

Introduction

Introduction

Markov Chain Monte Carlo (MCMC) methods around for almost
as long as Monte Carlo techniques, even though impact on
Statistics not been truly felt until the late 1980s / early 1990s .

Introduction

Introduction

Markov Chain Monte Carlo (MCMC) methods around for almost
as long as Monte Carlo techniques, even though impact on
Statistics not been truly felt until the late 1980s / early 1990s .
Contents: Distinction between Metropolis-Hastings based
algorithms and those related with Gibbs sampling, and brief entry
into “second-generation MCMC revolution”.

Introduction

A few landmarks

Realization that Markov chains could be used in a wide variety of
situations only came to “mainstream statisticians” with Gelfand
and Smith (1990) despite earlier publications in the statistical
literature like Hastings (1970) and growing awareness in spatial
statistics (Besag, 1986)

Introduction

A few landmarks

Realization that Markov chains could be used in a wide variety of
situations only came to “mainstream statisticians” with Gelfand
and Smith (1990) despite earlier publications in the statistical
literature like Hastings (1970) and growing awareness in spatial
statistics (Besag, 1986)
Several reasons:
lack of computing machinery
lack of background on Markov chains
lack of trust in the practicality of the method

Before the revolution
Los Alamos

Bombs before the revolution

Monte Carlo methods born in Los
Alamos, New Mexico, during WWII,
mostly by physicists working on atomic
bombs and eventually producing the
Metropolis algorithm in the early
1950’s.

[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]

Los Alamos

Bombs before the revolution

Monte Carlo methods born in Los
Alamos, New Mexico, during WWII,
mostly by physicists working on atomic
bombs and eventually producing the
Metropolis algorithm in the early
1950’s.
[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953]

Los Alamos

Monte Carlo genesis
Monte Carlo method usually traced to
Ulam and von Neumann:
Stanislaw Ulam associates idea
with an intractable combinatorial
computation attempted in 1946
about “solitaire”
Idea was enthusiastically adopted
by John von Neumann for
implementation on neutron
diﬀusion
Name “Monte Carlo“ being
suggested by Nicholas Metropolis
[Eckhardt, 1987]

Los Alamos

Monte Carlo with computers

Very close “coincidence” with
appearance of very ﬁrst
computer, ENIAC, born Feb.
1946, on which von Neumann
implemented Monte Carlo in
1947

Los Alamos

Monte Carlo with computers

Very close “coincidence” with
appearance of very first
computer, ENIAC, born Feb.
1946, on which von Neumann
implemented Monte Carlo in
1947
Same year Ulam and von Neumann (re)invented inversion and
accept-reject techniques
In 1949, very first symposium on Monte Carlo and very first paper
[Metropolis and Ulam, 1949]

Metropolis et al., 1953

The Metropolis et al. (1953) paper

Very ﬁrst MCMC algorithm associated
with the second computer, MANIAC,
Los Alamos, early 1952.
Besides Metropolis, Arianna W.
Rosenbluth, Marshall N. Rosenbluth,
Augusta H. Teller, and Edward Teller
contributed to create the Metropolis
algorithm...


Motivating problem

Computation of integrals of the form

F (p, q) exp{−E(p, q)/kT }dpdq
I= ,
exp{−E(p, q)/kT }dpdq

with energy E deﬁned as
N N
1
E(p, q) = V (dij ),
2
i=1 j=1
j=i

and N number of particles, V a potential function and dij the
distance between particles i and j.


Boltzmann distribution

Boltzmann distribution exp{−E(p, q)/kT } parameterised by
temperature T , k being the Boltzmann constant, with a
normalisation factor

Z(T ) = exp{−E(p, q)/kT }dpdq

not available in closed form.


Computational challenge

Since p and q are 2N -dimensional vectors, numerical integration is
impossible
Plus, standard Monte Carlo techniques fail to correctly
approximate I: exp{−E(p, q)/kT } is very small for most
realizations of random conﬁgurations (p, q) of the particle system.


Metropolis algorithm

Consider a random walk modification of the N particles: for each
1 ≤ i ≤ N , values

xi = xi + αξ1i and yi = yi + αξ2i

are proposed, where both ξ1i and ξ2i are uniform U(−1, 1). The
energy difference between new and previous configurations is ∆E
and the new configuration is accepted with probability

1 ∧ exp{−∆E/kT } ,

and otherwise the previous configuration is replicated∗

∗
counting one more time in the average of the F (pt , pt )’s over the τ moves
of the random walk.


Convergence

Validity of the algorithm established by proving
1. irreducibility
2. ergodicity, that is convergence to the stationary distribution.


Convergence

1. irreducibility
Second part obtained via discretization of the space: Metropolis et
al. note that the proposal is reversible, then establish that
exp{−E/kT } is invariant.


Convergence

1. irreducibility
Second part obtained via discretization of the space: Metropolis et
al. note that the proposal is reversible, then establish that
exp{−E/kT } is invariant.
Application to the speciﬁc problem of the rigid-sphere collision
model. The number of iterations of the Metropolis algorithm
seems to be limited: 16 steps for burn-in and 48 to 64 subsequent
iterations (that still required four to ﬁve hours on the Los Alamos
MANIAC).


Physics and chemistry

The method of Markov chain Monte Carlo immediately
had wide use in physics and chemistry.
[Geyer & Thompson, 1992]

Hammersley and Handscomb, 1967
Piekaar and Clarenburg, 1967
Kennedy and Kutil, 1985
Sokal, 1989
&tc...


Physics and chemistry

Statistics has always been fuelled by energetic mining of
the physics literature.
[Cliﬀord, 1993]

Hammersley and Handscomb, 1967
Piekaar and Clarenburg, 1967
Kennedy and Kutil, 1985
Sokal, 1989
&tc...

Hastings, 1970

A fair generalisation

In Biometrika 1970, Hastings deﬁnes MCMC methodology for
ﬁnite and reversible Markov chains, the continuous case being
discretised:

Hastings, 1970

A fair generalisation

In Biometrika 1970, Hastings deﬁnes MCMC methodology for
ﬁnite and reversible Markov chains, the continuous case being
discretised:
Generic acceptance probability for a move from state i to state j is
sij
αij = πi q ,
1 + πj qij
ji

where sij is a symmetric function.

Hastings, 1970

State of the art

Note
Generic form that encompasses both Metropolis et al. (1953) and
Barker (1965).
Peskun’s ordering not yet discovered: Hastings mentions that little
is known about the relative merits of those two choices (even
though) Metropolis’s method may be preferable.
Warning against high rejection rates as indicative of a poor choice
of transition matrix, but not mention of the opposite pitfall of low
rejection.

Hastings, 1970

What else?!
Items included in the paper are
a Poisson target with a ±1 random walk proposal,
a normal target with a uniform random walk proposal mixed with its
reflection (i.e. centered at −X(t) rather than X(t)),
a multivariate target where Hastings introduces Gibbs sampling,
updating one component at a time and defining the composed
transition as satisfying the stationary condition because each
component does leave the target invariant
a reference to Erhman, Fosdick and Handscomb (1960) as a
preliminary if specific instance of this Metropolis-within-Gibbs
sampler
an importance sampling version of MCMC,
some remarks about error assessment,
a Gibbs sampler for random orthogonal matrices

Hastings, 1970

Three years later

Peskun (1973) compares Metropolis’ and Barker’s acceptance
probabilities and shows (again in a discrete setup) that Metropolis’
is optimal (in terms of the asymptotic variance of any empirical
average).

Hastings, 1970

Three years later

Peskun (1973) compares Metropolis’ and Barker’s acceptance
probabilities and shows (again in a discrete setup) that Metropolis’
is optimal (in terms of the asymptotic variance of any empirical
average).
Proof direct consequence of Kemeny and Snell (1960) on
asymptotic variance. Peskun also establishes that this variance can
improve upon the iid case if and only if the eigenvalues of P − A
are all negative, when A is the transition matrix corresponding to
the iid simulation and P the transition matrix corresponding to the
Metropolis algorithm, but he concludes that the trace of P − A is
always positive.

Julian’s early works

Julian’s early works (1)

Early 1970’s, Hammersley, Clifford, and Besag were working on the
specification of joint distributions from conditional distributions
and on necessary and sufficient conditions for the conditional
distributions to be compatible with a joint distribution.




[Hammersley and Cliﬀord, 1971]



What is the most general form of the conditional probability
functions that deﬁne a coherent joint function? And what will the
joint look like?
[Besag, 1972]


Hammersley-Cliﬀord theorem

Theorem (Hammersley-Cliﬀord)
Joint distribution of vector associated with a dependence graph
must be represented as product of functions over the cliques of the
graphs, i.e., of functions depending only on the components
indexed by the labels in the clique.

[Cressie, 1993; Lauritzen, 1996]



A probability distribution P with positive and continuous density f
satisﬁes the pairwise Markov property with respect to an undirected
graph G if and only if it factorizes according to G , i.e., (F ) ≡ (G)




Under the positivity condition, the joint distribution g satisﬁes
p
g j (y j |y 1 , . . . , y j−1
,y j+1
, . . . , y p)
g(y1 , . . . , yp ) ∝
g j (y j |y 1 , . . . , y j−1
,y , . . . , y p)
j=1 j+1

for every permutation on {1, 2, . . . , p} and every y ∈ Y .



An apocryphal theorem

The Hammersley-Cliﬀord theorem was never published by its
authors, but only through Grimmet (1973), Preston (1973),
Sherman (1973), Besag (1974). The authors were dissatisﬁed with
the positivity constraint: The joint density could only be recovered
from the full conditionals when the support of the joint was made
of the product of the supports of the full conditionals (with
obvious counter-examples. Moussouris’ counter-example put a full
stop to their endeavors.
[Hammersley, 1974]


To Gibbs or not to Gibbs?

Julian Besag should certainly be credited to a large extent of the
(re?-)discovery of the Gibbs sampler.


To Gibbs or not to Gibbs?

Julian Besag should certainly be credited to a large extent of the
(re?-)discovery of the Gibbs sampler.
The simulation procedure is to consider the sites
cyclically and, at each stage, to amend or leave unaltered
the particular site value in question, according to a
probability distribution whose elements depend upon the
current value at neighboring sites (...) However, the
technique is unlikely to be particularly helpful in many
other than binary situations and the Markov chain itself
has no practical interpretation.
[Besag, 1974]


Broader perspective

In 1964, Hammersley and Handscomb wrote a (the ﬁrst?)
textbook on Monte Carlo methods: they cover
They cover such topics as
“Crude Monte Carlo“;
importance sampling;
control variates; and
“Conditional Monte Carlo”, which looks surprisingly like a
missing-data Gibbs completion approach.
They state in the Preface
We are convinced nevertheless that Monte Carlo methods
will one day reach an impressive maturity.


Clicking in

After Peskun (1973), MCMC mostly dormant in mainstream
statistical world for about 10 years, then several papers/books
highlighted its usefulness in speciﬁc settings:
Geman and Geman (1984)
Besag (1986)
Strauss (1986)
Ripley (Stochastic Simulation, 1987)
Tanner and Wong (1987)
Younes (1988)


Enters the Gibbs sampler

Geman and Geman (1984), building on Metropolis et al. (1953),
Hastings (1970), and Peskun (1973), constructed a Gibbs sampler
for optimisation in a discrete image processing problem without
completion.
Responsible for the name Gibbs sampling, because method used for
the Bayesian study of Gibbs random ﬁelds linked to the physicist
Josiah Willard Gibbs (1839–1903)
Back to Metropolis et al., 1953: the Gibbs sampler is used as a
simulated annealing algorithm and ergodicity is proven on the
collection of global maxima


Besag (1986) integrates GS for SA...

...easy to construct the transition matrix Q, of a discrete
time Markov chain, with state space Ω and limit
distribution (4). Simulated annealing proceeds by
running an associated time inhomogeneous Markov chain
with transition matrices QT , where T is progressively
decreased according to a prescribed “schedule” to a value
close to zero.
[Besag, 1986]


...and links with Metropolis-Hastings...

There are various related methods of constructing a
manageable QT (Hastings, 1970). Geman and Geman
(1984) adopt the simplest, which they term the ”Gibbs
sampler” (...) time reversibility, a common ingredient in
this type of problem (see, for example, Besag, 1977a), is
present at individual stages but not over complete cycles,
though Peter Green has pointed out that it returns if QT
is taken over a pair of cycles, the second of which visits
pixels in reverse order
[Besag, 1986]


...seeing the larger picture,...
As Geman and Geman (1984) point out, any property of
the (posterior) distribution P (x|y) can be simulated by
running the Gibbs sampler at “temperature” T = 1.
Thus, if xi maximizes P (xi |y), then it is the most
ˆ
frequently occurring colour at pixel i in an inﬁnite
realization of the Markov chain with transition matrix Q
of Section 2.3. The xi ’s can therefore be simultaneously
ˆ
estimated from a single ﬁnite realization of the chain. It
is not yet clear how long the realization needs to be,
particularly for estimation near colour boundaries, but the
amount of computation required is generally prohibitive
for routine purposes
[Besag, 1986]


...seeing the larger picture,...
P (x|y) can be simulated using the Gibbs sampler, as
suggested by Grenander (1983) and by Geman and
Geman (1984). My dismissal of such an approach for
routine applications was somewhat cavalier:
purpose-built array processors could become relatively
inexpensive (...) suppose that, for 100 complete cycles
say, images have been collected from the Gibbs sampler
(or by Metropolis’ method), following a “settling-in”
period of perhaps another 100 cycles, which should cater
for fairly intricate priors (...) These 100 images should
often be adequate for estimating properties of the
posterior (...) and for making approximate associated
conﬁdence statements, as mentioned by Mr Haslett.
[Besag, 1986]


...if not going fully Bayes!

...a neater and more eﬃcient procedure [for parameter
estimation] is to adopt maximum ”pseudo-likelihood”
estimation (Besag, 1975)




I have become increasingly enamoured with the Bayesian
paradigm
[Besag, 1986]



paradigm
[Besag, 1986]

The pair (xi , βi ) is then a (bivariate) Markov ﬁeld and
can be reconstructed as a bivariate process by the
methods described in Professor Besag’s paper.
[Cliﬀord, 1986]



paradigm
[Besag, 1986]

The simulation-based estimator Epost Ψ(X) will diﬀer
ˆ
from the m.a.p. estimator Ψ(x).

[Silverman, 1986]


Discussants of Besag (1986)

Impressive who’s who: D.M. Titterington, P. Cliﬀord, P. Green, P.
Brown, B. Silverman, F. Critchley, F. Kelly, K. Mardia, C.
Jennison, J. Kent, D. Spiegelhalter, H. Wynn, D. and S. Geman, J.
Haslett, J. Kay, H. K¨nsch, P. Switzer, B. Torsney, &tc
u


A comment on Besag (1986)

While special purpose algorithms will determine the
utility of the Bayesian methods, the general purpose
methods-stochastic relaxation and simulation of solutions
of the Langevin equation (Grenander, 1983; Geman and
Geman, 1984; Gidas, 1985a; Geman and Hwang, 1986)
have proven enormously convenient and versatile. We are
able to apply a single computer program to every new
problem by merely changing the subroutine that
computes the energy function in the Gibbs representation
of the posterior distribution.
[Geman and McClure, 1986]


Another one

It is easy to compute exact marginal and joint posterior
probabilities of currently unobserved features, conditional
on those clinical ﬁndings currently available
(Spiegelhalter, 1986a,b), the updating taking the form of
‘propagating evidence’ through the network (...) it would
be interesting to see if the techniques described tonight,
which are of intermediate complexity, may have any
applications in this new and exciting area [causal
networks].

[Spiegelhalter, 1986]


The candidate’s formula
Representation of the marginal likelihood as

π(θ)f (x|θ)
m(x)
π(θ|x)

or of the marginal predictive as

pn (y |y) = f (y |θ)πn (θ|y) πn+1 (θ|y, y )

[Besag, 1989]

Why candidate?
“Equation (2) appeared without explanation in a Durham
University undergraduate ﬁnal examination script of 1984.
Regrettably, the student’s name is no longer known to me.”


Implications

Newton and Raftery (1994) used this representation to derive
the [infamous] harmonic mean approximation to the marginal
likelihood
Gelfand and Dey (1994)
Geyer and Thompson (1995)
Chib (1995)


Implications

Newton and Raftery (1994)
Gelfand and Dey (1994) also relied on this formula for the
same purpose in a more general perspective
Chib (1995)


Implications

Geyer and Thompson (1995) derived MLEs by a Monte Carlo
approximation to the normalising constant
Chib (1995)


Implications

Chib (1995) uses this representation to build a MCMC
approximation to the marginal likelihood

The Revolution
Final steps to

Impact

“This is surely a revolution.”
[Cliﬀord, 1993]
Geman and Geman (1984) is one more spark that led to the
explosion, as it had a clear inﬂuence on Gelfand, Green, Smith,
Spiegelhalter and others.
Sparked new interest in Bayesian methods, statistical computing,
algorithms, and stochastic processes through the use of computing
algorithms such as the Gibbs sampler and the Metropolis–Hastings
algorithm.

The Revolution
Final steps to

Impact

“[Gibbs sampler] use seems to have been isolated in the spatial
statistics community until Gelfand and Smith (1990)”
[Geyer, 1990]
Geman and Geman (1984) is one more spark that led to the
explosion, as it had a clear inﬂuence on Gelfand, Green, Smith,
Spiegelhalter and others.
Sparked new interest in Bayesian methods, statistical computing,
algorithms, and stochastic processes through the use of computing
algorithms such as the Gibbs sampler and the Metropolis–Hastings
algorithm.

The Revolution
Final steps to

Data augmentation
Tanner and Wong (1987) has essentialy the same ingredients as
Gelfand and Smith (1990): simulating from conditionals is
simulating from the joint

The Revolution
Final steps to

Data augmentation
Tanner and Wong (1987) has essentialy the same ingredients as
Gelfand and Smith (1990): simulating from conditionals is
simulating from the joint
Lower impact:
emphasis on missing data problems (hence data augmentation)
MCMC approximation to the target at every iteration
K
1
π(θ|x) ≈ π(θ|x, z t,k ) , z t,k ∼ πt−1 (z|x) ,
ˆ
K
k=1

too close to Rubin’s (1978) multiple imputation
theoretical backup based on functional analysis (Markov kernel had
to be uniformly bounded and equicontinuous)

The Revolution
Gelfand and Smith, 1990

Epiphany

In June 1989, at a Bayesian workshop in Sherbrooke,
Qu´bec, Adrian Smith exposed for the ﬁrst time (?)
e
the generic features of Gibbs sampler, exhibiting a ten
line Fortran program handling a random eﬀect model

Yij = θi + εij , i = 1, . . . , K, j = 1, . . . , J,
2 2
θi ∼ N(µ, σθ ) εij ∼ N(0, σε )

by full conditionals on µ, σθ , σε ...
[Gelfand and Smith, 1990]
This was enough to convince the whole audience!

The Revolution

Garden of Eden

In early 1990s, researchers found that Gibbs and then Metropolis -
Hastings algorithms would crack almost any problem!
Flood of papers followed applying MCMC:
linear mixed models (Gelfand et al., 1990; Zeger and Karim, 1991;
Wang et al., 1993, 1994)
generalized linear mixed models (Albert and Chib, 1993)
mixture models (Tanner and Wong, 1987; Diebolt and X., 1990,
1994; Escobar and West, 1993)
changepoint analysis (Carlin et al., 1992)
point processes (Grenander and Møller, 1994)
&tc

The Revolution

Garden of Eden

In early 1990s, researchers found that Gibbs and then Metropolis -
Hastings algorithms would crack almost any problem!
Flood of papers followed applying MCMC:
genomics (Stephens and Smith, 1993; Lawrence et al., 1993;
Churchill, 1995; Geyer and Thompson, 1995)
ecology (George and X, 1992; Dupuis, 1995)
variable selection in regression (George and mcCulloch, 1993)
spatial statistics (Raftery and Banﬁeld, 1991)
longitudinal studies (Lange et al., 1992)
&tc

The Revolution

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”
[Cliﬀord, 1993]
Geyer and Thompson, 1992, relied on MCMC methods for ML
estimation
Smith and Roberts, 1993
Besag and Green, 1993
Tierney, 1994
Liu, Wong and Kong, 1994,95
Mengersen and Tweedie, 1996

The Revolution


[Cliﬀord, 1993]
Geyer and Thompson, 1992,
Smith and Roberts, 1993 discussed convergence diagnoses and
applications, incl. mixtures for Gibbs and Metropolis–Hastings
Tierney, 1994

The Revolution


[Cliﬀord, 1993]
Besag and Green, 1993 stated the desideratas for
convergences, and connect MCMC with auxiliary and
antithetic variables
Tierney, 1994

The Revolution


[Cliﬀord, 1993]
Tierney, 1994 laid out all of the assumptions needed to
analyze the Markov chains and then developed their
properties, in particular, convergence of ergodic averages and
central limit theorems

The Revolution


[Cliﬀord, 1993]
Tierney, 1994
Liu, Wong and Kong, 1994,95 analyzed the covariance
structure of Gibbs sampling, and were able to formally
establish the validity of Rao-Blackwellization in Gibbs
sampling

The Revolution


[Cliﬀord, 1993]
Tierney, 1994
Mengersen and Tweedie, 1996 set the tone for the study of
the speed of convergence of MCMC algorithms to the target
distribution

The Revolution


[Cliﬀord, 1993]
Tierney, 1994
Gilks, Clayton and Spiegelhalter, 1993
&tc...

The Revolution
Convergence diagnoses

Convergence diagnoses
Can we really tell when a complicated Markov chain has
reached equilibrium? Frankly, I doubt it.
[Cliﬀord, 1993]

Explosion of methods
Gelman and Rubin (1991)
Besag and Green (1992)
Geyer (1992)
Raftery and Lewis (1992)
Cowles and Carlin (1996) coda
Brooks and Roberts (1998)
&tc

After the Revolution
Particle systems

Particles, again

Iterating importance sampling is about as old as Monte Carlo
methods themselves!
[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
Found in the molecular simulation literature of the 50’s with
self-avoiding random walks and signal processing
[Marshall, 1965; Handschin and Mayne, 1969]

Particle systems

Particles, again

Iterating importance sampling is about as old as Monte Carlo
methods themselves!
[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
Found in the molecular simulation literature of the 50’s with
self-avoiding random walks and signal processing
[Marshall, 1965; Handschin and Mayne, 1969]

Use of the term “particle” dates back to Kitagawa (1996), and Carpenter
et al. (1997) coined the term “particle ﬁlter”.

Particle systems

Bootstrap filter and sequential Monte Carlo

Gordon, Salmon and Smith (1993) introduced the bootstrap filter
which, while formally connected with importance sampling,
involves past simulations and possible MCMC steps (Gilks and
Berzuini, 2001).
Sequential imputation was developped in Kong, Liu and Wong
(1994), while Liu and Chen (1995) first formally pointed out the
importance of resampling in “sequential Monte Carlo”, a term they
coined

Particle systems

pMC versus pMCMC

Recycling of past simulations legitimate to build better
importance sampling functions as in population Monte Carlo
[Iba, 2000; Capp´ et al, 2004; Del Moral et al., 2007]
e

Recent synthesis by Andrieu, Doucet, and Hollenstein (2010)
using particles to build an evolving MCMC kernel pθ (y1:T ) in
ˆ
state space models p(x1:T )p(y1:T |x1:T ), along with Andrieu’s
and Roberts’ (2009) use of approximations in MCMC
acceptance steps
[Kennedy and Kulti, 1985]

Reversible jump

Reversible jump

Generaly considered as the second Revolution.

Formalisation of a Markov chain moving across
models and parameter spaces allows for the
Bayesian processing of a wide variety of models
and to the success of Bayesian model choice

Deﬁnition of a proper balance condition on cross-model Markov
kernels gives a generic setup for exploring variable dimension
spaces, even when the number of models under comparison is
inﬁnite.
[Green, 1995]

Perfect sampling

Perfect sampling

Seminal paper of Propp and Wilson (1996) showed how to use
MCMC methods to produce an exact (or perfect) simulation from
the target.

Perfect sampling

Perfect sampling

Seminal paper of Propp and Wilson (1996) showed how to use
MCMC methods to produce an exact (or perfect) simulation from
the target.
Outburst of papers, particularly from Jesper Møller and coauthors,
but the excitement somehow dried out [except in dedicated areas]
as construction of perfect samplers is hard and coalescence times
very high...
[Møller and Waagepetersen, 2003]

Envoi

To be continued...
...standing on the shoulders of giants

A short history of MCMC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to A short history of MCMC

Similar to A short history of MCMC (10)

More from Christian Robert

More from Christian Robert (20)

Recently uploaded

Recently uploaded (20)

A short history of MCMC