1. Recent developments on
unbiased Markov chain Monte Carlo
Pierre E. Jacob (Harvard)
joint work with
John O’Leary (Harvard), Yves Atchad´e (Boston University)
Machine Learning Seminar
Duke University
December 5, 2018
Pierre E. Jacob Unbiased MCMC
2. Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
3. Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
4. Setting
Continuous or discrete space of dimension d.
Target probability distribution π,
with probability density function x → π(x).
Goal: approximate π, e.g.
for a class of functions h, approximate Eπ[h(X)],
defined as h(x)π(x)dx and also written π(h).
Pierre E. Jacob Unbiased MCMC
5. Markov chain Monte Carlo
Initially, X0 ∼ π0, then Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, . . . , T.
Estimator:
1
T − b
T
t=b+1
h(Xt),
where b iterations are discarded as burn-in.
Might converge to Eπ[h(X)] as T → ∞ by the ergodic theorem.
An MCMC algorithm ≡ a description of how to sample from
π0 and from a Markov kernel P leaving π invariant.
Pierre E. Jacob Unbiased MCMC
6. Bias and variance
The bias goes to zero as T → ∞:
E[
1
T − b
T
t=b+1
h(Xt)] − Eπ[h(X)] → 0,
but the bias is not zero for any fixed b, T, since π0 = π,
i.e. because the chain does not start at stationarity.
Therefore, obtaining many such estimators in parallel, for fixed
T, b, and averaging them, would not provide a consistent
estimator of Eπ[h(X)].
Pierre E. Jacob Unbiased MCMC
7. Preview
In the proposed approach, each run will involve two coupled
chains (Xt) and (Yt),
until some event occurs, which will happen after a random
number of iterations denoted by τ.
At that point an estimator later denoted by Hk:m will be
returned,
with the guarantee that E[Hk:m] = Eπ[h(X)] (unbiasedness).
Removing the bias will come at a price in terms of computing
cost and variance.
Pierre E. Jacob Unbiased MCMC
8. Why do we care about unbiased estimators?
In classical point estimation, unbiasedness is not crucial.
Larry Wasserman in “All of Statistics” (2003) writes:
Unbiasedness used to receive much attention but these
days is considered less important.
On the other hand, Jeff Rosenthal in “Parallel computing and
Monte Carlo algorithms” (2000) writes
When running parallel Monte Carlo with many
computers, it is more important to start with an
unbiased (or low-bias) estimate than with a
low-variance estimate.
Pierre E. Jacob Unbiased MCMC
9. Why do we care about unbiased estimators?
Pierre E. Jacob Unbiased MCMC
10. Parallel computing
Each among R processors generates unbiased estimators.
The number of estimators completed by time t is random.
There are different ways of aggregating estimators at time t,
either discarding on-going calculations or not.
We might want central limit theorems
as R → ∞ for fixed t,
as t → ∞ for fixed R,
as t and R both go to infinity.
This has been thoroughly studied already, e.g.
Glynn & Heidelberger, Analysis of parallel replicated
simulations under a completion time constraint, 1991.
Pierre E. Jacob Unbiased MCMC
11. Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
12. Coupled chains
Generate two Markov chains (Xt) and (Yt) as follows,
sample X0 and Y0 from π0 (independently or not),
sample X1|X0 ∼ P(X0, ·),
for t ≥ 1, sample (Xt+1, Yt)|(Xt, Yt−1) ∼ ¯P ((Xt, Yt−1), ·),
¯P is such that Xt+1|Xt ∼ P(Xt, ·) and Yt|Yt−1 ∼ P(Yt−1, ·),
so each chain marginally evolves according to P.
Xt and Yt have the same distribution for all t ≥ 0.
¯P is also such that ∃τ such that Xτ = Yτ−1, and Xt = Yt−1 for
all t ≥ τ (the chains are faithful).
Pierre E. Jacob Unbiased MCMC
13. Debiasing idea
Marginal convergence of each chain:
Eπ[h(X)] = lim
t→∞
E[h(Xt)].
Limit as a telescopic sum, for all k ≥ 0,
lim
t→∞
E[h(Xt)] = E[h(Xk)] +
∞
t=k+1
E[h(Xt) − h(Xt−1)].
Since for all t ≥ 0, Xt and Yt have the same distribution,
Eπ[h(X)] = E[h(Xk)] +
∞
t=k+1
E[h(Xt) − h(Yt−1)].
Pierre E. Jacob Unbiased MCMC
14. Debiasing idea
If we can swap expectation and limit,
Eπ[h(X)] = E[h(Xk) +
∞
t=k+1
(h(Xt) − h(Yt−1))],
the random variable
h(Xk) +
∞
t=k+1
(h(Xt) − h(Yt−1))
is an unbiased estimator of Eπ[h(X)].
We can compute the infinite sum since Xt = Yt−1 for all t ≥ τ.
Pierre E. Jacob Unbiased MCMC
15. Unbiased estimators
Unbiased estimator is given by
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
(h(Xt) − h(Yt−1)),
with the convention τ−1
t=k+1{·} = 0 if τ − 1 < k + 1.
Note: h(Xk) alone would be biased; the other terms correct for
the bias.
Cost: τ − 1 calls to ¯P and 1 + max(0, k − τ) calls to P.
Glynn & Rhee, Exact estimation for Markov chain equilibrium
expectations, 2014.
Pierre E. Jacob Unbiased MCMC
16. Conditions
1 Marginal chain converges:
E[h(Xt)] → Eπ[h(X)],
and h(Xt) has (2 + η)-finite moments for all t.
2 Meeting time τ has geometric tails:
∃C < +∞ ∃δ ∈ (0, 1) ∀t ≥ 0 P(τ > t) ≤ Cδt
.
3 Chains stay together: Xt = Yt−1 for all t ≥ τ.
Condition 2 itself implied by e.g. geometric drift condition.
Under these conditions, Hk(X, Y ) is unbiased, has finite
expected cost and finite variance, for all k.
Pierre E. Jacob Unbiased MCMC
17. Conditions: update
Middleton, Deligiannidis, Doucet, J., 2018.
1 Marginal chain converges:
E[h(Xt)] → Eπ[h(X)],
and h(Xt) has (2 + η)-finite moments for all t.
2 Meeting time τ has polynomial tails:
∃C < +∞ ∃δ > 2(2η−1
+ 1) ∀t ≥ 0 P(τ > t) ≤ Ct−δ
.
3 Chains stay together: Xt = Yt−1 for all t ≥ τ.
Condition 2 itself implied by e.g. polynomial drift condition.
Under these conditions, Hk(X, Y ) is unbiased, has finite
expected cost and finite variance, for all k.
Pierre E. Jacob Unbiased MCMC
18. Time-averaged estimators
Since Hk(X, Y ) is unbiased for all k, the “time-averaged”
estimator, for all m ≥ k,
Hk:m(X, Y ) =
1
m − k + 1
m
=k
H (X, Y ),
is also unbiased, and can be written
1
m − k + 1
m
t=k
h(Xt)+
τ−1
t=k+1
min 1,
t − k
m − k + 1
(h(Xt)−h(Yt−1)),
i.e. standard MCMC average + bias correction term.
As k → ∞ and m ≥ k, Hk:m(X, Y ) becomes MCMC average
with increasing probability; thus variance should be
comparable?
Pierre E. Jacob Unbiased MCMC
19. Efficiency
Writing Hk:m(X, Y ) = MCMCk:m + BCk:m,
then, denoting the MSE of MCMCk:m by MSEk:m,
V[Hk:m(X, Y )] ≤ MSEk:m+2 MSEk:m E BC2
k:m +E BC2
k:m .
Under geometric drift condition and regularity assumptions on
h, for some δ < 1, C < ∞,
E[BC2
k:m] ≤
Cδk
(m − k + 1)2
,
and δ is directly related to tails of the meeting time.
Similarly under polynomial drift, we obtain a term of the form
k−δ in the numerator instead of δk.
Pierre E. Jacob Unbiased MCMC
20. Signed measure estimator
Replacing function evaluations by delta masses leads to
ˆπ(·) =
1
m − k + 1
m
t=k
δXt (·)
+
τ−1
t=k+1
min 1,
t − k
m − k + 1
(δXt (·) − δYt−1 (·)),
which is of the form
ˆπ(·) =
N
s=1
ωsδZs (·),
where N
s=1 ωs = 1 but some ωs might be negative.
Pierre E. Jacob Unbiased MCMC
21. Designing coupled chains
To implement the proposed unbiased estimators,
we need to sample from a Markov kernel ¯P,
such that, when (Xt+1, Yt) is sampled from ¯P ((Xt, Yt−1), ·),
marginally Xt+1|Xt ∼ P(Xt, ·), and Yt|Yt−1 ∼ P(Yt−1, ·),
it is possible that Xt+1 = Yt exactly for some t ≥ 0,
if Xt = Yt−1, then Xt+1 = Yt almost surely.
Pierre E. Jacob Unbiased MCMC
22. Couplings of MCMC algorithms
Propp & Wilson, Exact sampling with coupled Markov chains
and applications to statistical mechanics, 1996.
Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, 1996.
** Neal, Circularly-coupled Markov chain sampling, 1999.
Pinto & Neal, Improving Markov chain Monte Carlo estimators
by coupling to an approximating chain, 2001.
** Glynn & Rhee, Exact estimation for Markov chain
equilibrium expectations, 2014.
Pierre E. Jacob Unbiased MCMC
23. Couplings
(X, Y ) follows a coupling of p and q if X ∼ p and Y ∼ q,
i.e. a coupling of p and q is a joint distribution such that
first marginal is p, second marginal is q.
There are infinitely many couplings of p and q.
If X ∼ p and Y ∼ q are independent,
(X, Y ) follows an independent coupling of p and q.
A maximal coupling maximizes P(X = Y ),
over all couplings of p and q.
Pierre E. Jacob Unbiased MCMC
26. Maximal coupling: algorithm
Input: p and q.
Output: pairs (X, Y ) from max coupling of p and q.
1 Sample X ∼ p and W ∼ U([0, 1]).
If W ≤ q(X)/p(X), output (X, X).
2 Otherwise, sample Y ∼ q and W ∼ U([0, 1])
until W > p(Y )/q(Y ), and output (X, Y ).
e.g. Thorisson, Coupling, stationarity, and regeneration, 2000,
Chapter 1, Section 4.5.
Pierre E. Jacob Unbiased MCMC
27. Metropolis–Hastings (kernel P)
At each iteration t, Markov chain at state Xt,
1 propose X ∼ k(Xt, ·), e.g. X ∼ N(Xt, V ),
2 sample U ∼ U([0, 1]),
3 if
U ≤ min 1,
π(X )k(X , Xt)
π(Xt)k(Xt, X )
,
set Xt+1 = X , otherwise set Xt+1 = Xt.
How to construct two MH chains at states Xt and Yt
such that {Xt+1 = Yt+1} can happen?
Pierre E. Jacob Unbiased MCMC
28. Coupling of Metropolis–Hastings (kernel ¯P)
At each iteration t, two Markov chains at states Xt, Yt,
1 propose (X , Y ) from max coupling of k(Xt, ·) and k(Yt, ·),
2 sample U ∼ U([0, 1]),
3 if
U ≤ min 1,
π(X )k(X , Xt)
π(Xt)k(Xt, X )
,
set Xt+1 = X , otherwise set Xt+1 = Xt,
if
U ≤ min 1,
π(Y )k(Y , Yt)
π(Yt)k(Yt, Y )
,
set Yt+1 = Y , otherwise set Yt+1 = Yt.
Pierre E. Jacob Unbiased MCMC
29. Coupling of Gibbs
Gibbs sampler: update component i of the chain,
leaving π(dxi|x1, . . . , xi−1, xi+1, . . . , xd) invariant.
For instance, we can propose X ∼ k(Xi
t, ·) to replace Xi
t,
and accept or not with a Metropolis–Hastings step.
These proposals can be maximally coupled across two chains,
at each component update.
The chains meet when all components have met.
Pierre E. Jacob Unbiased MCMC
30. Hamiltonian Monte Carlo
Introduce potential energy U(q) = − log π(q),
and total energy E(q, p) = U(q) + 1
2|p|2.
Hamiltonian dynamics for (q(s), p(s)), where s ≥ 0:
d
ds
q(s) = pE(q(s), p(s)) = p(s),
d
ds
p(s) = − qE(q(s), p(s)) = − U(q(s)).
Ideal algorithm: at step t, given position q(0) = Xt,
sample p(0) ∼ N(0, I),
solve dynamics over time length S to get (q(S), p(S)),
set Xt+1 = q(S).
then the Markov chain (Xt) would be converging to π.
Pierre E. Jacob Unbiased MCMC
31. Hamiltonian Monte Carlo
Solving Hamiltonian dynamics exactly is not feasible,
but many discretizations have been proposed.
Starting from position q0, sample p0 ∼ N(0, I),
and for = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q ),
q +1 = q + ε p +1/2,
p +1 = p +1/2 −
ε
2
U(q +1),
accept or not qL as the new state of the chain,
according to a Metropolis–Hastings ratio
min(1, exp (E(q0, p0) − E(qL, pL))).
Pierre E. Jacob Unbiased MCMC
32. Coupling of HMC
Common random numbers can make two HMC chains contract,
under assumptions on the target such as strong log-concavity,
i.e. the potential energy is strongly convex.
Using mixtures of HMC and RWMH kernels,
two chains can then exactly meet.
Heng & J., Unbiased HMC with couplings, 2018.
Mangoubi & Smith, Rapid mixing of HMC strongly log-concave
distributions, 2017.
In non-convex cases, see Bou-Rabee, Eberle & Zimmer, Coupling
and Convergence for Hamiltonian Monte Carlo, 2018.
Pierre E. Jacob Unbiased MCMC
33. Couplings of other MCMC algorithms
Conditional particle filters, with or without ancestor sampling:
J., Fredrik Lindsten, Thomas Sch¨on
Smoothing with Couplings of Conditional Particle Filters, 2018.
Pseudo-marginal methods, including particle marginal
Metropolis–Hastings (PMMH), and exchange algorithm:
Middleton, Deligiannidis, Doucet, J.,
Unbiased Markov chain Monte Carlo for intractable target
distributions, 2018.
Pierre E. Jacob Unbiased MCMC
34. Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
35. Logistic regression
X1, . . . , Xp ∈ Rn: covariates, with n = 1000, p = 300 (German
credit data).
Y ∈ Rn: binary response variable.
Model: P(Yi = 1) = {1 + exp(−a − b xi)}−1.
Parameters: intercept a ∈ R, coefficients b ∈ R300.
Prior: a|s2 ∼ N(0, s2), b|s2 ∼ N(0, s2I) and s2 ∼ Exp(0.01).
Target π is distribution of (a, b, log s2) given Y, X, on R302.
Heng & J., Unbiased Hamiltonian Monte Carlo with couplings,
2018.
Pierre E. Jacob Unbiased MCMC
36. Logistic regression
k m Cost Relative inefficiency
1 k 436 1989.07
1 5k 436 1671.93
1 10k 436 1403.28
90% quantile(τ) k 553 38.11
90% quantile(τ) 5k 1868 1.23
90% quantile(τ) 10k 3518 1.05
Inefficiencies are averaged over test functions yielding all first
and second moments of the target distribution.
Pierre E. Jacob Unbiased MCMC
37. Variable selection
Integers p, n, with possibly n < p.
X1, . . . , Xp ∈ Rn: covariates.
Y ∈ Rn: response variable.
γ ∈ {0, 1}p represents which covariates to select.
|γ| = p
i=1 γi: number of selected variables.
Pierre E. Jacob Unbiased MCMC
38. Variable selection
Xγ is the n × |γ| matrix of covariates selected by γ.
Model:
Y = Xγβγ + w, where w ∼ N(0, σ2
In).
Conjugate prior leads to
π(Y |X, γ) ∝
(1 + g)−|γ|/2
(1 + g(1 − R2
γ))n/2
, where R2
γ =
Y Xγ(XγXγ)−1XγY
Y Y
.
The prior on γ is set as π(γ) ∝ p−κ|γ|1(|γ| ≤ s0).
Yang, Wainwright & Jordan, On the computational complexity
of high-dimensional Bayesian variable selection, 2016.
Pierre E. Jacob Unbiased MCMC
39. Variable selection
MCMC algorithm randomly alternates between
flipping a component, from 1 to 0 or from 0 to 1,
accept or not with MH step;
randomly swapping a 0 and an 1,
accept or not with MH step.
Maximal couplings of these proposals can be implemented.
Pierre E. Jacob Unbiased MCMC
40. Variable selection
Generate n = 500 rows, with covariates X
generated as independent standard Normals.
Outcome variable Y generated from the model, with a
coefficient β such that first ten entries are non-zero.
Other parameters: σ2 = σ2
0 = 1, s0 = 100, g = p3, κ = 2.
Pierre E. Jacob Unbiased MCMC
41. Variable selection: scaling of the meeting times with p
0
10
20
30
40
100 250 500 750 1000
p
meetingtimes/p
Pierre E. Jacob Unbiased MCMC
42. Variable selection: inclusion probabilities
q
q
q
q
q q
q
q
q
q
q q q q q q q q q q
0.00
0.25
0.50
0.75
1.00
0 5 10 15 20
variable
inclusionprobability
kappa: q
0.1 1 2
For p = 1, 000, three values of κ, using k = 75, 000 and m = 2k. Error
bars generated on 600 processors for 2 hours. Lines are obtained with
106
iterations of standard MCMC.
Pierre E. Jacob Unbiased MCMC
43. Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
44. Unbiased property
Unbiased estimators can be used to tackle problems for which
MCMC methods are ill-suited.
For instance, problems where one needs to approximate an
integral of a function which itself is defined as an integral:
h(x1, x2)π2(x2|x1)dx2 π1(x1)dx1,
or equivalently
E1 [E2[h(X1, X2)|X1]] .
Rainforth, Cornish, Yang, Warrington, On nesting Monte Carlo
estimators, 2018.
Pierre E. Jacob Unbiased MCMC
45. Example 1: cut distribution
First model, parameter θ1, data Y1,
posterior π1(θ1|Y1), expectation E1[·].
Second model, parameter θ2, data Y2, posterior:
π2(θ2|Y2, θ1) =
p2(θ2|θ1)p2(Y2|θ1, θ2)
p2(Y2|θ1)
, expectation E2[·|θ1].
Pierre E. Jacob Unbiased MCMC
46. Cut distribution
How to do inference on θ2
while taking the uncertainty of θ1 into account?
Full posterior under the joint model:
π2(θ1, θ2|Y1, Y2).
Can be approximated via MCMC,
but misspecification of any part “contaminates” the ensemble.
Liu, Bayarri, Berger, Modularization in Bayesian Analysis, with
Emphasis on Analysis of Computer Models, 2009.
J., Murray, Holmes, Robert, Better together? Statistical learning in
models made of modules, 2017
Pierre E. Jacob Unbiased MCMC
47. Cut distribution
Cut distribution:
πcut(θ1, θ2) = π1(θ1|Y1)π2(θ2|Y2, θ1).
Different than full posterior = π1(θ1|Y1, Y2)π2(θ2|Y2, θ1).
The marginal of θ1 is the first model’s posterior.
Probabilistic counterpart to two-step estimators.
But how to sample from the cut distribution?
Plummer, Cuts in Bayesian graphical models, 2015.
Pierre E. Jacob Unbiased MCMC
48. Cut distribution
Express expectation with respect to cut distribution
Ecut[h(θ1, θ2)]
using the tower property, i.e.
E1[E2[h(θ1, θ2)|θ1]].
Can be unbiasedly estimated using the tower property of
expectations, and
unbiased estimators of E2[·|θ1] for all θ1,
and unbiased estimators of E1[·].
Pierre E. Jacob Unbiased MCMC
49. Example: epidemiological study
Model of virus prevalence
∀i = 1, . . . , I Zi ∼ Binomial(Ni, ϕi),
Zi is number of women infected with high-risk HPV in a
sample of size Ni in country i.
Impact of prevalence onto cervical cancer occurrence
∀i = 1, . . . , I Yi ∼ Poisson(λiTi), log(λi) = θ2,1 + θ2,2ϕi,
Yi is number of cancer cases arising from Ti woman-years of
follow-up in country i.
Plummer, Cuts in Bayesian graphical models, 2015.
Pierre E. Jacob Unbiased MCMC
50. Example: epidemiological study
Approximations of the cut distribution marginals:
0
1
2
3
−2.5 −2.0 −1.5
θ2,1
density
0.00
0.05
0.10
0.15
10 15 20 25
θ2,2
density
Red curves were obtained by long MCMC run targeting
π2(θ2|Y2, θ1), for draws θ1 from first posterior π1(θ1|Y1).
Black bars were obtained with unbiased MCMC.
Pierre E. Jacob Unbiased MCMC
51. Example 2: normalizing constants
We are interested in normalizing constant of π, for which we
can compute unnormalized density x → exp(−U(x)).
Introduce
∀λ ∈ [0, 1] πλ(x) = Z−1
λ exp(−Uλ(x)),
where Zλ = exp(−Uλ(x))dx.
Thermodynamics integration or path sampling identity:
log
Z1
Z0
= −
1
0
λUλ(x)πλ(dx)
q(λ)
q(dλ),
where q(λ) is a distribution supported on [0, 1].
Kirkwood, Statistical mechanics of fluid mixtures, 1935.
Pierre E. Jacob Unbiased MCMC
52. Normalizing constants
Proposed procedure:
sample λ ∼ q(dλ),
produce an unbiased estimator ˆE(λ) of − λUλ(x)πλ(dx),
return ˆr01 = ˆE(λ)/q(λ).
The resulting variable ˆr01 has expectation log(Z1/Z0).
The cost and variance of ˆr01 can be estimated, and leveraged to
guide the choice of q(dλ).
Rischard, J., Pillai, Unbiased estimation of log normalizing constants
with applications to Bayesian cross-validation, 2018.
Pierre E. Jacob Unbiased MCMC
53. Example 3: Bayesian cross-validation
Data y = {y1, . . . , yn}, made of n units.
Partition y into training T and validation V sets.
For instance, by sampling nT indices out of n at random
without replacement to form T.
For each split y = {T, V }, one might want to assess predictive
performance with
log p(V |T) = log p(V |θ, T)p(dθ|T),
i.e. using the logarithmic scoring rule.
Then log p(V |T) is a log-ratio of normalizing constants, p(T, V )
and p(T), so that the proposed estimator ˆr01 can be applied.
Pierre E. Jacob Unbiased MCMC
54. Bayesian cross-validation
One might want to approximate the criterion
CV = −
n
nT T,V
log p(V |T),
where the sum is over all splits (T, V ) of size (nT , n − nT ).
Proposed procedure:
randomly sample a split (T, V ) of size (nT , n − nT ),
introduce a “path” between p(dθ|T) and p(dθ|T, V ),
sample ˆr01, an unbiased estimator of log p(V |T),
return −ˆr01.
The resulting variable has expectation equal to CV.
Pierre E. Jacob Unbiased MCMC
55. Example 4: bagging Bayesian predictions?
Consider “bagging” (Breiman, 1994), applied to
predictors obtained by Bayesian procedures.
obtain y , a bootstrap sample of the data y,
approximate the resulting posterior distribution p(dθ|y ),
perform posterior predictions based on approximation.
This is another example of nested Monte Carlo problem.
B¨uhlmann, Discussion of Big Bayes Stories and BayesBag,
2014.
Pierre E. Jacob Unbiased MCMC
56. Discussion
Perfect samplers would yield same benefits and more.
Proposed estimators are perhaps more applicable,
no need for extra knowledge about π,
but require coming up with coupled kernel ¯P.
Cannot work if underlying MCMC doesn’t work. . .
but parallelism allows for more expensive MCMC kernels.
Optimal tuning of underlying MCMC algorithms is not
necessarily optimal for the proposed estimators.
Pierre E. Jacob Unbiased MCMC
57. Links – thank you for listening!
** P.J., John O’Leary, Yves F. Atchad´e
Unbiased Markov chain Monte Carlo with couplings, 2019+
P.J., Fredrik Lindsten, Thomas Sch¨on
Smoothing with Couplings of Conditional Particle Filters, 2018.
Jeremy Heng, P.J.
Unbiased Hamiltonian Monte Carlo with couplings, 2018.
Lawrence Middleton, George Deligiannidis, Arnaud Doucet, P.J.
Unbiased Markov chain Monte Carlo for intractable target
distributions, 2019+.
Maxime Rischard, P.J., Natesh Pillai,
Unbiased estimation of log normalizing constants with applications to
Bayesian cross-validation, 2019+.
Code on Github: github.com/pierrejacob/
Blog: statisfaction.wordpress.com/
Pierre E. Jacob Unbiased MCMC