SlideShare a Scribd company logo
1 of 57
Download to read offline
Recent developments on
unbiased Markov chain Monte Carlo
Pierre E. Jacob (Harvard)
joint work with
John O’Leary (Harvard), Yves Atchad´e (Boston University)
Machine Learning Seminar
Duke University
December 5, 2018
Pierre E. Jacob Unbiased MCMC
Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
Setting
Continuous or discrete space of dimension d.
Target probability distribution π,
with probability density function x → π(x).
Goal: approximate π, e.g.
for a class of functions h, approximate Eπ[h(X)],
defined as h(x)π(x)dx and also written π(h).
Pierre E. Jacob Unbiased MCMC
Markov chain Monte Carlo
Initially, X0 ∼ π0, then Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, . . . , T.
Estimator:
1
T − b
T
t=b+1
h(Xt),
where b iterations are discarded as burn-in.
Might converge to Eπ[h(X)] as T → ∞ by the ergodic theorem.
An MCMC algorithm ≡ a description of how to sample from
π0 and from a Markov kernel P leaving π invariant.
Pierre E. Jacob Unbiased MCMC
Bias and variance
The bias goes to zero as T → ∞:
E[
1
T − b
T
t=b+1
h(Xt)] − Eπ[h(X)] → 0,
but the bias is not zero for any fixed b, T, since π0 = π,
i.e. because the chain does not start at stationarity.
Therefore, obtaining many such estimators in parallel, for fixed
T, b, and averaging them, would not provide a consistent
estimator of Eπ[h(X)].
Pierre E. Jacob Unbiased MCMC
Preview
In the proposed approach, each run will involve two coupled
chains (Xt) and (Yt),
until some event occurs, which will happen after a random
number of iterations denoted by τ.
At that point an estimator later denoted by Hk:m will be
returned,
with the guarantee that E[Hk:m] = Eπ[h(X)] (unbiasedness).
Removing the bias will come at a price in terms of computing
cost and variance.
Pierre E. Jacob Unbiased MCMC
Why do we care about unbiased estimators?
In classical point estimation, unbiasedness is not crucial.
Larry Wasserman in “All of Statistics” (2003) writes:
Unbiasedness used to receive much attention but these
days is considered less important.
On the other hand, Jeff Rosenthal in “Parallel computing and
Monte Carlo algorithms” (2000) writes
When running parallel Monte Carlo with many
computers, it is more important to start with an
unbiased (or low-bias) estimate than with a
low-variance estimate.
Pierre E. Jacob Unbiased MCMC
Why do we care about unbiased estimators?
Pierre E. Jacob Unbiased MCMC
Parallel computing
Each among R processors generates unbiased estimators.
The number of estimators completed by time t is random.
There are different ways of aggregating estimators at time t,
either discarding on-going calculations or not.
We might want central limit theorems
as R → ∞ for fixed t,
as t → ∞ for fixed R,
as t and R both go to infinity.
This has been thoroughly studied already, e.g.
Glynn & Heidelberger, Analysis of parallel replicated
simulations under a completion time constraint, 1991.
Pierre E. Jacob Unbiased MCMC
Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
Coupled chains
Generate two Markov chains (Xt) and (Yt) as follows,
sample X0 and Y0 from π0 (independently or not),
sample X1|X0 ∼ P(X0, ·),
for t ≥ 1, sample (Xt+1, Yt)|(Xt, Yt−1) ∼ ¯P ((Xt, Yt−1), ·),
¯P is such that Xt+1|Xt ∼ P(Xt, ·) and Yt|Yt−1 ∼ P(Yt−1, ·),
so each chain marginally evolves according to P.
Xt and Yt have the same distribution for all t ≥ 0.
¯P is also such that ∃τ such that Xτ = Yτ−1, and Xt = Yt−1 for
all t ≥ τ (the chains are faithful).
Pierre E. Jacob Unbiased MCMC
Debiasing idea
Marginal convergence of each chain:
Eπ[h(X)] = lim
t→∞
E[h(Xt)].
Limit as a telescopic sum, for all k ≥ 0,
lim
t→∞
E[h(Xt)] = E[h(Xk)] +
∞
t=k+1
E[h(Xt) − h(Xt−1)].
Since for all t ≥ 0, Xt and Yt have the same distribution,
Eπ[h(X)] = E[h(Xk)] +
∞
t=k+1
E[h(Xt) − h(Yt−1)].
Pierre E. Jacob Unbiased MCMC
Debiasing idea
If we can swap expectation and limit,
Eπ[h(X)] = E[h(Xk) +
∞
t=k+1
(h(Xt) − h(Yt−1))],
the random variable
h(Xk) +
∞
t=k+1
(h(Xt) − h(Yt−1))
is an unbiased estimator of Eπ[h(X)].
We can compute the infinite sum since Xt = Yt−1 for all t ≥ τ.
Pierre E. Jacob Unbiased MCMC
Unbiased estimators
Unbiased estimator is given by
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
(h(Xt) − h(Yt−1)),
with the convention τ−1
t=k+1{·} = 0 if τ − 1 < k + 1.
Note: h(Xk) alone would be biased; the other terms correct for
the bias.
Cost: τ − 1 calls to ¯P and 1 + max(0, k − τ) calls to P.
Glynn & Rhee, Exact estimation for Markov chain equilibrium
expectations, 2014.
Pierre E. Jacob Unbiased MCMC
Conditions
1 Marginal chain converges:
E[h(Xt)] → Eπ[h(X)],
and h(Xt) has (2 + η)-finite moments for all t.
2 Meeting time τ has geometric tails:
∃C < +∞ ∃δ ∈ (0, 1) ∀t ≥ 0 P(τ > t) ≤ Cδt
.
3 Chains stay together: Xt = Yt−1 for all t ≥ τ.
Condition 2 itself implied by e.g. geometric drift condition.
Under these conditions, Hk(X, Y ) is unbiased, has finite
expected cost and finite variance, for all k.
Pierre E. Jacob Unbiased MCMC
Conditions: update
Middleton, Deligiannidis, Doucet, J., 2018.
1 Marginal chain converges:
E[h(Xt)] → Eπ[h(X)],
and h(Xt) has (2 + η)-finite moments for all t.
2 Meeting time τ has polynomial tails:
∃C < +∞ ∃δ > 2(2η−1
+ 1) ∀t ≥ 0 P(τ > t) ≤ Ct−δ
.
3 Chains stay together: Xt = Yt−1 for all t ≥ τ.
Condition 2 itself implied by e.g. polynomial drift condition.
Under these conditions, Hk(X, Y ) is unbiased, has finite
expected cost and finite variance, for all k.
Pierre E. Jacob Unbiased MCMC
Time-averaged estimators
Since Hk(X, Y ) is unbiased for all k, the “time-averaged”
estimator, for all m ≥ k,
Hk:m(X, Y ) =
1
m − k + 1
m
=k
H (X, Y ),
is also unbiased, and can be written
1
m − k + 1
m
t=k
h(Xt)+
τ−1
t=k+1
min 1,
t − k
m − k + 1
(h(Xt)−h(Yt−1)),
i.e. standard MCMC average + bias correction term.
As k → ∞ and m ≥ k, Hk:m(X, Y ) becomes MCMC average
with increasing probability; thus variance should be
comparable?
Pierre E. Jacob Unbiased MCMC
Efficiency
Writing Hk:m(X, Y ) = MCMCk:m + BCk:m,
then, denoting the MSE of MCMCk:m by MSEk:m,
V[Hk:m(X, Y )] ≤ MSEk:m+2 MSEk:m E BC2
k:m +E BC2
k:m .
Under geometric drift condition and regularity assumptions on
h, for some δ < 1, C < ∞,
E[BC2
k:m] ≤
Cδk
(m − k + 1)2
,
and δ is directly related to tails of the meeting time.
Similarly under polynomial drift, we obtain a term of the form
k−δ in the numerator instead of δk.
Pierre E. Jacob Unbiased MCMC
Signed measure estimator
Replacing function evaluations by delta masses leads to
ˆπ(·) =
1
m − k + 1
m
t=k
δXt (·)
+
τ−1
t=k+1
min 1,
t − k
m − k + 1
(δXt (·) − δYt−1 (·)),
which is of the form
ˆπ(·) =
N
s=1
ωsδZs (·),
where N
s=1 ωs = 1 but some ωs might be negative.
Pierre E. Jacob Unbiased MCMC
Designing coupled chains
To implement the proposed unbiased estimators,
we need to sample from a Markov kernel ¯P,
such that, when (Xt+1, Yt) is sampled from ¯P ((Xt, Yt−1), ·),
marginally Xt+1|Xt ∼ P(Xt, ·), and Yt|Yt−1 ∼ P(Yt−1, ·),
it is possible that Xt+1 = Yt exactly for some t ≥ 0,
if Xt = Yt−1, then Xt+1 = Yt almost surely.
Pierre E. Jacob Unbiased MCMC
Couplings of MCMC algorithms
Propp & Wilson, Exact sampling with coupled Markov chains
and applications to statistical mechanics, 1996.
Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, 1996.
** Neal, Circularly-coupled Markov chain sampling, 1999.
Pinto & Neal, Improving Markov chain Monte Carlo estimators
by coupling to an approximating chain, 2001.
** Glynn & Rhee, Exact estimation for Markov chain
equilibrium expectations, 2014.
Pierre E. Jacob Unbiased MCMC
Couplings
(X, Y ) follows a coupling of p and q if X ∼ p and Y ∼ q,
i.e. a coupling of p and q is a joint distribution such that
first marginal is p, second marginal is q.
There are infinitely many couplings of p and q.
If X ∼ p and Y ∼ q are independent,
(X, Y ) follows an independent coupling of p and q.
A maximal coupling maximizes P(X = Y ),
over all couplings of p and q.
Pierre E. Jacob Unbiased MCMC
Independent coupling of Gamma and Normal
Pierre E. Jacob Unbiased MCMC
Maximal coupling of Gamma and Normal
Pierre E. Jacob Unbiased MCMC
Maximal coupling: algorithm
Input: p and q.
Output: pairs (X, Y ) from max coupling of p and q.
1 Sample X ∼ p and W ∼ U([0, 1]).
If W ≤ q(X)/p(X), output (X, X).
2 Otherwise, sample Y ∼ q and W ∼ U([0, 1])
until W > p(Y )/q(Y ), and output (X, Y ).
e.g. Thorisson, Coupling, stationarity, and regeneration, 2000,
Chapter 1, Section 4.5.
Pierre E. Jacob Unbiased MCMC
Metropolis–Hastings (kernel P)
At each iteration t, Markov chain at state Xt,
1 propose X ∼ k(Xt, ·), e.g. X ∼ N(Xt, V ),
2 sample U ∼ U([0, 1]),
3 if
U ≤ min 1,
π(X )k(X , Xt)
π(Xt)k(Xt, X )
,
set Xt+1 = X , otherwise set Xt+1 = Xt.
How to construct two MH chains at states Xt and Yt
such that {Xt+1 = Yt+1} can happen?
Pierre E. Jacob Unbiased MCMC
Coupling of Metropolis–Hastings (kernel ¯P)
At each iteration t, two Markov chains at states Xt, Yt,
1 propose (X , Y ) from max coupling of k(Xt, ·) and k(Yt, ·),
2 sample U ∼ U([0, 1]),
3 if
U ≤ min 1,
π(X )k(X , Xt)
π(Xt)k(Xt, X )
,
set Xt+1 = X , otherwise set Xt+1 = Xt,
if
U ≤ min 1,
π(Y )k(Y , Yt)
π(Yt)k(Yt, Y )
,
set Yt+1 = Y , otherwise set Yt+1 = Yt.
Pierre E. Jacob Unbiased MCMC
Coupling of Gibbs
Gibbs sampler: update component i of the chain,
leaving π(dxi|x1, . . . , xi−1, xi+1, . . . , xd) invariant.
For instance, we can propose X ∼ k(Xi
t, ·) to replace Xi
t,
and accept or not with a Metropolis–Hastings step.
These proposals can be maximally coupled across two chains,
at each component update.
The chains meet when all components have met.
Pierre E. Jacob Unbiased MCMC
Hamiltonian Monte Carlo
Introduce potential energy U(q) = − log π(q),
and total energy E(q, p) = U(q) + 1
2|p|2.
Hamiltonian dynamics for (q(s), p(s)), where s ≥ 0:
d
ds
q(s) = pE(q(s), p(s)) = p(s),
d
ds
p(s) = − qE(q(s), p(s)) = − U(q(s)).
Ideal algorithm: at step t, given position q(0) = Xt,
sample p(0) ∼ N(0, I),
solve dynamics over time length S to get (q(S), p(S)),
set Xt+1 = q(S).
then the Markov chain (Xt) would be converging to π.
Pierre E. Jacob Unbiased MCMC
Hamiltonian Monte Carlo
Solving Hamiltonian dynamics exactly is not feasible,
but many discretizations have been proposed.
Starting from position q0, sample p0 ∼ N(0, I),
and for = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q ),
q +1 = q + ε p +1/2,
p +1 = p +1/2 −
ε
2
U(q +1),
accept or not qL as the new state of the chain,
according to a Metropolis–Hastings ratio
min(1, exp (E(q0, p0) − E(qL, pL))).
Pierre E. Jacob Unbiased MCMC
Coupling of HMC
Common random numbers can make two HMC chains contract,
under assumptions on the target such as strong log-concavity,
i.e. the potential energy is strongly convex.
Using mixtures of HMC and RWMH kernels,
two chains can then exactly meet.
Heng & J., Unbiased HMC with couplings, 2018.
Mangoubi & Smith, Rapid mixing of HMC strongly log-concave
distributions, 2017.
In non-convex cases, see Bou-Rabee, Eberle & Zimmer, Coupling
and Convergence for Hamiltonian Monte Carlo, 2018.
Pierre E. Jacob Unbiased MCMC
Couplings of other MCMC algorithms
Conditional particle filters, with or without ancestor sampling:
J., Fredrik Lindsten, Thomas Sch¨on
Smoothing with Couplings of Conditional Particle Filters, 2018.
Pseudo-marginal methods, including particle marginal
Metropolis–Hastings (PMMH), and exchange algorithm:
Middleton, Deligiannidis, Doucet, J.,
Unbiased Markov chain Monte Carlo for intractable target
distributions, 2018.
Pierre E. Jacob Unbiased MCMC
Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
Logistic regression
X1, . . . , Xp ∈ Rn: covariates, with n = 1000, p = 300 (German
credit data).
Y ∈ Rn: binary response variable.
Model: P(Yi = 1) = {1 + exp(−a − b xi)}−1.
Parameters: intercept a ∈ R, coefficients b ∈ R300.
Prior: a|s2 ∼ N(0, s2), b|s2 ∼ N(0, s2I) and s2 ∼ Exp(0.01).
Target π is distribution of (a, b, log s2) given Y, X, on R302.
Heng & J., Unbiased Hamiltonian Monte Carlo with couplings,
2018.
Pierre E. Jacob Unbiased MCMC
Logistic regression
k m Cost Relative inefficiency
1 k 436 1989.07
1 5k 436 1671.93
1 10k 436 1403.28
90% quantile(τ) k 553 38.11
90% quantile(τ) 5k 1868 1.23
90% quantile(τ) 10k 3518 1.05
Inefficiencies are averaged over test functions yielding all first
and second moments of the target distribution.
Pierre E. Jacob Unbiased MCMC
Variable selection
Integers p, n, with possibly n < p.
X1, . . . , Xp ∈ Rn: covariates.
Y ∈ Rn: response variable.
γ ∈ {0, 1}p represents which covariates to select.
|γ| = p
i=1 γi: number of selected variables.
Pierre E. Jacob Unbiased MCMC
Variable selection
Xγ is the n × |γ| matrix of covariates selected by γ.
Model:
Y = Xγβγ + w, where w ∼ N(0, σ2
In).
Conjugate prior leads to
π(Y |X, γ) ∝
(1 + g)−|γ|/2
(1 + g(1 − R2
γ))n/2
, where R2
γ =
Y Xγ(XγXγ)−1XγY
Y Y
.
The prior on γ is set as π(γ) ∝ p−κ|γ|1(|γ| ≤ s0).
Yang, Wainwright & Jordan, On the computational complexity
of high-dimensional Bayesian variable selection, 2016.
Pierre E. Jacob Unbiased MCMC
Variable selection
MCMC algorithm randomly alternates between
flipping a component, from 1 to 0 or from 0 to 1,
accept or not with MH step;
randomly swapping a 0 and an 1,
accept or not with MH step.
Maximal couplings of these proposals can be implemented.
Pierre E. Jacob Unbiased MCMC
Variable selection
Generate n = 500 rows, with covariates X
generated as independent standard Normals.
Outcome variable Y generated from the model, with a
coefficient β such that first ten entries are non-zero.
Other parameters: σ2 = σ2
0 = 1, s0 = 100, g = p3, κ = 2.
Pierre E. Jacob Unbiased MCMC
Variable selection: scaling of the meeting times with p
0
10
20
30
40
100 250 500 750 1000
p
meetingtimes/p
Pierre E. Jacob Unbiased MCMC
Variable selection: inclusion probabilities
q
q
q
q
q q
q
q
q
q
q q q q q q q q q q
0.00
0.25
0.50
0.75
1.00
0 5 10 15 20
variable
inclusionprobability
kappa: q
0.1 1 2
For p = 1, 000, three values of κ, using k = 75, 000 and m = 2k. Error
bars generated on 600 processors for 2 hours. Lines are obtained with
106
iterations of standard MCMC.
Pierre E. Jacob Unbiased MCMC
Outline
1 Setting: MCMC and bias
2 Unbiased estimators from Markov kernels
3 Illustrations on standard MCMC settings
4 Beyond the standard MCMC setting
Pierre E. Jacob Unbiased MCMC
Unbiased property
Unbiased estimators can be used to tackle problems for which
MCMC methods are ill-suited.
For instance, problems where one needs to approximate an
integral of a function which itself is defined as an integral:
h(x1, x2)π2(x2|x1)dx2 π1(x1)dx1,
or equivalently
E1 [E2[h(X1, X2)|X1]] .
Rainforth, Cornish, Yang, Warrington, On nesting Monte Carlo
estimators, 2018.
Pierre E. Jacob Unbiased MCMC
Example 1: cut distribution
First model, parameter θ1, data Y1,
posterior π1(θ1|Y1), expectation E1[·].
Second model, parameter θ2, data Y2, posterior:
π2(θ2|Y2, θ1) =
p2(θ2|θ1)p2(Y2|θ1, θ2)
p2(Y2|θ1)
, expectation E2[·|θ1].
Pierre E. Jacob Unbiased MCMC
Cut distribution
How to do inference on θ2
while taking the uncertainty of θ1 into account?
Full posterior under the joint model:
π2(θ1, θ2|Y1, Y2).
Can be approximated via MCMC,
but misspecification of any part “contaminates” the ensemble.
Liu, Bayarri, Berger, Modularization in Bayesian Analysis, with
Emphasis on Analysis of Computer Models, 2009.
J., Murray, Holmes, Robert, Better together? Statistical learning in
models made of modules, 2017
Pierre E. Jacob Unbiased MCMC
Cut distribution
Cut distribution:
πcut(θ1, θ2) = π1(θ1|Y1)π2(θ2|Y2, θ1).
Different than full posterior = π1(θ1|Y1, Y2)π2(θ2|Y2, θ1).
The marginal of θ1 is the first model’s posterior.
Probabilistic counterpart to two-step estimators.
But how to sample from the cut distribution?
Plummer, Cuts in Bayesian graphical models, 2015.
Pierre E. Jacob Unbiased MCMC
Cut distribution
Express expectation with respect to cut distribution
Ecut[h(θ1, θ2)]
using the tower property, i.e.
E1[E2[h(θ1, θ2)|θ1]].
Can be unbiasedly estimated using the tower property of
expectations, and
unbiased estimators of E2[·|θ1] for all θ1,
and unbiased estimators of E1[·].
Pierre E. Jacob Unbiased MCMC
Example: epidemiological study
Model of virus prevalence
∀i = 1, . . . , I Zi ∼ Binomial(Ni, ϕi),
Zi is number of women infected with high-risk HPV in a
sample of size Ni in country i.
Impact of prevalence onto cervical cancer occurrence
∀i = 1, . . . , I Yi ∼ Poisson(λiTi), log(λi) = θ2,1 + θ2,2ϕi,
Yi is number of cancer cases arising from Ti woman-years of
follow-up in country i.
Plummer, Cuts in Bayesian graphical models, 2015.
Pierre E. Jacob Unbiased MCMC
Example: epidemiological study
Approximations of the cut distribution marginals:
0
1
2
3
−2.5 −2.0 −1.5
θ2,1
density
0.00
0.05
0.10
0.15
10 15 20 25
θ2,2
density
Red curves were obtained by long MCMC run targeting
π2(θ2|Y2, θ1), for draws θ1 from first posterior π1(θ1|Y1).
Black bars were obtained with unbiased MCMC.
Pierre E. Jacob Unbiased MCMC
Example 2: normalizing constants
We are interested in normalizing constant of π, for which we
can compute unnormalized density x → exp(−U(x)).
Introduce
∀λ ∈ [0, 1] πλ(x) = Z−1
λ exp(−Uλ(x)),
where Zλ = exp(−Uλ(x))dx.
Thermodynamics integration or path sampling identity:
log
Z1
Z0
= −
1
0
λUλ(x)πλ(dx)
q(λ)
q(dλ),
where q(λ) is a distribution supported on [0, 1].
Kirkwood, Statistical mechanics of fluid mixtures, 1935.
Pierre E. Jacob Unbiased MCMC
Normalizing constants
Proposed procedure:
sample λ ∼ q(dλ),
produce an unbiased estimator ˆE(λ) of − λUλ(x)πλ(dx),
return ˆr01 = ˆE(λ)/q(λ).
The resulting variable ˆr01 has expectation log(Z1/Z0).
The cost and variance of ˆr01 can be estimated, and leveraged to
guide the choice of q(dλ).
Rischard, J., Pillai, Unbiased estimation of log normalizing constants
with applications to Bayesian cross-validation, 2018.
Pierre E. Jacob Unbiased MCMC
Example 3: Bayesian cross-validation
Data y = {y1, . . . , yn}, made of n units.
Partition y into training T and validation V sets.
For instance, by sampling nT indices out of n at random
without replacement to form T.
For each split y = {T, V }, one might want to assess predictive
performance with
log p(V |T) = log p(V |θ, T)p(dθ|T),
i.e. using the logarithmic scoring rule.
Then log p(V |T) is a log-ratio of normalizing constants, p(T, V )
and p(T), so that the proposed estimator ˆr01 can be applied.
Pierre E. Jacob Unbiased MCMC
Bayesian cross-validation
One might want to approximate the criterion
CV = −
n
nT T,V
log p(V |T),
where the sum is over all splits (T, V ) of size (nT , n − nT ).
Proposed procedure:
randomly sample a split (T, V ) of size (nT , n − nT ),
introduce a “path” between p(dθ|T) and p(dθ|T, V ),
sample ˆr01, an unbiased estimator of log p(V |T),
return −ˆr01.
The resulting variable has expectation equal to CV.
Pierre E. Jacob Unbiased MCMC
Example 4: bagging Bayesian predictions?
Consider “bagging” (Breiman, 1994), applied to
predictors obtained by Bayesian procedures.
obtain y , a bootstrap sample of the data y,
approximate the resulting posterior distribution p(dθ|y ),
perform posterior predictions based on approximation.
This is another example of nested Monte Carlo problem.
B¨uhlmann, Discussion of Big Bayes Stories and BayesBag,
2014.
Pierre E. Jacob Unbiased MCMC
Discussion
Perfect samplers would yield same benefits and more.
Proposed estimators are perhaps more applicable,
no need for extra knowledge about π,
but require coming up with coupled kernel ¯P.
Cannot work if underlying MCMC doesn’t work. . .
but parallelism allows for more expensive MCMC kernels.
Optimal tuning of underlying MCMC algorithms is not
necessarily optimal for the proposed estimators.
Pierre E. Jacob Unbiased MCMC
Links – thank you for listening!
** P.J., John O’Leary, Yves F. Atchad´e
Unbiased Markov chain Monte Carlo with couplings, 2019+
P.J., Fredrik Lindsten, Thomas Sch¨on
Smoothing with Couplings of Conditional Particle Filters, 2018.
Jeremy Heng, P.J.
Unbiased Hamiltonian Monte Carlo with couplings, 2018.
Lawrence Middleton, George Deligiannidis, Arnaud Doucet, P.J.
Unbiased Markov chain Monte Carlo for intractable target
distributions, 2019+.
Maxime Rischard, P.J., Natesh Pillai,
Unbiased estimation of log normalizing constants with applications to
Bayesian cross-validation, 2019+.
Code on Github: github.com/pierrejacob/
Blog: statisfaction.wordpress.com/
Pierre E. Jacob Unbiased MCMC

More Related Content

What's hot

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 

What's hot (20)

Chris Sherlock's slides
Chris Sherlock's slidesChris Sherlock's slides
Chris Sherlock's slides
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problems
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Linear response theory and TDDFT
Linear response theory and TDDFT Linear response theory and TDDFT
Linear response theory and TDDFT
 

Similar to Recent developments on unbiased MCMC

PaperNo5-HabibiYousefi-IJAM
PaperNo5-HabibiYousefi-IJAMPaperNo5-HabibiYousefi-IJAM
PaperNo5-HabibiYousefi-IJAM
Mezban Habibi
 

Similar to Recent developments on unbiased MCMC (20)

Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloUnbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Phase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsPhase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle Systems
 
Univariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisUnivariate Financial Time Series Analysis
Univariate Financial Time Series Analysis
 
02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf
 
02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf
 
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”
 
On estimating the integrated co volatility using
On estimating the integrated co volatility usingOn estimating the integrated co volatility using
On estimating the integrated co volatility using
 
Ph 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICSPh 101-9 QUANTUM MACHANICS
Ph 101-9 QUANTUM MACHANICS
 
Common Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative SchemesCommon Fixed Theorems Using Random Implicit Iterative Schemes
Common Fixed Theorems Using Random Implicit Iterative Schemes
 
Recursive State Estimation AI for Robotics.pdf
Recursive State Estimation AI for Robotics.pdfRecursive State Estimation AI for Robotics.pdf
Recursive State Estimation AI for Robotics.pdf
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Nonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo MethodNonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo Method
 
International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI) International Journal of Mathematics and Statistics Invention (IJMSI)
International Journal of Mathematics and Statistics Invention (IJMSI)
 
PaperNo5-HabibiYousefi-IJAM
PaperNo5-HabibiYousefi-IJAMPaperNo5-HabibiYousefi-IJAM
PaperNo5-HabibiYousefi-IJAM
 

More from Pierre Jacob

More from Pierre Jacob (13)

Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
 
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problems
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov models
 
On non-negative unbiased estimators
On non-negative unbiased estimatorsOn non-negative unbiased estimators
On non-negative unbiased estimators
 
Path storage in the particle filter
Path storage in the particle filterPath storage in the particle filter
Path storage in the particle filter
 
SMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space modelsSMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space models
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7
 
Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Presentation MCB seminar 09032011
Presentation MCB seminar 09032011
 

Recently uploaded

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 

Recently uploaded (20)

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

Recent developments on unbiased MCMC

  • 1. Recent developments on unbiased Markov chain Monte Carlo Pierre E. Jacob (Harvard) joint work with John O’Leary (Harvard), Yves Atchad´e (Boston University) Machine Learning Seminar Duke University December 5, 2018 Pierre E. Jacob Unbiased MCMC
  • 2. Outline 1 Setting: MCMC and bias 2 Unbiased estimators from Markov kernels 3 Illustrations on standard MCMC settings 4 Beyond the standard MCMC setting Pierre E. Jacob Unbiased MCMC
  • 3. Outline 1 Setting: MCMC and bias 2 Unbiased estimators from Markov kernels 3 Illustrations on standard MCMC settings 4 Beyond the standard MCMC setting Pierre E. Jacob Unbiased MCMC
  • 4. Setting Continuous or discrete space of dimension d. Target probability distribution π, with probability density function x → π(x). Goal: approximate π, e.g. for a class of functions h, approximate Eπ[h(X)], defined as h(x)π(x)dx and also written π(h). Pierre E. Jacob Unbiased MCMC
  • 5. Markov chain Monte Carlo Initially, X0 ∼ π0, then Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, . . . , T. Estimator: 1 T − b T t=b+1 h(Xt), where b iterations are discarded as burn-in. Might converge to Eπ[h(X)] as T → ∞ by the ergodic theorem. An MCMC algorithm ≡ a description of how to sample from π0 and from a Markov kernel P leaving π invariant. Pierre E. Jacob Unbiased MCMC
  • 6. Bias and variance The bias goes to zero as T → ∞: E[ 1 T − b T t=b+1 h(Xt)] − Eπ[h(X)] → 0, but the bias is not zero for any fixed b, T, since π0 = π, i.e. because the chain does not start at stationarity. Therefore, obtaining many such estimators in parallel, for fixed T, b, and averaging them, would not provide a consistent estimator of Eπ[h(X)]. Pierre E. Jacob Unbiased MCMC
  • 7. Preview In the proposed approach, each run will involve two coupled chains (Xt) and (Yt), until some event occurs, which will happen after a random number of iterations denoted by τ. At that point an estimator later denoted by Hk:m will be returned, with the guarantee that E[Hk:m] = Eπ[h(X)] (unbiasedness). Removing the bias will come at a price in terms of computing cost and variance. Pierre E. Jacob Unbiased MCMC
  • 8. Why do we care about unbiased estimators? In classical point estimation, unbiasedness is not crucial. Larry Wasserman in “All of Statistics” (2003) writes: Unbiasedness used to receive much attention but these days is considered less important. On the other hand, Jeff Rosenthal in “Parallel computing and Monte Carlo algorithms” (2000) writes When running parallel Monte Carlo with many computers, it is more important to start with an unbiased (or low-bias) estimate than with a low-variance estimate. Pierre E. Jacob Unbiased MCMC
  • 9. Why do we care about unbiased estimators? Pierre E. Jacob Unbiased MCMC
  • 10. Parallel computing Each among R processors generates unbiased estimators. The number of estimators completed by time t is random. There are different ways of aggregating estimators at time t, either discarding on-going calculations or not. We might want central limit theorems as R → ∞ for fixed t, as t → ∞ for fixed R, as t and R both go to infinity. This has been thoroughly studied already, e.g. Glynn & Heidelberger, Analysis of parallel replicated simulations under a completion time constraint, 1991. Pierre E. Jacob Unbiased MCMC
  • 11. Outline 1 Setting: MCMC and bias 2 Unbiased estimators from Markov kernels 3 Illustrations on standard MCMC settings 4 Beyond the standard MCMC setting Pierre E. Jacob Unbiased MCMC
  • 12. Coupled chains Generate two Markov chains (Xt) and (Yt) as follows, sample X0 and Y0 from π0 (independently or not), sample X1|X0 ∼ P(X0, ·), for t ≥ 1, sample (Xt+1, Yt)|(Xt, Yt−1) ∼ ¯P ((Xt, Yt−1), ·), ¯P is such that Xt+1|Xt ∼ P(Xt, ·) and Yt|Yt−1 ∼ P(Yt−1, ·), so each chain marginally evolves according to P. Xt and Yt have the same distribution for all t ≥ 0. ¯P is also such that ∃τ such that Xτ = Yτ−1, and Xt = Yt−1 for all t ≥ τ (the chains are faithful). Pierre E. Jacob Unbiased MCMC
  • 13. Debiasing idea Marginal convergence of each chain: Eπ[h(X)] = lim t→∞ E[h(Xt)]. Limit as a telescopic sum, for all k ≥ 0, lim t→∞ E[h(Xt)] = E[h(Xk)] + ∞ t=k+1 E[h(Xt) − h(Xt−1)]. Since for all t ≥ 0, Xt and Yt have the same distribution, Eπ[h(X)] = E[h(Xk)] + ∞ t=k+1 E[h(Xt) − h(Yt−1)]. Pierre E. Jacob Unbiased MCMC
  • 14. Debiasing idea If we can swap expectation and limit, Eπ[h(X)] = E[h(Xk) + ∞ t=k+1 (h(Xt) − h(Yt−1))], the random variable h(Xk) + ∞ t=k+1 (h(Xt) − h(Yt−1)) is an unbiased estimator of Eπ[h(X)]. We can compute the infinite sum since Xt = Yt−1 for all t ≥ τ. Pierre E. Jacob Unbiased MCMC
  • 15. Unbiased estimators Unbiased estimator is given by Hk(X, Y ) = h(Xk) + τ−1 t=k+1 (h(Xt) − h(Yt−1)), with the convention τ−1 t=k+1{·} = 0 if τ − 1 < k + 1. Note: h(Xk) alone would be biased; the other terms correct for the bias. Cost: τ − 1 calls to ¯P and 1 + max(0, k − τ) calls to P. Glynn & Rhee, Exact estimation for Markov chain equilibrium expectations, 2014. Pierre E. Jacob Unbiased MCMC
  • 16. Conditions 1 Marginal chain converges: E[h(Xt)] → Eπ[h(X)], and h(Xt) has (2 + η)-finite moments for all t. 2 Meeting time τ has geometric tails: ∃C < +∞ ∃δ ∈ (0, 1) ∀t ≥ 0 P(τ > t) ≤ Cδt . 3 Chains stay together: Xt = Yt−1 for all t ≥ τ. Condition 2 itself implied by e.g. geometric drift condition. Under these conditions, Hk(X, Y ) is unbiased, has finite expected cost and finite variance, for all k. Pierre E. Jacob Unbiased MCMC
  • 17. Conditions: update Middleton, Deligiannidis, Doucet, J., 2018. 1 Marginal chain converges: E[h(Xt)] → Eπ[h(X)], and h(Xt) has (2 + η)-finite moments for all t. 2 Meeting time τ has polynomial tails: ∃C < +∞ ∃δ > 2(2η−1 + 1) ∀t ≥ 0 P(τ > t) ≤ Ct−δ . 3 Chains stay together: Xt = Yt−1 for all t ≥ τ. Condition 2 itself implied by e.g. polynomial drift condition. Under these conditions, Hk(X, Y ) is unbiased, has finite expected cost and finite variance, for all k. Pierre E. Jacob Unbiased MCMC
  • 18. Time-averaged estimators Since Hk(X, Y ) is unbiased for all k, the “time-averaged” estimator, for all m ≥ k, Hk:m(X, Y ) = 1 m − k + 1 m =k H (X, Y ), is also unbiased, and can be written 1 m − k + 1 m t=k h(Xt)+ τ−1 t=k+1 min 1, t − k m − k + 1 (h(Xt)−h(Yt−1)), i.e. standard MCMC average + bias correction term. As k → ∞ and m ≥ k, Hk:m(X, Y ) becomes MCMC average with increasing probability; thus variance should be comparable? Pierre E. Jacob Unbiased MCMC
  • 19. Efficiency Writing Hk:m(X, Y ) = MCMCk:m + BCk:m, then, denoting the MSE of MCMCk:m by MSEk:m, V[Hk:m(X, Y )] ≤ MSEk:m+2 MSEk:m E BC2 k:m +E BC2 k:m . Under geometric drift condition and regularity assumptions on h, for some δ < 1, C < ∞, E[BC2 k:m] ≤ Cδk (m − k + 1)2 , and δ is directly related to tails of the meeting time. Similarly under polynomial drift, we obtain a term of the form k−δ in the numerator instead of δk. Pierre E. Jacob Unbiased MCMC
  • 20. Signed measure estimator Replacing function evaluations by delta masses leads to ˆπ(·) = 1 m − k + 1 m t=k δXt (·) + τ−1 t=k+1 min 1, t − k m − k + 1 (δXt (·) − δYt−1 (·)), which is of the form ˆπ(·) = N s=1 ωsδZs (·), where N s=1 ωs = 1 but some ωs might be negative. Pierre E. Jacob Unbiased MCMC
  • 21. Designing coupled chains To implement the proposed unbiased estimators, we need to sample from a Markov kernel ¯P, such that, when (Xt+1, Yt) is sampled from ¯P ((Xt, Yt−1), ·), marginally Xt+1|Xt ∼ P(Xt, ·), and Yt|Yt−1 ∼ P(Yt−1, ·), it is possible that Xt+1 = Yt exactly for some t ≥ 0, if Xt = Yt−1, then Xt+1 = Yt almost surely. Pierre E. Jacob Unbiased MCMC
  • 22. Couplings of MCMC algorithms Propp & Wilson, Exact sampling with coupled Markov chains and applications to statistical mechanics, 1996. Johnson, Studying convergence of Markov chain Monte Carlo algorithms using coupled sample paths, 1996. ** Neal, Circularly-coupled Markov chain sampling, 1999. Pinto & Neal, Improving Markov chain Monte Carlo estimators by coupling to an approximating chain, 2001. ** Glynn & Rhee, Exact estimation for Markov chain equilibrium expectations, 2014. Pierre E. Jacob Unbiased MCMC
  • 23. Couplings (X, Y ) follows a coupling of p and q if X ∼ p and Y ∼ q, i.e. a coupling of p and q is a joint distribution such that first marginal is p, second marginal is q. There are infinitely many couplings of p and q. If X ∼ p and Y ∼ q are independent, (X, Y ) follows an independent coupling of p and q. A maximal coupling maximizes P(X = Y ), over all couplings of p and q. Pierre E. Jacob Unbiased MCMC
  • 24. Independent coupling of Gamma and Normal Pierre E. Jacob Unbiased MCMC
  • 25. Maximal coupling of Gamma and Normal Pierre E. Jacob Unbiased MCMC
  • 26. Maximal coupling: algorithm Input: p and q. Output: pairs (X, Y ) from max coupling of p and q. 1 Sample X ∼ p and W ∼ U([0, 1]). If W ≤ q(X)/p(X), output (X, X). 2 Otherwise, sample Y ∼ q and W ∼ U([0, 1]) until W > p(Y )/q(Y ), and output (X, Y ). e.g. Thorisson, Coupling, stationarity, and regeneration, 2000, Chapter 1, Section 4.5. Pierre E. Jacob Unbiased MCMC
  • 27. Metropolis–Hastings (kernel P) At each iteration t, Markov chain at state Xt, 1 propose X ∼ k(Xt, ·), e.g. X ∼ N(Xt, V ), 2 sample U ∼ U([0, 1]), 3 if U ≤ min 1, π(X )k(X , Xt) π(Xt)k(Xt, X ) , set Xt+1 = X , otherwise set Xt+1 = Xt. How to construct two MH chains at states Xt and Yt such that {Xt+1 = Yt+1} can happen? Pierre E. Jacob Unbiased MCMC
  • 28. Coupling of Metropolis–Hastings (kernel ¯P) At each iteration t, two Markov chains at states Xt, Yt, 1 propose (X , Y ) from max coupling of k(Xt, ·) and k(Yt, ·), 2 sample U ∼ U([0, 1]), 3 if U ≤ min 1, π(X )k(X , Xt) π(Xt)k(Xt, X ) , set Xt+1 = X , otherwise set Xt+1 = Xt, if U ≤ min 1, π(Y )k(Y , Yt) π(Yt)k(Yt, Y ) , set Yt+1 = Y , otherwise set Yt+1 = Yt. Pierre E. Jacob Unbiased MCMC
  • 29. Coupling of Gibbs Gibbs sampler: update component i of the chain, leaving π(dxi|x1, . . . , xi−1, xi+1, . . . , xd) invariant. For instance, we can propose X ∼ k(Xi t, ·) to replace Xi t, and accept or not with a Metropolis–Hastings step. These proposals can be maximally coupled across two chains, at each component update. The chains meet when all components have met. Pierre E. Jacob Unbiased MCMC
  • 30. Hamiltonian Monte Carlo Introduce potential energy U(q) = − log π(q), and total energy E(q, p) = U(q) + 1 2|p|2. Hamiltonian dynamics for (q(s), p(s)), where s ≥ 0: d ds q(s) = pE(q(s), p(s)) = p(s), d ds p(s) = − qE(q(s), p(s)) = − U(q(s)). Ideal algorithm: at step t, given position q(0) = Xt, sample p(0) ∼ N(0, I), solve dynamics over time length S to get (q(S), p(S)), set Xt+1 = q(S). then the Markov chain (Xt) would be converging to π. Pierre E. Jacob Unbiased MCMC
  • 31. Hamiltonian Monte Carlo Solving Hamiltonian dynamics exactly is not feasible, but many discretizations have been proposed. Starting from position q0, sample p0 ∼ N(0, I), and for = 0, . . . , L − 1, compute p +1/2 = p − ε 2 U(q ), q +1 = q + ε p +1/2, p +1 = p +1/2 − ε 2 U(q +1), accept or not qL as the new state of the chain, according to a Metropolis–Hastings ratio min(1, exp (E(q0, p0) − E(qL, pL))). Pierre E. Jacob Unbiased MCMC
  • 32. Coupling of HMC Common random numbers can make two HMC chains contract, under assumptions on the target such as strong log-concavity, i.e. the potential energy is strongly convex. Using mixtures of HMC and RWMH kernels, two chains can then exactly meet. Heng & J., Unbiased HMC with couplings, 2018. Mangoubi & Smith, Rapid mixing of HMC strongly log-concave distributions, 2017. In non-convex cases, see Bou-Rabee, Eberle & Zimmer, Coupling and Convergence for Hamiltonian Monte Carlo, 2018. Pierre E. Jacob Unbiased MCMC
  • 33. Couplings of other MCMC algorithms Conditional particle filters, with or without ancestor sampling: J., Fredrik Lindsten, Thomas Sch¨on Smoothing with Couplings of Conditional Particle Filters, 2018. Pseudo-marginal methods, including particle marginal Metropolis–Hastings (PMMH), and exchange algorithm: Middleton, Deligiannidis, Doucet, J., Unbiased Markov chain Monte Carlo for intractable target distributions, 2018. Pierre E. Jacob Unbiased MCMC
  • 34. Outline 1 Setting: MCMC and bias 2 Unbiased estimators from Markov kernels 3 Illustrations on standard MCMC settings 4 Beyond the standard MCMC setting Pierre E. Jacob Unbiased MCMC
  • 35. Logistic regression X1, . . . , Xp ∈ Rn: covariates, with n = 1000, p = 300 (German credit data). Y ∈ Rn: binary response variable. Model: P(Yi = 1) = {1 + exp(−a − b xi)}−1. Parameters: intercept a ∈ R, coefficients b ∈ R300. Prior: a|s2 ∼ N(0, s2), b|s2 ∼ N(0, s2I) and s2 ∼ Exp(0.01). Target π is distribution of (a, b, log s2) given Y, X, on R302. Heng & J., Unbiased Hamiltonian Monte Carlo with couplings, 2018. Pierre E. Jacob Unbiased MCMC
  • 36. Logistic regression k m Cost Relative inefficiency 1 k 436 1989.07 1 5k 436 1671.93 1 10k 436 1403.28 90% quantile(τ) k 553 38.11 90% quantile(τ) 5k 1868 1.23 90% quantile(τ) 10k 3518 1.05 Inefficiencies are averaged over test functions yielding all first and second moments of the target distribution. Pierre E. Jacob Unbiased MCMC
  • 37. Variable selection Integers p, n, with possibly n < p. X1, . . . , Xp ∈ Rn: covariates. Y ∈ Rn: response variable. γ ∈ {0, 1}p represents which covariates to select. |γ| = p i=1 γi: number of selected variables. Pierre E. Jacob Unbiased MCMC
  • 38. Variable selection Xγ is the n × |γ| matrix of covariates selected by γ. Model: Y = Xγβγ + w, where w ∼ N(0, σ2 In). Conjugate prior leads to π(Y |X, γ) ∝ (1 + g)−|γ|/2 (1 + g(1 − R2 γ))n/2 , where R2 γ = Y Xγ(XγXγ)−1XγY Y Y . The prior on γ is set as π(γ) ∝ p−κ|γ|1(|γ| ≤ s0). Yang, Wainwright & Jordan, On the computational complexity of high-dimensional Bayesian variable selection, 2016. Pierre E. Jacob Unbiased MCMC
  • 39. Variable selection MCMC algorithm randomly alternates between flipping a component, from 1 to 0 or from 0 to 1, accept or not with MH step; randomly swapping a 0 and an 1, accept or not with MH step. Maximal couplings of these proposals can be implemented. Pierre E. Jacob Unbiased MCMC
  • 40. Variable selection Generate n = 500 rows, with covariates X generated as independent standard Normals. Outcome variable Y generated from the model, with a coefficient β such that first ten entries are non-zero. Other parameters: σ2 = σ2 0 = 1, s0 = 100, g = p3, κ = 2. Pierre E. Jacob Unbiased MCMC
  • 41. Variable selection: scaling of the meeting times with p 0 10 20 30 40 100 250 500 750 1000 p meetingtimes/p Pierre E. Jacob Unbiased MCMC
  • 42. Variable selection: inclusion probabilities q q q q q q q q q q q q q q q q q q q q 0.00 0.25 0.50 0.75 1.00 0 5 10 15 20 variable inclusionprobability kappa: q 0.1 1 2 For p = 1, 000, three values of κ, using k = 75, 000 and m = 2k. Error bars generated on 600 processors for 2 hours. Lines are obtained with 106 iterations of standard MCMC. Pierre E. Jacob Unbiased MCMC
  • 43. Outline 1 Setting: MCMC and bias 2 Unbiased estimators from Markov kernels 3 Illustrations on standard MCMC settings 4 Beyond the standard MCMC setting Pierre E. Jacob Unbiased MCMC
  • 44. Unbiased property Unbiased estimators can be used to tackle problems for which MCMC methods are ill-suited. For instance, problems where one needs to approximate an integral of a function which itself is defined as an integral: h(x1, x2)π2(x2|x1)dx2 π1(x1)dx1, or equivalently E1 [E2[h(X1, X2)|X1]] . Rainforth, Cornish, Yang, Warrington, On nesting Monte Carlo estimators, 2018. Pierre E. Jacob Unbiased MCMC
  • 45. Example 1: cut distribution First model, parameter θ1, data Y1, posterior π1(θ1|Y1), expectation E1[·]. Second model, parameter θ2, data Y2, posterior: π2(θ2|Y2, θ1) = p2(θ2|θ1)p2(Y2|θ1, θ2) p2(Y2|θ1) , expectation E2[·|θ1]. Pierre E. Jacob Unbiased MCMC
  • 46. Cut distribution How to do inference on θ2 while taking the uncertainty of θ1 into account? Full posterior under the joint model: π2(θ1, θ2|Y1, Y2). Can be approximated via MCMC, but misspecification of any part “contaminates” the ensemble. Liu, Bayarri, Berger, Modularization in Bayesian Analysis, with Emphasis on Analysis of Computer Models, 2009. J., Murray, Holmes, Robert, Better together? Statistical learning in models made of modules, 2017 Pierre E. Jacob Unbiased MCMC
  • 47. Cut distribution Cut distribution: πcut(θ1, θ2) = π1(θ1|Y1)π2(θ2|Y2, θ1). Different than full posterior = π1(θ1|Y1, Y2)π2(θ2|Y2, θ1). The marginal of θ1 is the first model’s posterior. Probabilistic counterpart to two-step estimators. But how to sample from the cut distribution? Plummer, Cuts in Bayesian graphical models, 2015. Pierre E. Jacob Unbiased MCMC
  • 48. Cut distribution Express expectation with respect to cut distribution Ecut[h(θ1, θ2)] using the tower property, i.e. E1[E2[h(θ1, θ2)|θ1]]. Can be unbiasedly estimated using the tower property of expectations, and unbiased estimators of E2[·|θ1] for all θ1, and unbiased estimators of E1[·]. Pierre E. Jacob Unbiased MCMC
  • 49. Example: epidemiological study Model of virus prevalence ∀i = 1, . . . , I Zi ∼ Binomial(Ni, ϕi), Zi is number of women infected with high-risk HPV in a sample of size Ni in country i. Impact of prevalence onto cervical cancer occurrence ∀i = 1, . . . , I Yi ∼ Poisson(λiTi), log(λi) = θ2,1 + θ2,2ϕi, Yi is number of cancer cases arising from Ti woman-years of follow-up in country i. Plummer, Cuts in Bayesian graphical models, 2015. Pierre E. Jacob Unbiased MCMC
  • 50. Example: epidemiological study Approximations of the cut distribution marginals: 0 1 2 3 −2.5 −2.0 −1.5 θ2,1 density 0.00 0.05 0.10 0.15 10 15 20 25 θ2,2 density Red curves were obtained by long MCMC run targeting π2(θ2|Y2, θ1), for draws θ1 from first posterior π1(θ1|Y1). Black bars were obtained with unbiased MCMC. Pierre E. Jacob Unbiased MCMC
  • 51. Example 2: normalizing constants We are interested in normalizing constant of π, for which we can compute unnormalized density x → exp(−U(x)). Introduce ∀λ ∈ [0, 1] πλ(x) = Z−1 λ exp(−Uλ(x)), where Zλ = exp(−Uλ(x))dx. Thermodynamics integration or path sampling identity: log Z1 Z0 = − 1 0 λUλ(x)πλ(dx) q(λ) q(dλ), where q(λ) is a distribution supported on [0, 1]. Kirkwood, Statistical mechanics of fluid mixtures, 1935. Pierre E. Jacob Unbiased MCMC
  • 52. Normalizing constants Proposed procedure: sample λ ∼ q(dλ), produce an unbiased estimator ˆE(λ) of − λUλ(x)πλ(dx), return ˆr01 = ˆE(λ)/q(λ). The resulting variable ˆr01 has expectation log(Z1/Z0). The cost and variance of ˆr01 can be estimated, and leveraged to guide the choice of q(dλ). Rischard, J., Pillai, Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation, 2018. Pierre E. Jacob Unbiased MCMC
  • 53. Example 3: Bayesian cross-validation Data y = {y1, . . . , yn}, made of n units. Partition y into training T and validation V sets. For instance, by sampling nT indices out of n at random without replacement to form T. For each split y = {T, V }, one might want to assess predictive performance with log p(V |T) = log p(V |θ, T)p(dθ|T), i.e. using the logarithmic scoring rule. Then log p(V |T) is a log-ratio of normalizing constants, p(T, V ) and p(T), so that the proposed estimator ˆr01 can be applied. Pierre E. Jacob Unbiased MCMC
  • 54. Bayesian cross-validation One might want to approximate the criterion CV = − n nT T,V log p(V |T), where the sum is over all splits (T, V ) of size (nT , n − nT ). Proposed procedure: randomly sample a split (T, V ) of size (nT , n − nT ), introduce a “path” between p(dθ|T) and p(dθ|T, V ), sample ˆr01, an unbiased estimator of log p(V |T), return −ˆr01. The resulting variable has expectation equal to CV. Pierre E. Jacob Unbiased MCMC
  • 55. Example 4: bagging Bayesian predictions? Consider “bagging” (Breiman, 1994), applied to predictors obtained by Bayesian procedures. obtain y , a bootstrap sample of the data y, approximate the resulting posterior distribution p(dθ|y ), perform posterior predictions based on approximation. This is another example of nested Monte Carlo problem. B¨uhlmann, Discussion of Big Bayes Stories and BayesBag, 2014. Pierre E. Jacob Unbiased MCMC
  • 56. Discussion Perfect samplers would yield same benefits and more. Proposed estimators are perhaps more applicable, no need for extra knowledge about π, but require coming up with coupled kernel ¯P. Cannot work if underlying MCMC doesn’t work. . . but parallelism allows for more expensive MCMC kernels. Optimal tuning of underlying MCMC algorithms is not necessarily optimal for the proposed estimators. Pierre E. Jacob Unbiased MCMC
  • 57. Links – thank you for listening! ** P.J., John O’Leary, Yves F. Atchad´e Unbiased Markov chain Monte Carlo with couplings, 2019+ P.J., Fredrik Lindsten, Thomas Sch¨on Smoothing with Couplings of Conditional Particle Filters, 2018. Jeremy Heng, P.J. Unbiased Hamiltonian Monte Carlo with couplings, 2018. Lawrence Middleton, George Deligiannidis, Arnaud Doucet, P.J. Unbiased Markov chain Monte Carlo for intractable target distributions, 2019+. Maxime Rischard, P.J., Natesh Pillai, Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation, 2019+. Code on Github: github.com/pierrejacob/ Blog: statisfaction.wordpress.com/ Pierre E. Jacob Unbiased MCMC