Zirakpur Call Girls👧 Book Now📱8146719683 📞👉Mohali Call Girl Service No Advanc...
Couplings of Markov chains and the Poisson equation
1. Couplings of Markov chains
& the Poisson equation
Pierre E. Jacob
Department of Statistics, Harvard University
March 22, 2021
Pierre E. Jacob Couplings, donkeys, coins and fish
2. Outline
1 Context
2 Couplings
General idea
Donkey walk
Conditional Bernoulli
Empirical rates of convergence
3 Poisson equation
Definition
Asymptotic variance estimation
Pierre E. Jacob Couplings, donkeys, coins and fish
3. Thank you!
First I want to thank these fantastic co-authors whose works
will be mentioned in this talk:
Yves Atchadé, Anirban Bhattacharya, Niloy Biswas, Arthur P.
Dempster, Randal Douc, Paul Edlefsen, Ruobin Gong, Jeremy
Heng, James Johndrow, Nianqiao (Phyllis) Ju, Anthony Lee,
John O’Leary, Natesh Pillai, Emilia Pompe, Maxime Rischard,
Paul Vanetti, Dootika Vats, Guanyang Wang.
Pierre E. Jacob Couplings, donkeys, coins and fish
4. Dr. Arianna Wright Rosenbluth (1927-2020)
From https://www.nytimes.com/2021/02/09/science/
arianna-wright-dead.html, by Katie Hafner.
Pierre E. Jacob Couplings, donkeys, coins and fish
5. Setting
Target probability distribution π. Markov chain Monte Carlo:
X0 ∼ π0, then Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, 2, . . .
Pierre E. Jacob Couplings, donkeys, coins and fish
6. Setting
Target probability distribution π. Markov chain Monte Carlo:
X0 ∼ π0, then Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, 2, . . .
Notation:
πt = πt−1P =
R
πt−1(dxt−1)P(xt−1, ·),
π(h) =
R
h(x)π(dx).
Pierre E. Jacob Couplings, donkeys, coins and fish
7. Setting
Target probability distribution π. Markov chain Monte Carlo:
X0 ∼ π0, then Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, 2, . . .
Notation:
πt = πt−1P =
R
πt−1(dxt−1)P(xt−1, ·),
π(h) =
R
h(x)π(dx).
Convergence of marginals:
kπt − πk → 0.
Pierre E. Jacob Couplings, donkeys, coins and fish
8. Setting
Target probability distribution π. Markov chain Monte Carlo:
X0 ∼ π0, then Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, 2, . . .
Notation:
πt = πt−1P =
R
πt−1(dxt−1)P(xt−1, ·),
π(h) =
R
h(x)π(dx).
Convergence of marginals:
kπt − πk → 0.
Central limit theorem:
√
t t−1
t−1
X
s=0
h(Xs) − π(h)
!
→ N(0, v(P, h)).
Pierre E. Jacob Couplings, donkeys, coins and fish
9. Setting
Target probability distribution π. Markov chain Monte Carlo:
X0 ∼ π0, then Xt|Xt−1 ∼ P(Xt−1, ·) for t = 1, 2, . . .
Notation:
πt = πt−1P =
R
πt−1(dxt−1)P(xt−1, ·),
π(h) =
R
h(x)π(dx).
Convergence of marginals:
kπt − πk → 0.
Central limit theorem:
√
t t−1
t−1
X
s=0
h(Xs) − π(h)
!
→ N(0, v(P, h)).
How can we choose t? How can we estimate v(P, h)?
Pierre E. Jacob Couplings, donkeys, coins and fish
10. How many iterations are enough?
Charles Geyer: “If you can’t get a good answer with one
long run, then you can’t get a good answer with many short
runs either.”
Pierre E. Jacob Couplings, donkeys, coins and fish
11. How many iterations are enough?
Charles Geyer: “If you can’t get a good answer with one
long run, then you can’t get a good answer with many short
runs either.”
An anonymous source: “I still remember fondly (?!) my
first Valencia Bayesian Statistics meeting in I think 1991
when Adrian Smith and Andrew Gelman had a bit of a
stand-up argument about MCMC implementation with
multiple or single chains! It’s 30 years since then but many
of the issues are still unresolved.”
Pierre E. Jacob Couplings, donkeys, coins and fish
12. How many iterations are enough?
Charles Geyer: “If you can’t get a good answer with one
long run, then you can’t get a good answer with many short
runs either.”
An anonymous source: “I still remember fondly (?!) my
first Valencia Bayesian Statistics meeting in I think 1991
when Adrian Smith and Andrew Gelman had a bit of a
stand-up argument about MCMC implementation with
multiple or single chains! It’s 30 years since then but many
of the issues are still unresolved.”
From C. McCartan & K. Imai: “[. . . ] Pegden ran an
MCMC algorithm for one trillion steps [. . . ]”.
Pierre E. Jacob Couplings, donkeys, coins and fish
13. How many iterations are enough?
Charles Geyer: “If you can’t get a good answer with one
long run, then you can’t get a good answer with many short
runs either.”
An anonymous source: “I still remember fondly (?!) my
first Valencia Bayesian Statistics meeting in I think 1991
when Adrian Smith and Andrew Gelman had a bit of a
stand-up argument about MCMC implementation with
multiple or single chains! It’s 30 years since then but many
of the issues are still unresolved.”
From C. McCartan & K. Imai: “[. . . ] Pegden ran an
MCMC algorithm for one trillion steps [. . . ]”.
In Stan, the default is 2000 iterations. In Nimble, the user
must specify that number.
Pierre E. Jacob Couplings, donkeys, coins and fish
14. How many iterations are enough?
Charles Geyer: “If you can’t get a good answer with one
long run, then you can’t get a good answer with many short
runs either.”
An anonymous source: “I still remember fondly (?!) my
first Valencia Bayesian Statistics meeting in I think 1991
when Adrian Smith and Andrew Gelman had a bit of a
stand-up argument about MCMC implementation with
multiple or single chains! It’s 30 years since then but many
of the issues are still unresolved.”
From C. McCartan & K. Imai: “[. . . ] Pegden ran an
MCMC algorithm for one trillion steps [. . . ]”.
In Stan, the default is 2000 iterations. In Nimble, the user
must specify that number.
It would be simpler if we could just specify a “tolerance”
parameter, or a time limit.
Pierre E. Jacob Couplings, donkeys, coins and fish
15. Outline
Reminders on couplings of Markov chains to obtain
convergence rates.
Might work “out of the box” (e.g. donkey walk) or might
require some extra care (e.g. conditional Bernoulli).
Couplings are implementable too, and provide useful
empirical assessments.
We will discuss connections to another mainstay of Markov
chain analysis, the Poisson equation, leading to a new
asymptotic variance estimator.
Pierre E. Jacob Couplings, donkeys, coins and fish
16. Outline
1 Context
2 Couplings
General idea
Donkey walk
Conditional Bernoulli
Empirical rates of convergence
3 Poisson equation
Definition
Asymptotic variance estimation
Pierre E. Jacob Couplings, donkeys, coins and fish
17. Couplings
Technique to study the convergence of Markov chains.
Construct a joint process (Xt, Yt) such that Yt ∼ π for all t ≥ 0,
and marginally both chains evolve according to same kernel P.
Pierre E. Jacob Couplings, donkeys, coins and fish
18. Couplings
Technique to study the convergence of Markov chains.
Construct a joint process (Xt, Yt) such that Yt ∼ π for all t ≥ 0,
and marginally both chains evolve according to same kernel P.
Suppose that there exists τ a random
variable such that Xt = Yt for all t ≥ τ.
Pierre E. Jacob Couplings, donkeys, coins and fish
19. Couplings
Technique to study the convergence of Markov chains.
Construct a joint process (Xt, Yt) such that Yt ∼ π for all t ≥ 0,
and marginally both chains evolve according to same kernel P.
Suppose that there exists τ a random
variable such that Xt = Yt for all t ≥ τ.
Then
kπt − πkTV = kL(Xt) − L(Yt)kTV
≤ P(Xt 6= Yt) = P(τ > t),
where k · kTV is the total variation distance.
Bru & Yor, Comments on the life and mathematical legacy of
Wolfgang Doeblin, 2002.
Pierre E. Jacob Couplings, donkeys, coins and fish
20. Couplings
Coupling techniques have proved very successful, in some cases
giving precise rates of convergence.
Jerrum, Mathematical foundations of the MCMC method, 1998.
Eberle, Reflection couplings and contraction rates for diffusions, PTRF,
2016.
Pillai & Smith, Kac’s walk on n-sphere mixes in n log n steps, AoAP, 2017.
Dieuleveut, Durmus & Bach, Bridging the gap between constant step size
stochastic gradient descent and Markov chains, AoS, 2020.
Pierre E. Jacob Couplings, donkeys, coins and fish
21. Couplings
Coupling techniques have proved very successful, in some cases
giving precise rates of convergence.
Jerrum, Mathematical foundations of the MCMC method, 1998.
Eberle, Reflection couplings and contraction rates for diffusions, PTRF,
2016.
Pillai & Smith, Kac’s walk on n-sphere mixes in n log n steps, AoAP, 2017.
Dieuleveut, Durmus & Bach, Bridging the gap between constant step size
stochastic gradient descent and Markov chains, AoS, 2020.
Coupling techniques provide
bounds on other metrics than TV,
kπt − πkW1 = inf
γ∈Γ(πt,π)
Eγ[d(X, Y )]
≤E[d(Xt, Yt)].
Pierre E. Jacob Couplings, donkeys, coins and fish
22. Couplings
Coupling techniques have proved very successful, in some cases
giving precise rates of convergence.
Jerrum, Mathematical foundations of the MCMC method, 1998.
Eberle, Reflection couplings and contraction rates for diffusions, PTRF,
2016.
Pillai & Smith, Kac’s walk on n-sphere mixes in n log n steps, AoAP, 2017.
Dieuleveut, Durmus & Bach, Bridging the gap between constant step size
stochastic gradient descent and Markov chains, AoS, 2020.
Coupling techniques provide
bounds on other metrics than TV,
kπt − πkW1 = inf
γ∈Γ(πt,π)
Eγ[d(X, Y )]
≤E[d(Xt, Yt)].
All of this appears theoretical, since we cannot sample Y0 ∼ π.
Pierre E. Jacob Couplings, donkeys, coins and fish
23. Outline
1 Context
2 Couplings
General idea
Donkey walk
Conditional Bernoulli
Empirical rates of convergence
3 Poisson equation
Definition
Asymptotic variance estimation
Pierre E. Jacob Couplings, donkeys, coins and fish
24. Example motivated by Dempster–Shafer inference
Pierre E. Jacob, Ruobin Gong, Paul T. Edlefsen & Arthur P.
Dempster, A Gibbs sampler for a class of random convex
polytopes, forthcoming discussion paper at JASA.
Consider two categories, and N0 + N1 = N counts, x1, . . . , xN .
Pierre E. Jacob Couplings, donkeys, coins and fish
25. Example motivated by Dempster–Shafer inference
Pierre E. Jacob, Ruobin Gong, Paul T. Edlefsen & Arthur P.
Dempster, A Gibbs sampler for a class of random convex
polytopes, forthcoming discussion paper at JASA.
Consider two categories, and N0 + N1 = N counts, x1, . . . , xN .
Model: xn = 1(un ≤ θ) for all n, with un ∼ Uniform(0, 1).
Dempster–Shafer framework asks for
F(u) = {θ ∈ (0, 1) : ∀n xn = 1(un ≤ θ)}, given F(u) 6= ∅.
Pierre E. Jacob Couplings, donkeys, coins and fish
26. Example motivated by Dempster–Shafer inference
Pierre E. Jacob, Ruobin Gong, Paul T. Edlefsen & Arthur P.
Dempster, A Gibbs sampler for a class of random convex
polytopes, forthcoming discussion paper at JASA.
Consider two categories, and N0 + N1 = N counts, x1, . . . , xN .
Model: xn = 1(un ≤ θ) for all n, with un ∼ Uniform(0, 1).
Dempster–Shafer framework asks for
F(u) = {θ ∈ (0, 1) : ∀n xn = 1(un ≤ θ)}, given F(u) 6= ∅.
We can work out the exact distribution of F(u) but here we
consider a Gibbs sampler which can be generalized to arbitrary
numbers of categories.
Pierre E. Jacob Couplings, donkeys, coins and fish
27. Example motivated by Dempster–Shafer inference
Denote Ik = {n : xn = k}. Conditionals:
{un : n ∈ I1}|{un : n ∈ I0} ∼ Uniform(0, min
n∈I0
un),
{un : n ∈ I0}|{un : n ∈ I1} ∼ Uniform(max
n∈I1
un, 1).
Example with N0 = 2, N1 = 3:
Pierre E. Jacob Couplings, donkeys, coins and fish
28. Donkey walk
We calculate the conditional distributions of Y = maxn∈I1 un
and Z = minn∈I0 un, and the Gibbs sampler simplifies to:
Zt = B1(1 − B0)Zt−1 + B0,
where B1 ∼ Beta(N1, 1) and B0 ∼ Beta(1, N0) are independent.
Pierre E. Jacob Couplings, donkeys, coins and fish
29. Donkey walk
We calculate the conditional distributions of Y = maxn∈I1 un
and Z = minn∈I0 un, and the Gibbs sampler simplifies to:
Zt = B1(1 − B0)Zt−1 + B0,
where B1 ∼ Beta(N1, 1) and B0 ∼ Beta(1, N0) are independent.
Letac, Donkey walk and Dirichlet distributions, Statistics & Probability
Letters, 2002.
Pierre E. Jacob Couplings, donkeys, coins and fish
30. Donkey walk
A “common random numbers” coupling
Zt = B1(1 − B0)Zt−1 + B0
Z̃t = B1(1 − B0)Z̃t−1 + B0,
leads to
kπt − πkW1 ≤
N0
N0 + 1
×
N1
N1 + 1
t
E
h
43. i
.
By Kantorovich–Rubinstein duality, and considering
h : x 7→ ±x, we can obtain a lower bound with the same rate, as
was pointed out by Guanyang Wang (Rutgers).
Pierre E. Jacob Couplings, donkeys, coins and fish
44. Donkey walk
A “common random numbers” coupling
Zt = B1(1 − B0)Zt−1 + B0
Z̃t = B1(1 − B0)Z̃t−1 + B0,
leads to
kπt − πkW1 ≤
N0
N0 + 1
×
N1
N1 + 1
t
E
h
50. i
.
By Kantorovich–Rubinstein duality, and considering
h : x 7→ ±x, we can obtain a lower bound with the same rate, as
was pointed out by Guanyang Wang (Rutgers).
Here we obtain practical guidance on the choice of number of
iterations t to perform; this is not often the case.
Pierre E. Jacob Couplings, donkeys, coins and fish
51. Outline
1 Context
2 Couplings
General idea
Donkey walk
Conditional Bernoulli
Empirical rates of convergence
3 Poisson equation
Definition
Asymptotic variance estimation
Pierre E. Jacob Couplings, donkeys, coins and fish
52. Example: Conditional Bernoulli
Jeremy Heng, Pierre E. Jacob Nianqiao Ju, A simple Markov
chain for independent Bernoulli variables conditioned on their
sum, on arXiv.
Let p = (p1, . . . , pN ) ∈ (0, 1)N and define wn = pn/(1 − pn), the
associated odds.
Pierre E. Jacob Couplings, donkeys, coins and fish
53. Example: Conditional Bernoulli
Jeremy Heng, Pierre E. Jacob Nianqiao Ju, A simple Markov
chain for independent Bernoulli variables conditioned on their
sum, on arXiv.
Let p = (p1, . . . , pN ) ∈ (0, 1)N and define wn = pn/(1 − pn), the
associated odds.
Let X = (X1, . . . , XN ) ∈ {0, 1}N such that Xn ∼ Bernoulli(pn),
independently.
Pierre E. Jacob Couplings, donkeys, coins and fish
54. Example: Conditional Bernoulli
Jeremy Heng, Pierre E. Jacob Nianqiao Ju, A simple Markov
chain for independent Bernoulli variables conditioned on their
sum, on arXiv.
Let p = (p1, . . . , pN ) ∈ (0, 1)N and define wn = pn/(1 − pn), the
associated odds.
Let X = (X1, . . . , XN ) ∈ {0, 1}N such that Xn ∼ Bernoulli(pn),
independently.
The conditional distribution of X given
PN
n=1 Xn = S is called
“conditional Bernoulli”, denoted by CBernoulli(p, S).
Pierre E. Jacob Couplings, donkeys, coins and fish
55. Example: Conditional Bernoulli
Jeremy Heng, Pierre E. Jacob Nianqiao Ju, A simple Markov
chain for independent Bernoulli variables conditioned on their
sum, on arXiv.
Let p = (p1, . . . , pN ) ∈ (0, 1)N and define wn = pn/(1 − pn), the
associated odds.
Let X = (X1, . . . , XN ) ∈ {0, 1}N such that Xn ∼ Bernoulli(pn),
independently.
The conditional distribution of X given
PN
n=1 Xn = S is called
“conditional Bernoulli”, denoted by CBernoulli(p, S).
Exact sampling costs O(S · N) operations. We assume S ∝ N.
Chen Liu, Statistical applications of the Poisson-Binomial and
conditional Bernoulli distributions, Statistica Sinica, 1997.
Pierre E. Jacob Couplings, donkeys, coins and fish
56. Example: Conditional Bernoulli
A Rosenbluth–Hastings transition goes as follows:
independently sample i0 ∈ I0 = {n : xn = 0} and
i1 ∈ I1 = {n : xn = 1} uniformly;
construct proposed state y with a swap i0 ↔ i1;
accept y as next state with probability min{1, wi0 /wi1 }.
Chen, Dempster Liu, Weighted finite population sampling to
maximize entropy, Biometrika, 1994.
Pierre E. Jacob Couplings, donkeys, coins and fish
57. Relevance
Identical success probabilities (pn):
the chain obtained by successive swaps is known as the
Bernoulli-Laplace diffusion model;
Pierre E. Jacob Couplings, donkeys, coins and fish
58. Relevance
Identical success probabilities (pn):
the chain obtained by successive swaps is known as the
Bernoulli-Laplace diffusion model;
the chain has been thoroughly studied; if S = N/2, mixing
occurs in N/8 · log N iterations (+ cutoff phenomenon).
Diaconis Shahshahani, Time to reach stationarity in the
Bernoulli-Laplace diffusion model, SIAM Journal on
Mathematical Analysis, 1987.
Pierre E. Jacob Couplings, donkeys, coins and fish
59. Relevance
Identical success probabilities (pn):
the chain obtained by successive swaps is known as the
Bernoulli-Laplace diffusion model;
the chain has been thoroughly studied; if S = N/2, mixing
occurs in N/8 · log N iterations (+ cutoff phenomenon).
Diaconis Shahshahani, Time to reach stationarity in the
Bernoulli-Laplace diffusion model, SIAM Journal on
Mathematical Analysis, 1987.
Non-identical (pn): arises in various contexts in statistics, and
occurred in our research on agent-based models:
Nianqiao Ju, Jeremy Heng Pierre E. Jacob, Sequential Monte Carlo
algorithms for agent-based models of disease transmission, on arXiv.
Pierre E. Jacob Couplings, donkeys, coins and fish
60. Assumptions
(Condition on the odds). The odds (wn) are such that
there exist ζ 0, 0 l r ∞ and η 0 such that for all
N large enough,
P (|{n ∈ [N] : wn /
∈ (l, r)}| ≤ ζN) ≥ 1 − exp(−ηN).
(Condition on S). There exist 0 ξ ≤ 1/2 and η0 0 such
that for all N large enough,
P (ξN ≤ S) ≥ 1 − exp(−η0
N).
We will work under these assumptions and ζ ξ.
We also assume S ≤ N/2 without loss of generality.
Pierre E. Jacob Couplings, donkeys, coins and fish
61. Convergence rate from couplings
Introduce two chains (x(t)) and (x̃(t)) evolving according to
coupled kernel P̄, x(0) ∼ π(0) and x̃(0) ∼ π.
Pierre E. Jacob Couplings, donkeys, coins and fish
62. Convergence rate from couplings
Introduce two chains (x(t)) and (x̃(t)) evolving according to
coupled kernel P̄, x(0) ∼ π(0) and x̃(0) ∼ π.
Hamming distance d(x, x̃) =
PN
n=1 1 (xn 6= x̃n).
Pierre E. Jacob Couplings, donkeys, coins and fish
63. Convergence rate from couplings
Introduce two chains (x(t)) and (x̃(t)) evolving according to
coupled kernel P̄, x(0) ∼ π(0) and x̃(0) ∼ π.
Hamming distance d(x, x̃) =
PN
n=1 1 (xn 6= x̃n).
Total variation distance
kπ(t)
− πkTV ≤ E
h
d(x(t)
, x̃(t)
)
i
.
We start from d(0) = d(x(0), x̃(0)) ≤ N.
Pierre E. Jacob Couplings, donkeys, coins and fish
64. Convergence rate from couplings
Introduce two chains (x(t)) and (x̃(t)) evolving according to
coupled kernel P̄, x(0) ∼ π(0) and x̃(0) ∼ π.
Hamming distance d(x, x̃) =
PN
n=1 1 (xn 6= x̃n).
Total variation distance
kπ(t)
− πkTV ≤ E
h
d(x(t)
, x̃(t)
)
i
.
We start from d(0) = d(x(0), x̃(0)) ≤ N.
Contraction:
E
h
d(t+1)
| x(t)
, x̃(t)
i
≤ (1 − cN )d(t)
Pierre E. Jacob Couplings, donkeys, coins and fish
65. Convergence rate from couplings
Introduce two chains (x(t)) and (x̃(t)) evolving according to
coupled kernel P̄, x(0) ∼ π(0) and x̃(0) ∼ π.
Hamming distance d(x, x̃) =
PN
n=1 1 (xn 6= x̃n).
Total variation distance
kπ(t)
− πkTV ≤ E
h
d(x(t)
, x̃(t)
)
i
.
We start from d(0) = d(x(0), x̃(0)) ≤ N.
Contraction:
E
h
d(t+1)
| x(t)
, x̃(t)
i
≤ (1 − cN )d(t)
implies, for any ∈ (0, 1),
kπ(t)
− πkTV ≤ ∀t ≥
log(N/)
− log(1 − cN )
,
We want cN ≥ N−1.
Pierre E. Jacob Couplings, donkeys, coins and fish
66. Convergence rate from couplings
Path coupling argument (Bubley Dyer, 1997): we can focus
on contraction from adjacent states, i.e. d(x, x̃) = 2.
Pierre E. Jacob Couplings, donkeys, coins and fish
67. Convergence rate from couplings
Path coupling argument (Bubley Dyer, 1997): we can focus
on contraction from adjacent states, i.e. d(x, x̃) = 2.
Let x, x̃ ∈ {0, 1}N be adjacent: they differ at locations a and b.
Assume xa = 0, xb = 1, x̃a = 1, x̃b = 0 and wa ≤ wb.
Pierre E. Jacob Couplings, donkeys, coins and fish
68. Convergence rate from couplings
Path coupling argument (Bubley Dyer, 1997): we can focus
on contraction from adjacent states, i.e. d(x, x̃) = 2.
Let x, x̃ ∈ {0, 1}N be adjacent: they differ at locations a and b.
Assume xa = 0, xb = 1, x̃a = 1, x̃b = 0 and wa ≤ wb.
Contraction rate from a maximal coupling strategy:
c(x, x̃) = P d(x0
, x̃0
) = 0
69.
70. x, x̃)
=
1 − wa
wb
+
P
i1∈I1∩Ĩ1
min
1, wa
wi1
+
P
i0∈I0∩Ĩ0
min
1,
wi0
wb
(N − S)S
.
Pierre E. Jacob Couplings, donkeys, coins and fish
71. Summary of problem and way forward
When pn ∼ Uniform(0, 1), wa = minn wn, wb = maxn wn,
contraction rate is of order N−2.
Pierre E. Jacob Couplings, donkeys, coins and fish
72. Summary of problem and way forward
When pn ∼ Uniform(0, 1), wa = minn wn, wb = maxn wn,
contraction rate is of order N−2.
However, by assumptions for most pairs of adjacent states,
wa, wb are of constant order. Starting from these states, chains
can meet with probability of order N−1.
Pierre E. Jacob Couplings, donkeys, coins and fish
73. Summary of problem and way forward
When pn ∼ Uniform(0, 1), wa = minn wn, wb = maxn wn,
contraction rate is of order N−2.
However, by assumptions for most pairs of adjacent states,
wa, wb are of constant order. Starting from these states, chains
can meet with probability of order N−1.
Thankfully, chains can move from ‘unfavorable’ to ‘favorable’
states quickly enough.
Pierre E. Jacob Couplings, donkeys, coins and fish
74. Favorable and unfavorable pairs
We can define ξF→D, ξU→F, ξF→U, ν 0 and 0 wlo whi ∞
such that, for all N large enough, with probability at least
1 − exp(−νN), the sets defined as
X̄U = {(x, x̃) ∈ X̄adj : wa wlo and wb whi},
X̄F = {(x, x̃) ∈ X̄adj : wa ≥ wlo or wb ≤ whi},
X̄D = {(x, x̃) ∈ X2
: x = x̃},
satisfy the following statements,
∀(x, x̃) ∈ X̄F, P̄((x, x̃), X̄D) ≥ ξF→D/N,
∀(x, x̃) ∈ X̄U, P̄((x, x̃), X̄F) ≥ ξU→F/N,
∀(x, x̃) ∈ X̄F, P̄((x, x̃), X̄U) ≤ ξF→U/N.
Pierre E. Jacob Couplings, donkeys, coins and fish
75. A three-state process specified by pairs of chains
Consider adjacent or identical states, define
B(x, x̃) =
1 if (x, x̃) ∈ X̄U (unfavorable),
2 if (x, x̃) ∈ X̄F (favorable),
3 if (x, x̃) ∈ X̄D (x = x̃).
Pierre E. Jacob Couplings, donkeys, coins and fish
76. A three-state process specified by pairs of chains
Consider adjacent or identical states, define
B(x, x̃) =
1 if (x, x̃) ∈ X̄U (unfavorable),
2 if (x, x̃) ∈ X̄F (favorable),
3 if (x, x̃) ∈ X̄D (x = x̃).
The process B(x(t), x̃(t)) can be coupled with a Markov
chain B̃(t) with transition matrix
1 − ξU→F/N ξU→F/N 0
ξF→U/N 1 − (ξF→U + ξF→D)/N ξF→D/N
0 0 1
,
which converges to the absorbing state 3 in O(N) steps.
Pierre E. Jacob Couplings, donkeys, coins and fish
77. Chasing chain and main result
We construct B̃(t) ∈ {1, 2, 3}, such that
B̃(t) converges to 3 in O(N) steps,
B̃(t) ≤ B(x(t), x̃(t)) at each time t,
thus {B̃(t) = 3} ⇒ {x(t) = x̃(t)}.
Pierre E. Jacob Couplings, donkeys, coins and fish
78. Chasing chain and main result
We construct B̃(t) ∈ {1, 2, 3}, such that
B̃(t) converges to 3 in O(N) steps,
B̃(t) ≤ B(x(t), x̃(t)) at each time t,
thus {B̃(t) = 3} ⇒ {x(t) = x̃(t)}.
There exist κ 0, ν 0, N0 ∈ N
independent of N such that, for any
∈ (0, 1), and for all N ≥ N0, with probability at least
1 − exp(−νN),
kx(t)
− CBernoulli(p, S)kTV ≤ for all t ≥ κN log(N/).
Pierre E. Jacob Couplings, donkeys, coins and fish
79. Chasing chain and main result
We construct B̃(t) ∈ {1, 2, 3}, such that
B̃(t) converges to 3 in O(N) steps,
B̃(t) ≤ B(x(t), x̃(t)) at each time t,
thus {B̃(t) = 3} ⇒ {x(t) = x̃(t)}.
There exist κ 0, ν 0, N0 ∈ N
independent of N such that, for any
∈ (0, 1), and for all N ≥ N0, with probability at least
1 − exp(−νN),
kx(t)
− CBernoulli(p, S)kTV ≤ for all t ≥ κN log(N/).
This Markov chain provides samples for a cheaper cost
than exact sampling when N is large: N log N versus N2.
The constants in our bounds are not helpful.
Pierre E. Jacob Couplings, donkeys, coins and fish
80. Outline
1 Context
2 Couplings
General idea
Donkey walk
Conditional Bernoulli
Empirical rates of convergence
3 Poisson equation
Definition
Asymptotic variance estimation
Pierre E. Jacob Couplings, donkeys, coins and fish
81. Upper bounds using couplings without stationarity
Generate (Xt, Yt) such that
Xt and Yt follow πt,
Xt = Yt−L for t ≥ τ.
Pierre E. Jacob Couplings, donkeys, coins and fish
82. Upper bounds using couplings without stationarity
Generate (Xt, Yt) such that
Xt and Yt follow πt,
Xt = Yt−L for t ≥ τ.
Then
kπt − πkTV ≤ E[max(0,
(τ − L − t)/L
)].
Jacob, O’Leary Atchadé, Unbiased MCMC with couplings, JRSS B
(with discussion), 2020, and Biswas, Jacob Vanetti, Estimating
Convergence of Markov chains with L-Lag Couplings, NeurIPS, 2019.
Pierre E. Jacob Couplings, donkeys, coins and fish
83. Improved bounds
Define Jt,L = max(0,
(τ − L − t)/L
).
Previous bounds: kπt − πkTV ≤ E[Jt,L].
Pierre E. Jacob Couplings, donkeys, coins and fish
84. Improved bounds
Define Jt,L = max(0,
(τ − L − t)/L
).
Previous bounds: kπt − πkTV ≤ E[Jt,L].
Improved bounds:
kπt − πkTV ≤
X
j≥1
min {P(Jt,L ≥ j), P(Jt,L ≤ j)} .
Equation (2.10) in Craiu Meng, Double Happiness: Enhancing
the Coupled Gains of L-lag Coupling via Control Variates, Statistica
Sinica, 2021.
Pierre E. Jacob Couplings, donkeys, coins and fish
85. Couplings of MCMC algorithms
Can we generate a chain (Xt, Yt) such that, Xt ∼ πt, Yt ∼ πt,
and for all t ≥ τ, Xt = Yt−L?
Pierre E. Jacob Couplings, donkeys, coins and fish
86. Couplings of MCMC algorithms
Can we generate a chain (Xt, Yt) such that, Xt ∼ πt, Yt ∼ πt,
and for all t ≥ τ, Xt = Yt−L?
On the Rosenbluth–Teller–Metropolis–Hastings algorithm:
Valen Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, JASA, 1996.
Pierre E. Jacob Couplings, donkeys, coins and fish
87. Couplings of MCMC algorithms
Can we generate a chain (Xt, Yt) such that, Xt ∼ πt, Yt ∼ πt,
and for all t ≥ τ, Xt = Yt−L?
On the Rosenbluth–Teller–Metropolis–Hastings algorithm:
Valen Johnson, Studying convergence of Markov chain Monte Carlo
algorithms using coupled sample paths, JASA, 1996.
John O’Leary, Guanyang Wang Pierre E. Jacob, Maximal couplings
of the Metropolis-Hastings algorithm, oral presentation at AISTATS
2021.
John O’Leary Guanyang Wang, Transition kernel couplings of the
Metropolis-Hastings algorithm, on arXiv.
John O’Leary, Couplings of the Random-Walk Metropolis algorithm,
on arXiv.
Pierre E. Jacob Couplings, donkeys, coins and fish
88. Example: large-scale Bayesian regression
Niloy Biswas, Anirban Bhattacharya, Pierre E. Jacob James
Johndrow, Coupled Markov chain Monte Carlo for high-dimensional
regression with Half-t(ν) priors, on arXiv.
Linear regression setting, n rows, p columns with p n.
Y ∼ N(Xβ, σ2
In),
σ2
∼ InverseGamma(a0/2, b0/2),
ξ−1/2
∼ Cauchy+
,
for j = 1, . . . , p βj ∼ N(0, σ2
/ξηj), η
−1/2
j ∼ t(ν)+
.
Global precision ξ, local precision ηj for j = 1, . . . , p.
Pierre E. Jacob Couplings, donkeys, coins and fish
89. Example: large-scale Bayesian regression
Gibbs sampler:
For j = 1, . . . , p, ηj given β, ξ, σ2 can be sampled from,
exactly or by slice sampling.
Given η, we can sample β, ξ, σ2:
ξ given η using MH step,
σ2 given η, ξ from InverseGamma,
β given η, ξ, σ2 from p-dimensional Normal.
Algorithm has n2p cost per iteration.
Pierre E. Jacob Couplings, donkeys, coins and fish
90. Example: large-scale Bayesian regression
Gibbs sampler:
For j = 1, . . . , p, ηj given β, ξ, σ2 can be sampled from,
exactly or by slice sampling.
Given η, we can sample β, ξ, σ2:
ξ given η using MH step,
σ2 given η, ξ from InverseGamma,
β given η, ξ, σ2 from p-dimensional Normal.
Algorithm has n2p cost per iteration.
Coupling strategy involves maximal couplings and common
random numbers, combined in bespoke way, for each update.
Pierre E. Jacob Couplings, donkeys, coins and fish
91. Example: large-scale Bayesian regression
Gibbs sampler:
For j = 1, . . . , p, ηj given β, ξ, σ2 can be sampled from,
exactly or by slice sampling.
Given η, we can sample β, ξ, σ2:
ξ given η using MH step,
σ2 given η, ξ from InverseGamma,
β given η, ξ, σ2 from p-dimensional Normal.
Algorithm has n2p cost per iteration.
Coupling strategy involves maximal couplings and common
random numbers, combined in bespoke way, for each update.
Genome-wide association study with n = 2, 266 and p = 98, 385.
Outcome: average number of days for silk emergence in maize.
Covariates: single nucleotide polymorphisms of maize.
Pierre E. Jacob Couplings, donkeys, coins and fish
92. Example: large-scale Bayesian regression
Meeting times of lagged chains, with L = 750.
0.000
0.002
0.004
0.006
0 200 400 600
Meeting time τ
density
Pierre E. Jacob Couplings, donkeys, coins and fish
93. Example: large-scale Bayesian regression
Meeting times can be turned into upper bounds on the TV
distance to stationarity.
0.00
0.25
0.50
0.75
1.00
0 250 500 750 1000
t
Total
variation
distance
Pierre E. Jacob Couplings, donkeys, coins and fish
94. Outline
1 Context
2 Couplings
General idea
Donkey walk
Conditional Bernoulli
Empirical rates of convergence
3 Poisson equation
Definition
Asymptotic variance estimation
Pierre E. Jacob Couplings, donkeys, coins and fish
95. The equation
Write Ph(x) =
R
P(x, dx0)h(x0) = E[h(X1)|X0 = x].
Pierre E. Jacob Couplings, donkeys, coins and fish
96. The equation
Write Ph(x) =
R
P(x, dx0)h(x0) = E[h(X1)|X0 = x].
A function h̃ in L1(π) is said to be a solution of the Poisson
equation associated with h and P, if
h̃ − Ph̃ = h − π(h).
For brevity we say that h̃ is fishy.
Pierre E. Jacob Couplings, donkeys, coins and fish
97. The equation
Write Ph(x) =
R
P(x, dx0)h(x0) = E[h(X1)|X0 = x].
A function h̃ in L1(π) is said to be a solution of the Poisson
equation associated with h and P, if
h̃ − Ph̃ = h − π(h).
For brevity we say that h̃ is fishy.
If
P
t≥0 kPt{h − π(h)}kL1(π) ∞ then the function
x 7→
∞
X
t=0
Pt
{h − π(h)} (x),
is fishy.
Marie Duflo, Opérateurs potentiels des chaı̂nes et des processus de
Markov irréductibles, 1970.
Pierre E. Jacob Couplings, donkeys, coins and fish
98. Central limit theorem
Aiming for a CLT for Markov chain ergodic averages, write
Pierre E. Jacob Couplings, donkeys, coins and fish
99. Central limit theorem
Aiming for a CLT for Markov chain ergodic averages, write
t−1
X
s=0
{h(Xs) − π(h)} =
t
X
s=1
n
h̃(Xs) − Ph̃(Xs−1)
o
+ h̃(X0) − h̃(Xt).
Pierre E. Jacob Couplings, donkeys, coins and fish
100. Central limit theorem
Aiming for a CLT for Markov chain ergodic averages, write
t−1
X
s=0
{h(Xs) − π(h)} =
t
X
s=1
n
h̃(Xs) − Ph̃(Xs−1)
o
+ h̃(X0) − h̃(Xt).
Spot the martingale.
Pierre E. Jacob Couplings, donkeys, coins and fish
101. Central limit theorem
Aiming for a CLT for Markov chain ergodic averages, write
t−1
X
s=0
{h(Xs) − π(h)} =
t
X
s=1
n
h̃(Xs) − Ph̃(Xs−1)
o
+ h̃(X0) − h̃(Xt).
Spot the martingale.
Then apply the central limit theorem for martingale difference
sequences, leading to the asymptotic variance
v(P, h) = E?
[{h̃(X1) − Ph̃(X0)}2
].
Chapter 21 in
Douc, Moulines, Priouret Soulier, Markov chains, 2018.
Pierre E. Jacob Couplings, donkeys, coins and fish
102. Outline
1 Context
2 Couplings
General idea
Donkey walk
Conditional Bernoulli
Empirical rates of convergence
3 Poisson equation
Definition
Asymptotic variance estimation
Pierre E. Jacob Couplings, donkeys, coins and fish
103. Unbiased estimation of fishy functions
Choose an arbitrary y ∈ X. The function
x 7→ h̃(x) =
∞
X
t=0
n
Pt
h(x) − Pt
h(y)
o
,
is fishy. It wants to be estimated with coupled Markov chains.
Pierre E. Jacob Couplings, donkeys, coins and fish
104. Unbiased estimation of fishy functions
Choose an arbitrary y ∈ X. The function
x 7→ h̃(x) =
∞
X
t=0
n
Pt
h(x) − Pt
h(y)
o
,
is fishy. It wants to be estimated with coupled Markov chains.
If we set X0 = x, Y0 = y, and generate Xt, Yt such that
(
Xt|Xt−1 ∼ P(Xt−1, ·)
Yt|Yt−1 ∼ P(Yt−1, ·)
and ∀t ≥ τ Xt = Yt,
Pierre E. Jacob Couplings, donkeys, coins and fish
105. Unbiased estimation of fishy functions
Choose an arbitrary y ∈ X. The function
x 7→ h̃(x) =
∞
X
t=0
n
Pt
h(x) − Pt
h(y)
o
,
is fishy. It wants to be estimated with coupled Markov chains.
If we set X0 = x, Y0 = y, and generate Xt, Yt such that
(
Xt|Xt−1 ∼ P(Xt−1, ·)
Yt|Yt−1 ∼ P(Yt−1, ·)
and ∀t ≥ τ Xt = Yt,
then
H̃(x) =
τ−1
X
t=0
{h(Xt) − h(Yt)} ,
has expectation equal to h̃(x).
Pierre E. Jacob Couplings, donkeys, coins and fish
106. Unbiased estimation of fishy functions: illustration
Target distribution: π(x) = 1
2N(−2, 1) + 1
2N(5, (1/2)2).
0.0
0.1
0.2
0.3
0.4
−10 −5 0 5 10
x
π(x)
Test function: h : x 7→ x with π(h) = 1.5.
−10
−5
0
5
10
−10 −5 0 5 10
x
h(x)
Pierre E. Jacob Couplings, donkeys, coins and fish
107. Unbiased estimation of fishy functions: illustration
P: Rosenbluth–Hastings with random walk N(x, 22).
Fishy function, choosing y = 0.
−100
0
100
900
−10 −5 0 5 10
x
h
~
(x)
Pierre E. Jacob Couplings, donkeys, coins and fish
108. Unbiased estimation of the asymptotic variance
We start from
v(P, h) = 2π({h − π(h)}h̃) − π(h2
) + π(h)2
.
Pierre E. Jacob Couplings, donkeys, coins and fish
109. Unbiased estimation of the asymptotic variance
We start from
v(P, h) = 2π({h − π(h)}h̃) − π(h2
) + π(h)2
.
We can obtain unbiased signed measures π̂ of π, and we can
estimate h̃ unbiasedly, point-wise.
Pierre E. Jacob Couplings, donkeys, coins and fish
110. Unbiased estimation of the asymptotic variance
We start from
v(P, h) = 2π({h − π(h)}h̃) − π(h2
) + π(h)2
.
We can obtain unbiased signed measures π̂ of π, and we can
estimate h̃ unbiasedly, point-wise.
Estimating v(P, h) is an exercise in “nested Monte Carlo”.
Emilia Pompe, Maxime Rischard, Pierre E. Jacob Natesh Pillai,
Estimation of nested expectations with couplings (?), forthcoming.
Pierre E. Jacob Couplings, donkeys, coins and fish
111. Unbiased estimation of the asymptotic variance
1 Obtain π̂(1) and π̂(2), two independent approximations of π.
Pierre E. Jacob Couplings, donkeys, coins and fish
112. Unbiased estimation of the asymptotic variance
1 Obtain π̂(1) and π̂(2), two independent approximations of π.
2 Write π̂(1)(·) =
PN
n=1 ωnδZn . For r = 1, . . . , R,
sample `(r) ∼ (ξ1, . . . , ξN ),
generate H̃(r) with expectation h̃(Z`(r) ).
Pierre E. Jacob Couplings, donkeys, coins and fish
113. Unbiased estimation of the asymptotic variance
1 Obtain π̂(1) and π̂(2), two independent approximations of π.
2 Write π̂(1)(·) =
PN
n=1 ωnδZn . For r = 1, . . . , R,
sample `(r) ∼ (ξ1, . . . , ξN ),
generate H̃(r) with expectation h̃(Z`(r) ).
3 Estimate
2π({h − π(h)}h̃) with 2R−1
R
X
r=1
w`(r)
(h(Z`(r) ) − π̂(2)(h))H̃(r)
ξ`(r)
−π(h2
) with − {
1
2
π̂(1)
(h2
) +
1
2
π̂(2)
(h2
)}
+π(h)2
with + π̂(1)
(h) × π̂(2)
(h).
Randal Douc, Pierre E. Jacob, Anthony Lee Dootika Vats,
Estimation of fishy functions with couplings (?), forthcoming.
Pierre E. Jacob Couplings, donkeys, coins and fish
114. Unbiased estimation of the asymptotic variance
Numerical results for various choices of R (number of
sub-sampled atoms in each run), y = 0, 104 independent repeats
for the proposed method.
Pierre E. Jacob Couplings, donkeys, coins and fish
115. Unbiased estimation of the asymptotic variance
Numerical results for various choices of R (number of
sub-sampled atoms in each run), y = 0, 104 independent repeats
for the proposed method.
Cost is measured in number of Markov transitions, and
inefficiency is variance × cost.
Pierre E. Jacob Couplings, donkeys, coins and fish
116. Unbiased estimation of the asymptotic variance
Numerical results for various choices of R (number of
sub-sampled atoms in each run), y = 0, 104 independent repeats
for the proposed method.
Cost is measured in number of Markov transitions, and
inefficiency is variance × cost.
We compare with asymptotic variance estimators implemented
in various R packages, based on 103 runs of length 5 × 105 with
a burn-in of 103 iterations.
Pierre E. Jacob Couplings, donkeys, coins and fish
117. Unbiased estimation of the asymptotic variance
method v̂(P, h) σ̂ mean cost inefficiency
proposed, R = 1 3166 59 5262 1.8e+11
proposed, R = 5 3086 29 5777 4.8e+10
proposed, R = 10 3076 22 6416 3.1e+10
proposed, R = 20 3046 18 7695 2.6e+10
batchmeans::bm 2539 3 500000 5.1e+09
coda::spectrum0 3149 19 500000 1.9e+11
coda::spectrum0ar 3052 3 500000 3.5e+09
mcmc::initseq 3106 6 500000 2.0e+10
mcmcse 3291 13 500000 8.7e+10
Pierre E. Jacob Couplings, donkeys, coins and fish
118. Discussion
Some basic questions about MCMC are still largely open.
Theoretical analysis of MCMC progresses rapidly, but still
rarely translates into practical guidelines.
Couplings are powerful for theoretical analysis but also
(often?) implementable.
One way or another, we will need a way of saving and
parallelizing computation. There’s work to do!
Pierre E. Jacob Couplings, donkeys, coins and fish