Seminar at IEEE Computational Intelligence Society, Singapore Chapter at School of Electrical and Electronic Engineering, NTU, Singapore, 20 February 2019
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
Unbiased Hamiltonian Monte Carlo
1. Unbiased Hamiltonian Monte Carlo
Jeremy Heng
Information Systems, Decision Sciences and Statistics (IDS)
Department, ESSEC
Joint work with Pierre Jacob
Department of Statistics, Harvard University
Nanyang Technological University
20 February 2019
Jeremy Heng Unbiased HMC 1/ 48
2. Outline
1 MCMC, burn-in bias and parallel computing
2 Couplings of MCMC algorithms
Jeremy Heng Unbiased HMC 2/ 48
3. Outline
1 MCMC, burn-in bias and parallel computing
2 Couplings of MCMC algorithms
Jeremy Heng Unbiased HMC 2/ 48
5. Setting
• Target distribution
π(dx) = π(x)dx, x ∈ Rd
• For Bayesian inference, target is the posterior distribution of
parameters x given data y
π(x) = p(x|y) ∝ p(x)
prior
p(y|x)
likelihood
Jeremy Heng Unbiased HMC 3/ 48
6. Setting
• Target distribution
π(dx) = π(x)dx, x ∈ Rd
• For Bayesian inference, target is the posterior distribution of
parameters x given data y
π(x) = p(x|y) ∝ p(x)
prior
p(y|x)
likelihood
• Objective: compute expectation
Eπ [h(X)] =
Rd
h(x)π(x)dx
for some test function h : Rd → R
Jeremy Heng Unbiased HMC 3/ 48
7. Setting
• Target distribution
π(dx) = π(x)dx, x ∈ Rd
• For Bayesian inference, target is the posterior distribution of
parameters x given data y
π(x) = p(x|y) ∝ p(x)
prior
p(y|x)
likelihood
• Objective: compute expectation
Eπ [h(X)] =
Rd
h(x)π(x)dx
for some test function h : Rd → R
• Monte Carlo method: sample X0, . . . , XT ∼ π and compute
1
T + 1
T
t=0
h(Xt) → Eπ [h(X)] as T → ∞
Jeremy Heng Unbiased HMC 3/ 48
8. Markov chain Monte Carlo (MCMC)
• MCMC algorithm defines π-invariant Markov kernel K
Jeremy Heng Unbiased HMC 4/ 48
9. Markov chain Monte Carlo (MCMC)
• MCMC algorithm defines π-invariant Markov kernel K
• Initialize X0 ∼ π0 = π and iterate
Xt ∼ K(Xt−1, ·) for t = 1, . . . , T
Jeremy Heng Unbiased HMC 4/ 48
10. Markov chain Monte Carlo (MCMC)
• MCMC algorithm defines π-invariant Markov kernel K
• Initialize X0 ∼ π0 = π and iterate
Xt ∼ K(Xt−1, ·) for t = 1, . . . , T
• Compute
1
T − b + 1
T
t=b
h(Xt) → Eπ [h(X)] as T → ∞
where b ≥ 0 iterations are discarded as burn-in
Jeremy Heng Unbiased HMC 4/ 48
11. MCMC trajectory
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 5/ 48
12. MCMC trajectories
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 6/ 48
13. MCMC marginal distributions
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 7/ 48
14. Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
Jeremy Heng Unbiased HMC 8/ 48
15. Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
• Bias converges to zero only if b is fixed and T → ∞
Jeremy Heng Unbiased HMC 8/ 48
16. Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
• Bias converges to zero only if b is fixed and T → ∞
• Naive parallelization: generate R chains (X
(r)
t )R
r=1 and
compute
1
R
R
r=1
1
T − b + 1
T
t=b
h(X
(r)
t )
Jeremy Heng Unbiased HMC 8/ 48
17. Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
• Bias converges to zero only if b is fixed and T → ∞
• Naive parallelization: generate R chains (X
(r)
t )R
r=1 and
compute
1
R
R
r=1
1
T − b + 1
T
t=b
h(X
(r)
t )
• This estimator is not consistent as R → ∞ for fixed b, T
Jeremy Heng Unbiased HMC 8/ 48
18. Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
• Bias converges to zero only if b is fixed and T → ∞
• Naive parallelization: generate R chains (X
(r)
t )R
r=1 and
compute
1
R
R
r=1
1
T − b + 1
T
t=b
h(X
(r)
t )
• This estimator is not consistent as R → ∞ for fixed b, T
• But consistent as T → ∞ for fixed b, R
Jeremy Heng Unbiased HMC 8/ 48
19. Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
20. Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
21. Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
22. Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
23. Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
• Efficiency depends on
expected compute cost and
variance of Hk:m
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
24. Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
• Efficiency depends on
expected compute cost and
variance of Hk:m
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 10/ 48
25. Debiasing idea (Glynn and Rhee 2014)
• Ergodicity of Markov chain implies
lim
t→∞
E [h(Xt)] = Eπ [h(X)]
Jeremy Heng Unbiased HMC 11/ 48
26. Debiasing idea (Glynn and Rhee 2014)
• Ergodicity of Markov chain implies
lim
t→∞
E [h(Xt)] = Eπ [h(X)]
• Writing limit as telescopic sum (starting from k ≥ 0)
lim
t→∞
E [h(Xt)] = E [h(Xk )] +
∞
t=k+1
E [h(Xt) − h(Xt−1)]
Jeremy Heng Unbiased HMC 11/ 48
27. Debiasing idea (Glynn and Rhee 2014)
• Ergodicity of Markov chain implies
lim
t→∞
E [h(Xt)] = Eπ [h(X)]
• Writing limit as telescopic sum (starting from k ≥ 0)
lim
t→∞
E [h(Xt)] = E [h(Xk )] +
∞
t=k+1
E [h(Xt) − h(Xt−1)]
• If interchanging summation and expectation is valid
E h(Xk ) +
∞
t=k+1
{h(Xt) − h(Xt−1)} = Eπ [h(X)]
Jeremy Heng Unbiased HMC 11/ 48
28. Debiasing idea (Glynn and Rhee 2014)
• Ergodicity of Markov chain implies
lim
t→∞
E [h(Xt)] = Eπ [h(X)]
• Writing limit as telescopic sum (starting from k ≥ 0)
lim
t→∞
E [h(Xt)] = E [h(Xk )] +
∞
t=k+1
E [h(Xt) − h(Xt−1)]
• If interchanging summation and expectation is valid
E h(Xk ) +
∞
t=k+1
{h(Xt) − h(Xt−1)} = Eπ [h(X)]
• If we construct another Markov chain (Yt) such that
Xt
d.
= Yt and Xt = Yt−1 for t ≥ τ
then
E h(Xk ) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} = Eπ [h(X)]
Jeremy Heng Unbiased HMC 11/ 48
29. Unbiased estimators
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} for any k ≥ 0
with τ−1
t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of
Eπ [h(X)], with finite variance and expected cost
Jeremy Heng Unbiased HMC 12/ 48
30. Unbiased estimators
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} for any k ≥ 0
with τ−1
t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of
Eπ [h(X)], with finite variance and expected cost
(Glynn and Rhee 2014, Vihola 2017, Jacob et al. 2017)
1 Convergence of marginal chain:
lim
t→∞
E [h(Xt)] = Eπ [h(X)] and sup
t≥0
E|h(Xt)|2+δ
< ∞, δ > 0
Jeremy Heng Unbiased HMC 12/ 48
31. Unbiased estimators
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} for any k ≥ 0
with τ−1
t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of
Eπ [h(X)], with finite variance and expected cost
(Glynn and Rhee 2014, Vihola 2017, Jacob et al. 2017)
1 Convergence of marginal chain:
lim
t→∞
E [h(Xt)] = Eπ [h(X)] and sup
t≥0
E|h(Xt)|2+δ
< ∞, δ > 0
2 Meeting time τ = inf{t ≥ 1 : Xt = Yt−1} has geometric
tails:
P(τ > t) ≤ Cρt
for C < ∞, ρ ∈ (0, 1)
Jeremy Heng Unbiased HMC 12/ 48
32. Unbiased estimators
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} for any k ≥ 0
with τ−1
t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of
Eπ [h(X)], with finite variance and expected cost
(Glynn and Rhee 2014, Vihola 2017, Jacob et al. 2017)
1 Convergence of marginal chain:
lim
t→∞
E [h(Xt)] = Eπ [h(X)] and sup
t≥0
E|h(Xt)|2+δ
< ∞, δ > 0
2 Meeting time τ = inf{t ≥ 1 : Xt = Yt−1} has geometric
tails:
P(τ > t) ≤ Cρt
for C < ∞, ρ ∈ (0, 1)
3 Faithfulness: Xt = Yt−1 for t ≥ τ
Jeremy Heng Unbiased HMC 12/ 48
33. Unbiased estimators
• For any tuning parameter k ∈ N
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)}
is unbiased
Jeremy Heng Unbiased HMC 13/ 48
34. Unbiased estimators
• For any tuning parameter k ∈ N
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)}
is unbiased
• First term h(Xk) is biased; second term corrects for bias (zero
if k ≥ τ − 1)
Jeremy Heng Unbiased HMC 13/ 48
35. Unbiased estimators
• For any tuning parameter k ∈ N
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)}
is unbiased
• First term h(Xk) is biased; second term corrects for bias (zero
if k ≥ τ − 1)
• As k → ∞, Hk(X, Y ) = h(Xk) with increasing probability, so
V [Hk(X, Y )] → Vπ[h(X)]
Jeremy Heng Unbiased HMC 13/ 48
36. Unbiased estimators
• For any tuning parameter k ∈ N
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)}
is unbiased
• First term h(Xk) is biased; second term corrects for bias (zero
if k ≥ τ − 1)
• As k → ∞, Hk(X, Y ) = h(Xk) with increasing probability, so
V [Hk(X, Y )] → Vπ[h(X)]
• Cost of computing Hk(X, Y ) is roughly
2(τ − 1) + max(1, k + 1 − τ)
applications of K
Jeremy Heng Unbiased HMC 13/ 48
37. Time-averaged estimators
• Since Hk(X, Y ) is unbiased for all k ≥ 0, the time-averaged
estimator
Hk:m(X, Y ) =
1
m − k + 1
m
t=k
Ht(X, Y ) for any k ≤ m
is also unbiased
Jeremy Heng Unbiased HMC 14/ 48
38. Time-averaged estimators
• Since Hk(X, Y ) is unbiased for all k ≥ 0, the time-averaged
estimator
Hk:m(X, Y ) =
1
m − k + 1
m
t=k
Ht(X, Y ) for any k ≤ m
is also unbiased
• Rewrite estimator as
1
m − k + 1
m
t=k
h(Xt) +
τ−1
t=k+1
min 1,
t − k
m − k + 1
{h(Xt) − h(Yt−1)}
Jeremy Heng Unbiased HMC 14/ 48
39. Time-averaged estimators
• Since Hk(X, Y ) is unbiased for all k ≥ 0, the time-averaged
estimator
Hk:m(X, Y ) =
1
m − k + 1
m
t=k
Ht(X, Y ) for any k ≤ m
is also unbiased
• Rewrite estimator as
1
m − k + 1
m
t=k
h(Xt) +
τ−1
t=k+1
min 1,
t − k
m − k + 1
{h(Xt) − h(Yt−1)}
• First term is standard MCMC average; second term is bias
correction (zero if k ≥ τ − 1)
Jeremy Heng Unbiased HMC 14/ 48
40. Time-averaged estimators
1
m − k + 1
m
t=k
h(Xt) +
τ−1
t=k+1
min 1,
t − k
m − k + 1
{h(Xt) − h(Yt−1)}
q
q q
q
q q q q
q q
q
q
q q
q q
q
q
q q q
q
q q
q q q q
q q
q
Xt
Yt−1
∆t
−4
0
4
0 k = 5 τ = 10 m = 20
iteration
statespace
Jeremy Heng Unbiased HMC 15/ 48
41. Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
• Efficiency depends on
expected compute cost and
variance of Hk:m
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 16/ 48
42. Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
Jeremy Heng Unbiased HMC 17/ 48
43. Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
• Bias removal leads to variance inflation
Jeremy Heng Unbiased HMC 17/ 48
44. Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
• Bias removal leads to variance inflation
• Variance inflation can be mitigated by increasing k and m
Jeremy Heng Unbiased HMC 17/ 48
45. Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
• Bias removal leads to variance inflation
• Variance inflation can be mitigated by increasing k and m
• As k → ∞, Hk:m(X, Y ) is standard MCMC average with
increasing probability, so its variance should be similar
Jeremy Heng Unbiased HMC 17/ 48
46. Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
• Bias removal leads to variance inflation
• Variance inflation can be mitigated by increasing k and m
• As k → ∞, Hk:m(X, Y ) is standard MCMC average with
increasing probability, so its variance should be similar
• If τ k m, asymptotic inefficiency is approximately
m ×
σ2(h)
m − k + 1
≈ σ2
(h)
the asymptotic variance of marginal chain
Jeremy Heng Unbiased HMC 17/ 48
47. Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
• Efficiency depends on
expected compute cost and
variance of Hk:m
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 18/ 48
49. Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
Jeremy Heng Unbiased HMC 19/ 48
50. Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
2 Sample X1 ∼ K(X0, ·)
Jeremy Heng Unbiased HMC 19/ 48
51. Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
2 Sample X1 ∼ K(X0, ·)
3 For t = 1, . . . , max(m, τ) sample
(Xt+1, Yt) ∼ ¯K((Xt, Yt−1), ·)
from coupled kernel ¯K that admits K as marginals, i.e.
Xt+1 ∼ K(Xt, ·) and Yt ∼ K(Yt−1, ·)
Jeremy Heng Unbiased HMC 19/ 48
52. Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
2 Sample X1 ∼ K(X0, ·)
3 For t = 1, . . . , max(m, τ) sample
(Xt+1, Yt) ∼ ¯K((Xt, Yt−1), ·)
from coupled kernel ¯K that admits K as marginals, i.e.
Xt+1 ∼ K(Xt, ·) and Yt ∼ K(Yt−1, ·)
• Note that Xt
d.
= Yt for t ≥ 0
Jeremy Heng Unbiased HMC 19/ 48
53. Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
2 Sample X1 ∼ K(X0, ·)
3 For t = 1, . . . , max(m, τ) sample
(Xt+1, Yt) ∼ ¯K((Xt, Yt−1), ·)
from coupled kernel ¯K that admits K as marginals, i.e.
Xt+1 ∼ K(Xt, ·) and Yt ∼ K(Yt−1, ·)
• Note that Xt
d.
= Yt for t ≥ 0
• Need to design ¯K so that Xτ = Yτ−1 (chains meet) and
Xt = Yt−1 for t ≥ τ (are faithful)
Jeremy Heng Unbiased HMC 19/ 48
54. Outline
1 MCMC, burn-in bias and parallel computing
2 Couplings of MCMC algorithms
Jeremy Heng Unbiased HMC 19/ 48
55. Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
Jeremy Heng Unbiased HMC 20/ 48
56. Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
Jeremy Heng Unbiased HMC 20/ 48
57. Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
• There are infinitely many couplings of p and q
Jeremy Heng Unbiased HMC 20/ 48
58. Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
• There are infinitely many couplings of p and q
• Independent coupling: X ∼ p and Y ∼ q independently
Jeremy Heng Unbiased HMC 20/ 48
59. Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
• There are infinitely many couplings of p and q
• Independent coupling: X ∼ p and Y ∼ q independently
• Optimal coupling: minimizes E |X − Y |2
Jeremy Heng Unbiased HMC 20/ 48
60. Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
• There are infinitely many couplings of p and q
• Independent coupling: X ∼ p and Y ∼ q independently
• Optimal coupling: minimizes E |X − Y |2
• Maximal coupling: maximizes P(X = Y )
Jeremy Heng Unbiased HMC 20/ 48
63. Maximal coupling: algorithm
Sampling (X, Y ) from maximal coupling of p and q
1 Sample X ∼ p and U ∼ U([0, 1])
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Thorisson, Coupling, stationarity, and regeneration (2000)
Jeremy Heng Unbiased HMC 23/ 48
64. Maximal coupling: algorithm
Sampling (X, Y ) from maximal coupling of p and q
1 Sample X ∼ p and U ∼ U([0, 1])
If U ≤ q(X)/p(X), output (X, X)
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Thorisson, Coupling, stationarity, and regeneration (2000)
Jeremy Heng Unbiased HMC 23/ 48
65. Maximal coupling: algorithm
Sampling (X, Y ) from maximal coupling of p and q
1 Sample X ∼ p and U ∼ U([0, 1])
If U ≤ q(X)/p(X), output (X, X)
2 Otherwise, sample Y ∼ q and
U ∼ U([0, 1])
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Thorisson, Coupling, stationarity, and regeneration (2000)
Jeremy Heng Unbiased HMC 23/ 48
66. Maximal coupling: algorithm
Sampling (X, Y ) from maximal coupling of p and q
1 Sample X ∼ p and U ∼ U([0, 1])
If U ≤ q(X)/p(X), output (X, X)
2 Otherwise, sample Y ∼ q and
U ∼ U([0, 1])
until U > p(Y )/q(Y ), and
output (X, Y ) 0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Thorisson, Coupling, stationarity, and regeneration (2000)
Jeremy Heng Unbiased HMC 23/ 48
68. Maximal coupling: algorithm
Remarks:
• Step 1 samples from overlap
min{p(x), q(x)}
• Maximality follows from coupling
inequality
P(X = Y ) =
Rd
min{p(x), q(x)}dx
= 1 − TV(p, q)
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Jeremy Heng Unbiased HMC 24/ 48
69. Maximal coupling: algorithm
Remarks:
• Step 1 samples from overlap
min{p(x), q(x)}
• Maximality follows from coupling
inequality
P(X = Y ) =
Rd
min{p(x), q(x)}dx
= 1 − TV(p, q)
• Expected cost does not depend
on p and q
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Jeremy Heng Unbiased HMC 24/ 48
70. Metropolis–Hastings (kernel K)
At iteration t − 1, Markov chain at state Xt−1
1 Propose X ∼ q(Xt−1, ·), e.g.
for RWMH X ∼ N(Xt−1, σ2Id ),
for MALA X ∼ N(Xt−1 + σ2
2 log π(Xt−1), σ2Id )
Jeremy Heng Unbiased HMC 25/ 48
71. Metropolis–Hastings (kernel K)
At iteration t − 1, Markov chain at state Xt−1
1 Propose X ∼ q(Xt−1, ·), e.g.
for RWMH X ∼ N(Xt−1, σ2Id ),
for MALA X ∼ N(Xt−1 + σ2
2 log π(Xt−1), σ2Id )
2 Sample U ∼ U([0, 1])
Jeremy Heng Unbiased HMC 25/ 48
72. Metropolis–Hastings (kernel K)
At iteration t − 1, Markov chain at state Xt−1
1 Propose X ∼ q(Xt−1, ·), e.g.
for RWMH X ∼ N(Xt−1, σ2Id ),
for MALA X ∼ N(Xt−1 + σ2
2 log π(Xt−1), σ2Id )
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
Jeremy Heng Unbiased HMC 25/ 48
73. Metropolis–Hastings (kernel K)
At iteration t − 1, Markov chain at state Xt−1
1 Propose X ∼ q(Xt−1, ·), e.g.
for RWMH X ∼ N(Xt−1, σ2Id ),
for MALA X ∼ N(Xt−1 + σ2
2 log π(Xt−1), σ2Id )
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
set Xt = X , otherwise set Xt = Xt−1
Jeremy Heng Unbiased HMC 25/ 48
74. Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
Jeremy Heng Unbiased HMC 26/ 48
75. Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
Jeremy Heng Unbiased HMC 26/ 48
76. Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
Jeremy Heng Unbiased HMC 26/ 48
77. Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
set Xt = X , otherwise set Xt = Xt−1
Jeremy Heng Unbiased HMC 26/ 48
78. Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
set Xt = X , otherwise set Xt = Xt−1
If
U ≤ min 1,
π(Y )q(Y , Yt−1)
π(Yt−1)q(Yt−1, Y )
,
Jeremy Heng Unbiased HMC 26/ 48
79. Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
set Xt = X , otherwise set Xt = Xt−1
If
U ≤ min 1,
π(Y )q(Y , Yt−1)
π(Yt−1)q(Yt−1, Y )
,
set Yt = Y , otherwise set Yt = Yt−1
Jeremy Heng Unbiased HMC 26/ 48
80. RWMH on Gaussian target: trajectories
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 27/ 48
81. RWMH on Gaussian target: meetings
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 28/ 48
82. RWMH on Gaussian target: meeting times
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
0.000
0.005
0.010
0.015
0 50 100 150 200
meeting time
density
Jeremy Heng Unbiased HMC 29/ 48
83. RWMH on Gaussian target: scaling with dimension
π = π0 = N(0, Id ), ¯K = coupled RWMH with proposal std Cd−1/2
C=1.0
C=1.5
C=2.00
1000
2000
3000
2 4 6 8 10
dimension
averagemeetingtime
Jeremy Heng Unbiased HMC 30/ 48
84. HMC on Gaussian target: scaling with dimension
π = π0 = N(0, Id ), ¯K = coupled HMC with step size Cd−1/4
C=1.0
C=1.5
C=2.0
40
45
50
55
2000 4000 6000 8000 10000
dimension
averagemeetingtime
Jeremy Heng Unbiased HMC 31/ 48
85. Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
Jeremy Heng Unbiased HMC 32/ 48
86. Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
Jeremy Heng Unbiased HMC 32/ 48
87. Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
• Ideal algorithm defining π-invariant K:
at iteration t − 1, Markov chain at state Xt−1
Jeremy Heng Unbiased HMC 32/ 48
88. Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
• Ideal algorithm defining π-invariant K:
at iteration t − 1, Markov chain at state Xt−1
1 Set q(0) = Xt−1 and sample p(0) ∼ N(0, Id )
Jeremy Heng Unbiased HMC 32/ 48
89. Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
• Ideal algorithm defining π-invariant K:
at iteration t − 1, Markov chain at state Xt−1
1 Set q(0) = Xt−1 and sample p(0) ∼ N(0, Id )
2 Solve dynamics over time length T to get (q(T), p(T))
Jeremy Heng Unbiased HMC 32/ 48
90. Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
• Ideal algorithm defining π-invariant K:
at iteration t − 1, Markov chain at state Xt−1
1 Set q(0) = Xt−1 and sample p(0) ∼ N(0, Id )
2 Solve dynamics over time length T to get (q(T), p(T))
3 Set Xt = q(T).
Jeremy Heng Unbiased HMC 32/ 48
91. Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
Jeremy Heng Unbiased HMC 33/ 48
92. Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
Jeremy Heng Unbiased HMC 33/ 48
93. Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
Jeremy Heng Unbiased HMC 33/ 48
94. Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
2 For = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q )
q +1 = q + ε p +1/2
p +1 = p +1/2 −
ε
2
U(q +1)
Jeremy Heng Unbiased HMC 33/ 48
95. Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
2 For = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q )
q +1 = q + ε p +1/2
p +1 = p +1/2 −
ε
2
U(q +1)
3 Sample U ∼ U([0, 1])
Jeremy Heng Unbiased HMC 33/ 48
96. Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
2 For = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q )
q +1 = q + ε p +1/2
p +1 = p +1/2 −
ε
2
U(q +1)
3 Sample U ∼ U([0, 1])
4 If
U ≤ min {1, exp [E(q0, p0) − E(qL, pL)]}
Jeremy Heng Unbiased HMC 33/ 48
97. Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
2 For = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q )
q +1 = q + ε p +1/2
p +1 = p +1/2 −
ε
2
U(q +1)
3 Sample U ∼ U([0, 1])
4 If
U ≤ min {1, exp [E(q0, p0) − E(qL, pL)]}
set Xt = qL, otherwise set Xt = Xt−1
Jeremy Heng Unbiased HMC 33/ 48
98. Coupled Hamiltonian dynamics
• Consider coupling two particles (qi (t), pi (t)), i = 1, 2
following Hamiltonian dynamics
Jeremy Heng Unbiased HMC 34/ 48
99. Coupled Hamiltonian dynamics
• Consider coupling two particles (qi (t), pi (t)), i = 1, 2
following Hamiltonian dynamics
• For Gaussian target π = N(µ, σ2)
q1
(t)−q2
(t) = cos(t/σ) q1
(0) − q2
(0) +σ sin(t/σ) p1
(0) − p2
(0)
therefore if p1(0) = p2(0) then
|q1
(t) − q2
(t)| = | cos(t/σ)| |q1
(0) − q2
(0)|
Jeremy Heng Unbiased HMC 34/ 48
100. Coupled Hamiltonian dynamics
• Consider coupling two particles (qi (t), pi (t)), i = 1, 2
following Hamiltonian dynamics
• For Gaussian target π = N(µ, σ2)
q1
(t)−q2
(t) = cos(t/σ) q1
(0) − q2
(0) +σ sin(t/σ) p1
(0) − p2
(0)
therefore if p1(0) = p2(0) then
|q1
(t) − q2
(t)| = | cos(t/σ)| |q1
(0) − q2
(0)|
• Difference ∆(t) = q1(t) − q2(t) satisfies
1
2
d
dt
|∆(t)|2
= ∆(t)T
{p1
(t) − p2
(t)}
therefore if p1(0) = p2(0) then t → |∆(t)|2 has a stationary
point at t = 0
Jeremy Heng Unbiased HMC 34/ 48
101. Coupled Hamiltonian dynamics
• To characterize stationary point
1
2
d2
dt2
|∆(0)|2
= −∆(0)T
{ U(q1
(0)) − U(q2
(0))}
≤ −α|∆(0)|2
if q1(0), q2(0) ∈ S where U is α-strongly convex
Jeremy Heng Unbiased HMC 35/ 48
102. Coupled Hamiltonian dynamics
• To characterize stationary point
1
2
d2
dt2
|∆(0)|2
= −∆(0)T
{ U(q1
(0)) − U(q2
(0))}
≤ −α|∆(0)|2
if q1(0), q2(0) ∈ S where U is α-strongly convex
• Since t = 0 is a strict local maximum point, there exists
T > 0 such that for any t ∈ (0, T]
|q1
(t) − q2
(t)| ≤ ρt|q1
(0) − q2
(0)|, ρt ∈ [0, 1)
Jeremy Heng Unbiased HMC 35/ 48
103. Logistic regression: distance against integration time
15
20
25
30
35
0.00 0.25 0.50 0.75 1.00
Integration time
Distance
Jeremy Heng Unbiased HMC 36/ 48
104. Coupled Hamiltonian dynamics
• Assuming U is β-Lipschitz, we established contraction
using Taylor expansion around t = 0 (Lemma 1)
Jeremy Heng Unbiased HMC 37/ 48
105. Coupled Hamiltonian dynamics
• Assuming U is β-Lipschitz, we established contraction
using Taylor expansion around t = 0 (Lemma 1)
• More quantitative results by Mangoubi and Smith (2017,
Theorem 6) Bou-Rabee et al. (2018, Theorem 2.1) give
T =
√
α
β
and ρt =
1
2
αt2
Jeremy Heng Unbiased HMC 37/ 48
106. Coupled Hamiltonian dynamics
• Assuming U is β-Lipschitz, we established contraction
using Taylor expansion around t = 0 (Lemma 1)
• More quantitative results by Mangoubi and Smith (2017,
Theorem 6) Bou-Rabee et al. (2018, Theorem 2.1) give
T =
√
α
β
and ρt =
1
2
αt2
• Coupling can be effective in high dimensions if problem is
well-conditioned
Jeremy Heng Unbiased HMC 37/ 48
107. Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
Jeremy Heng Unbiased HMC 38/ 48
108. Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
Jeremy Heng Unbiased HMC 38/ 48
109. Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
Jeremy Heng Unbiased HMC 38/ 48
110. Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
4 If
U ≤ min 1, exp E(q1
0, p0) − E(q1
L, p1
L)
Jeremy Heng Unbiased HMC 38/ 48
111. Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
4 If
U ≤ min 1, exp E(q1
0, p0) − E(q1
L, p1
L)
set Xt = q1
L, otherwise set Xt = Xt−1
Jeremy Heng Unbiased HMC 38/ 48
112. Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
4 If
U ≤ min 1, exp E(q1
0, p0) − E(q1
L, p1
L)
set Xt = q1
L, otherwise set Xt = Xt−1
5 If
U ≤ min 1, exp E(q2
0, p0) − E(q2
L, p2
L)
Jeremy Heng Unbiased HMC 38/ 48
113. Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
4 If
U ≤ min 1, exp E(q1
0, p0) − E(q1
L, p1
L)
set Xt = q1
L, otherwise set Xt = Xt−1
5 If
U ≤ min 1, exp E(q2
0, p0) − E(q2
L, p2
L)
set Yt = q2
L, otherwise set Yt = Yt−1
Jeremy Heng Unbiased HMC 38/ 48
115. Logistic regression: distance after 1000 iterations
1e−12
1e−08
1e−04
1e+00
0.25 0.50 0.75 1.00 1.25
Integration time
Distanceafter1000iterations
L 10 20 30
Jeremy Heng Unbiased HMC 40/ 48
116. Mixture of coupled kernels (kernel ¯K)
• To enable exact meetings, we consider for γ ∈ (0, 1)
¯K = (1 − γ) ¯Kε,L
coupled HMC
+ γ ¯Kσ
coupled RWMH
Jeremy Heng Unbiased HMC 41/ 48
117. Mixture of coupled kernels (kernel ¯K)
• To enable exact meetings, we consider for γ ∈ (0, 1)
¯K = (1 − γ) ¯Kε,L
coupled HMC
+ γ ¯Kσ
coupled RWMH
• Choice of RWMH proposal std σ:
distance between chains < σ < spread of π
Jeremy Heng Unbiased HMC 41/ 48
118. Mixture of coupled kernels (kernel ¯K)
• To enable exact meetings, we consider for γ ∈ (0, 1)
¯K = (1 − γ) ¯Kε,L
coupled HMC
+ γ ¯Kσ
coupled RWMH
• Choice of RWMH proposal std σ:
distance between chains < σ < spread of π
• Advocate small RWMH probability γ to minimize inefficiency
Jeremy Heng Unbiased HMC 41/ 48
119. Geometric tails of meeting time
• To ensure validity of unbiased estimators:
Jeremy Heng Unbiased HMC 42/ 48
120. Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
Jeremy Heng Unbiased HMC 42/ 48
121. Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
Jeremy Heng Unbiased HMC 42/ 48
122. Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
Jeremy Heng Unbiased HMC 42/ 48
123. Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
Jeremy Heng Unbiased HMC 42/ 48
124. Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
1 U is globally Lipschitz
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
Jeremy Heng Unbiased HMC 42/ 48
125. Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
1 U is globally Lipschitz
2 U is strongly convex on S ⊂ Rd
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
Jeremy Heng Unbiased HMC 42/ 48
126. Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
1 U is globally Lipschitz
2 U is strongly convex on S ⊂ Rd
3 Geometric drift condition on HMC kernel
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
Jeremy Heng Unbiased HMC 42/ 48
127. Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
1 U is globally Lipschitz
2 U is strongly convex on S ⊂ Rd
3 Geometric drift condition on HMC kernel
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
• Assumptions can be verified for Gaussian targets, Bayesian
logistic regression relying on Durmus et al. (2017, Theorem 9)
Jeremy Heng Unbiased HMC 42/ 48
135. Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Jeremy Heng Unbiased HMC 47/ 48
136. Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
Jeremy Heng Unbiased HMC 47/ 48
137. Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
1 No-U-Turn Sampler (Hoffman and Gelman, 2014)
Jeremy Heng Unbiased HMC 47/ 48
138. Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
1 No-U-Turn Sampler (Hoffman and Gelman, 2014)
2 Partial momentum refreshment (Horowitz, 1991)
Jeremy Heng Unbiased HMC 47/ 48
139. Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
1 No-U-Turn Sampler (Hoffman and Gelman, 2014)
2 Partial momentum refreshment (Horowitz, 1991)
3 Different choices of kinetic energy (Livingstone et al., 2017)
Jeremy Heng Unbiased HMC 47/ 48
140. Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
1 No-U-Turn Sampler (Hoffman and Gelman, 2014)
2 Partial momentum refreshment (Horowitz, 1991)
3 Different choices of kinetic energy (Livingstone et al., 2017)
4 Hamiltonian bouncy particle sampler (Vanetti et al., 2017)
Jeremy Heng Unbiased HMC 47/ 48
141. References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
Jeremy Heng Unbiased HMC 48/ 48
142. References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
R package:
https://github.com/pierrejacob/debiasedhmc
Jeremy Heng Unbiased HMC 48/ 48
143. References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
R package:
https://github.com/pierrejacob/debiasedhmc
P. Jacob, J. O’Leary, Y. Atchad´e. Unbiased Markov chain
Monte Carlo with couplings. arXiv:1708.03625, 2017.
Jeremy Heng Unbiased HMC 48/ 48
144. References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
R package:
https://github.com/pierrejacob/debiasedhmc
P. Jacob, J. O’Leary, Y. Atchad´e. Unbiased Markov chain
Monte Carlo with couplings. arXiv:1708.03625, 2017.
P. Jacob, F. Lindsten, T. Sch¨on. Smoothing with Couplings
of Conditional Particle Filters. JASA, 2018.
Jeremy Heng Unbiased HMC 48/ 48
145. References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
R package:
https://github.com/pierrejacob/debiasedhmc
P. Jacob, J. O’Leary, Y. Atchad´e. Unbiased Markov chain
Monte Carlo with couplings. arXiv:1708.03625, 2017.
P. Jacob, F. Lindsten, T. Sch¨on. Smoothing with Couplings
of Conditional Particle Filters. JASA, 2018.
L. Middleton, G. Deligiannidis, A. Doucet, P. Jacob. Unbiased
Markov chain Monte Carlo for intractable target distributions.
arXiv:1807.08691, 2018.
Jeremy Heng Unbiased HMC 48/ 48