SlideShare a Scribd company logo
1 of 145
Download to read offline
Unbiased Hamiltonian Monte Carlo
Jeremy Heng
Information Systems, Decision Sciences and Statistics (IDS)
Department, ESSEC
Joint work with Pierre Jacob
Department of Statistics, Harvard University
Nanyang Technological University
20 February 2019
Jeremy Heng Unbiased HMC 1/ 48
Outline
1 MCMC, burn-in bias and parallel computing
2 Couplings of MCMC algorithms
Jeremy Heng Unbiased HMC 2/ 48
Outline
1 MCMC, burn-in bias and parallel computing
2 Couplings of MCMC algorithms
Jeremy Heng Unbiased HMC 2/ 48
Setting
• Target distribution
π(dx) = π(x)dx, x ∈ Rd
Jeremy Heng Unbiased HMC 3/ 48
Setting
• Target distribution
π(dx) = π(x)dx, x ∈ Rd
• For Bayesian inference, target is the posterior distribution of
parameters x given data y
π(x) = p(x|y) ∝ p(x)
prior
p(y|x)
likelihood
Jeremy Heng Unbiased HMC 3/ 48
Setting
• Target distribution
π(dx) = π(x)dx, x ∈ Rd
• For Bayesian inference, target is the posterior distribution of
parameters x given data y
π(x) = p(x|y) ∝ p(x)
prior
p(y|x)
likelihood
• Objective: compute expectation
Eπ [h(X)] =
Rd
h(x)π(x)dx
for some test function h : Rd → R
Jeremy Heng Unbiased HMC 3/ 48
Setting
• Target distribution
π(dx) = π(x)dx, x ∈ Rd
• For Bayesian inference, target is the posterior distribution of
parameters x given data y
π(x) = p(x|y) ∝ p(x)
prior
p(y|x)
likelihood
• Objective: compute expectation
Eπ [h(X)] =
Rd
h(x)π(x)dx
for some test function h : Rd → R
• Monte Carlo method: sample X0, . . . , XT ∼ π and compute
1
T + 1
T
t=0
h(Xt) → Eπ [h(X)] as T → ∞
Jeremy Heng Unbiased HMC 3/ 48
Markov chain Monte Carlo (MCMC)
• MCMC algorithm defines π-invariant Markov kernel K
Jeremy Heng Unbiased HMC 4/ 48
Markov chain Monte Carlo (MCMC)
• MCMC algorithm defines π-invariant Markov kernel K
• Initialize X0 ∼ π0 = π and iterate
Xt ∼ K(Xt−1, ·) for t = 1, . . . , T
Jeremy Heng Unbiased HMC 4/ 48
Markov chain Monte Carlo (MCMC)
• MCMC algorithm defines π-invariant Markov kernel K
• Initialize X0 ∼ π0 = π and iterate
Xt ∼ K(Xt−1, ·) for t = 1, . . . , T
• Compute
1
T − b + 1
T
t=b
h(Xt) → Eπ [h(X)] as T → ∞
where b ≥ 0 iterations are discarded as burn-in
Jeremy Heng Unbiased HMC 4/ 48
MCMC trajectory
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 5/ 48
MCMC trajectories
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 6/ 48
MCMC marginal distributions
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 7/ 48
Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
Jeremy Heng Unbiased HMC 8/ 48
Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
• Bias converges to zero only if b is fixed and T → ∞
Jeremy Heng Unbiased HMC 8/ 48
Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
• Bias converges to zero only if b is fixed and T → ∞
• Naive parallelization: generate R chains (X
(r)
t )R
r=1 and
compute
1
R
R
r=1
1
T − b + 1
T
t=b
h(X
(r)
t )
Jeremy Heng Unbiased HMC 8/ 48
Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
• Bias converges to zero only if b is fixed and T → ∞
• Naive parallelization: generate R chains (X
(r)
t )R
r=1 and
compute
1
R
R
r=1
1
T − b + 1
T
t=b
h(X
(r)
t )
• This estimator is not consistent as R → ∞ for fixed b, T
Jeremy Heng Unbiased HMC 8/ 48
Burn-in bias and parallel computing
• Since π0 = π, the bias
E
1
T − b + 1
T
t=b
h(Xt) − Eπ [h(X)] = 0
for any fixed b, T
• Bias converges to zero only if b is fixed and T → ∞
• Naive parallelization: generate R chains (X
(r)
t )R
r=1 and
compute
1
R
R
r=1
1
T − b + 1
T
t=b
h(X
(r)
t )
• This estimator is not consistent as R → ∞ for fixed b, T
• But consistent as T → ∞ for fixed b, R
Jeremy Heng Unbiased HMC 8/ 48
Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
• Efficiency depends on
expected compute cost and
variance of Hk:m
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 9/ 48
Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
• Efficiency depends on
expected compute cost and
variance of Hk:m
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 10/ 48
Debiasing idea (Glynn and Rhee 2014)
• Ergodicity of Markov chain implies
lim
t→∞
E [h(Xt)] = Eπ [h(X)]
Jeremy Heng Unbiased HMC 11/ 48
Debiasing idea (Glynn and Rhee 2014)
• Ergodicity of Markov chain implies
lim
t→∞
E [h(Xt)] = Eπ [h(X)]
• Writing limit as telescopic sum (starting from k ≥ 0)
lim
t→∞
E [h(Xt)] = E [h(Xk )] +
∞
t=k+1
E [h(Xt) − h(Xt−1)]
Jeremy Heng Unbiased HMC 11/ 48
Debiasing idea (Glynn and Rhee 2014)
• Ergodicity of Markov chain implies
lim
t→∞
E [h(Xt)] = Eπ [h(X)]
• Writing limit as telescopic sum (starting from k ≥ 0)
lim
t→∞
E [h(Xt)] = E [h(Xk )] +
∞
t=k+1
E [h(Xt) − h(Xt−1)]
• If interchanging summation and expectation is valid
E h(Xk ) +
∞
t=k+1
{h(Xt) − h(Xt−1)} = Eπ [h(X)]
Jeremy Heng Unbiased HMC 11/ 48
Debiasing idea (Glynn and Rhee 2014)
• Ergodicity of Markov chain implies
lim
t→∞
E [h(Xt)] = Eπ [h(X)]
• Writing limit as telescopic sum (starting from k ≥ 0)
lim
t→∞
E [h(Xt)] = E [h(Xk )] +
∞
t=k+1
E [h(Xt) − h(Xt−1)]
• If interchanging summation and expectation is valid
E h(Xk ) +
∞
t=k+1
{h(Xt) − h(Xt−1)} = Eπ [h(X)]
• If we construct another Markov chain (Yt) such that
Xt
d.
= Yt and Xt = Yt−1 for t ≥ τ
then
E h(Xk ) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} = Eπ [h(X)]
Jeremy Heng Unbiased HMC 11/ 48
Unbiased estimators
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} for any k ≥ 0
with τ−1
t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of
Eπ [h(X)], with finite variance and expected cost
Jeremy Heng Unbiased HMC 12/ 48
Unbiased estimators
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} for any k ≥ 0
with τ−1
t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of
Eπ [h(X)], with finite variance and expected cost
(Glynn and Rhee 2014, Vihola 2017, Jacob et al. 2017)
1 Convergence of marginal chain:
lim
t→∞
E [h(Xt)] = Eπ [h(X)] and sup
t≥0
E|h(Xt)|2+δ
< ∞, δ > 0
Jeremy Heng Unbiased HMC 12/ 48
Unbiased estimators
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} for any k ≥ 0
with τ−1
t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of
Eπ [h(X)], with finite variance and expected cost
(Glynn and Rhee 2014, Vihola 2017, Jacob et al. 2017)
1 Convergence of marginal chain:
lim
t→∞
E [h(Xt)] = Eπ [h(X)] and sup
t≥0
E|h(Xt)|2+δ
< ∞, δ > 0
2 Meeting time τ = inf{t ≥ 1 : Xt = Yt−1} has geometric
tails:
P(τ > t) ≤ Cρt
for C < ∞, ρ ∈ (0, 1)
Jeremy Heng Unbiased HMC 12/ 48
Unbiased estimators
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)} for any k ≥ 0
with τ−1
t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of
Eπ [h(X)], with finite variance and expected cost
(Glynn and Rhee 2014, Vihola 2017, Jacob et al. 2017)
1 Convergence of marginal chain:
lim
t→∞
E [h(Xt)] = Eπ [h(X)] and sup
t≥0
E|h(Xt)|2+δ
< ∞, δ > 0
2 Meeting time τ = inf{t ≥ 1 : Xt = Yt−1} has geometric
tails:
P(τ > t) ≤ Cρt
for C < ∞, ρ ∈ (0, 1)
3 Faithfulness: Xt = Yt−1 for t ≥ τ
Jeremy Heng Unbiased HMC 12/ 48
Unbiased estimators
• For any tuning parameter k ∈ N
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)}
is unbiased
Jeremy Heng Unbiased HMC 13/ 48
Unbiased estimators
• For any tuning parameter k ∈ N
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)}
is unbiased
• First term h(Xk) is biased; second term corrects for bias (zero
if k ≥ τ − 1)
Jeremy Heng Unbiased HMC 13/ 48
Unbiased estimators
• For any tuning parameter k ∈ N
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)}
is unbiased
• First term h(Xk) is biased; second term corrects for bias (zero
if k ≥ τ − 1)
• As k → ∞, Hk(X, Y ) = h(Xk) with increasing probability, so
V [Hk(X, Y )] → Vπ[h(X)]
Jeremy Heng Unbiased HMC 13/ 48
Unbiased estimators
• For any tuning parameter k ∈ N
Hk(X, Y ) = h(Xk) +
τ−1
t=k+1
{h(Xt) − h(Yt−1)}
is unbiased
• First term h(Xk) is biased; second term corrects for bias (zero
if k ≥ τ − 1)
• As k → ∞, Hk(X, Y ) = h(Xk) with increasing probability, so
V [Hk(X, Y )] → Vπ[h(X)]
• Cost of computing Hk(X, Y ) is roughly
2(τ − 1) + max(1, k + 1 − τ)
applications of K
Jeremy Heng Unbiased HMC 13/ 48
Time-averaged estimators
• Since Hk(X, Y ) is unbiased for all k ≥ 0, the time-averaged
estimator
Hk:m(X, Y ) =
1
m − k + 1
m
t=k
Ht(X, Y ) for any k ≤ m
is also unbiased
Jeremy Heng Unbiased HMC 14/ 48
Time-averaged estimators
• Since Hk(X, Y ) is unbiased for all k ≥ 0, the time-averaged
estimator
Hk:m(X, Y ) =
1
m − k + 1
m
t=k
Ht(X, Y ) for any k ≤ m
is also unbiased
• Rewrite estimator as
1
m − k + 1
m
t=k
h(Xt) +
τ−1
t=k+1
min 1,
t − k
m − k + 1
{h(Xt) − h(Yt−1)}
Jeremy Heng Unbiased HMC 14/ 48
Time-averaged estimators
• Since Hk(X, Y ) is unbiased for all k ≥ 0, the time-averaged
estimator
Hk:m(X, Y ) =
1
m − k + 1
m
t=k
Ht(X, Y ) for any k ≤ m
is also unbiased
• Rewrite estimator as
1
m − k + 1
m
t=k
h(Xt) +
τ−1
t=k+1
min 1,
t − k
m − k + 1
{h(Xt) − h(Yt−1)}
• First term is standard MCMC average; second term is bias
correction (zero if k ≥ τ − 1)
Jeremy Heng Unbiased HMC 14/ 48
Time-averaged estimators
1
m − k + 1
m
t=k
h(Xt) +
τ−1
t=k+1
min 1,
t − k
m − k + 1
{h(Xt) − h(Yt−1)}
q
q q
q
q q q q
q q
q
q
q q
q q
q
q
q q q
q
q q
q q q q
q q
q
Xt
Yt−1
∆t
−4
0
4
0 k = 5 τ = 10 m = 20
iteration
statespace
Jeremy Heng Unbiased HMC 15/ 48
Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
• Efficiency depends on
expected compute cost and
variance of Hk:m
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 16/ 48
Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
Jeremy Heng Unbiased HMC 17/ 48
Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
• Bias removal leads to variance inflation
Jeremy Heng Unbiased HMC 17/ 48
Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
• Bias removal leads to variance inflation
• Variance inflation can be mitigated by increasing k and m
Jeremy Heng Unbiased HMC 17/ 48
Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
• Bias removal leads to variance inflation
• Variance inflation can be mitigated by increasing k and m
• As k → ∞, Hk:m(X, Y ) is standard MCMC average with
increasing probability, so its variance should be similar
Jeremy Heng Unbiased HMC 17/ 48
Efficiency
• Following Glynn and Whitt (1992), the asymptotic
inefficiency of Hk:m(X, Y ) as compute budget → ∞ is
E [2(τ − 1) + max(1, m + 1 − τ)]
expected cost
× V [Hk:m(X, Y )]
variance
• Bias removal leads to variance inflation
• Variance inflation can be mitigated by increasing k and m
• As k → ∞, Hk:m(X, Y ) is standard MCMC average with
increasing probability, so its variance should be similar
• If τ k m, asymptotic inefficiency is approximately
m ×
σ2(h)
m − k + 1
≈ σ2
(h)
the asymptotic variance of marginal chain
Jeremy Heng Unbiased HMC 17/ 48
Proposed methodology
• Each processor runs two
coupled chains X = (Xt) and
Y = (Yt)
• Terminates at some random
time which involves their
meeting time
• Returns unbiased estimator
Hk:m of Eπ [h(X)]
• Average over R processors:
1
R
R
r=1 H
(r)
k:m → Eπ [h(X)] as
R → ∞
• Efficiency depends on
expected compute cost and
variance of Hk:m
Parallel MCMC
processors
#
1
Jeremy Heng Unbiased HMC 18/ 48
Coupled chains
• To compute Hk:m(X, Y )
Jeremy Heng Unbiased HMC 19/ 48
Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
Jeremy Heng Unbiased HMC 19/ 48
Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
2 Sample X1 ∼ K(X0, ·)
Jeremy Heng Unbiased HMC 19/ 48
Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
2 Sample X1 ∼ K(X0, ·)
3 For t = 1, . . . , max(m, τ) sample
(Xt+1, Yt) ∼ ¯K((Xt, Yt−1), ·)
from coupled kernel ¯K that admits K as marginals, i.e.
Xt+1 ∼ K(Xt, ·) and Yt ∼ K(Yt−1, ·)
Jeremy Heng Unbiased HMC 19/ 48
Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
2 Sample X1 ∼ K(X0, ·)
3 For t = 1, . . . , max(m, τ) sample
(Xt+1, Yt) ∼ ¯K((Xt, Yt−1), ·)
from coupled kernel ¯K that admits K as marginals, i.e.
Xt+1 ∼ K(Xt, ·) and Yt ∼ K(Yt−1, ·)
• Note that Xt
d.
= Yt for t ≥ 0
Jeremy Heng Unbiased HMC 19/ 48
Coupled chains
• To compute Hk:m(X, Y )
1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals,
i.e. X0 ∼ π0 and Y0 ∼ π0
2 Sample X1 ∼ K(X0, ·)
3 For t = 1, . . . , max(m, τ) sample
(Xt+1, Yt) ∼ ¯K((Xt, Yt−1), ·)
from coupled kernel ¯K that admits K as marginals, i.e.
Xt+1 ∼ K(Xt, ·) and Yt ∼ K(Yt−1, ·)
• Note that Xt
d.
= Yt for t ≥ 0
• Need to design ¯K so that Xτ = Yτ−1 (chains meet) and
Xt = Yt−1 for t ≥ τ (are faithful)
Jeremy Heng Unbiased HMC 19/ 48
Outline
1 MCMC, burn-in bias and parallel computing
2 Couplings of MCMC algorithms
Jeremy Heng Unbiased HMC 19/ 48
Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
Jeremy Heng Unbiased HMC 20/ 48
Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
Jeremy Heng Unbiased HMC 20/ 48
Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
• There are infinitely many couplings of p and q
Jeremy Heng Unbiased HMC 20/ 48
Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
• There are infinitely many couplings of p and q
• Independent coupling: X ∼ p and Y ∼ q independently
Jeremy Heng Unbiased HMC 20/ 48
Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
• There are infinitely many couplings of p and q
• Independent coupling: X ∼ p and Y ∼ q independently
• Optimal coupling: minimizes E |X − Y |2
Jeremy Heng Unbiased HMC 20/ 48
Couplings
• Given distributions p(x) and q(y) on Rd , a coupling c(x, y)
is a joint distribution on Rd × Rd such that
p(x) =
Rd
c(x, y) dy and q(y) =
Rd
c(x, y) dx
• (X, Y ) ∼ c implies X ∼ p and Y ∼ q
• There are infinitely many couplings of p and q
• Independent coupling: X ∼ p and Y ∼ q independently
• Optimal coupling: minimizes E |X − Y |2
• Maximal coupling: maximizes P(X = Y )
Jeremy Heng Unbiased HMC 20/ 48
Independent coupling of Gamma and Gaussian
Jeremy Heng Unbiased HMC 21/ 48
Maximal coupling of Gamma and Gaussian
Jeremy Heng Unbiased HMC 22/ 48
Maximal coupling: algorithm
Sampling (X, Y ) from maximal coupling of p and q
1 Sample X ∼ p and U ∼ U([0, 1])
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Thorisson, Coupling, stationarity, and regeneration (2000)
Jeremy Heng Unbiased HMC 23/ 48
Maximal coupling: algorithm
Sampling (X, Y ) from maximal coupling of p and q
1 Sample X ∼ p and U ∼ U([0, 1])
If U ≤ q(X)/p(X), output (X, X)
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Thorisson, Coupling, stationarity, and regeneration (2000)
Jeremy Heng Unbiased HMC 23/ 48
Maximal coupling: algorithm
Sampling (X, Y ) from maximal coupling of p and q
1 Sample X ∼ p and U ∼ U([0, 1])
If U ≤ q(X)/p(X), output (X, X)
2 Otherwise, sample Y ∼ q and
U ∼ U([0, 1])
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Thorisson, Coupling, stationarity, and regeneration (2000)
Jeremy Heng Unbiased HMC 23/ 48
Maximal coupling: algorithm
Sampling (X, Y ) from maximal coupling of p and q
1 Sample X ∼ p and U ∼ U([0, 1])
If U ≤ q(X)/p(X), output (X, X)
2 Otherwise, sample Y ∼ q and
U ∼ U([0, 1])
until U > p(Y )/q(Y ), and
output (X, Y ) 0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Thorisson, Coupling, stationarity, and regeneration (2000)
Jeremy Heng Unbiased HMC 23/ 48
Maximal coupling: algorithm
Remarks:
• Step 1 samples from overlap
min{p(x), q(x)}
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Jeremy Heng Unbiased HMC 24/ 48
Maximal coupling: algorithm
Remarks:
• Step 1 samples from overlap
min{p(x), q(x)}
• Maximality follows from coupling
inequality
P(X = Y ) =
Rd
min{p(x), q(x)}dx
= 1 − TV(p, q)
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Jeremy Heng Unbiased HMC 24/ 48
Maximal coupling: algorithm
Remarks:
• Step 1 samples from overlap
min{p(x), q(x)}
• Maximality follows from coupling
inequality
P(X = Y ) =
Rd
min{p(x), q(x)}dx
= 1 − TV(p, q)
• Expected cost does not depend
on p and q
0.0
0.5
1.0
1.5
2.0
−2 −1 0 1 2 3
Density
Jeremy Heng Unbiased HMC 24/ 48
Metropolis–Hastings (kernel K)
At iteration t − 1, Markov chain at state Xt−1
1 Propose X ∼ q(Xt−1, ·), e.g.
for RWMH X ∼ N(Xt−1, σ2Id ),
for MALA X ∼ N(Xt−1 + σ2
2 log π(Xt−1), σ2Id )
Jeremy Heng Unbiased HMC 25/ 48
Metropolis–Hastings (kernel K)
At iteration t − 1, Markov chain at state Xt−1
1 Propose X ∼ q(Xt−1, ·), e.g.
for RWMH X ∼ N(Xt−1, σ2Id ),
for MALA X ∼ N(Xt−1 + σ2
2 log π(Xt−1), σ2Id )
2 Sample U ∼ U([0, 1])
Jeremy Heng Unbiased HMC 25/ 48
Metropolis–Hastings (kernel K)
At iteration t − 1, Markov chain at state Xt−1
1 Propose X ∼ q(Xt−1, ·), e.g.
for RWMH X ∼ N(Xt−1, σ2Id ),
for MALA X ∼ N(Xt−1 + σ2
2 log π(Xt−1), σ2Id )
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
Jeremy Heng Unbiased HMC 25/ 48
Metropolis–Hastings (kernel K)
At iteration t − 1, Markov chain at state Xt−1
1 Propose X ∼ q(Xt−1, ·), e.g.
for RWMH X ∼ N(Xt−1, σ2Id ),
for MALA X ∼ N(Xt−1 + σ2
2 log π(Xt−1), σ2Id )
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
set Xt = X , otherwise set Xt = Xt−1
Jeremy Heng Unbiased HMC 25/ 48
Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
Jeremy Heng Unbiased HMC 26/ 48
Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
Jeremy Heng Unbiased HMC 26/ 48
Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
Jeremy Heng Unbiased HMC 26/ 48
Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
set Xt = X , otherwise set Xt = Xt−1
Jeremy Heng Unbiased HMC 26/ 48
Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
set Xt = X , otherwise set Xt = Xt−1
If
U ≤ min 1,
π(Y )q(Y , Yt−1)
π(Yt−1)q(Yt−1, Y )
,
Jeremy Heng Unbiased HMC 26/ 48
Coupled Metropolis–Hastings (kernel ¯K)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and
q(Yt−1, ·)
2 Sample U ∼ U([0, 1])
3 If
U ≤ min 1,
π(X )q(X , Xt−1)
π(Xt−1)q(Xt−1, X )
,
set Xt = X , otherwise set Xt = Xt−1
If
U ≤ min 1,
π(Y )q(Y , Yt−1)
π(Yt−1)q(Yt−1, Y )
,
set Yt = Y , otherwise set Yt = Yt−1
Jeremy Heng Unbiased HMC 26/ 48
RWMH on Gaussian target: trajectories
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 27/ 48
RWMH on Gaussian target: meetings
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
Jeremy Heng Unbiased HMC 28/ 48
RWMH on Gaussian target: meeting times
π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5
0.000
0.005
0.010
0.015
0 50 100 150 200
meeting time
density
Jeremy Heng Unbiased HMC 29/ 48
RWMH on Gaussian target: scaling with dimension
π = π0 = N(0, Id ), ¯K = coupled RWMH with proposal std Cd−1/2
C=1.0
C=1.5
C=2.00
1000
2000
3000
2 4 6 8 10
dimension
averagemeetingtime
Jeremy Heng Unbiased HMC 30/ 48
HMC on Gaussian target: scaling with dimension
π = π0 = N(0, Id ), ¯K = coupled HMC with step size Cd−1/4
C=1.0
C=1.5
C=2.0
40
45
50
55
2000 4000 6000 8000 10000
dimension
averagemeetingtime
Jeremy Heng Unbiased HMC 31/ 48
Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
Jeremy Heng Unbiased HMC 32/ 48
Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
Jeremy Heng Unbiased HMC 32/ 48
Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
• Ideal algorithm defining π-invariant K:
at iteration t − 1, Markov chain at state Xt−1
Jeremy Heng Unbiased HMC 32/ 48
Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
• Ideal algorithm defining π-invariant K:
at iteration t − 1, Markov chain at state Xt−1
1 Set q(0) = Xt−1 and sample p(0) ∼ N(0, Id )
Jeremy Heng Unbiased HMC 32/ 48
Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
• Ideal algorithm defining π-invariant K:
at iteration t − 1, Markov chain at state Xt−1
1 Set q(0) = Xt−1 and sample p(0) ∼ N(0, Id )
2 Solve dynamics over time length T to get (q(T), p(T))
Jeremy Heng Unbiased HMC 32/ 48
Hamiltonian Monte Carlo (HMC)
• Define potential energy U(q) = − log π(q) and
Hamiltonian E(q, p) = U(q) + 1
2|p|2
• Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0
d
dt
q(t) = pE(q(t), p(t)) = p(t)
d
dt
p(t) = − qE(q(t), p(t)) = − U(q(t))
• Ideal algorithm defining π-invariant K:
at iteration t − 1, Markov chain at state Xt−1
1 Set q(0) = Xt−1 and sample p(0) ∼ N(0, Id )
2 Solve dynamics over time length T to get (q(T), p(T))
3 Set Xt = q(T).
Jeremy Heng Unbiased HMC 32/ 48
Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
Jeremy Heng Unbiased HMC 33/ 48
Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
Jeremy Heng Unbiased HMC 33/ 48
Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
Jeremy Heng Unbiased HMC 33/ 48
Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
2 For = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q )
q +1 = q + ε p +1/2
p +1 = p +1/2 −
ε
2
U(q +1)
Jeremy Heng Unbiased HMC 33/ 48
Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
2 For = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q )
q +1 = q + ε p +1/2
p +1 = p +1/2 −
ε
2
U(q +1)
3 Sample U ∼ U([0, 1])
Jeremy Heng Unbiased HMC 33/ 48
Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
2 For = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q )
q +1 = q + ε p +1/2
p +1 = p +1/2 −
ε
2
U(q +1)
3 Sample U ∼ U([0, 1])
4 If
U ≤ min {1, exp [E(q0, p0) − E(qL, pL)]}
Jeremy Heng Unbiased HMC 33/ 48
Hamiltonian Monte Carlo (HMC)
• Solving Hamiltonian dynamics exactly is typically intractable
• Leap-frog integrator:
1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id )
2 For = 0, . . . , L − 1, compute
p +1/2 = p −
ε
2
U(q )
q +1 = q + ε p +1/2
p +1 = p +1/2 −
ε
2
U(q +1)
3 Sample U ∼ U([0, 1])
4 If
U ≤ min {1, exp [E(q0, p0) − E(qL, pL)]}
set Xt = qL, otherwise set Xt = Xt−1
Jeremy Heng Unbiased HMC 33/ 48
Coupled Hamiltonian dynamics
• Consider coupling two particles (qi (t), pi (t)), i = 1, 2
following Hamiltonian dynamics
Jeremy Heng Unbiased HMC 34/ 48
Coupled Hamiltonian dynamics
• Consider coupling two particles (qi (t), pi (t)), i = 1, 2
following Hamiltonian dynamics
• For Gaussian target π = N(µ, σ2)
q1
(t)−q2
(t) = cos(t/σ) q1
(0) − q2
(0) +σ sin(t/σ) p1
(0) − p2
(0)
therefore if p1(0) = p2(0) then
|q1
(t) − q2
(t)| = | cos(t/σ)| |q1
(0) − q2
(0)|
Jeremy Heng Unbiased HMC 34/ 48
Coupled Hamiltonian dynamics
• Consider coupling two particles (qi (t), pi (t)), i = 1, 2
following Hamiltonian dynamics
• For Gaussian target π = N(µ, σ2)
q1
(t)−q2
(t) = cos(t/σ) q1
(0) − q2
(0) +σ sin(t/σ) p1
(0) − p2
(0)
therefore if p1(0) = p2(0) then
|q1
(t) − q2
(t)| = | cos(t/σ)| |q1
(0) − q2
(0)|
• Difference ∆(t) = q1(t) − q2(t) satisfies
1
2
d
dt
|∆(t)|2
= ∆(t)T
{p1
(t) − p2
(t)}
therefore if p1(0) = p2(0) then t → |∆(t)|2 has a stationary
point at t = 0
Jeremy Heng Unbiased HMC 34/ 48
Coupled Hamiltonian dynamics
• To characterize stationary point
1
2
d2
dt2
|∆(0)|2
= −∆(0)T
{ U(q1
(0)) − U(q2
(0))}
≤ −α|∆(0)|2
if q1(0), q2(0) ∈ S where U is α-strongly convex
Jeremy Heng Unbiased HMC 35/ 48
Coupled Hamiltonian dynamics
• To characterize stationary point
1
2
d2
dt2
|∆(0)|2
= −∆(0)T
{ U(q1
(0)) − U(q2
(0))}
≤ −α|∆(0)|2
if q1(0), q2(0) ∈ S where U is α-strongly convex
• Since t = 0 is a strict local maximum point, there exists
T > 0 such that for any t ∈ (0, T]
|q1
(t) − q2
(t)| ≤ ρt|q1
(0) − q2
(0)|, ρt ∈ [0, 1)
Jeremy Heng Unbiased HMC 35/ 48
Logistic regression: distance against integration time
15
20
25
30
35
0.00 0.25 0.50 0.75 1.00
Integration time
Distance
Jeremy Heng Unbiased HMC 36/ 48
Coupled Hamiltonian dynamics
• Assuming U is β-Lipschitz, we established contraction
using Taylor expansion around t = 0 (Lemma 1)
Jeremy Heng Unbiased HMC 37/ 48
Coupled Hamiltonian dynamics
• Assuming U is β-Lipschitz, we established contraction
using Taylor expansion around t = 0 (Lemma 1)
• More quantitative results by Mangoubi and Smith (2017,
Theorem 6) Bou-Rabee et al. (2018, Theorem 2.1) give
T =
√
α
β
and ρt =
1
2
αt2
Jeremy Heng Unbiased HMC 37/ 48
Coupled Hamiltonian dynamics
• Assuming U is β-Lipschitz, we established contraction
using Taylor expansion around t = 0 (Lemma 1)
• More quantitative results by Mangoubi and Smith (2017,
Theorem 6) Bou-Rabee et al. (2018, Theorem 2.1) give
T =
√
α
β
and ρt =
1
2
αt2
• Coupling can be effective in high dimensions if problem is
well-conditioned
Jeremy Heng Unbiased HMC 37/ 48
Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
Jeremy Heng Unbiased HMC 38/ 48
Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
Jeremy Heng Unbiased HMC 38/ 48
Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
Jeremy Heng Unbiased HMC 38/ 48
Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
4 If
U ≤ min 1, exp E(q1
0, p0) − E(q1
L, p1
L)
Jeremy Heng Unbiased HMC 38/ 48
Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
4 If
U ≤ min 1, exp E(q1
0, p0) − E(q1
L, p1
L)
set Xt = q1
L, otherwise set Xt = Xt−1
Jeremy Heng Unbiased HMC 38/ 48
Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
4 If
U ≤ min 1, exp E(q1
0, p0) − E(q1
L, p1
L)
set Xt = q1
L, otherwise set Xt = Xt−1
5 If
U ≤ min 1, exp E(q2
0, p0) − E(q2
L, p2
L)
Jeremy Heng Unbiased HMC 38/ 48
Coupled HMC kernel ( ¯Kε,L)
At iteration t − 1, two Markov chains at states Xt−1 and Yt−1
1 Set q1
0 = Xt−1, q2
0 = Yt−1 and sample p0 ∼ N(0, Id )
2 Perform leap-frog integration to obtain (qi
L, pi
L), i = 1, 2
3 Sample U ∼ U([0, 1])
4 If
U ≤ min 1, exp E(q1
0, p0) − E(q1
L, p1
L)
set Xt = q1
L, otherwise set Xt = Xt−1
5 If
U ≤ min 1, exp E(q2
0, p0) − E(q2
L, p2
L)
set Yt = q2
L, otherwise set Yt = Yt−1
Jeremy Heng Unbiased HMC 38/ 48
Coupled HMC chains
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5
x1
x2
Jeremy Heng Unbiased HMC 39/ 48
Logistic regression: distance after 1000 iterations
1e−12
1e−08
1e−04
1e+00
0.25 0.50 0.75 1.00 1.25
Integration time
Distanceafter1000iterations
L 10 20 30
Jeremy Heng Unbiased HMC 40/ 48
Mixture of coupled kernels (kernel ¯K)
• To enable exact meetings, we consider for γ ∈ (0, 1)
¯K = (1 − γ) ¯Kε,L
coupled HMC
+ γ ¯Kσ
coupled RWMH
Jeremy Heng Unbiased HMC 41/ 48
Mixture of coupled kernels (kernel ¯K)
• To enable exact meetings, we consider for γ ∈ (0, 1)
¯K = (1 − γ) ¯Kε,L
coupled HMC
+ γ ¯Kσ
coupled RWMH
• Choice of RWMH proposal std σ:
distance between chains < σ < spread of π
Jeremy Heng Unbiased HMC 41/ 48
Mixture of coupled kernels (kernel ¯K)
• To enable exact meetings, we consider for γ ∈ (0, 1)
¯K = (1 − γ) ¯Kε,L
coupled HMC
+ γ ¯Kσ
coupled RWMH
• Choice of RWMH proposal std σ:
distance between chains < σ < spread of π
• Advocate small RWMH probability γ to minimize inefficiency
Jeremy Heng Unbiased HMC 41/ 48
Geometric tails of meeting time
• To ensure validity of unbiased estimators:
Jeremy Heng Unbiased HMC 42/ 48
Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
Jeremy Heng Unbiased HMC 42/ 48
Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
Jeremy Heng Unbiased HMC 42/ 48
Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
Jeremy Heng Unbiased HMC 42/ 48
Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
Jeremy Heng Unbiased HMC 42/ 48
Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
1 U is globally Lipschitz
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
Jeremy Heng Unbiased HMC 42/ 48
Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
1 U is globally Lipschitz
2 U is strongly convex on S ⊂ Rd
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
Jeremy Heng Unbiased HMC 42/ 48
Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
1 U is globally Lipschitz
2 U is strongly convex on S ⊂ Rd
3 Geometric drift condition on HMC kernel
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
Jeremy Heng Unbiased HMC 42/ 48
Geometric tails of meeting time
• To ensure validity of unbiased estimators:
1 Convergence of marginal chain (inherited from HMC)
2 Meeting time has geometric tails (Theorem 2)
3 Faithfulness (by construction)
• Main assumptions:
1 U is globally Lipschitz
2 U is strongly convex on S ⊂ Rd
3 Geometric drift condition on HMC kernel
(Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ)
are small enough
• Assumptions can be verified for Gaussian targets, Bayesian
logistic regression relying on Durmus et al. (2017, Theorem 9)
Jeremy Heng Unbiased HMC 42/ 48
Sensitivity of RWMH proposal std σ
Logistic regression (left), Cox process (right)
200
400
600
800
1e−061e−051e−04 0.001 0.01
σ
Meetingtime
50
100
150
1e−061e−051e−04 0.001 0.01
σ
Meetingtime
Jeremy Heng Unbiased HMC 43/ 48
Sensitivity of RWMH probability γ
Logistic regression (left), Cox process (right)
200
400
600
800
0.01 0.03 0.05 0.07 0.09 0.11
γ
Meetingtime
0
200
400
600
0.01 0.16 0.31 0.46 0.61 0.76
γ
Meetingtime
Jeremy Heng Unbiased HMC 44/ 48
Cox process: effect of dimension and algorithm
• Better algorithms yield smaller
meeting times
RHMC
HMC
256 1024 4096
0
200
400
600
0
200
400
600
Dimension
Meetingtime
Jeremy Heng Unbiased HMC 45/ 48
Cox process: effect of dimension and algorithm
• Better algorithms yield smaller
meeting times
• Proposed methodology cannot
work if marginal chain fails to mix
RHMC
HMC
256 1024 4096
0
200
400
600
0
200
400
600
Dimension
Meetingtime
Jeremy Heng Unbiased HMC 45/ 48
Cox process: effect of dimension and algorithm
• Better algorithms yield smaller
meeting times
• Proposed methodology cannot
work if marginal chain fails to mix
• Writing πt = π0Kt, we have
TV(πt, π)
≤ min {1, E [max(0, τ − t + 1)]}
RHMC
HMC
256 1024 4096
0
200
400
600
0
200
400
600
Dimension
Meetingtime
Jeremy Heng Unbiased HMC 45/ 48
Logistic regression: impact of k and m
k m Cost Relative inefficiency
1 k 436 1989.07
1 5k 436 1671.93
1 10k 436 1403.28
90% quantile(τ) k 553 38.11
90% quantile(τ) 5k 1868 1.23
90% quantile(τ) 10k 3518 1.05
Relative inefficiency =
Asymptotic inefficiency
Asymptotic variance of optimal HMC
k,m → ∞
−−−−−→
Asymptotic variance of marginal HMC
Asymptotic variance of optimal HMC
Jeremy Heng Unbiased HMC 46/ 48
Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Jeremy Heng Unbiased HMC 47/ 48
Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Jeremy Heng Unbiased HMC 47/ 48
Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
Jeremy Heng Unbiased HMC 47/ 48
Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
1 No-U-Turn Sampler (Hoffman and Gelman, 2014)
Jeremy Heng Unbiased HMC 47/ 48
Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
1 No-U-Turn Sampler (Hoffman and Gelman, 2014)
2 Partial momentum refreshment (Horowitz, 1991)
Jeremy Heng Unbiased HMC 47/ 48
Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
1 No-U-Turn Sampler (Hoffman and Gelman, 2014)
2 Partial momentum refreshment (Horowitz, 1991)
3 Different choices of kinetic energy (Livingstone et al., 2017)
Jeremy Heng Unbiased HMC 47/ 48
Concluding remarks
Bou-Rabee et al. (2018) introduced another coupling for HMC
Could combine synchronous coupling (L = 1) and maximal
coupling for MALA
Extension to other variants of HMC:
1 No-U-Turn Sampler (Hoffman and Gelman, 2014)
2 Partial momentum refreshment (Horowitz, 1991)
3 Different choices of kinetic energy (Livingstone et al., 2017)
4 Hamiltonian bouncy particle sampler (Vanetti et al., 2017)
Jeremy Heng Unbiased HMC 47/ 48
References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
Jeremy Heng Unbiased HMC 48/ 48
References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
R package:
https://github.com/pierrejacob/debiasedhmc
Jeremy Heng Unbiased HMC 48/ 48
References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
R package:
https://github.com/pierrejacob/debiasedhmc
P. Jacob, J. O’Leary, Y. Atchad´e. Unbiased Markov chain
Monte Carlo with couplings. arXiv:1708.03625, 2017.
Jeremy Heng Unbiased HMC 48/ 48
References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
R package:
https://github.com/pierrejacob/debiasedhmc
P. Jacob, J. O’Leary, Y. Atchad´e. Unbiased Markov chain
Monte Carlo with couplings. arXiv:1708.03625, 2017.
P. Jacob, F. Lindsten, T. Sch¨on. Smoothing with Couplings
of Conditional Particle Filters. JASA, 2018.
Jeremy Heng Unbiased HMC 48/ 48
References
J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo
with couplings. Biometrika (to appear), arXiv:1709.00404,
2019.
R package:
https://github.com/pierrejacob/debiasedhmc
P. Jacob, J. O’Leary, Y. Atchad´e. Unbiased Markov chain
Monte Carlo with couplings. arXiv:1708.03625, 2017.
P. Jacob, F. Lindsten, T. Sch¨on. Smoothing with Couplings
of Conditional Particle Filters. JASA, 2018.
L. Middleton, G. Deligiannidis, A. Doucet, P. Jacob. Unbiased
Markov chain Monte Carlo for intractable target distributions.
arXiv:1807.08691, 2018.
Jeremy Heng Unbiased HMC 48/ 48

More Related Content

What's hot

Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo JeremyHeng10
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning Sean Meyn
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsSean Meyn
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Christian Robert
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 

What's hot (20)

Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Chris Sherlock's slides
Chris Sherlock's slidesChris Sherlock's slides
Chris Sherlock's slides
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 

Similar to Unbiased Hamiltonian Monte Carlo

Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 
Gibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inferenceGibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inferenceJeremyHeng10
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplingsPierre Jacob
 
Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Pierre Jacob
 
Sampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methodsSampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methodsStephane Senecal
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Asma Ben Slimene
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Asma Ben Slimene
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018JeremyHeng10
 
Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Haruki Nishimura
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsMercier Jean-Marc
 
Vanilla rao blackwellisation
Vanilla rao blackwellisationVanilla rao blackwellisation
Vanilla rao blackwellisationDeb Roy
 
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...Satoshi Kura
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningSungbin Lim
 

Similar to Unbiased Hamiltonian Monte Carlo (20)

Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
Gibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inferenceGibbs flow transport for Bayesian inference
Gibbs flow transport for Bayesian inference
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 
Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
Sampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methodsSampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methods
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
Rdnd2008
Rdnd2008Rdnd2008
Rdnd2008
 
Adc
AdcAdc
Adc
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018
 
Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)
 
Integration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methodsIntegration with kernel methods, Transported meshfree methods
Integration with kernel methods, Transported meshfree methods
 
Trondheim, LGM2012
Trondheim, LGM2012Trondheim, LGM2012
Trondheim, LGM2012
 
Vanilla rao blackwellisation
Vanilla rao blackwellisationVanilla rao blackwellisation
Vanilla rao blackwellisation
 
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
Tail Probabilities for Randomized Program Runtimes via Martingales for Higher...
 
Harmonic Analysis and Deep Learning
Harmonic Analysis and Deep LearningHarmonic Analysis and Deep Learning
Harmonic Analysis and Deep Learning
 

Recently uploaded

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Unbiased Hamiltonian Monte Carlo

  • 1. Unbiased Hamiltonian Monte Carlo Jeremy Heng Information Systems, Decision Sciences and Statistics (IDS) Department, ESSEC Joint work with Pierre Jacob Department of Statistics, Harvard University Nanyang Technological University 20 February 2019 Jeremy Heng Unbiased HMC 1/ 48
  • 2. Outline 1 MCMC, burn-in bias and parallel computing 2 Couplings of MCMC algorithms Jeremy Heng Unbiased HMC 2/ 48
  • 3. Outline 1 MCMC, burn-in bias and parallel computing 2 Couplings of MCMC algorithms Jeremy Heng Unbiased HMC 2/ 48
  • 4. Setting • Target distribution π(dx) = π(x)dx, x ∈ Rd Jeremy Heng Unbiased HMC 3/ 48
  • 5. Setting • Target distribution π(dx) = π(x)dx, x ∈ Rd • For Bayesian inference, target is the posterior distribution of parameters x given data y π(x) = p(x|y) ∝ p(x) prior p(y|x) likelihood Jeremy Heng Unbiased HMC 3/ 48
  • 6. Setting • Target distribution π(dx) = π(x)dx, x ∈ Rd • For Bayesian inference, target is the posterior distribution of parameters x given data y π(x) = p(x|y) ∝ p(x) prior p(y|x) likelihood • Objective: compute expectation Eπ [h(X)] = Rd h(x)π(x)dx for some test function h : Rd → R Jeremy Heng Unbiased HMC 3/ 48
  • 7. Setting • Target distribution π(dx) = π(x)dx, x ∈ Rd • For Bayesian inference, target is the posterior distribution of parameters x given data y π(x) = p(x|y) ∝ p(x) prior p(y|x) likelihood • Objective: compute expectation Eπ [h(X)] = Rd h(x)π(x)dx for some test function h : Rd → R • Monte Carlo method: sample X0, . . . , XT ∼ π and compute 1 T + 1 T t=0 h(Xt) → Eπ [h(X)] as T → ∞ Jeremy Heng Unbiased HMC 3/ 48
  • 8. Markov chain Monte Carlo (MCMC) • MCMC algorithm defines π-invariant Markov kernel K Jeremy Heng Unbiased HMC 4/ 48
  • 9. Markov chain Monte Carlo (MCMC) • MCMC algorithm defines π-invariant Markov kernel K • Initialize X0 ∼ π0 = π and iterate Xt ∼ K(Xt−1, ·) for t = 1, . . . , T Jeremy Heng Unbiased HMC 4/ 48
  • 10. Markov chain Monte Carlo (MCMC) • MCMC algorithm defines π-invariant Markov kernel K • Initialize X0 ∼ π0 = π and iterate Xt ∼ K(Xt−1, ·) for t = 1, . . . , T • Compute 1 T − b + 1 T t=b h(Xt) → Eπ [h(X)] as T → ∞ where b ≥ 0 iterations are discarded as burn-in Jeremy Heng Unbiased HMC 4/ 48
  • 11. MCMC trajectory π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5 Jeremy Heng Unbiased HMC 5/ 48
  • 12. MCMC trajectories π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5 Jeremy Heng Unbiased HMC 6/ 48
  • 13. MCMC marginal distributions π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5 Jeremy Heng Unbiased HMC 7/ 48
  • 14. Burn-in bias and parallel computing • Since π0 = π, the bias E 1 T − b + 1 T t=b h(Xt) − Eπ [h(X)] = 0 for any fixed b, T Jeremy Heng Unbiased HMC 8/ 48
  • 15. Burn-in bias and parallel computing • Since π0 = π, the bias E 1 T − b + 1 T t=b h(Xt) − Eπ [h(X)] = 0 for any fixed b, T • Bias converges to zero only if b is fixed and T → ∞ Jeremy Heng Unbiased HMC 8/ 48
  • 16. Burn-in bias and parallel computing • Since π0 = π, the bias E 1 T − b + 1 T t=b h(Xt) − Eπ [h(X)] = 0 for any fixed b, T • Bias converges to zero only if b is fixed and T → ∞ • Naive parallelization: generate R chains (X (r) t )R r=1 and compute 1 R R r=1 1 T − b + 1 T t=b h(X (r) t ) Jeremy Heng Unbiased HMC 8/ 48
  • 17. Burn-in bias and parallel computing • Since π0 = π, the bias E 1 T − b + 1 T t=b h(Xt) − Eπ [h(X)] = 0 for any fixed b, T • Bias converges to zero only if b is fixed and T → ∞ • Naive parallelization: generate R chains (X (r) t )R r=1 and compute 1 R R r=1 1 T − b + 1 T t=b h(X (r) t ) • This estimator is not consistent as R → ∞ for fixed b, T Jeremy Heng Unbiased HMC 8/ 48
  • 18. Burn-in bias and parallel computing • Since π0 = π, the bias E 1 T − b + 1 T t=b h(Xt) − Eπ [h(X)] = 0 for any fixed b, T • Bias converges to zero only if b is fixed and T → ∞ • Naive parallelization: generate R chains (X (r) t )R r=1 and compute 1 R R r=1 1 T − b + 1 T t=b h(X (r) t ) • This estimator is not consistent as R → ∞ for fixed b, T • But consistent as T → ∞ for fixed b, R Jeremy Heng Unbiased HMC 8/ 48
  • 19. Proposed methodology • Each processor runs two coupled chains X = (Xt) and Y = (Yt) Parallel MCMC processors # 1 Jeremy Heng Unbiased HMC 9/ 48
  • 20. Proposed methodology • Each processor runs two coupled chains X = (Xt) and Y = (Yt) • Terminates at some random time which involves their meeting time Parallel MCMC processors # 1 Jeremy Heng Unbiased HMC 9/ 48
  • 21. Proposed methodology • Each processor runs two coupled chains X = (Xt) and Y = (Yt) • Terminates at some random time which involves their meeting time • Returns unbiased estimator Hk:m of Eπ [h(X)] Parallel MCMC processors # 1 Jeremy Heng Unbiased HMC 9/ 48
  • 22. Proposed methodology • Each processor runs two coupled chains X = (Xt) and Y = (Yt) • Terminates at some random time which involves their meeting time • Returns unbiased estimator Hk:m of Eπ [h(X)] • Average over R processors: 1 R R r=1 H (r) k:m → Eπ [h(X)] as R → ∞ Parallel MCMC processors # 1 Jeremy Heng Unbiased HMC 9/ 48
  • 23. Proposed methodology • Each processor runs two coupled chains X = (Xt) and Y = (Yt) • Terminates at some random time which involves their meeting time • Returns unbiased estimator Hk:m of Eπ [h(X)] • Average over R processors: 1 R R r=1 H (r) k:m → Eπ [h(X)] as R → ∞ • Efficiency depends on expected compute cost and variance of Hk:m Parallel MCMC processors # 1 Jeremy Heng Unbiased HMC 9/ 48
  • 24. Proposed methodology • Each processor runs two coupled chains X = (Xt) and Y = (Yt) • Terminates at some random time which involves their meeting time • Returns unbiased estimator Hk:m of Eπ [h(X)] • Average over R processors: 1 R R r=1 H (r) k:m → Eπ [h(X)] as R → ∞ • Efficiency depends on expected compute cost and variance of Hk:m Parallel MCMC processors # 1 Jeremy Heng Unbiased HMC 10/ 48
  • 25. Debiasing idea (Glynn and Rhee 2014) • Ergodicity of Markov chain implies lim t→∞ E [h(Xt)] = Eπ [h(X)] Jeremy Heng Unbiased HMC 11/ 48
  • 26. Debiasing idea (Glynn and Rhee 2014) • Ergodicity of Markov chain implies lim t→∞ E [h(Xt)] = Eπ [h(X)] • Writing limit as telescopic sum (starting from k ≥ 0) lim t→∞ E [h(Xt)] = E [h(Xk )] + ∞ t=k+1 E [h(Xt) − h(Xt−1)] Jeremy Heng Unbiased HMC 11/ 48
  • 27. Debiasing idea (Glynn and Rhee 2014) • Ergodicity of Markov chain implies lim t→∞ E [h(Xt)] = Eπ [h(X)] • Writing limit as telescopic sum (starting from k ≥ 0) lim t→∞ E [h(Xt)] = E [h(Xk )] + ∞ t=k+1 E [h(Xt) − h(Xt−1)] • If interchanging summation and expectation is valid E h(Xk ) + ∞ t=k+1 {h(Xt) − h(Xt−1)} = Eπ [h(X)] Jeremy Heng Unbiased HMC 11/ 48
  • 28. Debiasing idea (Glynn and Rhee 2014) • Ergodicity of Markov chain implies lim t→∞ E [h(Xt)] = Eπ [h(X)] • Writing limit as telescopic sum (starting from k ≥ 0) lim t→∞ E [h(Xt)] = E [h(Xk )] + ∞ t=k+1 E [h(Xt) − h(Xt−1)] • If interchanging summation and expectation is valid E h(Xk ) + ∞ t=k+1 {h(Xt) − h(Xt−1)} = Eπ [h(X)] • If we construct another Markov chain (Yt) such that Xt d. = Yt and Xt = Yt−1 for t ≥ τ then E h(Xk ) + τ−1 t=k+1 {h(Xt) − h(Yt−1)} = Eπ [h(X)] Jeremy Heng Unbiased HMC 11/ 48
  • 29. Unbiased estimators Hk(X, Y ) = h(Xk) + τ−1 t=k+1 {h(Xt) − h(Yt−1)} for any k ≥ 0 with τ−1 t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of Eπ [h(X)], with finite variance and expected cost Jeremy Heng Unbiased HMC 12/ 48
  • 30. Unbiased estimators Hk(X, Y ) = h(Xk) + τ−1 t=k+1 {h(Xt) − h(Yt−1)} for any k ≥ 0 with τ−1 t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of Eπ [h(X)], with finite variance and expected cost (Glynn and Rhee 2014, Vihola 2017, Jacob et al. 2017) 1 Convergence of marginal chain: lim t→∞ E [h(Xt)] = Eπ [h(X)] and sup t≥0 E|h(Xt)|2+δ < ∞, δ > 0 Jeremy Heng Unbiased HMC 12/ 48
  • 31. Unbiased estimators Hk(X, Y ) = h(Xk) + τ−1 t=k+1 {h(Xt) − h(Yt−1)} for any k ≥ 0 with τ−1 t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of Eπ [h(X)], with finite variance and expected cost (Glynn and Rhee 2014, Vihola 2017, Jacob et al. 2017) 1 Convergence of marginal chain: lim t→∞ E [h(Xt)] = Eπ [h(X)] and sup t≥0 E|h(Xt)|2+δ < ∞, δ > 0 2 Meeting time τ = inf{t ≥ 1 : Xt = Yt−1} has geometric tails: P(τ > t) ≤ Cρt for C < ∞, ρ ∈ (0, 1) Jeremy Heng Unbiased HMC 12/ 48
  • 32. Unbiased estimators Hk(X, Y ) = h(Xk) + τ−1 t=k+1 {h(Xt) − h(Yt−1)} for any k ≥ 0 with τ−1 t=k+1{·} = 0 if τ − 1 < k + 1, is an unbiased estimator of Eπ [h(X)], with finite variance and expected cost (Glynn and Rhee 2014, Vihola 2017, Jacob et al. 2017) 1 Convergence of marginal chain: lim t→∞ E [h(Xt)] = Eπ [h(X)] and sup t≥0 E|h(Xt)|2+δ < ∞, δ > 0 2 Meeting time τ = inf{t ≥ 1 : Xt = Yt−1} has geometric tails: P(τ > t) ≤ Cρt for C < ∞, ρ ∈ (0, 1) 3 Faithfulness: Xt = Yt−1 for t ≥ τ Jeremy Heng Unbiased HMC 12/ 48
  • 33. Unbiased estimators • For any tuning parameter k ∈ N Hk(X, Y ) = h(Xk) + τ−1 t=k+1 {h(Xt) − h(Yt−1)} is unbiased Jeremy Heng Unbiased HMC 13/ 48
  • 34. Unbiased estimators • For any tuning parameter k ∈ N Hk(X, Y ) = h(Xk) + τ−1 t=k+1 {h(Xt) − h(Yt−1)} is unbiased • First term h(Xk) is biased; second term corrects for bias (zero if k ≥ τ − 1) Jeremy Heng Unbiased HMC 13/ 48
  • 35. Unbiased estimators • For any tuning parameter k ∈ N Hk(X, Y ) = h(Xk) + τ−1 t=k+1 {h(Xt) − h(Yt−1)} is unbiased • First term h(Xk) is biased; second term corrects for bias (zero if k ≥ τ − 1) • As k → ∞, Hk(X, Y ) = h(Xk) with increasing probability, so V [Hk(X, Y )] → Vπ[h(X)] Jeremy Heng Unbiased HMC 13/ 48
  • 36. Unbiased estimators • For any tuning parameter k ∈ N Hk(X, Y ) = h(Xk) + τ−1 t=k+1 {h(Xt) − h(Yt−1)} is unbiased • First term h(Xk) is biased; second term corrects for bias (zero if k ≥ τ − 1) • As k → ∞, Hk(X, Y ) = h(Xk) with increasing probability, so V [Hk(X, Y )] → Vπ[h(X)] • Cost of computing Hk(X, Y ) is roughly 2(τ − 1) + max(1, k + 1 − τ) applications of K Jeremy Heng Unbiased HMC 13/ 48
  • 37. Time-averaged estimators • Since Hk(X, Y ) is unbiased for all k ≥ 0, the time-averaged estimator Hk:m(X, Y ) = 1 m − k + 1 m t=k Ht(X, Y ) for any k ≤ m is also unbiased Jeremy Heng Unbiased HMC 14/ 48
  • 38. Time-averaged estimators • Since Hk(X, Y ) is unbiased for all k ≥ 0, the time-averaged estimator Hk:m(X, Y ) = 1 m − k + 1 m t=k Ht(X, Y ) for any k ≤ m is also unbiased • Rewrite estimator as 1 m − k + 1 m t=k h(Xt) + τ−1 t=k+1 min 1, t − k m − k + 1 {h(Xt) − h(Yt−1)} Jeremy Heng Unbiased HMC 14/ 48
  • 39. Time-averaged estimators • Since Hk(X, Y ) is unbiased for all k ≥ 0, the time-averaged estimator Hk:m(X, Y ) = 1 m − k + 1 m t=k Ht(X, Y ) for any k ≤ m is also unbiased • Rewrite estimator as 1 m − k + 1 m t=k h(Xt) + τ−1 t=k+1 min 1, t − k m − k + 1 {h(Xt) − h(Yt−1)} • First term is standard MCMC average; second term is bias correction (zero if k ≥ τ − 1) Jeremy Heng Unbiased HMC 14/ 48
  • 40. Time-averaged estimators 1 m − k + 1 m t=k h(Xt) + τ−1 t=k+1 min 1, t − k m − k + 1 {h(Xt) − h(Yt−1)} q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Xt Yt−1 ∆t −4 0 4 0 k = 5 τ = 10 m = 20 iteration statespace Jeremy Heng Unbiased HMC 15/ 48
  • 41. Proposed methodology • Each processor runs two coupled chains X = (Xt) and Y = (Yt) • Terminates at some random time which involves their meeting time • Returns unbiased estimator Hk:m of Eπ [h(X)] • Average over R processors: 1 R R r=1 H (r) k:m → Eπ [h(X)] as R → ∞ • Efficiency depends on expected compute cost and variance of Hk:m Parallel MCMC processors # 1 Jeremy Heng Unbiased HMC 16/ 48
  • 42. Efficiency • Following Glynn and Whitt (1992), the asymptotic inefficiency of Hk:m(X, Y ) as compute budget → ∞ is E [2(τ − 1) + max(1, m + 1 − τ)] expected cost × V [Hk:m(X, Y )] variance Jeremy Heng Unbiased HMC 17/ 48
  • 43. Efficiency • Following Glynn and Whitt (1992), the asymptotic inefficiency of Hk:m(X, Y ) as compute budget → ∞ is E [2(τ − 1) + max(1, m + 1 − τ)] expected cost × V [Hk:m(X, Y )] variance • Bias removal leads to variance inflation Jeremy Heng Unbiased HMC 17/ 48
  • 44. Efficiency • Following Glynn and Whitt (1992), the asymptotic inefficiency of Hk:m(X, Y ) as compute budget → ∞ is E [2(τ − 1) + max(1, m + 1 − τ)] expected cost × V [Hk:m(X, Y )] variance • Bias removal leads to variance inflation • Variance inflation can be mitigated by increasing k and m Jeremy Heng Unbiased HMC 17/ 48
  • 45. Efficiency • Following Glynn and Whitt (1992), the asymptotic inefficiency of Hk:m(X, Y ) as compute budget → ∞ is E [2(τ − 1) + max(1, m + 1 − τ)] expected cost × V [Hk:m(X, Y )] variance • Bias removal leads to variance inflation • Variance inflation can be mitigated by increasing k and m • As k → ∞, Hk:m(X, Y ) is standard MCMC average with increasing probability, so its variance should be similar Jeremy Heng Unbiased HMC 17/ 48
  • 46. Efficiency • Following Glynn and Whitt (1992), the asymptotic inefficiency of Hk:m(X, Y ) as compute budget → ∞ is E [2(τ − 1) + max(1, m + 1 − τ)] expected cost × V [Hk:m(X, Y )] variance • Bias removal leads to variance inflation • Variance inflation can be mitigated by increasing k and m • As k → ∞, Hk:m(X, Y ) is standard MCMC average with increasing probability, so its variance should be similar • If τ k m, asymptotic inefficiency is approximately m × σ2(h) m − k + 1 ≈ σ2 (h) the asymptotic variance of marginal chain Jeremy Heng Unbiased HMC 17/ 48
  • 47. Proposed methodology • Each processor runs two coupled chains X = (Xt) and Y = (Yt) • Terminates at some random time which involves their meeting time • Returns unbiased estimator Hk:m of Eπ [h(X)] • Average over R processors: 1 R R r=1 H (r) k:m → Eπ [h(X)] as R → ∞ • Efficiency depends on expected compute cost and variance of Hk:m Parallel MCMC processors # 1 Jeremy Heng Unbiased HMC 18/ 48
  • 48. Coupled chains • To compute Hk:m(X, Y ) Jeremy Heng Unbiased HMC 19/ 48
  • 49. Coupled chains • To compute Hk:m(X, Y ) 1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals, i.e. X0 ∼ π0 and Y0 ∼ π0 Jeremy Heng Unbiased HMC 19/ 48
  • 50. Coupled chains • To compute Hk:m(X, Y ) 1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals, i.e. X0 ∼ π0 and Y0 ∼ π0 2 Sample X1 ∼ K(X0, ·) Jeremy Heng Unbiased HMC 19/ 48
  • 51. Coupled chains • To compute Hk:m(X, Y ) 1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals, i.e. X0 ∼ π0 and Y0 ∼ π0 2 Sample X1 ∼ K(X0, ·) 3 For t = 1, . . . , max(m, τ) sample (Xt+1, Yt) ∼ ¯K((Xt, Yt−1), ·) from coupled kernel ¯K that admits K as marginals, i.e. Xt+1 ∼ K(Xt, ·) and Yt ∼ K(Yt−1, ·) Jeremy Heng Unbiased HMC 19/ 48
  • 52. Coupled chains • To compute Hk:m(X, Y ) 1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals, i.e. X0 ∼ π0 and Y0 ∼ π0 2 Sample X1 ∼ K(X0, ·) 3 For t = 1, . . . , max(m, τ) sample (Xt+1, Yt) ∼ ¯K((Xt, Yt−1), ·) from coupled kernel ¯K that admits K as marginals, i.e. Xt+1 ∼ K(Xt, ·) and Yt ∼ K(Yt−1, ·) • Note that Xt d. = Yt for t ≥ 0 Jeremy Heng Unbiased HMC 19/ 48
  • 53. Coupled chains • To compute Hk:m(X, Y ) 1 Initialize (X0, Y0) ∼ ¯π0 from a coupling with π0 as marginals, i.e. X0 ∼ π0 and Y0 ∼ π0 2 Sample X1 ∼ K(X0, ·) 3 For t = 1, . . . , max(m, τ) sample (Xt+1, Yt) ∼ ¯K((Xt, Yt−1), ·) from coupled kernel ¯K that admits K as marginals, i.e. Xt+1 ∼ K(Xt, ·) and Yt ∼ K(Yt−1, ·) • Note that Xt d. = Yt for t ≥ 0 • Need to design ¯K so that Xτ = Yτ−1 (chains meet) and Xt = Yt−1 for t ≥ τ (are faithful) Jeremy Heng Unbiased HMC 19/ 48
  • 54. Outline 1 MCMC, burn-in bias and parallel computing 2 Couplings of MCMC algorithms Jeremy Heng Unbiased HMC 19/ 48
  • 55. Couplings • Given distributions p(x) and q(y) on Rd , a coupling c(x, y) is a joint distribution on Rd × Rd such that p(x) = Rd c(x, y) dy and q(y) = Rd c(x, y) dx Jeremy Heng Unbiased HMC 20/ 48
  • 56. Couplings • Given distributions p(x) and q(y) on Rd , a coupling c(x, y) is a joint distribution on Rd × Rd such that p(x) = Rd c(x, y) dy and q(y) = Rd c(x, y) dx • (X, Y ) ∼ c implies X ∼ p and Y ∼ q Jeremy Heng Unbiased HMC 20/ 48
  • 57. Couplings • Given distributions p(x) and q(y) on Rd , a coupling c(x, y) is a joint distribution on Rd × Rd such that p(x) = Rd c(x, y) dy and q(y) = Rd c(x, y) dx • (X, Y ) ∼ c implies X ∼ p and Y ∼ q • There are infinitely many couplings of p and q Jeremy Heng Unbiased HMC 20/ 48
  • 58. Couplings • Given distributions p(x) and q(y) on Rd , a coupling c(x, y) is a joint distribution on Rd × Rd such that p(x) = Rd c(x, y) dy and q(y) = Rd c(x, y) dx • (X, Y ) ∼ c implies X ∼ p and Y ∼ q • There are infinitely many couplings of p and q • Independent coupling: X ∼ p and Y ∼ q independently Jeremy Heng Unbiased HMC 20/ 48
  • 59. Couplings • Given distributions p(x) and q(y) on Rd , a coupling c(x, y) is a joint distribution on Rd × Rd such that p(x) = Rd c(x, y) dy and q(y) = Rd c(x, y) dx • (X, Y ) ∼ c implies X ∼ p and Y ∼ q • There are infinitely many couplings of p and q • Independent coupling: X ∼ p and Y ∼ q independently • Optimal coupling: minimizes E |X − Y |2 Jeremy Heng Unbiased HMC 20/ 48
  • 60. Couplings • Given distributions p(x) and q(y) on Rd , a coupling c(x, y) is a joint distribution on Rd × Rd such that p(x) = Rd c(x, y) dy and q(y) = Rd c(x, y) dx • (X, Y ) ∼ c implies X ∼ p and Y ∼ q • There are infinitely many couplings of p and q • Independent coupling: X ∼ p and Y ∼ q independently • Optimal coupling: minimizes E |X − Y |2 • Maximal coupling: maximizes P(X = Y ) Jeremy Heng Unbiased HMC 20/ 48
  • 61. Independent coupling of Gamma and Gaussian Jeremy Heng Unbiased HMC 21/ 48
  • 62. Maximal coupling of Gamma and Gaussian Jeremy Heng Unbiased HMC 22/ 48
  • 63. Maximal coupling: algorithm Sampling (X, Y ) from maximal coupling of p and q 1 Sample X ∼ p and U ∼ U([0, 1]) 0.0 0.5 1.0 1.5 2.0 −2 −1 0 1 2 3 Density Thorisson, Coupling, stationarity, and regeneration (2000) Jeremy Heng Unbiased HMC 23/ 48
  • 64. Maximal coupling: algorithm Sampling (X, Y ) from maximal coupling of p and q 1 Sample X ∼ p and U ∼ U([0, 1]) If U ≤ q(X)/p(X), output (X, X) 0.0 0.5 1.0 1.5 2.0 −2 −1 0 1 2 3 Density Thorisson, Coupling, stationarity, and regeneration (2000) Jeremy Heng Unbiased HMC 23/ 48
  • 65. Maximal coupling: algorithm Sampling (X, Y ) from maximal coupling of p and q 1 Sample X ∼ p and U ∼ U([0, 1]) If U ≤ q(X)/p(X), output (X, X) 2 Otherwise, sample Y ∼ q and U ∼ U([0, 1]) 0.0 0.5 1.0 1.5 2.0 −2 −1 0 1 2 3 Density Thorisson, Coupling, stationarity, and regeneration (2000) Jeremy Heng Unbiased HMC 23/ 48
  • 66. Maximal coupling: algorithm Sampling (X, Y ) from maximal coupling of p and q 1 Sample X ∼ p and U ∼ U([0, 1]) If U ≤ q(X)/p(X), output (X, X) 2 Otherwise, sample Y ∼ q and U ∼ U([0, 1]) until U > p(Y )/q(Y ), and output (X, Y ) 0.0 0.5 1.0 1.5 2.0 −2 −1 0 1 2 3 Density Thorisson, Coupling, stationarity, and regeneration (2000) Jeremy Heng Unbiased HMC 23/ 48
  • 67. Maximal coupling: algorithm Remarks: • Step 1 samples from overlap min{p(x), q(x)} 0.0 0.5 1.0 1.5 2.0 −2 −1 0 1 2 3 Density Jeremy Heng Unbiased HMC 24/ 48
  • 68. Maximal coupling: algorithm Remarks: • Step 1 samples from overlap min{p(x), q(x)} • Maximality follows from coupling inequality P(X = Y ) = Rd min{p(x), q(x)}dx = 1 − TV(p, q) 0.0 0.5 1.0 1.5 2.0 −2 −1 0 1 2 3 Density Jeremy Heng Unbiased HMC 24/ 48
  • 69. Maximal coupling: algorithm Remarks: • Step 1 samples from overlap min{p(x), q(x)} • Maximality follows from coupling inequality P(X = Y ) = Rd min{p(x), q(x)}dx = 1 − TV(p, q) • Expected cost does not depend on p and q 0.0 0.5 1.0 1.5 2.0 −2 −1 0 1 2 3 Density Jeremy Heng Unbiased HMC 24/ 48
  • 70. Metropolis–Hastings (kernel K) At iteration t − 1, Markov chain at state Xt−1 1 Propose X ∼ q(Xt−1, ·), e.g. for RWMH X ∼ N(Xt−1, σ2Id ), for MALA X ∼ N(Xt−1 + σ2 2 log π(Xt−1), σ2Id ) Jeremy Heng Unbiased HMC 25/ 48
  • 71. Metropolis–Hastings (kernel K) At iteration t − 1, Markov chain at state Xt−1 1 Propose X ∼ q(Xt−1, ·), e.g. for RWMH X ∼ N(Xt−1, σ2Id ), for MALA X ∼ N(Xt−1 + σ2 2 log π(Xt−1), σ2Id ) 2 Sample U ∼ U([0, 1]) Jeremy Heng Unbiased HMC 25/ 48
  • 72. Metropolis–Hastings (kernel K) At iteration t − 1, Markov chain at state Xt−1 1 Propose X ∼ q(Xt−1, ·), e.g. for RWMH X ∼ N(Xt−1, σ2Id ), for MALA X ∼ N(Xt−1 + σ2 2 log π(Xt−1), σ2Id ) 2 Sample U ∼ U([0, 1]) 3 If U ≤ min 1, π(X )q(X , Xt−1) π(Xt−1)q(Xt−1, X ) , Jeremy Heng Unbiased HMC 25/ 48
  • 73. Metropolis–Hastings (kernel K) At iteration t − 1, Markov chain at state Xt−1 1 Propose X ∼ q(Xt−1, ·), e.g. for RWMH X ∼ N(Xt−1, σ2Id ), for MALA X ∼ N(Xt−1 + σ2 2 log π(Xt−1), σ2Id ) 2 Sample U ∼ U([0, 1]) 3 If U ≤ min 1, π(X )q(X , Xt−1) π(Xt−1)q(Xt−1, X ) , set Xt = X , otherwise set Xt = Xt−1 Jeremy Heng Unbiased HMC 25/ 48
  • 74. Coupled Metropolis–Hastings (kernel ¯K) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and q(Yt−1, ·) Jeremy Heng Unbiased HMC 26/ 48
  • 75. Coupled Metropolis–Hastings (kernel ¯K) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and q(Yt−1, ·) 2 Sample U ∼ U([0, 1]) Jeremy Heng Unbiased HMC 26/ 48
  • 76. Coupled Metropolis–Hastings (kernel ¯K) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and q(Yt−1, ·) 2 Sample U ∼ U([0, 1]) 3 If U ≤ min 1, π(X )q(X , Xt−1) π(Xt−1)q(Xt−1, X ) , Jeremy Heng Unbiased HMC 26/ 48
  • 77. Coupled Metropolis–Hastings (kernel ¯K) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and q(Yt−1, ·) 2 Sample U ∼ U([0, 1]) 3 If U ≤ min 1, π(X )q(X , Xt−1) π(Xt−1)q(Xt−1, X ) , set Xt = X , otherwise set Xt = Xt−1 Jeremy Heng Unbiased HMC 26/ 48
  • 78. Coupled Metropolis–Hastings (kernel ¯K) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and q(Yt−1, ·) 2 Sample U ∼ U([0, 1]) 3 If U ≤ min 1, π(X )q(X , Xt−1) π(Xt−1)q(Xt−1, X ) , set Xt = X , otherwise set Xt = Xt−1 If U ≤ min 1, π(Y )q(Y , Yt−1) π(Yt−1)q(Yt−1, Y ) , Jeremy Heng Unbiased HMC 26/ 48
  • 79. Coupled Metropolis–Hastings (kernel ¯K) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Propose (X , Y ) from maximal coupling of q(Xt−1, ·) and q(Yt−1, ·) 2 Sample U ∼ U([0, 1]) 3 If U ≤ min 1, π(X )q(X , Xt−1) π(Xt−1)q(Xt−1, X ) , set Xt = X , otherwise set Xt = Xt−1 If U ≤ min 1, π(Y )q(Y , Yt−1) π(Yt−1)q(Yt−1, Y ) , set Yt = Y , otherwise set Yt = Yt−1 Jeremy Heng Unbiased HMC 26/ 48
  • 80. RWMH on Gaussian target: trajectories π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5 Jeremy Heng Unbiased HMC 27/ 48
  • 81. RWMH on Gaussian target: meetings π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5 Jeremy Heng Unbiased HMC 28/ 48
  • 82. RWMH on Gaussian target: meeting times π = N(0, 1), π0 = N(10, 32), K = RWMH with proposal std 0.5 0.000 0.005 0.010 0.015 0 50 100 150 200 meeting time density Jeremy Heng Unbiased HMC 29/ 48
  • 83. RWMH on Gaussian target: scaling with dimension π = π0 = N(0, Id ), ¯K = coupled RWMH with proposal std Cd−1/2 C=1.0 C=1.5 C=2.00 1000 2000 3000 2 4 6 8 10 dimension averagemeetingtime Jeremy Heng Unbiased HMC 30/ 48
  • 84. HMC on Gaussian target: scaling with dimension π = π0 = N(0, Id ), ¯K = coupled HMC with step size Cd−1/4 C=1.0 C=1.5 C=2.0 40 45 50 55 2000 4000 6000 8000 10000 dimension averagemeetingtime Jeremy Heng Unbiased HMC 31/ 48
  • 85. Hamiltonian Monte Carlo (HMC) • Define potential energy U(q) = − log π(q) and Hamiltonian E(q, p) = U(q) + 1 2|p|2 Jeremy Heng Unbiased HMC 32/ 48
  • 86. Hamiltonian Monte Carlo (HMC) • Define potential energy U(q) = − log π(q) and Hamiltonian E(q, p) = U(q) + 1 2|p|2 • Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0 d dt q(t) = pE(q(t), p(t)) = p(t) d dt p(t) = − qE(q(t), p(t)) = − U(q(t)) Jeremy Heng Unbiased HMC 32/ 48
  • 87. Hamiltonian Monte Carlo (HMC) • Define potential energy U(q) = − log π(q) and Hamiltonian E(q, p) = U(q) + 1 2|p|2 • Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0 d dt q(t) = pE(q(t), p(t)) = p(t) d dt p(t) = − qE(q(t), p(t)) = − U(q(t)) • Ideal algorithm defining π-invariant K: at iteration t − 1, Markov chain at state Xt−1 Jeremy Heng Unbiased HMC 32/ 48
  • 88. Hamiltonian Monte Carlo (HMC) • Define potential energy U(q) = − log π(q) and Hamiltonian E(q, p) = U(q) + 1 2|p|2 • Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0 d dt q(t) = pE(q(t), p(t)) = p(t) d dt p(t) = − qE(q(t), p(t)) = − U(q(t)) • Ideal algorithm defining π-invariant K: at iteration t − 1, Markov chain at state Xt−1 1 Set q(0) = Xt−1 and sample p(0) ∼ N(0, Id ) Jeremy Heng Unbiased HMC 32/ 48
  • 89. Hamiltonian Monte Carlo (HMC) • Define potential energy U(q) = − log π(q) and Hamiltonian E(q, p) = U(q) + 1 2|p|2 • Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0 d dt q(t) = pE(q(t), p(t)) = p(t) d dt p(t) = − qE(q(t), p(t)) = − U(q(t)) • Ideal algorithm defining π-invariant K: at iteration t − 1, Markov chain at state Xt−1 1 Set q(0) = Xt−1 and sample p(0) ∼ N(0, Id ) 2 Solve dynamics over time length T to get (q(T), p(T)) Jeremy Heng Unbiased HMC 32/ 48
  • 90. Hamiltonian Monte Carlo (HMC) • Define potential energy U(q) = − log π(q) and Hamiltonian E(q, p) = U(q) + 1 2|p|2 • Hamiltonian dynamics (q(t), p(t)) ∈ Rd × Rd , for t ≥ 0 d dt q(t) = pE(q(t), p(t)) = p(t) d dt p(t) = − qE(q(t), p(t)) = − U(q(t)) • Ideal algorithm defining π-invariant K: at iteration t − 1, Markov chain at state Xt−1 1 Set q(0) = Xt−1 and sample p(0) ∼ N(0, Id ) 2 Solve dynamics over time length T to get (q(T), p(T)) 3 Set Xt = q(T). Jeremy Heng Unbiased HMC 32/ 48
  • 91. Hamiltonian Monte Carlo (HMC) • Solving Hamiltonian dynamics exactly is typically intractable Jeremy Heng Unbiased HMC 33/ 48
  • 92. Hamiltonian Monte Carlo (HMC) • Solving Hamiltonian dynamics exactly is typically intractable • Leap-frog integrator: Jeremy Heng Unbiased HMC 33/ 48
  • 93. Hamiltonian Monte Carlo (HMC) • Solving Hamiltonian dynamics exactly is typically intractable • Leap-frog integrator: 1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id ) Jeremy Heng Unbiased HMC 33/ 48
  • 94. Hamiltonian Monte Carlo (HMC) • Solving Hamiltonian dynamics exactly is typically intractable • Leap-frog integrator: 1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id ) 2 For = 0, . . . , L − 1, compute p +1/2 = p − ε 2 U(q ) q +1 = q + ε p +1/2 p +1 = p +1/2 − ε 2 U(q +1) Jeremy Heng Unbiased HMC 33/ 48
  • 95. Hamiltonian Monte Carlo (HMC) • Solving Hamiltonian dynamics exactly is typically intractable • Leap-frog integrator: 1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id ) 2 For = 0, . . . , L − 1, compute p +1/2 = p − ε 2 U(q ) q +1 = q + ε p +1/2 p +1 = p +1/2 − ε 2 U(q +1) 3 Sample U ∼ U([0, 1]) Jeremy Heng Unbiased HMC 33/ 48
  • 96. Hamiltonian Monte Carlo (HMC) • Solving Hamiltonian dynamics exactly is typically intractable • Leap-frog integrator: 1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id ) 2 For = 0, . . . , L − 1, compute p +1/2 = p − ε 2 U(q ) q +1 = q + ε p +1/2 p +1 = p +1/2 − ε 2 U(q +1) 3 Sample U ∼ U([0, 1]) 4 If U ≤ min {1, exp [E(q0, p0) − E(qL, pL)]} Jeremy Heng Unbiased HMC 33/ 48
  • 97. Hamiltonian Monte Carlo (HMC) • Solving Hamiltonian dynamics exactly is typically intractable • Leap-frog integrator: 1 Set q0 = Xt−1 and sample p0 ∼ N(0, Id ) 2 For = 0, . . . , L − 1, compute p +1/2 = p − ε 2 U(q ) q +1 = q + ε p +1/2 p +1 = p +1/2 − ε 2 U(q +1) 3 Sample U ∼ U([0, 1]) 4 If U ≤ min {1, exp [E(q0, p0) − E(qL, pL)]} set Xt = qL, otherwise set Xt = Xt−1 Jeremy Heng Unbiased HMC 33/ 48
  • 98. Coupled Hamiltonian dynamics • Consider coupling two particles (qi (t), pi (t)), i = 1, 2 following Hamiltonian dynamics Jeremy Heng Unbiased HMC 34/ 48
  • 99. Coupled Hamiltonian dynamics • Consider coupling two particles (qi (t), pi (t)), i = 1, 2 following Hamiltonian dynamics • For Gaussian target π = N(µ, σ2) q1 (t)−q2 (t) = cos(t/σ) q1 (0) − q2 (0) +σ sin(t/σ) p1 (0) − p2 (0) therefore if p1(0) = p2(0) then |q1 (t) − q2 (t)| = | cos(t/σ)| |q1 (0) − q2 (0)| Jeremy Heng Unbiased HMC 34/ 48
  • 100. Coupled Hamiltonian dynamics • Consider coupling two particles (qi (t), pi (t)), i = 1, 2 following Hamiltonian dynamics • For Gaussian target π = N(µ, σ2) q1 (t)−q2 (t) = cos(t/σ) q1 (0) − q2 (0) +σ sin(t/σ) p1 (0) − p2 (0) therefore if p1(0) = p2(0) then |q1 (t) − q2 (t)| = | cos(t/σ)| |q1 (0) − q2 (0)| • Difference ∆(t) = q1(t) − q2(t) satisfies 1 2 d dt |∆(t)|2 = ∆(t)T {p1 (t) − p2 (t)} therefore if p1(0) = p2(0) then t → |∆(t)|2 has a stationary point at t = 0 Jeremy Heng Unbiased HMC 34/ 48
  • 101. Coupled Hamiltonian dynamics • To characterize stationary point 1 2 d2 dt2 |∆(0)|2 = −∆(0)T { U(q1 (0)) − U(q2 (0))} ≤ −α|∆(0)|2 if q1(0), q2(0) ∈ S where U is α-strongly convex Jeremy Heng Unbiased HMC 35/ 48
  • 102. Coupled Hamiltonian dynamics • To characterize stationary point 1 2 d2 dt2 |∆(0)|2 = −∆(0)T { U(q1 (0)) − U(q2 (0))} ≤ −α|∆(0)|2 if q1(0), q2(0) ∈ S where U is α-strongly convex • Since t = 0 is a strict local maximum point, there exists T > 0 such that for any t ∈ (0, T] |q1 (t) − q2 (t)| ≤ ρt|q1 (0) − q2 (0)|, ρt ∈ [0, 1) Jeremy Heng Unbiased HMC 35/ 48
  • 103. Logistic regression: distance against integration time 15 20 25 30 35 0.00 0.25 0.50 0.75 1.00 Integration time Distance Jeremy Heng Unbiased HMC 36/ 48
  • 104. Coupled Hamiltonian dynamics • Assuming U is β-Lipschitz, we established contraction using Taylor expansion around t = 0 (Lemma 1) Jeremy Heng Unbiased HMC 37/ 48
  • 105. Coupled Hamiltonian dynamics • Assuming U is β-Lipschitz, we established contraction using Taylor expansion around t = 0 (Lemma 1) • More quantitative results by Mangoubi and Smith (2017, Theorem 6) Bou-Rabee et al. (2018, Theorem 2.1) give T = √ α β and ρt = 1 2 αt2 Jeremy Heng Unbiased HMC 37/ 48
  • 106. Coupled Hamiltonian dynamics • Assuming U is β-Lipschitz, we established contraction using Taylor expansion around t = 0 (Lemma 1) • More quantitative results by Mangoubi and Smith (2017, Theorem 6) Bou-Rabee et al. (2018, Theorem 2.1) give T = √ α β and ρt = 1 2 αt2 • Coupling can be effective in high dimensions if problem is well-conditioned Jeremy Heng Unbiased HMC 37/ 48
  • 107. Coupled HMC kernel ( ¯Kε,L) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Set q1 0 = Xt−1, q2 0 = Yt−1 and sample p0 ∼ N(0, Id ) Jeremy Heng Unbiased HMC 38/ 48
  • 108. Coupled HMC kernel ( ¯Kε,L) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Set q1 0 = Xt−1, q2 0 = Yt−1 and sample p0 ∼ N(0, Id ) 2 Perform leap-frog integration to obtain (qi L, pi L), i = 1, 2 Jeremy Heng Unbiased HMC 38/ 48
  • 109. Coupled HMC kernel ( ¯Kε,L) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Set q1 0 = Xt−1, q2 0 = Yt−1 and sample p0 ∼ N(0, Id ) 2 Perform leap-frog integration to obtain (qi L, pi L), i = 1, 2 3 Sample U ∼ U([0, 1]) Jeremy Heng Unbiased HMC 38/ 48
  • 110. Coupled HMC kernel ( ¯Kε,L) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Set q1 0 = Xt−1, q2 0 = Yt−1 and sample p0 ∼ N(0, Id ) 2 Perform leap-frog integration to obtain (qi L, pi L), i = 1, 2 3 Sample U ∼ U([0, 1]) 4 If U ≤ min 1, exp E(q1 0, p0) − E(q1 L, p1 L) Jeremy Heng Unbiased HMC 38/ 48
  • 111. Coupled HMC kernel ( ¯Kε,L) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Set q1 0 = Xt−1, q2 0 = Yt−1 and sample p0 ∼ N(0, Id ) 2 Perform leap-frog integration to obtain (qi L, pi L), i = 1, 2 3 Sample U ∼ U([0, 1]) 4 If U ≤ min 1, exp E(q1 0, p0) − E(q1 L, p1 L) set Xt = q1 L, otherwise set Xt = Xt−1 Jeremy Heng Unbiased HMC 38/ 48
  • 112. Coupled HMC kernel ( ¯Kε,L) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Set q1 0 = Xt−1, q2 0 = Yt−1 and sample p0 ∼ N(0, Id ) 2 Perform leap-frog integration to obtain (qi L, pi L), i = 1, 2 3 Sample U ∼ U([0, 1]) 4 If U ≤ min 1, exp E(q1 0, p0) − E(q1 L, p1 L) set Xt = q1 L, otherwise set Xt = Xt−1 5 If U ≤ min 1, exp E(q2 0, p0) − E(q2 L, p2 L) Jeremy Heng Unbiased HMC 38/ 48
  • 113. Coupled HMC kernel ( ¯Kε,L) At iteration t − 1, two Markov chains at states Xt−1 and Yt−1 1 Set q1 0 = Xt−1, q2 0 = Yt−1 and sample p0 ∼ N(0, Id ) 2 Perform leap-frog integration to obtain (qi L, pi L), i = 1, 2 3 Sample U ∼ U([0, 1]) 4 If U ≤ min 1, exp E(q1 0, p0) − E(q1 L, p1 L) set Xt = q1 L, otherwise set Xt = Xt−1 5 If U ≤ min 1, exp E(q2 0, p0) − E(q2 L, p2 L) set Yt = q2 L, otherwise set Yt = Yt−1 Jeremy Heng Unbiased HMC 38/ 48
  • 114. Coupled HMC chains −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 x1 x2 Jeremy Heng Unbiased HMC 39/ 48
  • 115. Logistic regression: distance after 1000 iterations 1e−12 1e−08 1e−04 1e+00 0.25 0.50 0.75 1.00 1.25 Integration time Distanceafter1000iterations L 10 20 30 Jeremy Heng Unbiased HMC 40/ 48
  • 116. Mixture of coupled kernels (kernel ¯K) • To enable exact meetings, we consider for γ ∈ (0, 1) ¯K = (1 − γ) ¯Kε,L coupled HMC + γ ¯Kσ coupled RWMH Jeremy Heng Unbiased HMC 41/ 48
  • 117. Mixture of coupled kernels (kernel ¯K) • To enable exact meetings, we consider for γ ∈ (0, 1) ¯K = (1 − γ) ¯Kε,L coupled HMC + γ ¯Kσ coupled RWMH • Choice of RWMH proposal std σ: distance between chains < σ < spread of π Jeremy Heng Unbiased HMC 41/ 48
  • 118. Mixture of coupled kernels (kernel ¯K) • To enable exact meetings, we consider for γ ∈ (0, 1) ¯K = (1 − γ) ¯Kε,L coupled HMC + γ ¯Kσ coupled RWMH • Choice of RWMH proposal std σ: distance between chains < σ < spread of π • Advocate small RWMH probability γ to minimize inefficiency Jeremy Heng Unbiased HMC 41/ 48
  • 119. Geometric tails of meeting time • To ensure validity of unbiased estimators: Jeremy Heng Unbiased HMC 42/ 48
  • 120. Geometric tails of meeting time • To ensure validity of unbiased estimators: 1 Convergence of marginal chain (inherited from HMC) Jeremy Heng Unbiased HMC 42/ 48
  • 121. Geometric tails of meeting time • To ensure validity of unbiased estimators: 1 Convergence of marginal chain (inherited from HMC) 2 Meeting time has geometric tails (Theorem 2) Jeremy Heng Unbiased HMC 42/ 48
  • 122. Geometric tails of meeting time • To ensure validity of unbiased estimators: 1 Convergence of marginal chain (inherited from HMC) 2 Meeting time has geometric tails (Theorem 2) 3 Faithfulness (by construction) Jeremy Heng Unbiased HMC 42/ 48
  • 123. Geometric tails of meeting time • To ensure validity of unbiased estimators: 1 Convergence of marginal chain (inherited from HMC) 2 Meeting time has geometric tails (Theorem 2) 3 Faithfulness (by construction) • Main assumptions: (Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ) are small enough Jeremy Heng Unbiased HMC 42/ 48
  • 124. Geometric tails of meeting time • To ensure validity of unbiased estimators: 1 Convergence of marginal chain (inherited from HMC) 2 Meeting time has geometric tails (Theorem 2) 3 Faithfulness (by construction) • Main assumptions: 1 U is globally Lipschitz (Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ) are small enough Jeremy Heng Unbiased HMC 42/ 48
  • 125. Geometric tails of meeting time • To ensure validity of unbiased estimators: 1 Convergence of marginal chain (inherited from HMC) 2 Meeting time has geometric tails (Theorem 2) 3 Faithfulness (by construction) • Main assumptions: 1 U is globally Lipschitz 2 U is strongly convex on S ⊂ Rd (Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ) are small enough Jeremy Heng Unbiased HMC 42/ 48
  • 126. Geometric tails of meeting time • To ensure validity of unbiased estimators: 1 Convergence of marginal chain (inherited from HMC) 2 Meeting time has geometric tails (Theorem 2) 3 Faithfulness (by construction) • Main assumptions: 1 U is globally Lipschitz 2 U is strongly convex on S ⊂ Rd 3 Geometric drift condition on HMC kernel (Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ) are small enough Jeremy Heng Unbiased HMC 42/ 48
  • 127. Geometric tails of meeting time • To ensure validity of unbiased estimators: 1 Convergence of marginal chain (inherited from HMC) 2 Meeting time has geometric tails (Theorem 2) 3 Faithfulness (by construction) • Main assumptions: 1 U is globally Lipschitz 2 U is strongly convex on S ⊂ Rd 3 Geometric drift condition on HMC kernel (Theorem 2) Meeting time has geometric tails if (ε, L, σ, γ) are small enough • Assumptions can be verified for Gaussian targets, Bayesian logistic regression relying on Durmus et al. (2017, Theorem 9) Jeremy Heng Unbiased HMC 42/ 48
  • 128. Sensitivity of RWMH proposal std σ Logistic regression (left), Cox process (right) 200 400 600 800 1e−061e−051e−04 0.001 0.01 σ Meetingtime 50 100 150 1e−061e−051e−04 0.001 0.01 σ Meetingtime Jeremy Heng Unbiased HMC 43/ 48
  • 129. Sensitivity of RWMH probability γ Logistic regression (left), Cox process (right) 200 400 600 800 0.01 0.03 0.05 0.07 0.09 0.11 γ Meetingtime 0 200 400 600 0.01 0.16 0.31 0.46 0.61 0.76 γ Meetingtime Jeremy Heng Unbiased HMC 44/ 48
  • 130. Cox process: effect of dimension and algorithm • Better algorithms yield smaller meeting times RHMC HMC 256 1024 4096 0 200 400 600 0 200 400 600 Dimension Meetingtime Jeremy Heng Unbiased HMC 45/ 48
  • 131. Cox process: effect of dimension and algorithm • Better algorithms yield smaller meeting times • Proposed methodology cannot work if marginal chain fails to mix RHMC HMC 256 1024 4096 0 200 400 600 0 200 400 600 Dimension Meetingtime Jeremy Heng Unbiased HMC 45/ 48
  • 132. Cox process: effect of dimension and algorithm • Better algorithms yield smaller meeting times • Proposed methodology cannot work if marginal chain fails to mix • Writing πt = π0Kt, we have TV(πt, π) ≤ min {1, E [max(0, τ − t + 1)]} RHMC HMC 256 1024 4096 0 200 400 600 0 200 400 600 Dimension Meetingtime Jeremy Heng Unbiased HMC 45/ 48
  • 133. Logistic regression: impact of k and m k m Cost Relative inefficiency 1 k 436 1989.07 1 5k 436 1671.93 1 10k 436 1403.28 90% quantile(τ) k 553 38.11 90% quantile(τ) 5k 1868 1.23 90% quantile(τ) 10k 3518 1.05 Relative inefficiency = Asymptotic inefficiency Asymptotic variance of optimal HMC k,m → ∞ −−−−−→ Asymptotic variance of marginal HMC Asymptotic variance of optimal HMC Jeremy Heng Unbiased HMC 46/ 48
  • 134. Concluding remarks Bou-Rabee et al. (2018) introduced another coupling for HMC Jeremy Heng Unbiased HMC 47/ 48
  • 135. Concluding remarks Bou-Rabee et al. (2018) introduced another coupling for HMC Could combine synchronous coupling (L = 1) and maximal coupling for MALA Jeremy Heng Unbiased HMC 47/ 48
  • 136. Concluding remarks Bou-Rabee et al. (2018) introduced another coupling for HMC Could combine synchronous coupling (L = 1) and maximal coupling for MALA Extension to other variants of HMC: Jeremy Heng Unbiased HMC 47/ 48
  • 137. Concluding remarks Bou-Rabee et al. (2018) introduced another coupling for HMC Could combine synchronous coupling (L = 1) and maximal coupling for MALA Extension to other variants of HMC: 1 No-U-Turn Sampler (Hoffman and Gelman, 2014) Jeremy Heng Unbiased HMC 47/ 48
  • 138. Concluding remarks Bou-Rabee et al. (2018) introduced another coupling for HMC Could combine synchronous coupling (L = 1) and maximal coupling for MALA Extension to other variants of HMC: 1 No-U-Turn Sampler (Hoffman and Gelman, 2014) 2 Partial momentum refreshment (Horowitz, 1991) Jeremy Heng Unbiased HMC 47/ 48
  • 139. Concluding remarks Bou-Rabee et al. (2018) introduced another coupling for HMC Could combine synchronous coupling (L = 1) and maximal coupling for MALA Extension to other variants of HMC: 1 No-U-Turn Sampler (Hoffman and Gelman, 2014) 2 Partial momentum refreshment (Horowitz, 1991) 3 Different choices of kinetic energy (Livingstone et al., 2017) Jeremy Heng Unbiased HMC 47/ 48
  • 140. Concluding remarks Bou-Rabee et al. (2018) introduced another coupling for HMC Could combine synchronous coupling (L = 1) and maximal coupling for MALA Extension to other variants of HMC: 1 No-U-Turn Sampler (Hoffman and Gelman, 2014) 2 Partial momentum refreshment (Horowitz, 1991) 3 Different choices of kinetic energy (Livingstone et al., 2017) 4 Hamiltonian bouncy particle sampler (Vanetti et al., 2017) Jeremy Heng Unbiased HMC 47/ 48
  • 141. References J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo with couplings. Biometrika (to appear), arXiv:1709.00404, 2019. Jeremy Heng Unbiased HMC 48/ 48
  • 142. References J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo with couplings. Biometrika (to appear), arXiv:1709.00404, 2019. R package: https://github.com/pierrejacob/debiasedhmc Jeremy Heng Unbiased HMC 48/ 48
  • 143. References J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo with couplings. Biometrika (to appear), arXiv:1709.00404, 2019. R package: https://github.com/pierrejacob/debiasedhmc P. Jacob, J. O’Leary, Y. Atchad´e. Unbiased Markov chain Monte Carlo with couplings. arXiv:1708.03625, 2017. Jeremy Heng Unbiased HMC 48/ 48
  • 144. References J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo with couplings. Biometrika (to appear), arXiv:1709.00404, 2019. R package: https://github.com/pierrejacob/debiasedhmc P. Jacob, J. O’Leary, Y. Atchad´e. Unbiased Markov chain Monte Carlo with couplings. arXiv:1708.03625, 2017. P. Jacob, F. Lindsten, T. Sch¨on. Smoothing with Couplings of Conditional Particle Filters. JASA, 2018. Jeremy Heng Unbiased HMC 48/ 48
  • 145. References J. Heng and P. Jacob. Unbiased Hamiltonian Monte Carlo with couplings. Biometrika (to appear), arXiv:1709.00404, 2019. R package: https://github.com/pierrejacob/debiasedhmc P. Jacob, J. O’Leary, Y. Atchad´e. Unbiased Markov chain Monte Carlo with couplings. arXiv:1708.03625, 2017. P. Jacob, F. Lindsten, T. Sch¨on. Smoothing with Couplings of Conditional Particle Filters. JASA, 2018. L. Middleton, G. Deligiannidis, A. Doucet, P. Jacob. Unbiased Markov chain Monte Carlo for intractable target distributions. arXiv:1807.08691, 2018. Jeremy Heng Unbiased HMC 48/ 48