Coordinate sampler: A non-reversible Gibbs-like sampler

Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Coordinate Sampler: A Non-Reversible
Gibbs-like Sampler
Christian P. Robert
U Paris Dauphine PSL & University of Warwick
Joint work with Changye Wu

Degemer mad en Breizh ∗
∗. Welcome to Brittany

Outline
Background
Versions of PDMP
Coordinate Sampler
Numerical comparison
Conclusion

Background
Generic issue
Goal : sample from a target known up to a constant, deﬁned over
Rd ,
π(x) ∝ γ(x)
with energy U(x) = − log π(x), U ∈ C1.

Background
Marketing arguments
Current default workhorse : reversible MCMC methods
Non-reversible MCMC algorithms based on piecewise deterministic
Markov processes perform well empirically
Quantitative convergence rates and variance now available
Mesquita and Hespanha (2010) show geometric ergodicity for
targets with exponentially decaying tails,
Monmarché (2016) gives sharp results for compact state-space
Bierkens et al. (2016a,b) consider targets on the real line
Christophe Andrieu’s talk contains further advances

Background
Motivation : piecewise deterministic Markov process
PDMP sampler is a (new ?) continuous-time, non-reversible MCMC
method based on auxiliary variables
1. particle physics simulation
[Peters and de With, 2012]
2. empirically state-of-the-art performances
[Bouchard-Côté et al., 2018]
3. exact subsampled in big data
[Bierkens et al., 2016]
4. geometric ergodicity for a large class of distribution
[Deligiannidis et al., 2017; Bierkens et al., 2017]

Background
Recent reference

Background
Piecewise deterministic Markov process
Piecewise deterministic Markov process {zt ∈ Z}t∈[0,∞), with three
ingredients,
1. Deterministic dynamics : between events, deterministic
evolution based on ODE
dzt/dt = Φ(zt)
2. Event occurrence rate : λ(t) = λ(zt)
3. Transition dynamics : At event time, τ, state prior to τ
denoted by zτ−, and new state generated by zτ ∼ Q(·|zτ−).
[Davis, 1984, 1993]

Background
Implementation
Algorithm 1: Simulation of PDMP
Starting point z0, τ0 ← 0.
for k = 1, 2, 3, · · · do
Sample inter-event time ηk from distribution
P(ηk > t) = exp −
t
0
λ(zτk−1+s )ds .
τk ← τk−1 + ηk, zτk−1+s ← Ψs(zτk−1
), for s ∈ (0, ηk), where
Ψ ODE ﬂow of Φ.
zτk − ← Ψηk
(zτk−1
), zτk
∼ Q(·|zτk −).
end

Versions of PDMP
Outline
Background
Versions of PDMP
Coordinate Sampler
Conclusion

Versions of PDMP
Basic bouncy particle sampler
Simulation of continuous-time piecewise linear trajectory (xt)t with
each segment in trajectory speciﬁed by
initial position x
length τ
velocity v

Versions of PDMP
initial position x
length τ
velocity v
length speciﬁed by inhomogeneous Poisson point process with
intensity function
λ(x, v) = max{0, < U(x), v >}

Versions of PDMP
initial position x
length τ
velocity v
new velocity after bouncing given by Newtonian elastic collision
R(x)v = v − 2
< U(x), v >
|| U(x)||2
U(x)

Versions of PDMP

Versions of PDMP
Implementation
Generally speaking, the main diﬃculties of implementing PDMP
come from
1. Computing the ODE ﬂow Ψ : linear dynamic, quadratic
dynamic
2. Simulating the inter-event time ηk : techniques of
superposition and thinning for Poisson processes

Versions of PDMP
Poisson process on R+
Deﬁnition (Poisson process)
Poisson process with rate λ on R+ is sequence
τ1, τ2, · · ·
of rv’s when intervals
τ1, τ2 − τ1, τ3 − τ2, · · ·
are iid with
P(τi − τi−1 > T) = exp −
τi−1+T
τi−1
λ(t)dt , τ0 = 0

Versions of PDMP
Simulation by superposition theorem
Theorem (Kingman, 1992)
Let Π1, Π2, · · · , be countable collection of independent Poisson
processes on R+ with resp. rates λn(·). If ∞
n=1λn(t) < ∞ for all
t’s, then superposition process
Π =
∞
n=1
Πn
is Poisson process with rate
λ(t) =
∞
n=1
λn(t)

Versions of PDMP
Simulation by thinning
Theorem (Lewis and Shedler, 1979)
Let
λ, Λ : R+
→ R+
be continuous functions such that λ(·) Λ(·). Let
τ1, τ2, · · · ,
be the increasing sequence of a Poisson process with rate Λ(·). For
all i, if τi is removed from the sequence with probability
1 − λ(t)/Λ(t)
then the remaining ˜τ1, ˜τ2, · · · form a non-homogeneous Poisson
process with rateλ(·)

Versions of PDMP
Extended generator
Deﬁnition
For D(L) set of measurable functions f : Z → R such that there
exists a measurable function h : Z → R with t → h(zt) Pz-a.s. for
each z ∈ Z and the process
Cf
t = f (zt) − f (z0) −
t
0
h(zs)ds
a local martingale. Then we write h = Lf and call (L, D(L)) the
extended generator of the process {zt}t 0.

Versions of PDMP
Extended Generator of PDMP
Theorem (Davis, 1993)
The generator, L, of above PDMP is, for f ∈ D(L)
Lf (z) = f (z) · Φ(z) + λ(z)
z
f (z ) − f (z) Q(dz |z)
Furthermore, µ(dz) is an invariant distribution of above PDMP, if
Lf (z)µ(dz) = 0, for all f ∈ D(L)

Versions of PDMP
PDMP-based sampler
PDMP-based sampler is an auxiliary variable technique
Given target π(x),
1. introduce auxiliary variable V ∈ V along with a density π(v|x),
2. choose appropriate Φ, λ and Q
for π(x)π(v|x) to be unique invariant distribution of Markov process

Versions of PDMP
Bouncy Particle Sampler (Bouchard-Côté et al., 2018)
V = Rd , and π(v|x) = ϕ(v) for N(0, Id )
1. Deterministic dynamics :
dxt/dt = vt, dvt/dt = 0
2. Event occurrence rate : λ(x, v) = v, U(x) + + λref
3. Transition dynamics :
Q((dx , dv )|(x, v))
=
v, U(x) +
λ(x, v)
δx(dx )δR U(x)v(dv ) +
λref
λ(x, v)
δx(dx )ϕ(dv )
where R U(x)v = v − 2 U(x),v
U(x), U(x) U(x)

Versions of PDMP
Zig-Zag Sampler (Bierkens et al., 2016)
back to CS

Versions of PDMP
Zig-Zag Sampler (Bierkens et al., 2016)
V = {+1, −1}d , and π(v|x) ∼ Uniform({+1, −1}d )
2. Event occurrence rate :
λ(x, v) = d
i=1 λi (x, v) = d
i=1 {vi i U(x)}+ + λref
i
Q((dx , dv )|(x, v)) =
d
i=1
λi (x, v)
λ(x, v)
δx(dx )δFi v(dv )
where Fi operator that ﬂips i-th component of v and keep others
unchanged
back to CS

Versions of PDMP
Continuous-time Hamiltonian Monte Carlo (Neal, 1999)
V = Rd , and π(v|x) = ϕ(v) ∼ N(0, Id )
dxt/dt = vt, dvt/dt = − U(xt)
2. Event occurrence rate : λ(x, v) = λ0(x)
Q((dx , dv )|(x, v)) = δx(dx )ϕ(dv )

Versions of PDMP
Continuous-time Riemannian Manifold HMC (Girolami
& Calderhead, 2011)
V = Rd , and π(v|x) = N(0, G(x)), the Hamiltonian is
H(x, v) = U(x) + 1/2vT
G(x)−1
v + 1/2 log(|G(x)|)
dxt/dt = ∂H/∂v(xt, vt), dvt/dt = −∂H/∂x(xt, vt)
2. Event occurrence rate : λ(x, v) = λ0(x)
Q((dx , dv )|(x, v)) = δx(dx )ϕ(dv |x )

Versions of PDMP
Randomized BPS
Deﬁne
a =
v, U(x)
U(x), U(x)
U(x), b = v − a
Regular BPS, move v = −a + b
Alternatives
1. Fearnhead et al. (2016) :
v ∼ Qx(dv |v) = max {0, −v , U(x) } dv
2. Wu and Robert (2017) : v = −a + b , where b is Gaussian
distributed over the space which is orthogonal to U(x) in Rd .

Versions of PDMP
HMC-BPS (Vanetti et al., 2017)
ρ(x) ∝ exp{−V (x)} is a Gaussian approximation of the target π(x).
^H(x, v) = V (x) + 1/2vT
v, ˜U(x) = U(x) − V (x)
dxt/dt = vt, dvt/dt = − V (xt)
2. Event occurrence rate : λ(x, v) = v, ˜U(x) + + λref
Q((dx , dv )|(x, v))
=
v, ˜U(x) +
λ(x, v)
δx(dx )δR ˜U(x)v(dv ) +
λref
λ(x, v)
δx(dx )ϕ(dv )

Versions of PDMP
Discretization
1. Sherlock and Thiery (2017) considers delayed rejection
approach with only point-wise evaluations of target, by making
speed flip move once proposal involving flip in speed and drift
in variable of interest rejected. Also add random perturbation
for eergodicity, plus another perturbation based on a Brownian
argument. Requires calibration
2. Vanetti et al. (2017)
Benefit : bypassing the generation of inter-event time of
inhomogeneous Poisson processes.

Versions of PDMP
Discretization
1. Sherlock and Thiery (2017)
2. Vanetti et al. (2017) uniﬁes many threads and relates PDMP,
HMC, and discrete versions, with convergence results. Main
idea improves upon existing deterministic methods by
accounting for target. Borrows from earlier slice sampler idea
of Murrayet al. (AISTATS, 2010), exploiting exact Hamiltonian
dynamics for approximation to true target. Except that
bouncing avoids the slice step. Eight discrete BPS both correct
against target and do not simulating event times.
Beneﬁt : bypassing the generation of inter-event time of
inhomogeneous Poisson processes.

Coordinate Sampler
Outline
Background
Versions of PDMP
Coordinate Sampler
Conclusion

Coordinate Sampler
Coordinate Sampler
In this generalisation of BPS, bounce becomes partly random in
direction orthogonal to gradient : V = {±e1, · · · , ±ed }, and
π(v|x) = ϕ(v) ∼ U(V)
2. Event occurrence rate : λ(x, v) = v, U(x) + + λref
Q((dx , dv )|(x, v)) =
v∗∈V
λ(x, −v∗)
λ(x)
δx(dx )δv∗ (dv )
where λ(x) = v∈V λ(x, v) = 2dλref + d
i=1
∂U(x)/∂xi

Coordinate Sampler
Validation of Coordinate Sampler
Lf = xf (x, v), v + λ(x, v)
v ∈V
λ(x, −v )
λ(x)
f (x, v ) − f (x, v)
Theorem
For any positive λref > 0, the PDMP induced by CS enjoys
π(x)ϕ(v) as its unique invariant distribution, produced the
potential U is C1.

Coordinate Sampler
Geometric Ergodicity of Coordinate Sampler
Assumptions : Assume U : Rd → R+ satisfy [Deligiannidis et al.,
2017]
A.1 ∂2U(x)/∂xi xj is locally Lipschitz continuous for all i, j,
A.2 U(x) π(dx) < ∞,
A.3 lim|x|→∞eU(x)/2/ U(x) > 0
A.4 V c0 for some positive constant c0.
Further conditions
C.1 lim|x|→∞ U(x) = ∞, lim|x|→∞ ∆U(x) α1 < ∞ and
λref >
√
8α1.
C.2 lim|x|→∞ U(x) = 2α2 > 0, lim|x|→∞ ∆U(x) = 0 and
λref < α2/14d

Coordinate Sampler
Geometric Ergodicity of Coordinate Sampler
Lyapunov function for the Markov process induced by coordinate
sampler is
V (x, v) = eU(x)/2
/
√
λref+ U(x),−v +,
Theorem
Suppose A.1 − A.4 hold and one of the conditions C.1, C.2 holds,
then CS is V -uniformly ergodic.

Coordinate Sampler
Coordinate Sampler vs Zig-Zag Sampler
1. The cardinality of V of CS is 2d, while that of ZS is 2d ;
2. Along each piecewise segment, CS only changes one
component of x, while ZS modiﬁes all components at the
same time (con) ;
3. λCS = {vi i U(x)}+ + λref, if v = vi ei ; while
λZS = d
i =1 {vi i U(x)}+ + λref , if v = d
i =1 vi ei .
Generally speaking, λCS is much smaller than λZS (pro) ;
pro > con !

Coordinate Sampler
Coordinate Sampler vs Zig-Zag Sampler
Suppose that
1. λ(x, v) of CS and λi (x, v) of Zig-Zag sampler have same scale.
2. Simulating ﬁrst event time of Poisson process with rates
λ(x, v) of CS and λi (x, v) of ZS : same computation cost,
O(c).
O(dc) computation cost result in
1. ZS makes each component evolve with scale O( /d),
2. CS makes each component evolve O( )
on average.

Outline
Background
Versions of PDMP
Coordinate Sampler
Conclusion

Example 1 : Banana-shaped Distribution
π(x) ∝ exp −(x1 − 1)2
− κ(x2 − x2
1 )2
−2 −1 0 1 2 3 4 5
0.00.51.01.52.02.53.0
log2(κ)
RatioofESSpersecond
first component
second component
loglikelihood

Example 2 : Multivariate Gaussian Distribution
π(x) ∝ exp −
1
2
xT
A−1
x
1. MVN1 : Aii = 1, for i = 1, · · · , d and Aij = 0.9 for i = j.
2. MVN2 : Aii = 1, fro i = 1, · · · , d and Aij = 0.9|i−j| for i = j.

Example 2 : Multivariate Gaussian Distribution
20 40 60 80 100
468101214
MinESS
MeanESS
MaxESS
20 40 60 80 100
4567891011
MinESS
MeanESS
MaxESS
Figure – The lhs plot shows the results for MVN1 and the rhs for MVN2.
The x-axis indexes the dimension d of the distribution, and the y-axis the
eﬃciency ratios of CS over ZS in terms of minimal (red), mean (blue),
and maximal ESS(green) across the components.

Example 2 : Bayesian Logistic Posterior
π(x) ∝
N
n=1
exp(tnxT rn)
1 + exp(xT rn)
0
50
100
150
MinESS MeanESS MaxESS
ESSpersecond
Type
CS
ZS
Figure – Comparison of CS versus ZS for the Bayesian logistic model :
the y-axis represents the ESS per second.

Conclusion
Outline
Background
Versions of PDMP
Coordinate Sampler
Conclusion

Conclusion
Forward
1. How to compare coordinate sampler and zigzag sampler
theoretically.
2. reparametrization
3. Riemannian manifold technique

Conclusion
Forward
Trugarez ! †
†. Thank you !

Conclusion
Bierkens, J., Fearnhead, P., and Roberts, G. (2016). The zig-zag process and super-eﬃcient sampling
for bayesian analysis of big data. arXiv preprint arXiv :1607.03188.
Bierkens, J., Roberts, G., and Zitt, P.-A. (2017). Ergodicity of the zigzag process. arXiv preprint
arXiv :1712.09875.
Bouchard-Côté, A., Vollmer, S. J., and Doucet, A. (2018). The bouncy particle sampler : a
non-reversible rejection-free markov chain monte carlo method. Journal of the American Statistical
Association, (to appear).
Davis, M. H. (1984). Piecewise-deterministic markov processes : A general class of non-diﬀusion
stochastic models. Journal of the Royal Statistical Society. Series B (Methodological), pages
353–388.
Davis, M. H. (1993). Markov Models & Optimization, volume 49. CRC Press.
Deligiannidis, G., Bouchard-Côté, A., and Doucet, A. (2017). Exponential ergodicity of the bouncy
particle sampler. arXiv preprint arXiv :1705.04579.
Fearnhead, P., Bierkens, J., Pollock, M., and Roberts, G. O. (2016). Piecewise deterministic markov
processes for continuous-time monte carlo. arXiv preprint arXiv :1611.07873.
Kingman, J. F. C. (1992). Poisson processes, volume 3. Clarendon Press.
Lewis, P. A. and Shedler, G. S. (1979). Simulation of nonhomogeneous poisson processes by thinning.
Naval Research Logistics (NRL), 26(3) :403–413.
Peters, E. A. and de With, G. (2012). Rejection-free monte carlo sampling for general potentials.
Physical Review E, 85(2) :026703.
Sherlock, C. and Thiery, A. H. (2017). A discrete bouncy particle sampler. arXiv preprint
arXiv :1707.05200.
Vanetti, P., Bouchard-Côté, A., Deligiannidis, G., and Doucet, A. (2017). Piecewise deterministic
markov chain monte carlo. arXiv preprint arXiv :1707.05296.
Wu, C. and Robert, C. P. (2017). Generalized bouncy particle sampler. arXiv preprint arXiv :1706.04781.

Coordinate sampler: A non-reversible Gibbs-like sampler

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Coordinate sampler: A non-reversible Gibbs-like sampler

Similar to Coordinate sampler: A non-reversible Gibbs-like sampler (20)

More from Christian Robert

More from Christian Robert (19)

Recently uploaded

Recently uploaded (20)

Coordinate sampler: A non-reversible Gibbs-like sampler