This document describes a new MCMC method called the Coordinate Sampler. It is a non-reversible Gibbs-like sampler based on a piecewise deterministic Markov process (PDMP). The Coordinate Sampler generalizes the Bouncy Particle Sampler by making the bounce direction partly random and orthogonal to the gradient. It is proven that under certain conditions, the PDMP induced by the Coordinate Sampler has a unique invariant distribution of the target distribution multiplied by a uniform auxiliary variable distribution. The Coordinate Sampler is also shown to exhibit geometric ergodicity, an important convergence property, under additional regularity conditions on the target distribution.
Coordinate sampler: A non-reversible Gibbs-like sampler
1. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Coordinate Sampler: A Non-Reversible
Gibbs-like Sampler
Christian P. Robert
U Paris Dauphine PSL & University of Warwick
Joint work with Changye Wu
2. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Degemer mad en Breizh ∗
∗. Welcome to Brittany
3. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Outline
Background
Versions of PDMP
Coordinate Sampler
Numerical comparison
Conclusion
4. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Background
Generic issue
Goal : sample from a target known up to a constant, defined over
Rd ,
π(x) ∝ γ(x)
with energy U(x) = − log π(x), U ∈ C1.
5. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Background
Marketing arguments
Current default workhorse : reversible MCMC methods
Non-reversible MCMC algorithms based on piecewise deterministic
Markov processes perform well empirically
Quantitative convergence rates and variance now available
Mesquita and Hespanha (2010) show geometric ergodicity for
targets with exponentially decaying tails,
Monmarché (2016) gives sharp results for compact state-space
Bierkens et al. (2016a,b) consider targets on the real line
Christophe Andrieu’s talk contains further advances
6. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Background
Motivation : piecewise deterministic Markov process
PDMP sampler is a (new ?) continuous-time, non-reversible MCMC
method based on auxiliary variables
1. particle physics simulation
[Peters and de With, 2012]
2. empirically state-of-the-art performances
[Bouchard-Côté et al., 2018]
3. exact subsampled in big data
[Bierkens et al., 2016]
4. geometric ergodicity for a large class of distribution
[Deligiannidis et al., 2017; Bierkens et al., 2017]
7. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Background
Recent reference
[Bouchard-Côté et al., 2018]
8. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Background
Piecewise deterministic Markov process
Piecewise deterministic Markov process {zt ∈ Z}t∈[0,∞), with three
ingredients,
1. Deterministic dynamics : between events, deterministic
evolution based on ODE
dzt/dt = Φ(zt)
2. Event occurrence rate : λ(t) = λ(zt)
3. Transition dynamics : At event time, τ, state prior to τ
denoted by zτ−, and new state generated by zτ ∼ Q(·|zτ−).
[Davis, 1984, 1993]
9. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Background
Implementation
Algorithm 1: Simulation of PDMP
Starting point z0, τ0 ← 0.
for k = 1, 2, 3, · · · do
Sample inter-event time ηk from distribution
P(ηk > t) = exp −
t
0
λ(zτk−1+s )ds .
τk ← τk−1 + ηk, zτk−1+s ← Ψs(zτk−1
), for s ∈ (0, ηk), where
Ψ ODE flow of Φ.
zτk − ← Ψηk
(zτk−1
), zτk
∼ Q(·|zτk −).
end
10. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Outline
Background
Versions of PDMP
Coordinate Sampler
Numerical comparison
Conclusion
11. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Basic bouncy particle sampler
Simulation of continuous-time piecewise linear trajectory (xt)t with
each segment in trajectory specified by
initial position x
length τ
velocity v
[Bouchard-Côté et al., 2018]
12. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Basic bouncy particle sampler
Simulation of continuous-time piecewise linear trajectory (xt)t with
each segment in trajectory specified by
initial position x
length τ
velocity v
length specified by inhomogeneous Poisson point process with
intensity function
λ(x, v) = max{0, < U(x), v >}
[Bouchard-Côté et al., 2018]
13. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Basic bouncy particle sampler
Simulation of continuous-time piecewise linear trajectory (xt)t with
each segment in trajectory specified by
initial position x
length τ
velocity v
new velocity after bouncing given by Newtonian elastic collision
R(x)v = v − 2
< U(x), v >
|| U(x)||2
U(x)
[Bouchard-Côté et al., 2018]
14. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Basic bouncy particle sampler
[Bouchard-Côté et al., 2018]
15. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Implementation
Generally speaking, the main difficulties of implementing PDMP
come from
1. Computing the ODE flow Ψ : linear dynamic, quadratic
dynamic
2. Simulating the inter-event time ηk : techniques of
superposition and thinning for Poisson processes
16. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Poisson process on R+
Definition (Poisson process)
Poisson process with rate λ on R+ is sequence
τ1, τ2, · · ·
of rv’s when intervals
τ1, τ2 − τ1, τ3 − τ2, · · ·
are iid with
P(τi − τi−1 > T) = exp −
τi−1+T
τi−1
λ(t)dt , τ0 = 0
17. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Simulation by superposition theorem
Theorem (Kingman, 1992)
Let Π1, Π2, · · · , be countable collection of independent Poisson
processes on R+ with resp. rates λn(·). If ∞
n=1λn(t) < ∞ for all
t’s, then superposition process
Π =
∞
n=1
Πn
is Poisson process with rate
λ(t) =
∞
n=1
λn(t)
18. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Simulation by thinning
Theorem (Lewis and Shedler, 1979)
Let
λ, Λ : R+
→ R+
be continuous functions such that λ(·) Λ(·). Let
τ1, τ2, · · · ,
be the increasing sequence of a Poisson process with rate Λ(·). For
all i, if τi is removed from the sequence with probability
1 − λ(t)/Λ(t)
then the remaining ˜τ1, ˜τ2, · · · form a non-homogeneous Poisson
process with rateλ(·)
19. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Extended generator
Definition
For D(L) set of measurable functions f : Z → R such that there
exists a measurable function h : Z → R with t → h(zt) Pz-a.s. for
each z ∈ Z and the process
Cf
t = f (zt) − f (z0) −
t
0
h(zs)ds
a local martingale. Then we write h = Lf and call (L, D(L)) the
extended generator of the process {zt}t 0.
20. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Extended Generator of PDMP
Theorem (Davis, 1993)
The generator, L, of above PDMP is, for f ∈ D(L)
Lf (z) = f (z) · Φ(z) + λ(z)
z
f (z ) − f (z) Q(dz |z)
Furthermore, µ(dz) is an invariant distribution of above PDMP, if
Lf (z)µ(dz) = 0, for all f ∈ D(L)
21. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
PDMP-based sampler
PDMP-based sampler is an auxiliary variable technique
Given target π(x),
1. introduce auxiliary variable V ∈ V along with a density π(v|x),
2. choose appropriate Φ, λ and Q
for π(x)π(v|x) to be unique invariant distribution of Markov process
22. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Bouncy Particle Sampler (Bouchard-Côté et al., 2018)
V = Rd , and π(v|x) = ϕ(v) for N(0, Id )
1. Deterministic dynamics :
dxt/dt = vt, dvt/dt = 0
2. Event occurrence rate : λ(x, v) = v, U(x) + + λref
3. Transition dynamics :
Q((dx , dv )|(x, v))
=
v, U(x) +
λ(x, v)
δx(dx )δR U(x)v(dv ) +
λref
λ(x, v)
δx(dx )ϕ(dv )
where R U(x)v = v − 2 U(x),v
U(x), U(x) U(x)
23. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Zig-Zag Sampler (Bierkens et al., 2016)
back to CS
24. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Zig-Zag Sampler (Bierkens et al., 2016)
V = {+1, −1}d , and π(v|x) ∼ Uniform({+1, −1}d )
1. Deterministic dynamics :
dxt/dt = vt, dvt/dt = 0
2. Event occurrence rate :
λ(x, v) = d
i=1 λi (x, v) = d
i=1 {vi i U(x)}+ + λref
i
3. Transition dynamics :
Q((dx , dv )|(x, v)) =
d
i=1
λi (x, v)
λ(x, v)
δx(dx )δFi v(dv )
where Fi operator that flips i-th component of v and keep others
unchanged
back to CS
25. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Continuous-time Hamiltonian Monte Carlo (Neal, 1999)
V = Rd , and π(v|x) = ϕ(v) ∼ N(0, Id )
1. Deterministic dynamics :
dxt/dt = vt, dvt/dt = − U(xt)
2. Event occurrence rate : λ(x, v) = λ0(x)
3. Transition dynamics :
Q((dx , dv )|(x, v)) = δx(dx )ϕ(dv )
26. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Continuous-time Riemannian Manifold HMC (Girolami
& Calderhead, 2011)
V = Rd , and π(v|x) = N(0, G(x)), the Hamiltonian is
H(x, v) = U(x) + 1/2vT
G(x)−1
v + 1/2 log(|G(x)|)
1. Deterministic dynamics :
dxt/dt = ∂H/∂v(xt, vt), dvt/dt = −∂H/∂x(xt, vt)
2. Event occurrence rate : λ(x, v) = λ0(x)
3. Transition dynamics :
Q((dx , dv )|(x, v)) = δx(dx )ϕ(dv |x )
27. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Randomized BPS
Define
a =
v, U(x)
U(x), U(x)
U(x), b = v − a
Regular BPS, move v = −a + b
Alternatives
1. Fearnhead et al. (2016) :
v ∼ Qx(dv |v) = max {0, −v , U(x) } dv
2. Wu and Robert (2017) : v = −a + b , where b is Gaussian
distributed over the space which is orthogonal to U(x) in Rd .
28. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
HMC-BPS (Vanetti et al., 2017)
ρ(x) ∝ exp{−V (x)} is a Gaussian approximation of the target π(x).
^H(x, v) = V (x) + 1/2vT
v, ˜U(x) = U(x) − V (x)
1. Deterministic dynamics :
dxt/dt = vt, dvt/dt = − V (xt)
2. Event occurrence rate : λ(x, v) = v, ˜U(x) + + λref
3. Transition dynamics :
Q((dx , dv )|(x, v))
=
v, ˜U(x) +
λ(x, v)
δx(dx )δR ˜U(x)v(dv ) +
λref
λ(x, v)
δx(dx )ϕ(dv )
29. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Discretization
1. Sherlock and Thiery (2017) considers delayed rejection
approach with only point-wise evaluations of target, by making
speed flip move once proposal involving flip in speed and drift
in variable of interest rejected. Also add random perturbation
for eergodicity, plus another perturbation based on a Brownian
argument. Requires calibration
2. Vanetti et al. (2017)
Benefit : bypassing the generation of inter-event time of
inhomogeneous Poisson processes.
30. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Versions of PDMP
Discretization
1. Sherlock and Thiery (2017)
2. Vanetti et al. (2017) unifies many threads and relates PDMP,
HMC, and discrete versions, with convergence results. Main
idea improves upon existing deterministic methods by
accounting for target. Borrows from earlier slice sampler idea
of Murrayet al. (AISTATS, 2010), exploiting exact Hamiltonian
dynamics for approximation to true target. Except that
bouncing avoids the slice step. Eight discrete BPS both correct
against target and do not simulating event times.
Benefit : bypassing the generation of inter-event time of
inhomogeneous Poisson processes.
31. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Coordinate Sampler
Outline
Background
Versions of PDMP
Coordinate Sampler
Numerical comparison
Conclusion
32. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Coordinate Sampler
Coordinate Sampler
In this generalisation of BPS, bounce becomes partly random in
direction orthogonal to gradient : V = {±e1, · · · , ±ed }, and
π(v|x) = ϕ(v) ∼ U(V)
1. Deterministic dynamics :
dxt/dt = vt, dvt/dt = 0
2. Event occurrence rate : λ(x, v) = v, U(x) + + λref
3. Transition dynamics :
Q((dx , dv )|(x, v)) =
v∗∈V
λ(x, −v∗)
λ(x)
δx(dx )δv∗ (dv )
where λ(x) = v∈V λ(x, v) = 2dλref + d
i=1
∂U(x)/∂xi
33. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Coordinate Sampler
Validation of Coordinate Sampler
Lf = xf (x, v), v + λ(x, v)
v ∈V
λ(x, −v )
λ(x)
f (x, v ) − f (x, v)
Theorem
For any positive λref > 0, the PDMP induced by CS enjoys
π(x)ϕ(v) as its unique invariant distribution, produced the
potential U is C1.
34. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Coordinate Sampler
Geometric Ergodicity of Coordinate Sampler
Assumptions : Assume U : Rd → R+ satisfy [Deligiannidis et al.,
2017]
A.1 ∂2U(x)/∂xi xj is locally Lipschitz continuous for all i, j,
A.2 U(x) π(dx) < ∞,
A.3 lim|x|→∞eU(x)/2/ U(x) > 0
A.4 V c0 for some positive constant c0.
Further conditions
C.1 lim|x|→∞ U(x) = ∞, lim|x|→∞ ∆U(x) α1 < ∞ and
λref >
√
8α1.
C.2 lim|x|→∞ U(x) = 2α2 > 0, lim|x|→∞ ∆U(x) = 0 and
λref < α2/14d
35. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Coordinate Sampler
Geometric Ergodicity of Coordinate Sampler
Lyapunov function for the Markov process induced by coordinate
sampler is
V (x, v) = eU(x)/2
/
√
λref+ U(x),−v +,
Theorem
Suppose A.1 − A.4 hold and one of the conditions C.1, C.2 holds,
then CS is V -uniformly ergodic.
36. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Coordinate Sampler
Coordinate Sampler vs Zig-Zag Sampler
1. The cardinality of V of CS is 2d, while that of ZS is 2d ;
2. Along each piecewise segment, CS only changes one
component of x, while ZS modifies all components at the
same time (con) ;
3. λCS = {vi i U(x)}+ + λref, if v = vi ei ; while
λZS = d
i =1 {vi i U(x)}+ + λref , if v = d
i =1 vi ei .
Generally speaking, λCS is much smaller than λZS (pro) ;
pro > con !
37. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Coordinate Sampler
Coordinate Sampler vs Zig-Zag Sampler
Suppose that
1. λ(x, v) of CS and λi (x, v) of Zig-Zag sampler have same scale.
2. Simulating first event time of Poisson process with rates
λ(x, v) of CS and λi (x, v) of ZS : same computation cost,
O(c).
O(dc) computation cost result in
1. ZS makes each component evolve with scale O( /d),
2. CS makes each component evolve O( )
on average.
38. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Numerical comparison
Outline
Background
Versions of PDMP
Coordinate Sampler
Numerical comparison
Conclusion
39. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Numerical comparison
Example 1 : Banana-shaped Distribution
π(x) ∝ exp −(x1 − 1)2
− κ(x2 − x2
1 )2
−2 −1 0 1 2 3 4 5
0.00.51.01.52.02.53.0
log2(κ)
RatioofESSpersecond
first component
second component
loglikelihood
40. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Numerical comparison
Example 2 : Multivariate Gaussian Distribution
π(x) ∝ exp −
1
2
xT
A−1
x
1. MVN1 : Aii = 1, for i = 1, · · · , d and Aij = 0.9 for i = j.
2. MVN2 : Aii = 1, fro i = 1, · · · , d and Aij = 0.9|i−j| for i = j.
41. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Numerical comparison
Example 2 : Multivariate Gaussian Distribution
20 40 60 80 100
468101214
MinESS
MeanESS
MaxESS
20 40 60 80 100
4567891011
MinESS
MeanESS
MaxESS
Figure – The lhs plot shows the results for MVN1 and the rhs for MVN2.
The x-axis indexes the dimension d of the distribution, and the y-axis the
efficiency ratios of CS over ZS in terms of minimal (red), mean (blue),
and maximal ESS(green) across the components.
42. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Numerical comparison
Example 2 : Bayesian Logistic Posterior
π(x) ∝
N
n=1
exp(tnxT rn)
1 + exp(xT rn)
0
50
100
150
MinESS MeanESS MaxESS
ESSpersecond
Type
CS
ZS
Figure – Comparison of CS versus ZS for the Bayesian logistic model :
the y-axis represents the ESS per second.
43. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Conclusion
Outline
Background
Versions of PDMP
Coordinate Sampler
Numerical comparison
Conclusion
44. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Conclusion
Forward
1. How to compare coordinate sampler and zigzag sampler
theoretically.
2. reparametrization
3. Riemannian manifold technique
45. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Conclusion
Forward
Trugarez ! †
†. Thank you !
46. Coordinate Sampler: A Non-Reversible Gibbs-like Sampler
Conclusion
Bierkens, J., Fearnhead, P., and Roberts, G. (2016). The zig-zag process and super-efficient sampling
for bayesian analysis of big data. arXiv preprint arXiv :1607.03188.
Bierkens, J., Roberts, G., and Zitt, P.-A. (2017). Ergodicity of the zigzag process. arXiv preprint
arXiv :1712.09875.
Bouchard-Côté, A., Vollmer, S. J., and Doucet, A. (2018). The bouncy particle sampler : a
non-reversible rejection-free markov chain monte carlo method. Journal of the American Statistical
Association, (to appear).
Davis, M. H. (1984). Piecewise-deterministic markov processes : A general class of non-diffusion
stochastic models. Journal of the Royal Statistical Society. Series B (Methodological), pages
353–388.
Davis, M. H. (1993). Markov Models & Optimization, volume 49. CRC Press.
Deligiannidis, G., Bouchard-Côté, A., and Doucet, A. (2017). Exponential ergodicity of the bouncy
particle sampler. arXiv preprint arXiv :1705.04579.
Fearnhead, P., Bierkens, J., Pollock, M., and Roberts, G. O. (2016). Piecewise deterministic markov
processes for continuous-time monte carlo. arXiv preprint arXiv :1611.07873.
Kingman, J. F. C. (1992). Poisson processes, volume 3. Clarendon Press.
Lewis, P. A. and Shedler, G. S. (1979). Simulation of nonhomogeneous poisson processes by thinning.
Naval Research Logistics (NRL), 26(3) :403–413.
Peters, E. A. and de With, G. (2012). Rejection-free monte carlo sampling for general potentials.
Physical Review E, 85(2) :026703.
Sherlock, C. and Thiery, A. H. (2017). A discrete bouncy particle sampler. arXiv preprint
arXiv :1707.05200.
Vanetti, P., Bouchard-Côté, A., Deligiannidis, G., and Doucet, A. (2017). Piecewise deterministic
markov chain monte carlo. arXiv preprint arXiv :1707.05296.
Wu, C. and Robert, C. P. (2017). Generalized bouncy particle sampler. arXiv preprint arXiv :1706.04781.