SlideShare a Scribd company logo
1 of 80
Download to read offline
Gibbs flow transport for Bayesian inference
Jeremy Heng,
Department of Statistics, Harvard University
Joint work with Arnaud Doucet (Oxford) & Yvo Pokern (UCL)
Computational Statistics and Molecular Simulation:
A Practical Cross-Fertilization
Casa Matem´atica Oaxaca (CMO)
13 November 2018
Jeremy Heng Flow transport 1/ 24
Problem specification
• Target distribution on Rd
π(dx) =
γ(x) dx
Z
where γ : Rd
→ R+ can be evaluated pointwise and
Z =
Rd
γ(x) dx
is unknown
Jeremy Heng Flow transport 2/ 24
Problem specification
• Target distribution on Rd
π(dx) =
γ(x) dx
Z
where γ : Rd
→ R+ can be evaluated pointwise and
Z =
Rd
γ(x) dx
is unknown
• Problem 1: Obtain consistent estimator of π(ϕ) := Rd ϕ(x) π(dx)
Jeremy Heng Flow transport 2/ 24
Problem specification
• Target distribution on Rd
π(dx) =
γ(x) dx
Z
where γ : Rd
→ R+ can be evaluated pointwise and
Z =
Rd
γ(x) dx
is unknown
• Problem 1: Obtain consistent estimator of π(ϕ) := Rd ϕ(x) π(dx)
• Problem 2: Obtain unbiased and consistent estimator of Z
Jeremy Heng Flow transport 2/ 24
Motivation: Bayesian computation
• Prior distribution π0 on unknown parameters of a model
Jeremy Heng Flow transport 3/ 24
Motivation: Bayesian computation
• Prior distribution π0 on unknown parameters of a model
• Likelihood function L : Rd
→ R+ of data y
Jeremy Heng Flow transport 3/ 24
Motivation: Bayesian computation
• Prior distribution π0 on unknown parameters of a model
• Likelihood function L : Rd
→ R+ of data y
• Bayes update gives posterior distribution on Rd
π(dx) =
π0(x)L(x) dx
Z
,
where Z = Rd π0(x)L(x) dx is the marginal likelihood of y
Jeremy Heng Flow transport 3/ 24
Motivation: Bayesian computation
• Prior distribution π0 on unknown parameters of a model
• Likelihood function L : Rd
→ R+ of data y
• Bayes update gives posterior distribution on Rd
π(dx) =
π0(x)L(x) dx
Z
,
where Z = Rd π0(x)L(x) dx is the marginal likelihood of y
• π(ϕ) and Z are typically intractable
Jeremy Heng Flow transport 3/ 24
Monte Carlo methods
• Typically sampling from π is intractable, so we rely on Markov
chain Monte Carlo (MCMC) methods
Jeremy Heng Flow transport 4/ 24
Monte Carlo methods
• Typically sampling from π is intractable, so we rely on Markov
chain Monte Carlo (MCMC) methods
• MCMC constructs a π-invariant Markov kernel
K : Rd
× B(Rd
) → [0, 1]
Jeremy Heng Flow transport 4/ 24
Monte Carlo methods
• Typically sampling from π is intractable, so we rely on Markov
chain Monte Carlo (MCMC) methods
• MCMC constructs a π-invariant Markov kernel
K : Rd
× B(Rd
) → [0, 1]
• Sample X0 ∼ π0 and iterate Xn ∼ K(Xn−1, ·) until convergence
Jeremy Heng Flow transport 4/ 24
Monte Carlo methods
• Typically sampling from π is intractable, so we rely on Markov
chain Monte Carlo (MCMC) methods
• MCMC constructs a π-invariant Markov kernel
K : Rd
× B(Rd
) → [0, 1]
• Sample X0 ∼ π0 and iterate Xn ∼ K(Xn−1, ·) until convergence
• MCMC can fail in practice, for e.g. when π is highly multi-modal
Jeremy Heng Flow transport 4/ 24
Annealed importance sampling
• If π0 and π are distant,
Jeremy Heng Flow transport 5/ 24
Annealed importance sampling
• If π0 and π are distant, define bridges
πλm
(dx) =
π0(x)L(x)λm
dx
Z(λm)
,
with 0 = λ0 < λ1 < . . . < λM = 1 so
that π1 = π
Jeremy Heng Flow transport 5/ 24
Annealed importance sampling
• If π0 and π are distant, define bridges
πλm
(dx) =
π0(x)L(x)λm
dx
Z(λm)
,
with 0 = λ0 < λ1 < . . . < λM = 1 so
that π1 = π
• Initialize X0 ∼ π0 and move Xm ∼ Km(Xm−1, ·) for m = 1, . . . , M,
where Km is πλm -invariant
Jeremy Heng Flow transport 5/ 24
Annealed importance sampling
• If π0 and π are distant, define bridges
πλm
(dx) =
π0(x)L(x)λm
dx
Z(λm)
,
with 0 = λ0 < λ1 < . . . < λM = 1 so
that π1 = π
• Initialize X0 ∼ π0 and move Xm ∼ Km(Xm−1, ·) for m = 1, . . . , M,
where Km is πλm -invariant
• Annealed importance sampling constructs w : (Rd
)M+1
→ R+ so
that
π(ϕ) =
E [ϕ(XM )w(X0:M )]
E [w(X0:M )]
, Z = E [w(X0:M )]
Jeremy Heng Flow transport 5/ 24
Annealed importance sampling
• If π0 and π are distant, define bridges
πλm
(dx) =
π0(x)L(x)λm
dx
Z(λm)
,
with 0 = λ0 < λ1 < . . . < λM = 1 so
that π1 = π
• Initialize X0 ∼ π0 and move Xm ∼ Km(Xm−1, ·) for m = 1, . . . , M,
where Km is πλm -invariant
• Annealed importance sampling constructs w : (Rd
)M+1
→ R+ so
that
π(ϕ) =
E [ϕ(XM )w(X0:M )]
E [w(X0:M )]
, Z = E [w(X0:M )]
• AIS (Neal, 2001) and SMC samplers (Del Moral et al., 2006) are
considered state-of-the-art in statistics and machine learning
Jeremy Heng Flow transport 5/ 24
Jarzynski nonequilibrium equality
• Consider M → ∞,
Jeremy Heng Flow transport 6/ 24
Jarzynski nonequilibrium equality
• Consider M → ∞, i.e. define the curve
of distribution {πt}t∈[0,1]
Jeremy Heng Flow transport 6/ 24
Jarzynski nonequilibrium equality
• Consider M → ∞, i.e. define the curve
of distribution {πt}t∈[0,1]
πt(dx) =
π0(x)L(x)λ(t)
dx
Z(t)
,
where λ : [0, 1] → [0, 1] is a strictly
increasing C1
function
Jeremy Heng Flow transport 6/ 24
Jarzynski nonequilibrium equality
• Consider M → ∞, i.e. define the curve
of distribution {πt}t∈[0,1]
πt(dx) =
π0(x)L(x)λ(t)
dx
Z(t)
,
where λ : [0, 1] → [0, 1] is a strictly
increasing C1
function
• Initialize X0 ∼ π0 and run time-inhomogenous Langevin dynamics
dXt =
1
2
log πt(Xt)dt + dWt, t ∈ [0, 1]
Jeremy Heng Flow transport 6/ 24
Jarzynski nonequilibrium equality
• Consider M → ∞, i.e. define the curve
of distribution {πt}t∈[0,1]
πt(dx) =
π0(x)L(x)λ(t)
dx
Z(t)
,
where λ : [0, 1] → [0, 1] is a strictly
increasing C1
function
• Initialize X0 ∼ π0 and run time-inhomogenous Langevin dynamics
dXt =
1
2
log πt(Xt)dt + dWt, t ∈ [0, 1]
• Jarzynski equality (Jarzynski, 1997; Crooks, 1998) constructs
w : C([0, 1], Rd
) → R+ so that
π(ϕ) =
E ϕ(X1)w(X[0,1])
E w(X[0,1])
, Z = E w(X[0,1])
Jeremy Heng Flow transport 6/ 24
Jarzynski nonequilibrium equality
• Consider M → ∞, i.e. define the curve
of distribution {πt}t∈[0,1]
πt(dx) =
π0(x)L(x)λ(t)
dx
Z(t)
,
where λ : [0, 1] → [0, 1] is a strictly
increasing C1
function
• Initialize X0 ∼ π0 and run time-inhomogenous Langevin dynamics
dXt =
1
2
log πt(Xt)dt + dWt, t ∈ [0, 1]
• Jarzynski equality (Jarzynski, 1997; Crooks, 1998) constructs
w : C([0, 1], Rd
) → R+ so that
π(ϕ) =
E ϕ(X1)w(X[0,1])
E w(X[0,1])
, Z = E w(X[0,1])
• To what extent is this state-of-the-art in molecular dynamics?
Jeremy Heng Flow transport 6/ 24
Optimal dynamics
• Dynamical lag Law(Xt) − πt impacts variance of estimators
Jeremy Heng Flow transport 7/ 24
Optimal dynamics
• Dynamical lag Law(Xt) − πt impacts variance of estimators
• Vaikuntanathan & Jarzynski (2011) considered adding drift
f : [0, 1] × Rd
→ Rd
to reduce lag
dXt = f (t, Xt)dt +
1
2
log πt(Xt)dt + dWt, t ∈ [0, 1], X0 ∼ π0
Jeremy Heng Flow transport 7/ 24
Optimal dynamics
• Dynamical lag Law(Xt) − πt impacts variance of estimators
• Vaikuntanathan & Jarzynski (2011) considered adding drift
f : [0, 1] × Rd
→ Rd
to reduce lag
dXt = f (t, Xt)dt +
1
2
log πt(Xt)dt + dWt, t ∈ [0, 1], X0 ∼ π0
• An optimal choice of f results in zero lag, i.e. Xt ∼ πt for t ∈ [0, 1],
and zero variance estimator of Z
Jeremy Heng Flow transport 7/ 24
Optimal dynamics
• Dynamical lag Law(Xt) − πt impacts variance of estimators
• Vaikuntanathan & Jarzynski (2011) considered adding drift
f : [0, 1] × Rd
→ Rd
to reduce lag
dXt = f (t, Xt)dt +
1
2
log πt(Xt)dt + dWt, t ∈ [0, 1], X0 ∼ π0
• An optimal choice of f results in zero lag, i.e. Xt ∼ πt for t ∈ [0, 1],
and zero variance estimator of Z
• Any optimal choice f satisfies Liouville PDE
− · (πt(x)f (t, x)) = ∂tπt(x)
Jeremy Heng Flow transport 7/ 24
Optimal dynamics
• Dynamical lag Law(Xt) − πt impacts variance of estimators
• Vaikuntanathan & Jarzynski (2011) considered adding drift
f : [0, 1] × Rd
→ Rd
to reduce lag
dXt = f (t, Xt)dt +
1
2
log πt(Xt)dt + dWt, t ∈ [0, 1], X0 ∼ π0
• An optimal choice of f results in zero lag, i.e. Xt ∼ πt for t ∈ [0, 1],
and zero variance estimator of Z
• Any optimal choice f satisfies Liouville PDE
− · (πt(x)f (t, x)) = ∂tπt(x)
• Zero lag also achieved by running deterministic dynamics
dXt = f (t, Xt)dt, X0 ∼ π0
Jeremy Heng Flow transport 7/ 24
Time evolution of distributions
• Time evolution of πt is given by
∂tπt(x) = λ (t) (log L(x) − It) πt(x),
where
It =
1
λ (t)
d
dt
log Z(t)
!
= Eπt [log L(Xt)] < ∞
Jeremy Heng Flow transport 8/ 24
Time evolution of distributions
• Time evolution of πt is given by
∂tπt(x) = λ (t) (log L(x) − It) πt(x),
where
It =
1
λ (t)
d
dt
log Z(t)
!
= Eπt [log L(Xt)] < ∞
• Integrating recovers path sampling (Gelman and Meng, 1998) or
thermodynamic integration (Kirkwood, 1935) identity
log
Z(1)
Z(0)
=
1
0
λ (t)It dt.
Jeremy Heng Flow transport 8/ 24
Liouville PDE
• Dynamics governed by ODE
dXt = f (t, Xt)dt, X0 ∼ π0
Jeremy Heng Flow transport 9/ 24
Liouville PDE
• Dynamics governed by ODE
dXt = f (t, Xt)dt, X0 ∼ π0
• For sufficiently regular f , ODE admits a unique solution
t → Xt, t ∈ [0, 1]
Jeremy Heng Flow transport 9/ 24
Liouville PDE
• Dynamics governed by ODE
dXt = f (t, Xt)dt, X0 ∼ π0
• For sufficiently regular f , ODE admits a unique solution
t → Xt, t ∈ [0, 1]
Jeremy Heng Flow transport 9/ 24
Liouville PDE
• Dynamics governed by ODE
dXt = f (t, Xt)dt, X0 ∼ π0
• For sufficiently regular f , ODE admits a unique solution
t → Xt, t ∈ [0, 1]
inducing a curve of distributions
{˜πt := Law(Xt), t ∈ [0, 1]}
Jeremy Heng Flow transport 9/ 24
Liouville PDE
• Dynamics governed by ODE
dXt = f (t, Xt)dt, X0 ∼ π0
• For sufficiently regular f , ODE admits a unique solution
t → Xt, t ∈ [0, 1]
inducing a curve of distributions
{˜πt := Law(Xt), t ∈ [0, 1]}
satisfying Liouville PDE
− · (˜πtf ) = ∂t ˜πt, ˜π0 = π0
Jeremy Heng Flow transport 9/ 24
Defining the flow transport problem
• Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation
− · (πtf ) = ∂tπt, (L)
for a drift f ... but not all solutions will work!
Jeremy Heng Flow transport 10/ 24
Defining the flow transport problem
• Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation
− · (πtf ) = ∂tπt, (L)
for a drift f ... but not all solutions will work!
• Validity relies on following result:
Jeremy Heng Flow transport 10/ 24
Defining the flow transport problem
• Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation
− · (πtf ) = ∂tπt, (L)
for a drift f ... but not all solutions will work!
• Validity relies on following result:
Theorem. Ambrosio et al. (2005)
Under the following assumptions:
Eulerian Liouville PDE ⇐⇒ Lagrangian ODE
Jeremy Heng Flow transport 10/ 24
Defining the flow transport problem
• Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation
− · (πtf ) = ∂tπt, (L)
for a drift f ... but not all solutions will work!
• Validity relies on following result:
Theorem. Ambrosio et al. (2005)
Under the following assumptions:
A1 f is locally Lipschitz;
Eulerian Liouville PDE ⇐⇒ Lagrangian ODE
Jeremy Heng Flow transport 10/ 24
Defining the flow transport problem
• Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation
− · (πtf ) = ∂tπt, (L)
for a drift f ... but not all solutions will work!
• Validity relies on following result:
Theorem. Ambrosio et al. (2005)
Under the following assumptions:
A1 f is locally Lipschitz;
A2
1
0 Rd |f (t, x)|πt(x) dx dt < ∞;
Eulerian Liouville PDE ⇐⇒ Lagrangian ODE
Jeremy Heng Flow transport 10/ 24
Defining the flow transport problem
• Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation
− · (πtf ) = ∂tπt, (L)
for a drift f ... but not all solutions will work!
• Validity relies on following result:
Theorem. Ambrosio et al. (2005)
Under the following assumptions:
A1 f is locally Lipschitz;
A2
1
0 Rd |f (t, x)|πt(x) dx dt < ∞;
Eulerian Liouville PDE ⇐⇒ Lagrangian ODE
• Define flow transport problem as solving Liouville (L) for f that
satisfies [A1] & [A2]
Jeremy Heng Flow transport 10/ 24
Ill-posedness and regularization
• Under-determined: consider πt = N((0, 0) , I2) for t ∈ [0, 1],
f (x1, x2) = (0, 0) and f (x1, x2) = (−x2, x1)
are both solutions
Jeremy Heng Flow transport 11/ 24
Ill-posedness and regularization
• Under-determined: consider πt = N((0, 0) , I2) for t ∈ [0, 1],
f (x1, x2) = (0, 0) and f (x1, x2) = (−x2, x1)
are both solutions
• Regularization: seek minimal kinetic energy solution
argminf
1
0 Rd
|f (t, x)|2
πt(x) dx dt : f solves Liouville
EL
=⇒ f ∗
= ϕ where − · (πt ϕ) = ∂tπt
Jeremy Heng Flow transport 11/ 24
Ill-posedness and regularization
• Under-determined: consider πt = N((0, 0) , I2) for t ∈ [0, 1],
f (x1, x2) = (0, 0) and f (x1, x2) = (−x2, x1)
are both solutions
• Regularization: seek minimal kinetic energy solution
argminf
1
0 Rd
|f (t, x)|2
πt(x) dx dt : f solves Liouville
EL
=⇒ f ∗
= ϕ where − · (πt ϕ) = ∂tπt
• Analytical solution available when distributions are (mixtures of)
Gaussians (Reich, 2012)
Jeremy Heng Flow transport 11/ 24
Flow transport problem on R
• Minimal kinetic energy solution
f (t, x) =
−
x
−∞
∂tπt(u) du
πt(x)
Jeremy Heng Flow transport 12/ 24
Flow transport problem on R
• Minimal kinetic energy solution
f (t, x) =
−
x
−∞
∂tπt(u) du
πt(x)
• Checking Liouville
− · (πtf ) = ∂x
x
−∞
∂tπt(u) du = ∂tπt(x)
Jeremy Heng Flow transport 12/ 24
Flow transport problem on R
• Minimal kinetic energy solution
f (t, x) =
−
x
−∞
∂tπt(u) du
πt(x)
• Checking Liouville
− · (πtf ) = ∂x
x
−∞
∂tπt(u) du = ∂tπt(x)
A1 For f to be locally Lipschitz, assume
π0, L ∈ C1
(R, R+) =⇒ f ∈ C1
([0, 1] × R, R)
Jeremy Heng Flow transport 12/ 24
Flow transport problem on R
• Minimal kinetic energy solution
f (t, x) =
−
x
−∞
∂tπt(u) du
πt(x)
• Checking Liouville
− · (πtf ) = ∂x
x
−∞
∂tπt(u) du = ∂tπt(x)
A1 For f to be locally Lipschitz, assume
π0, L ∈ C1
(R, R+) =⇒ f ∈ C1
([0, 1] × R, R)
A2 For integrability of
1
0 Rd |f (t, x)|πt(x) dx dt < ∞, necessarily
|πtf |(t, x) =
x
−∞
∂tπt(u) du → 0 as |x| → ∞
since
∞
−∞
∂tπt(u) du = 0
Jeremy Heng Flow transport 12/ 24
Flow transport problem on R
• Minimal kinetic energy solution
f (t, x) =
−
x
−∞
∂tπt(u) du
πt(x)
• Checking Liouville
− · (πtf ) = ∂x
x
−∞
∂tπt(u) du = ∂tπt(x)
A1 For f to be locally Lipschitz, assume
π0, L ∈ C1
(R, R+) =⇒ f ∈ C1
([0, 1] × R, R)
A2 For integrability of
1
0 Rd |f (t, x)|πt(x) dx dt < ∞, necessarily
|πtf |(t, x) =
x
−∞
∂tπt(u) du → 0 as |x| → ∞
since
∞
−∞
∂tπt(u) du = 0
• Optimality: f = ϕ holds trivially
Jeremy Heng Flow transport 12/ 24
Flow transport problem on R
• Re-write solution as
f (t, x) =
λ (t)It {Ft(x) − Ix
t /It}
πt(x)
where Ix
t = Eπt
[1(−∞,x] log L(Xt)] and Ft is CDF of πt
Jeremy Heng Flow transport 13/ 24
Flow transport problem on R
• Re-write solution as
f (t, x) =
λ (t)It {Ft(x) − Ix
t /It}
πt(x)
where Ix
t = Eπt
[1(−∞,x] log L(Xt)] and Ft is CDF of πt
• Speed is controlled by λ (t) and πt(x)
Jeremy Heng Flow transport 13/ 24
Flow transport problem on R
• Re-write solution as
f (t, x) =
λ (t)It {Ft(x) − Ix
t /It}
πt(x)
where Ix
t = Eπt
[1(−∞,x] log L(Xt)] and Ft is CDF of πt
• Speed is controlled by λ (t) and πt(x)
• Sign is given by difference between Ft(x) and Ix
t /It ∈ [0, 1]
Jeremy Heng Flow transport 13/ 24
Flow transport problem on Rd
, d ≥ 1
• Multivariate solution for d = 3
(πtf1)(t, x1:3) = −
x1
−∞
∂tπt(u1, x2, x3) du1
+ g1(t, x1)
∞
−∞
∂tπt(u1, x2, x3) du1
(πtf2)(t, x1:3) = − g1(t, x1)
∞
−∞
x2
−∞
∂tπt(u1, u2, x3) du1:2
+ g1(t, x1)g2(t, x2)
∞
−∞
∞
−∞
∂tπt(u1, u2, x3) du1:2
(πtf3)(t, x1:3) = − g1(t, x1)g2(t, x2)
∞
−∞
∞
−∞
x3
−∞
∂tπt(u1, u2, u3) du1:3
where g1, g2 ∈ C2
([0, 1] × R, [0, 1])
Jeremy Heng Flow transport 14/ 24
Flow transport problem on Rd
, d ≥ 1
• Checking Liouville
∂x1 (πtf1)(t, x1:3) = − ∂tπt(x1, x2, x3)
+ g1(t, x1)
∞
−∞
∂tπt(u1, x2, x3) du1
∂x2
(πtf2)(t, x1:3) = − g1(t, x1)
∞
−∞
∂tπt(u1, x2, x3) du1:2
+ g1(t, x1)g2(t, x2)
∞
−∞
∞
−∞
∂tπt(u1, u2, x3) du1:2
∂x3
(πtf3)(t, x1:3) = − g1(t, x1)g2(t, x2)
∞
−∞
∞
−∞
∂tπt(u1, u2, x3) du1:2
Jeremy Heng Flow transport 15/ 24
Flow transport problem on Rd
, d ≥ 1
• Checking Liouville
∂x1 (πtf1)(t, x1:3) = − ∂tπt(x1, x2, x3)
+ g1(t, x1)
∞
−∞
∂tπt(u1, x2, x3) du1
∂x2
(πtf2)(t, x1:3) = − g1(t, x1)
∞
−∞
∂tπt(u1, x2, x3) du1:2
+ g1(t, x1)g2(t, x2)
∞
−∞
∞
−∞
∂tπt(u1, u2, x3) du1:2
∂x3
(πtf3)(t, x1:3) = − g1(t, x1)g2(t, x2)
∞
−∞
∞
−∞
∂tπt(u1, u2, x3) du1:2
• Taking divergence gives telescopic sum
− · (πtf )(t, x1:3) = −
3
i=1
∂xi
(πtfi )(t, x1:3) = ∂tπt(x1:3)
Jeremy Heng Flow transport 15/ 24
Flow transport problem on Rd
, d ≥ 1
A1 For f to be locally Lipschitz, assume
π0, L ∈ C1
(Rd
, R+) =⇒ f ∈ C1
([0, 1] × Rd
, Rd
)
Jeremy Heng Flow transport 16/ 24
Flow transport problem on Rd
, d ≥ 1
A1 For f to be locally Lipschitz, assume
π0, L ∈ C1
(Rd
, R+) =⇒ f ∈ C1
([0, 1] × Rd
, Rd
)
A2 For integrability of
1
0 Rd |f (t, x)|πt(x) dx dt < ∞, necessarily
|πtf |(t, x)| → 0 as |x| → ∞
if {gi } are non-decreasing functions with tail behaviour
gi (t, xi ) → 0 as xi → −∞,
gi (t, xi ) → 1 as xi → ∞
Jeremy Heng Flow transport 16/ 24
Flow transport problem on Rd
, d ≥ 1
A1 For f to be locally Lipschitz, assume
π0, L ∈ C1
(Rd
, R+) =⇒ f ∈ C1
([0, 1] × Rd
, Rd
)
A2 For integrability of
1
0 Rd |f (t, x)|πt(x) dx dt < ∞, necessarily
|πtf |(t, x)| → 0 as |x| → ∞
if {gi } are non-decreasing functions with tail behaviour
gi (t, xi ) → 0 as xi → −∞,
gi (t, xi ) → 1 as xi → ∞
• Choosing gi (t, xi ) = Ft(xi ) as marginal CDF of πt allows f to
decouple if distributions are independent
Jeremy Heng Flow transport 16/ 24
Approximate Gibbs flow transport
• Solution involved integrals of increasing dimension as it tracks
increasing conditional distributions
πt(x1|x2:d ), πt(x2|3:d ), . . . , πt(xd ), xi ∈ R
Jeremy Heng Flow transport 17/ 24
Approximate Gibbs flow transport
• Solution involved integrals of increasing dimension as it tracks
increasing conditional distributions
πt(x1|x2:d ), πt(x2|3:d ), . . . , πt(xd ), xi ∈ R
• Trade-off accuracy for computational tractability: track full
conditional distributions
πt(xi |x−i ), xi ∈ Rdi
Jeremy Heng Flow transport 17/ 24
Approximate Gibbs flow transport
• Solution involved integrals of increasing dimension as it tracks
increasing conditional distributions
πt(x1|x2:d ), πt(x2|3:d ), . . . , πt(xd ), xi ∈ R
• Trade-off accuracy for computational tractability: track full
conditional distributions
πt(xi |x−i ), xi ∈ Rdi
• System of Liouville equations
− xi
· πt(xi |x−i )˜fi (t, x) = ∂tπt(xi |x−i ),
each defined on (0, 1) × Rdi
Jeremy Heng Flow transport 17/ 24
Approximate Gibbs flow transport
• For di = 1, solution is
˜fi (t, x) =
−
xi
−∞
∂tπt(ui |x−i ) dui
πt(xi |x−i )
Jeremy Heng Flow transport 18/ 24
Approximate Gibbs flow transport
• For di = 1, solution is
˜fi (t, x) =
−
xi
−∞
∂tπt(ui |x−i ) dui
πt(xi |x−i )
• If π0, L ∈ C1
(Rd
, R+) and lim|x|→∞ L(x) = 0, the ODE
dXt = ˜f (t, Xt)dt, X0 ∼ π0
admits a unique solution on [0, 1], referred to as Gibbs flow
Jeremy Heng Flow transport 18/ 24
Approximate Gibbs flow transport
• For di = 1, solution is
˜fi (t, x) =
−
xi
−∞
∂tπt(ui |x−i ) dui
πt(xi |x−i )
• If π0, L ∈ C1
(Rd
, R+) and lim|x|→∞ L(x) = 0, the ODE
dXt = ˜f (t, Xt)dt, X0 ∼ π0
admits a unique solution on [0, 1], referred to as Gibbs flow
• For di > 1, can often exploit analytical tractability of πt(xi |x−i ) to
solve for ˜fi (t, x); or apply multivariate extension
Jeremy Heng Flow transport 18/ 24
Approximate Gibbs flow transport
• For di = 1, solution is
˜fi (t, x) =
−
xi
−∞
∂tπt(ui |x−i ) dui
πt(xi |x−i )
• If π0, L ∈ C1
(Rd
, R+) and lim|x|→∞ L(x) = 0, the ODE
dXt = ˜f (t, Xt)dt, X0 ∼ π0
admits a unique solution on [0, 1], referred to as Gibbs flow
• For di > 1, can often exploit analytical tractability of πt(xi |x−i ) to
solve for ˜fi (t, x); or apply multivariate extension
• Otherwise, analogous to Metropolis-within-Gibbs, split into one
dimensional components and apply above
Jeremy Heng Flow transport 18/ 24
Error control
• Define local error
εt(x) = ∂tπt(x) + · (πt(x)˜f (t, x))
= ∂tπt(x) −
p
i=1
∂tπt(xi |x−i )πt(x−i )
Jeremy Heng Flow transport 19/ 24
Error control
• Define local error
εt(x) = ∂tπt(x) + · (πt(x)˜f (t, x))
= ∂tπt(x) −
p
i=1
∂tπt(xi |x−i )πt(x−i )
• Gibbs flow exploits local independence:
πt(x) =
p
i=1
πt(xi ) =⇒ εt(x) = 0
Jeremy Heng Flow transport 19/ 24
Error control
• Define local error
εt(x) = ∂tπt(x) + · (πt(x)˜f (t, x))
= ∂tπt(x) −
p
i=1
∂tπt(xi |x−i )πt(x−i )
• Gibbs flow exploits local independence:
πt(x) =
p
i=1
πt(xi ) =⇒ εt(x) = 0
• If Gibbs flow induces {˜πt}t∈[0,1] with ˜π0 = π0
˜πt − πt
2
L2 ≤ t
t
0
εu
2
L2 du · exp 1 +
t
0
· ˜f (u, ·) ∞ du
Jeremy Heng Flow transport 19/ 24
Numerical integration of Gibbs flow
• Previously, we considered the forward Euler scheme
Ym = Ym−1 + ∆t ˜f (tm−1, Ym−1) = Φm(Ym−1)
Jeremy Heng Flow transport 20/ 24
Numerical integration of Gibbs flow
• Previously, we considered the forward Euler scheme
Ym = Ym−1 + ∆t ˜f (tm−1, Ym−1) = Φm(Ym−1)
• To get Law(Ym), we need Jacobian determinant of Φm which
typically costs O(d3
) for di = 1
Jeremy Heng Flow transport 20/ 24
Numerical integration of Gibbs flow
• Previously, we considered the forward Euler scheme
Ym = Ym−1 + ∆t ˜f (tm−1, Ym−1) = Φm(Ym−1)
• To get Law(Ym), we need Jacobian determinant of Φm which
typically costs O(d3
) for di = 1
• In contrast, this scheme mimicking a systematic Gibbs scan
Ym[i] = Ym−1[i] + ∆t ˜f (tm−1, Ym[1 : i − 1], Ym−1[i : p])
Ym = Φm,d ◦ · · · ◦ Φm,1(Ym−1)
is also order one, and costs O(d)
Jeremy Heng Flow transport 20/ 24
Mixture modelling example
• Lack of identifiability induces π on R4
with 4! = 24 well-separated
and identical modes
Jeremy Heng Flow transport 21/ 24
Mixture modelling example
• Lack of identifiability induces π on R4
with 4! = 24 well-separated
and identical modes
• Gibbs flow approximation
t =0.0006
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
t =0.0058
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
t =0.0542
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
t =1.0000
x1
x2
−10 −5 0 5 10
−10
−5
0
5
10
Jeremy Heng Flow transport 21/ 24
Mixture modelling example
• Proportion of particles in each of the 24 modes
0.00
0.01
0.02
0.03
0.04
1 2 3 4 5 6 7 8 9 101112131415161718192021222324
Mode
Proportionofparticles
Jeremy Heng Flow transport 22/ 24
Mixture modelling example
• Proportion of particles in each of the 24 modes
0.00
0.01
0.02
0.03
0.04
1 2 3 4 5 6 7 8 9 101112131415161718192021222324
Mode
Proportionofparticles
• Pearson’s Chi-squared test for uniformity gives p-value of 0.85
Jeremy Heng Flow transport 22/ 24
Cox point process model
• Effective sample size % in dimension d
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 100
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 225
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 400
Jeremy Heng Flow transport 23/ 24
Cox point process model
• Effective sample size % in dimension d
• AIS: AIS with HMC moves
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 100
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 225
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 400
Jeremy Heng Flow transport 23/ 24
Cox point process model
• Effective sample size % in dimension d
• AIS: AIS with HMC moves
• GF-SIS: Gibbs flow
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 100
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 225
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 400
Jeremy Heng Flow transport 23/ 24
Cox point process model
• Effective sample size % in dimension d
• AIS: AIS with HMC moves
• GF-SIS: Gibbs flow
• GF-AIS: Gibbs flow with HMC moves
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 100
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 225
0
25
50
75
100
0.00 0.25 0.50 0.75 1.00
time
ESS%
method AIS GF−SIS GF−AIS
d = 400
Jeremy Heng Flow transport 23/ 24
End
• Heng, J., Doucet, A., & Pokern, Y. (2015). Gibbs Flow for
Approximate Transport with Applications to Bayesian Computation.
arXiv preprint arXiv:1509.08787.
• Updated article and R package coming soon!
Jeremy Heng Flow transport 24/ 24

More Related Content

What's hot

short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursGabriel Peyré
 
Tachyonic and Localy Equivalent Canonical Lagrangians - The Polynomial Case -
Tachyonic and Localy Equivalent Canonical Lagrangians - The Polynomial Case -Tachyonic and Localy Equivalent Canonical Lagrangians - The Polynomial Case -
Tachyonic and Localy Equivalent Canonical Lagrangians - The Polynomial Case -Milan Milošević
 
3 geotop-summer-school2011
3 geotop-summer-school20113 geotop-summer-school2011
3 geotop-summer-school2011Riccardo Rigon
 
Some recent developments in the traffic flow variational formulation
Some recent developments in the traffic flow variational formulationSome recent developments in the traffic flow variational formulation
Some recent developments in the traffic flow variational formulationGuillaume Costeseque
 
Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...
Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...
Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...Hiroyuki KASAI
 
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...Hiroyuki KASAI
 
Representation formula for traffic flow estimation on a network
Representation formula for traffic flow estimation on a networkRepresentation formula for traffic flow estimation on a network
Representation formula for traffic flow estimation on a networkGuillaume Costeseque
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
An application of the hyperfunction theory to numerical integration
An application of the hyperfunction theory to numerical integrationAn application of the hyperfunction theory to numerical integration
An application of the hyperfunction theory to numerical integrationHidenoriOgata
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelsun peiyuan
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 

What's hot (20)

short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
Mesh Processing Course : Active Contours
Mesh Processing Course : Active ContoursMesh Processing Course : Active Contours
Mesh Processing Course : Active Contours
 
Tachyonic and Localy Equivalent Canonical Lagrangians - The Polynomial Case -
Tachyonic and Localy Equivalent Canonical Lagrangians - The Polynomial Case -Tachyonic and Localy Equivalent Canonical Lagrangians - The Polynomial Case -
Tachyonic and Localy Equivalent Canonical Lagrangians - The Polynomial Case -
 
3 geotop-summer-school2011
3 geotop-summer-school20113 geotop-summer-school2011
3 geotop-summer-school2011
 
Some recent developments in the traffic flow variational formulation
Some recent developments in the traffic flow variational formulationSome recent developments in the traffic flow variational formulation
Some recent developments in the traffic flow variational formulation
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...
Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...
Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
ICML2016: Low-rank tensor completion: a Riemannian manifold preconditioning a...
 
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
 
Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
Representation formula for traffic flow estimation on a network
Representation formula for traffic flow estimation on a networkRepresentation formula for traffic flow estimation on a network
Representation formula for traffic flow estimation on a network
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
An application of the hyperfunction theory to numerical integration
An application of the hyperfunction theory to numerical integrationAn application of the hyperfunction theory to numerical integration
An application of the hyperfunction theory to numerical integration
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.model
 
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 

Similar to Gibbs flow transport for Bayesian inference

Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo JeremyHeng10
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloUnbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloJeremyHeng10
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloJeremyHeng10
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”IOSRJM
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloJeremyHeng10
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on SamplingDon Sheehy
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas EberleBigMC
 
Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Haruki Nishimura
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportGabriel Peyré
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_DefenseTeja Turk
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018JeremyHeng10
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 
On estimating the integrated co volatility using
On estimating the integrated co volatility usingOn estimating the integrated co volatility using
On estimating the integrated co volatility usingkkislas
 

Similar to Gibbs flow transport for Bayesian inference (20)

Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloUnbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”
 
Unbiased Markov chain Monte Carlo
Unbiased Markov chain Monte CarloUnbiased Markov chain Monte Carlo
Unbiased Markov chain Monte Carlo
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
Anomalous Transport
Anomalous TransportAnomalous Transport
Anomalous Transport
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas Eberle
 
Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)
 
Proximal Splitting and Optimal Transport
Proximal Splitting and Optimal TransportProximal Splitting and Optimal Transport
Proximal Splitting and Optimal Transport
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
On estimating the integrated co volatility using
On estimating the integrated co volatility usingOn estimating the integrated co volatility using
On estimating the integrated co volatility using
 
Bird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysisBird’s-eye view of Gaussian harmonic analysis
Bird’s-eye view of Gaussian harmonic analysis
 

More from JeremyHeng10

Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Sequential Monte Carlo Algorithms for Agent-based Models of Disease TransmissionSequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Sequential Monte Carlo Algorithms for Agent-based Models of Disease TransmissionJeremyHeng10
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
 
Statistical inference for agent-based SIS and SIR models
Statistical inference for agent-based SIS and SIR modelsStatistical inference for agent-based SIS and SIR models
Statistical inference for agent-based SIS and SIR modelsJeremyHeng10
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo JeremyHeng10
 

More from JeremyHeng10 (7)

Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Sequential Monte Carlo Algorithms for Agent-based Models of Disease TransmissionSequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
Sequential Monte Carlo Algorithms for Agent-based Models of Disease Transmission
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
 
Statistical inference for agent-based SIS and SIR models
Statistical inference for agent-based SIS and SIR modelsStatistical inference for agent-based SIS and SIR models
Statistical inference for agent-based SIS and SIR models
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo
 

Recently uploaded

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Recently uploaded (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

Gibbs flow transport for Bayesian inference

  • 1. Gibbs flow transport for Bayesian inference Jeremy Heng, Department of Statistics, Harvard University Joint work with Arnaud Doucet (Oxford) & Yvo Pokern (UCL) Computational Statistics and Molecular Simulation: A Practical Cross-Fertilization Casa Matem´atica Oaxaca (CMO) 13 November 2018 Jeremy Heng Flow transport 1/ 24
  • 2. Problem specification • Target distribution on Rd π(dx) = γ(x) dx Z where γ : Rd → R+ can be evaluated pointwise and Z = Rd γ(x) dx is unknown Jeremy Heng Flow transport 2/ 24
  • 3. Problem specification • Target distribution on Rd π(dx) = γ(x) dx Z where γ : Rd → R+ can be evaluated pointwise and Z = Rd γ(x) dx is unknown • Problem 1: Obtain consistent estimator of π(ϕ) := Rd ϕ(x) π(dx) Jeremy Heng Flow transport 2/ 24
  • 4. Problem specification • Target distribution on Rd π(dx) = γ(x) dx Z where γ : Rd → R+ can be evaluated pointwise and Z = Rd γ(x) dx is unknown • Problem 1: Obtain consistent estimator of π(ϕ) := Rd ϕ(x) π(dx) • Problem 2: Obtain unbiased and consistent estimator of Z Jeremy Heng Flow transport 2/ 24
  • 5. Motivation: Bayesian computation • Prior distribution π0 on unknown parameters of a model Jeremy Heng Flow transport 3/ 24
  • 6. Motivation: Bayesian computation • Prior distribution π0 on unknown parameters of a model • Likelihood function L : Rd → R+ of data y Jeremy Heng Flow transport 3/ 24
  • 7. Motivation: Bayesian computation • Prior distribution π0 on unknown parameters of a model • Likelihood function L : Rd → R+ of data y • Bayes update gives posterior distribution on Rd π(dx) = π0(x)L(x) dx Z , where Z = Rd π0(x)L(x) dx is the marginal likelihood of y Jeremy Heng Flow transport 3/ 24
  • 8. Motivation: Bayesian computation • Prior distribution π0 on unknown parameters of a model • Likelihood function L : Rd → R+ of data y • Bayes update gives posterior distribution on Rd π(dx) = π0(x)L(x) dx Z , where Z = Rd π0(x)L(x) dx is the marginal likelihood of y • π(ϕ) and Z are typically intractable Jeremy Heng Flow transport 3/ 24
  • 9. Monte Carlo methods • Typically sampling from π is intractable, so we rely on Markov chain Monte Carlo (MCMC) methods Jeremy Heng Flow transport 4/ 24
  • 10. Monte Carlo methods • Typically sampling from π is intractable, so we rely on Markov chain Monte Carlo (MCMC) methods • MCMC constructs a π-invariant Markov kernel K : Rd × B(Rd ) → [0, 1] Jeremy Heng Flow transport 4/ 24
  • 11. Monte Carlo methods • Typically sampling from π is intractable, so we rely on Markov chain Monte Carlo (MCMC) methods • MCMC constructs a π-invariant Markov kernel K : Rd × B(Rd ) → [0, 1] • Sample X0 ∼ π0 and iterate Xn ∼ K(Xn−1, ·) until convergence Jeremy Heng Flow transport 4/ 24
  • 12. Monte Carlo methods • Typically sampling from π is intractable, so we rely on Markov chain Monte Carlo (MCMC) methods • MCMC constructs a π-invariant Markov kernel K : Rd × B(Rd ) → [0, 1] • Sample X0 ∼ π0 and iterate Xn ∼ K(Xn−1, ·) until convergence • MCMC can fail in practice, for e.g. when π is highly multi-modal Jeremy Heng Flow transport 4/ 24
  • 13. Annealed importance sampling • If π0 and π are distant, Jeremy Heng Flow transport 5/ 24
  • 14. Annealed importance sampling • If π0 and π are distant, define bridges πλm (dx) = π0(x)L(x)λm dx Z(λm) , with 0 = λ0 < λ1 < . . . < λM = 1 so that π1 = π Jeremy Heng Flow transport 5/ 24
  • 15. Annealed importance sampling • If π0 and π are distant, define bridges πλm (dx) = π0(x)L(x)λm dx Z(λm) , with 0 = λ0 < λ1 < . . . < λM = 1 so that π1 = π • Initialize X0 ∼ π0 and move Xm ∼ Km(Xm−1, ·) for m = 1, . . . , M, where Km is πλm -invariant Jeremy Heng Flow transport 5/ 24
  • 16. Annealed importance sampling • If π0 and π are distant, define bridges πλm (dx) = π0(x)L(x)λm dx Z(λm) , with 0 = λ0 < λ1 < . . . < λM = 1 so that π1 = π • Initialize X0 ∼ π0 and move Xm ∼ Km(Xm−1, ·) for m = 1, . . . , M, where Km is πλm -invariant • Annealed importance sampling constructs w : (Rd )M+1 → R+ so that π(ϕ) = E [ϕ(XM )w(X0:M )] E [w(X0:M )] , Z = E [w(X0:M )] Jeremy Heng Flow transport 5/ 24
  • 17. Annealed importance sampling • If π0 and π are distant, define bridges πλm (dx) = π0(x)L(x)λm dx Z(λm) , with 0 = λ0 < λ1 < . . . < λM = 1 so that π1 = π • Initialize X0 ∼ π0 and move Xm ∼ Km(Xm−1, ·) for m = 1, . . . , M, where Km is πλm -invariant • Annealed importance sampling constructs w : (Rd )M+1 → R+ so that π(ϕ) = E [ϕ(XM )w(X0:M )] E [w(X0:M )] , Z = E [w(X0:M )] • AIS (Neal, 2001) and SMC samplers (Del Moral et al., 2006) are considered state-of-the-art in statistics and machine learning Jeremy Heng Flow transport 5/ 24
  • 18. Jarzynski nonequilibrium equality • Consider M → ∞, Jeremy Heng Flow transport 6/ 24
  • 19. Jarzynski nonequilibrium equality • Consider M → ∞, i.e. define the curve of distribution {πt}t∈[0,1] Jeremy Heng Flow transport 6/ 24
  • 20. Jarzynski nonequilibrium equality • Consider M → ∞, i.e. define the curve of distribution {πt}t∈[0,1] πt(dx) = π0(x)L(x)λ(t) dx Z(t) , where λ : [0, 1] → [0, 1] is a strictly increasing C1 function Jeremy Heng Flow transport 6/ 24
  • 21. Jarzynski nonequilibrium equality • Consider M → ∞, i.e. define the curve of distribution {πt}t∈[0,1] πt(dx) = π0(x)L(x)λ(t) dx Z(t) , where λ : [0, 1] → [0, 1] is a strictly increasing C1 function • Initialize X0 ∼ π0 and run time-inhomogenous Langevin dynamics dXt = 1 2 log πt(Xt)dt + dWt, t ∈ [0, 1] Jeremy Heng Flow transport 6/ 24
  • 22. Jarzynski nonequilibrium equality • Consider M → ∞, i.e. define the curve of distribution {πt}t∈[0,1] πt(dx) = π0(x)L(x)λ(t) dx Z(t) , where λ : [0, 1] → [0, 1] is a strictly increasing C1 function • Initialize X0 ∼ π0 and run time-inhomogenous Langevin dynamics dXt = 1 2 log πt(Xt)dt + dWt, t ∈ [0, 1] • Jarzynski equality (Jarzynski, 1997; Crooks, 1998) constructs w : C([0, 1], Rd ) → R+ so that π(ϕ) = E ϕ(X1)w(X[0,1]) E w(X[0,1]) , Z = E w(X[0,1]) Jeremy Heng Flow transport 6/ 24
  • 23. Jarzynski nonequilibrium equality • Consider M → ∞, i.e. define the curve of distribution {πt}t∈[0,1] πt(dx) = π0(x)L(x)λ(t) dx Z(t) , where λ : [0, 1] → [0, 1] is a strictly increasing C1 function • Initialize X0 ∼ π0 and run time-inhomogenous Langevin dynamics dXt = 1 2 log πt(Xt)dt + dWt, t ∈ [0, 1] • Jarzynski equality (Jarzynski, 1997; Crooks, 1998) constructs w : C([0, 1], Rd ) → R+ so that π(ϕ) = E ϕ(X1)w(X[0,1]) E w(X[0,1]) , Z = E w(X[0,1]) • To what extent is this state-of-the-art in molecular dynamics? Jeremy Heng Flow transport 6/ 24
  • 24. Optimal dynamics • Dynamical lag Law(Xt) − πt impacts variance of estimators Jeremy Heng Flow transport 7/ 24
  • 25. Optimal dynamics • Dynamical lag Law(Xt) − πt impacts variance of estimators • Vaikuntanathan & Jarzynski (2011) considered adding drift f : [0, 1] × Rd → Rd to reduce lag dXt = f (t, Xt)dt + 1 2 log πt(Xt)dt + dWt, t ∈ [0, 1], X0 ∼ π0 Jeremy Heng Flow transport 7/ 24
  • 26. Optimal dynamics • Dynamical lag Law(Xt) − πt impacts variance of estimators • Vaikuntanathan & Jarzynski (2011) considered adding drift f : [0, 1] × Rd → Rd to reduce lag dXt = f (t, Xt)dt + 1 2 log πt(Xt)dt + dWt, t ∈ [0, 1], X0 ∼ π0 • An optimal choice of f results in zero lag, i.e. Xt ∼ πt for t ∈ [0, 1], and zero variance estimator of Z Jeremy Heng Flow transport 7/ 24
  • 27. Optimal dynamics • Dynamical lag Law(Xt) − πt impacts variance of estimators • Vaikuntanathan & Jarzynski (2011) considered adding drift f : [0, 1] × Rd → Rd to reduce lag dXt = f (t, Xt)dt + 1 2 log πt(Xt)dt + dWt, t ∈ [0, 1], X0 ∼ π0 • An optimal choice of f results in zero lag, i.e. Xt ∼ πt for t ∈ [0, 1], and zero variance estimator of Z • Any optimal choice f satisfies Liouville PDE − · (πt(x)f (t, x)) = ∂tπt(x) Jeremy Heng Flow transport 7/ 24
  • 28. Optimal dynamics • Dynamical lag Law(Xt) − πt impacts variance of estimators • Vaikuntanathan & Jarzynski (2011) considered adding drift f : [0, 1] × Rd → Rd to reduce lag dXt = f (t, Xt)dt + 1 2 log πt(Xt)dt + dWt, t ∈ [0, 1], X0 ∼ π0 • An optimal choice of f results in zero lag, i.e. Xt ∼ πt for t ∈ [0, 1], and zero variance estimator of Z • Any optimal choice f satisfies Liouville PDE − · (πt(x)f (t, x)) = ∂tπt(x) • Zero lag also achieved by running deterministic dynamics dXt = f (t, Xt)dt, X0 ∼ π0 Jeremy Heng Flow transport 7/ 24
  • 29. Time evolution of distributions • Time evolution of πt is given by ∂tπt(x) = λ (t) (log L(x) − It) πt(x), where It = 1 λ (t) d dt log Z(t) ! = Eπt [log L(Xt)] < ∞ Jeremy Heng Flow transport 8/ 24
  • 30. Time evolution of distributions • Time evolution of πt is given by ∂tπt(x) = λ (t) (log L(x) − It) πt(x), where It = 1 λ (t) d dt log Z(t) ! = Eπt [log L(Xt)] < ∞ • Integrating recovers path sampling (Gelman and Meng, 1998) or thermodynamic integration (Kirkwood, 1935) identity log Z(1) Z(0) = 1 0 λ (t)It dt. Jeremy Heng Flow transport 8/ 24
  • 31. Liouville PDE • Dynamics governed by ODE dXt = f (t, Xt)dt, X0 ∼ π0 Jeremy Heng Flow transport 9/ 24
  • 32. Liouville PDE • Dynamics governed by ODE dXt = f (t, Xt)dt, X0 ∼ π0 • For sufficiently regular f , ODE admits a unique solution t → Xt, t ∈ [0, 1] Jeremy Heng Flow transport 9/ 24
  • 33. Liouville PDE • Dynamics governed by ODE dXt = f (t, Xt)dt, X0 ∼ π0 • For sufficiently regular f , ODE admits a unique solution t → Xt, t ∈ [0, 1] Jeremy Heng Flow transport 9/ 24
  • 34. Liouville PDE • Dynamics governed by ODE dXt = f (t, Xt)dt, X0 ∼ π0 • For sufficiently regular f , ODE admits a unique solution t → Xt, t ∈ [0, 1] inducing a curve of distributions {˜πt := Law(Xt), t ∈ [0, 1]} Jeremy Heng Flow transport 9/ 24
  • 35. Liouville PDE • Dynamics governed by ODE dXt = f (t, Xt)dt, X0 ∼ π0 • For sufficiently regular f , ODE admits a unique solution t → Xt, t ∈ [0, 1] inducing a curve of distributions {˜πt := Law(Xt), t ∈ [0, 1]} satisfying Liouville PDE − · (˜πtf ) = ∂t ˜πt, ˜π0 = π0 Jeremy Heng Flow transport 9/ 24
  • 36. Defining the flow transport problem • Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation − · (πtf ) = ∂tπt, (L) for a drift f ... but not all solutions will work! Jeremy Heng Flow transport 10/ 24
  • 37. Defining the flow transport problem • Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation − · (πtf ) = ∂tπt, (L) for a drift f ... but not all solutions will work! • Validity relies on following result: Jeremy Heng Flow transport 10/ 24
  • 38. Defining the flow transport problem • Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation − · (πtf ) = ∂tπt, (L) for a drift f ... but not all solutions will work! • Validity relies on following result: Theorem. Ambrosio et al. (2005) Under the following assumptions: Eulerian Liouville PDE ⇐⇒ Lagrangian ODE Jeremy Heng Flow transport 10/ 24
  • 39. Defining the flow transport problem • Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation − · (πtf ) = ∂tπt, (L) for a drift f ... but not all solutions will work! • Validity relies on following result: Theorem. Ambrosio et al. (2005) Under the following assumptions: A1 f is locally Lipschitz; Eulerian Liouville PDE ⇐⇒ Lagrangian ODE Jeremy Heng Flow transport 10/ 24
  • 40. Defining the flow transport problem • Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation − · (πtf ) = ∂tπt, (L) for a drift f ... but not all solutions will work! • Validity relies on following result: Theorem. Ambrosio et al. (2005) Under the following assumptions: A1 f is locally Lipschitz; A2 1 0 Rd |f (t, x)|πt(x) dx dt < ∞; Eulerian Liouville PDE ⇐⇒ Lagrangian ODE Jeremy Heng Flow transport 10/ 24
  • 41. Defining the flow transport problem • Set ˜πt = πt, for t ∈ [0, 1] and solve Liouville equation − · (πtf ) = ∂tπt, (L) for a drift f ... but not all solutions will work! • Validity relies on following result: Theorem. Ambrosio et al. (2005) Under the following assumptions: A1 f is locally Lipschitz; A2 1 0 Rd |f (t, x)|πt(x) dx dt < ∞; Eulerian Liouville PDE ⇐⇒ Lagrangian ODE • Define flow transport problem as solving Liouville (L) for f that satisfies [A1] & [A2] Jeremy Heng Flow transport 10/ 24
  • 42. Ill-posedness and regularization • Under-determined: consider πt = N((0, 0) , I2) for t ∈ [0, 1], f (x1, x2) = (0, 0) and f (x1, x2) = (−x2, x1) are both solutions Jeremy Heng Flow transport 11/ 24
  • 43. Ill-posedness and regularization • Under-determined: consider πt = N((0, 0) , I2) for t ∈ [0, 1], f (x1, x2) = (0, 0) and f (x1, x2) = (−x2, x1) are both solutions • Regularization: seek minimal kinetic energy solution argminf 1 0 Rd |f (t, x)|2 πt(x) dx dt : f solves Liouville EL =⇒ f ∗ = ϕ where − · (πt ϕ) = ∂tπt Jeremy Heng Flow transport 11/ 24
  • 44. Ill-posedness and regularization • Under-determined: consider πt = N((0, 0) , I2) for t ∈ [0, 1], f (x1, x2) = (0, 0) and f (x1, x2) = (−x2, x1) are both solutions • Regularization: seek minimal kinetic energy solution argminf 1 0 Rd |f (t, x)|2 πt(x) dx dt : f solves Liouville EL =⇒ f ∗ = ϕ where − · (πt ϕ) = ∂tπt • Analytical solution available when distributions are (mixtures of) Gaussians (Reich, 2012) Jeremy Heng Flow transport 11/ 24
  • 45. Flow transport problem on R • Minimal kinetic energy solution f (t, x) = − x −∞ ∂tπt(u) du πt(x) Jeremy Heng Flow transport 12/ 24
  • 46. Flow transport problem on R • Minimal kinetic energy solution f (t, x) = − x −∞ ∂tπt(u) du πt(x) • Checking Liouville − · (πtf ) = ∂x x −∞ ∂tπt(u) du = ∂tπt(x) Jeremy Heng Flow transport 12/ 24
  • 47. Flow transport problem on R • Minimal kinetic energy solution f (t, x) = − x −∞ ∂tπt(u) du πt(x) • Checking Liouville − · (πtf ) = ∂x x −∞ ∂tπt(u) du = ∂tπt(x) A1 For f to be locally Lipschitz, assume π0, L ∈ C1 (R, R+) =⇒ f ∈ C1 ([0, 1] × R, R) Jeremy Heng Flow transport 12/ 24
  • 48. Flow transport problem on R • Minimal kinetic energy solution f (t, x) = − x −∞ ∂tπt(u) du πt(x) • Checking Liouville − · (πtf ) = ∂x x −∞ ∂tπt(u) du = ∂tπt(x) A1 For f to be locally Lipschitz, assume π0, L ∈ C1 (R, R+) =⇒ f ∈ C1 ([0, 1] × R, R) A2 For integrability of 1 0 Rd |f (t, x)|πt(x) dx dt < ∞, necessarily |πtf |(t, x) = x −∞ ∂tπt(u) du → 0 as |x| → ∞ since ∞ −∞ ∂tπt(u) du = 0 Jeremy Heng Flow transport 12/ 24
  • 49. Flow transport problem on R • Minimal kinetic energy solution f (t, x) = − x −∞ ∂tπt(u) du πt(x) • Checking Liouville − · (πtf ) = ∂x x −∞ ∂tπt(u) du = ∂tπt(x) A1 For f to be locally Lipschitz, assume π0, L ∈ C1 (R, R+) =⇒ f ∈ C1 ([0, 1] × R, R) A2 For integrability of 1 0 Rd |f (t, x)|πt(x) dx dt < ∞, necessarily |πtf |(t, x) = x −∞ ∂tπt(u) du → 0 as |x| → ∞ since ∞ −∞ ∂tπt(u) du = 0 • Optimality: f = ϕ holds trivially Jeremy Heng Flow transport 12/ 24
  • 50. Flow transport problem on R • Re-write solution as f (t, x) = λ (t)It {Ft(x) − Ix t /It} πt(x) where Ix t = Eπt [1(−∞,x] log L(Xt)] and Ft is CDF of πt Jeremy Heng Flow transport 13/ 24
  • 51. Flow transport problem on R • Re-write solution as f (t, x) = λ (t)It {Ft(x) − Ix t /It} πt(x) where Ix t = Eπt [1(−∞,x] log L(Xt)] and Ft is CDF of πt • Speed is controlled by λ (t) and πt(x) Jeremy Heng Flow transport 13/ 24
  • 52. Flow transport problem on R • Re-write solution as f (t, x) = λ (t)It {Ft(x) − Ix t /It} πt(x) where Ix t = Eπt [1(−∞,x] log L(Xt)] and Ft is CDF of πt • Speed is controlled by λ (t) and πt(x) • Sign is given by difference between Ft(x) and Ix t /It ∈ [0, 1] Jeremy Heng Flow transport 13/ 24
  • 53. Flow transport problem on Rd , d ≥ 1 • Multivariate solution for d = 3 (πtf1)(t, x1:3) = − x1 −∞ ∂tπt(u1, x2, x3) du1 + g1(t, x1) ∞ −∞ ∂tπt(u1, x2, x3) du1 (πtf2)(t, x1:3) = − g1(t, x1) ∞ −∞ x2 −∞ ∂tπt(u1, u2, x3) du1:2 + g1(t, x1)g2(t, x2) ∞ −∞ ∞ −∞ ∂tπt(u1, u2, x3) du1:2 (πtf3)(t, x1:3) = − g1(t, x1)g2(t, x2) ∞ −∞ ∞ −∞ x3 −∞ ∂tπt(u1, u2, u3) du1:3 where g1, g2 ∈ C2 ([0, 1] × R, [0, 1]) Jeremy Heng Flow transport 14/ 24
  • 54. Flow transport problem on Rd , d ≥ 1 • Checking Liouville ∂x1 (πtf1)(t, x1:3) = − ∂tπt(x1, x2, x3) + g1(t, x1) ∞ −∞ ∂tπt(u1, x2, x3) du1 ∂x2 (πtf2)(t, x1:3) = − g1(t, x1) ∞ −∞ ∂tπt(u1, x2, x3) du1:2 + g1(t, x1)g2(t, x2) ∞ −∞ ∞ −∞ ∂tπt(u1, u2, x3) du1:2 ∂x3 (πtf3)(t, x1:3) = − g1(t, x1)g2(t, x2) ∞ −∞ ∞ −∞ ∂tπt(u1, u2, x3) du1:2 Jeremy Heng Flow transport 15/ 24
  • 55. Flow transport problem on Rd , d ≥ 1 • Checking Liouville ∂x1 (πtf1)(t, x1:3) = − ∂tπt(x1, x2, x3) + g1(t, x1) ∞ −∞ ∂tπt(u1, x2, x3) du1 ∂x2 (πtf2)(t, x1:3) = − g1(t, x1) ∞ −∞ ∂tπt(u1, x2, x3) du1:2 + g1(t, x1)g2(t, x2) ∞ −∞ ∞ −∞ ∂tπt(u1, u2, x3) du1:2 ∂x3 (πtf3)(t, x1:3) = − g1(t, x1)g2(t, x2) ∞ −∞ ∞ −∞ ∂tπt(u1, u2, x3) du1:2 • Taking divergence gives telescopic sum − · (πtf )(t, x1:3) = − 3 i=1 ∂xi (πtfi )(t, x1:3) = ∂tπt(x1:3) Jeremy Heng Flow transport 15/ 24
  • 56. Flow transport problem on Rd , d ≥ 1 A1 For f to be locally Lipschitz, assume π0, L ∈ C1 (Rd , R+) =⇒ f ∈ C1 ([0, 1] × Rd , Rd ) Jeremy Heng Flow transport 16/ 24
  • 57. Flow transport problem on Rd , d ≥ 1 A1 For f to be locally Lipschitz, assume π0, L ∈ C1 (Rd , R+) =⇒ f ∈ C1 ([0, 1] × Rd , Rd ) A2 For integrability of 1 0 Rd |f (t, x)|πt(x) dx dt < ∞, necessarily |πtf |(t, x)| → 0 as |x| → ∞ if {gi } are non-decreasing functions with tail behaviour gi (t, xi ) → 0 as xi → −∞, gi (t, xi ) → 1 as xi → ∞ Jeremy Heng Flow transport 16/ 24
  • 58. Flow transport problem on Rd , d ≥ 1 A1 For f to be locally Lipschitz, assume π0, L ∈ C1 (Rd , R+) =⇒ f ∈ C1 ([0, 1] × Rd , Rd ) A2 For integrability of 1 0 Rd |f (t, x)|πt(x) dx dt < ∞, necessarily |πtf |(t, x)| → 0 as |x| → ∞ if {gi } are non-decreasing functions with tail behaviour gi (t, xi ) → 0 as xi → −∞, gi (t, xi ) → 1 as xi → ∞ • Choosing gi (t, xi ) = Ft(xi ) as marginal CDF of πt allows f to decouple if distributions are independent Jeremy Heng Flow transport 16/ 24
  • 59. Approximate Gibbs flow transport • Solution involved integrals of increasing dimension as it tracks increasing conditional distributions πt(x1|x2:d ), πt(x2|3:d ), . . . , πt(xd ), xi ∈ R Jeremy Heng Flow transport 17/ 24
  • 60. Approximate Gibbs flow transport • Solution involved integrals of increasing dimension as it tracks increasing conditional distributions πt(x1|x2:d ), πt(x2|3:d ), . . . , πt(xd ), xi ∈ R • Trade-off accuracy for computational tractability: track full conditional distributions πt(xi |x−i ), xi ∈ Rdi Jeremy Heng Flow transport 17/ 24
  • 61. Approximate Gibbs flow transport • Solution involved integrals of increasing dimension as it tracks increasing conditional distributions πt(x1|x2:d ), πt(x2|3:d ), . . . , πt(xd ), xi ∈ R • Trade-off accuracy for computational tractability: track full conditional distributions πt(xi |x−i ), xi ∈ Rdi • System of Liouville equations − xi · πt(xi |x−i )˜fi (t, x) = ∂tπt(xi |x−i ), each defined on (0, 1) × Rdi Jeremy Heng Flow transport 17/ 24
  • 62. Approximate Gibbs flow transport • For di = 1, solution is ˜fi (t, x) = − xi −∞ ∂tπt(ui |x−i ) dui πt(xi |x−i ) Jeremy Heng Flow transport 18/ 24
  • 63. Approximate Gibbs flow transport • For di = 1, solution is ˜fi (t, x) = − xi −∞ ∂tπt(ui |x−i ) dui πt(xi |x−i ) • If π0, L ∈ C1 (Rd , R+) and lim|x|→∞ L(x) = 0, the ODE dXt = ˜f (t, Xt)dt, X0 ∼ π0 admits a unique solution on [0, 1], referred to as Gibbs flow Jeremy Heng Flow transport 18/ 24
  • 64. Approximate Gibbs flow transport • For di = 1, solution is ˜fi (t, x) = − xi −∞ ∂tπt(ui |x−i ) dui πt(xi |x−i ) • If π0, L ∈ C1 (Rd , R+) and lim|x|→∞ L(x) = 0, the ODE dXt = ˜f (t, Xt)dt, X0 ∼ π0 admits a unique solution on [0, 1], referred to as Gibbs flow • For di > 1, can often exploit analytical tractability of πt(xi |x−i ) to solve for ˜fi (t, x); or apply multivariate extension Jeremy Heng Flow transport 18/ 24
  • 65. Approximate Gibbs flow transport • For di = 1, solution is ˜fi (t, x) = − xi −∞ ∂tπt(ui |x−i ) dui πt(xi |x−i ) • If π0, L ∈ C1 (Rd , R+) and lim|x|→∞ L(x) = 0, the ODE dXt = ˜f (t, Xt)dt, X0 ∼ π0 admits a unique solution on [0, 1], referred to as Gibbs flow • For di > 1, can often exploit analytical tractability of πt(xi |x−i ) to solve for ˜fi (t, x); or apply multivariate extension • Otherwise, analogous to Metropolis-within-Gibbs, split into one dimensional components and apply above Jeremy Heng Flow transport 18/ 24
  • 66. Error control • Define local error εt(x) = ∂tπt(x) + · (πt(x)˜f (t, x)) = ∂tπt(x) − p i=1 ∂tπt(xi |x−i )πt(x−i ) Jeremy Heng Flow transport 19/ 24
  • 67. Error control • Define local error εt(x) = ∂tπt(x) + · (πt(x)˜f (t, x)) = ∂tπt(x) − p i=1 ∂tπt(xi |x−i )πt(x−i ) • Gibbs flow exploits local independence: πt(x) = p i=1 πt(xi ) =⇒ εt(x) = 0 Jeremy Heng Flow transport 19/ 24
  • 68. Error control • Define local error εt(x) = ∂tπt(x) + · (πt(x)˜f (t, x)) = ∂tπt(x) − p i=1 ∂tπt(xi |x−i )πt(x−i ) • Gibbs flow exploits local independence: πt(x) = p i=1 πt(xi ) =⇒ εt(x) = 0 • If Gibbs flow induces {˜πt}t∈[0,1] with ˜π0 = π0 ˜πt − πt 2 L2 ≤ t t 0 εu 2 L2 du · exp 1 + t 0 · ˜f (u, ·) ∞ du Jeremy Heng Flow transport 19/ 24
  • 69. Numerical integration of Gibbs flow • Previously, we considered the forward Euler scheme Ym = Ym−1 + ∆t ˜f (tm−1, Ym−1) = Φm(Ym−1) Jeremy Heng Flow transport 20/ 24
  • 70. Numerical integration of Gibbs flow • Previously, we considered the forward Euler scheme Ym = Ym−1 + ∆t ˜f (tm−1, Ym−1) = Φm(Ym−1) • To get Law(Ym), we need Jacobian determinant of Φm which typically costs O(d3 ) for di = 1 Jeremy Heng Flow transport 20/ 24
  • 71. Numerical integration of Gibbs flow • Previously, we considered the forward Euler scheme Ym = Ym−1 + ∆t ˜f (tm−1, Ym−1) = Φm(Ym−1) • To get Law(Ym), we need Jacobian determinant of Φm which typically costs O(d3 ) for di = 1 • In contrast, this scheme mimicking a systematic Gibbs scan Ym[i] = Ym−1[i] + ∆t ˜f (tm−1, Ym[1 : i − 1], Ym−1[i : p]) Ym = Φm,d ◦ · · · ◦ Φm,1(Ym−1) is also order one, and costs O(d) Jeremy Heng Flow transport 20/ 24
  • 72. Mixture modelling example • Lack of identifiability induces π on R4 with 4! = 24 well-separated and identical modes Jeremy Heng Flow transport 21/ 24
  • 73. Mixture modelling example • Lack of identifiability induces π on R4 with 4! = 24 well-separated and identical modes • Gibbs flow approximation t =0.0006 x1 x2 −10 −5 0 5 10 −10 −5 0 5 10 t =0.0058 x1 x2 −10 −5 0 5 10 −10 −5 0 5 10 t =0.0542 x1 x2 −10 −5 0 5 10 −10 −5 0 5 10 t =1.0000 x1 x2 −10 −5 0 5 10 −10 −5 0 5 10 Jeremy Heng Flow transport 21/ 24
  • 74. Mixture modelling example • Proportion of particles in each of the 24 modes 0.00 0.01 0.02 0.03 0.04 1 2 3 4 5 6 7 8 9 101112131415161718192021222324 Mode Proportionofparticles Jeremy Heng Flow transport 22/ 24
  • 75. Mixture modelling example • Proportion of particles in each of the 24 modes 0.00 0.01 0.02 0.03 0.04 1 2 3 4 5 6 7 8 9 101112131415161718192021222324 Mode Proportionofparticles • Pearson’s Chi-squared test for uniformity gives p-value of 0.85 Jeremy Heng Flow transport 22/ 24
  • 76. Cox point process model • Effective sample size % in dimension d 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 100 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 225 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 400 Jeremy Heng Flow transport 23/ 24
  • 77. Cox point process model • Effective sample size % in dimension d • AIS: AIS with HMC moves 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 100 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 225 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 400 Jeremy Heng Flow transport 23/ 24
  • 78. Cox point process model • Effective sample size % in dimension d • AIS: AIS with HMC moves • GF-SIS: Gibbs flow 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 100 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 225 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 400 Jeremy Heng Flow transport 23/ 24
  • 79. Cox point process model • Effective sample size % in dimension d • AIS: AIS with HMC moves • GF-SIS: Gibbs flow • GF-AIS: Gibbs flow with HMC moves 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 100 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 225 0 25 50 75 100 0.00 0.25 0.50 0.75 1.00 time ESS% method AIS GF−SIS GF−AIS d = 400 Jeremy Heng Flow transport 23/ 24
  • 80. End • Heng, J., Doucet, A., & Pokern, Y. (2015). Gibbs Flow for Approximate Transport with Applications to Bayesian Computation. arXiv preprint arXiv:1509.08787. • Updated article and R package coming soon! Jeremy Heng Flow transport 24/ 24