SlideShare a Scribd company logo
Inference in generative models using the
Wasserstein distance
Christian P. Robert
(Paris Dauphine PSL & Warwick U.)
joint work with E. Bernton (Harvard), P.E. Jacob
(Harvard), and M. Gerber (Bristol)
INI, July 2017
Outline
1 ABC and distance between samples
2 Wasserstein distance and computational aspects
3 Asymptotics
4 Handling time series
Outline
1 ABC and distance between
samples
2 Wasserstein distance and
computational aspects
3 Asymptotics
4 Handling time series
Generative model
Assumption of a data-generating distribution µ
(n)
for data
y1:n = y1, . . . , yn ∈ Yn
Parametric generative model
M = {µ
(n)
θ : θ ∈ H}
such that sampling z1:n from µ
(n)
θ is feasible
Prior distribution π(θ) available as density and generative model
Goal: inference on parameters θ given observations y1:n
Generic ABC
Basic (summary-less) ABC posterior with density
(θ, z1:n) ∼ π(θ)
µ
(n)
θ 1 ( y1:n − z1:n < ε)
Yn 1 ( y1:n − z1:n < ε) dz1:n
.
and ABC marginal
qε
(θ) = Yn
n
i=1 µ(dzi|θ)1 ( y1:n − z1:n < ε)
Yn 1 ( y1:n − z1:n < ε) dz1:n
unbiasedly estimated by π(θ)1 ( y1:n − z1:n < ε) where
z1:n ∼ µ
(n)
θ
Reminder: ABC-posterior goes to posterior as ε → 0
Summary statistics
Since random variable y1:n − z1:n may have large variance,
{ y1:n − z1:n < ε}
gets rare as ε → 0
When using
|η(y1:n) − η(z1:n) < ε
based on (insufficient) summary statistic η, variance and
dimension decrease but q-likelihood differs from likelihood
Arbitrariness and impact of summaries
[X et al., 2011; Fearnhead & Prangle, 2012; Li & Fearnhead, 2016]
Distances between samples
Aim: Use alternate distances D between samples y1:n and z1:n
such that
D(y1:n, z1:n)
has smaller variance than
y1:n − z1:n
while resulting ABC-posterior still converges to posterior when
ε → 0
ABC with order statistics
For univariate i.i.d. observations, order statistics are sufficient
1. sort observed and generated samples y1:n and z1:n
2. compute
yσy(1:n) − zσz(1:n) p =
n
i=1
|y(i) − z(i)|p
1/p
for order p (e.g. 1 or 2) instead of
y1:n − z1:n =
n
i=1
|yi − zi|p
1/p
Toy example
Data-generating process given by
Y1:1000 ∼ Gamma(10, 5)
Hypothesised model:
M = {N(µ, σ2
) : (µ, σ) ∈ R × R+
}
Prior µ ∼ N(0, 1) and σ ∼ Gamma(2, 1)
ABC-Rejection sampling: 105 draws, using Euclidean
distance, on sorted vs. unsorted samples and keeping 102
draws with smallest distances
Toy example
0
2
4
6
1.6 2.0 2.4
µ
density
distance: L2 sorted L2
0
2
4
0.00 0.25 0.50 0.75 1.00
σ
density
distance: L2 sorted L2
ABC with transport distances
Distance
D(y1:n, z1:n) =
1
n
n
i=1
|y(i) − z(i)|p
1/p
is p-Wasserstein distance between empirical
ˆµn(dy) =
1
n
n
i=1
δyi (dy) and ˆνn(dy) =
1
n
n
i=1
δzi (dy)
Rather than comparing samples as vectors, alternative
representation as empirical distributions
c Novel ABC methods, which does not require summary
statistics, available with multivariate or dependent data
Outline
1 ABC and distance between
samples
2 Wasserstein distance and
computational aspects
3 Asymptotics
4 Handling time series
Wasserstein distance
Ground distance ρ (x, y) → ρ(x, y) on Y along with order p ≥ 1
leads to Wasserstein distance between µ, ν ∈ Pp(Y), p ≥ 1:
Wp(µ, ν) = inf
γ∈Γ(µ,ν) Y×Y
ρ(x, y)p
dγ(x, y)
1/p
where Γ(µ, ν) set of joints with marginals µ, ν and Pp(Y) set of
distributions µ for which Eµ[ρ(Y, y0)p] < ∞ for one y0
Wasserstein distance: univariate case
1 23 123
Two empirical distributions on R with 3 atoms:
1
3
3
i=1
δyi and
1
3
3
j=1
δzj
Matrix of pair-wise costs:



ρ(y1, z1)p ρ(y1, z2)p ρ(y1, z3)p
ρ(y2, z1)p ρ(y2, z2)p ρ(y2, z3)p
ρ(y3, z1)p ρ(y3, z2)p ρ(y3, z3)p



Wasserstein distance: univariate case
1 23 123
Joint distribution
γ =



γ1,1 γ1,2 γ1,3
γ2,1 γ2,2 γ2,3
γ3,1 γ3,2 γ3,3


 ,
with marginals (1/3 1/3 1/3), corresponds to a transport
cost of
3
i,j=1
γi,jρ(yi, zj)p
Wasserstein distance: univariate case
1 23 123
Optimal assignment:
y1 ←→ z3
y2 ←→ z1
y3 ←→ z2
corresponds to choice of joint distribution γ
γ =



0 0 1/3
1/3 0 0
0 1/3 0


 ,
with marginals (1/3 1/3 1/3) and cost 3
i=1 ρ(y(i), z(i))p
Wasserstein distance: bivariate case
1
2
3
1
2
3
there exists a joint distribution γ minimizing cost
3
i,j=1
γi,jρ(yi, zj)p
with various algorithms to compute/approximate it
Wasserstein distance
also called Earth Mover, Gini, Mallows, Kantorovich, &tc.
can be defined between arbitrary distributions
actual distance
statistically sound:
ˆθn = arginf
θ∈H
Wp(
1
n
n
i=1
δyi , µθ) → θ = arginf
θ∈H
Wp(µ , µθ),
at rate
√
n, plus asymptotic distribution
[Bassetti & al., 2006]
Computing Wasserstein distances
when Y = R, computing Wp(µn, νn) costs O(n log n)
when Y = Rd, exact calculation is O(n3 log n)
For entropic regularization, with δ > 0
Wp,δ(ˆµn, ˆνn)p
= inf
γ∈Γ(ˆµn,ˆνn) Y×Y
ρ(x, y)p
dγ(x, y) − δH(γ) ,
where H(γ) = − ij γij log γij entropy of γ, existence of
Sinkhorn’s algorithm that yields cost O(n2)
[Genevay et al., 2016]
Computing Wasserstein distances
other approximations, like Ye et al. (2016) using Simulated
Annealing
regularized Wasserstein not a distance, but as δ goes to
zero,
Wp,δ(ˆµn, ˆνn) → Wp(ˆµn, ˆνn)
for δ small enough, Wp,δ(ˆµn, ˆνn) = Wp(ˆµn, ˆνn) (exact)
in practice, δ 5% of median(ρ(yi, zj)p)i,j
[Cuturi, 2013]
Computing Wasserstein distances
cost linear in the dimension of
observations
distance calculations
model-independent
other transport distances
calculated in only O(n log n),
based on different
generalizations of “sorting”
Computing Wasserstein distances
cost linear in the dimension of
observations
distance calculations
model-independent
other transport distances
calculated in only O(n log n),
based on different
generalizations of “sorting”
[source: Wikipedia]
Transport distance via Hilbert curve
Sort multivariate data via space-filling curves, like Hilbert
space-filling curve
H : [0, 1] → [0, 1]d
continuous mapping, with pseudo-inverse
h : [0, 1]d
→ [0, 1]
Compute order σ ∈ S of projected points, and compute
hp(y1:n, z1:n) =
1
n
n
i=1
ρ(yσy(i), zσz(i))p
1/p
,
called Hilbert ordering transport distance
[Gerber & Chopin, 2015]
Transport distance via Hilbert curve
Fact: hp(y1:n, z1:n) is a distance between empirical distributions
with n atoms, for all p ≥ 1
Hence, hp(y1:n, z1:n) = 0 if and only if y1:n = zσ(1:n), for a
permutation σ, with hope to retrieve posterior as ε → 0
Cost O(n log n) per calculation, but encompassing sampler
might be more costly than with regularized or exact
Wasserstein distances
Adaptive SMC with r-hit moves
Start with ε0 = ∞
1. ∀k ∈ 1 : N, sample θk
0 ∼ π(θ) (prior)
2. ∀k ∈ 1 : N, sample zk
1:n from µ
(n)
θk
3. ∀k ∈ 1 : N, compute the distance dk
0 = D(y1:n, zk
1:n)
4. based on (θk
0)N
k=1 and (dk
0)N
k=1, compute ε1, s.t.
resampled particles have at least 50% unique values
At step t ≥ 1, weight wk
t ∝ 1(dk
t−1 ≤ εt), resample, and perform
r-hit MCMC with adaptive independent proposals
[Lee, 2012]
Toy example: bivariate Normal
100 observations from bivariate Normal with variance 1 and
covariance 0.55
Compare WABC with ABC versions based on raw Euclidean
distance and Euclidean distance between (sufficient) sample
means on 106 model simulations.
Toy example: multivariate Normal
θ6 θ7 θ8
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
0
1
2
3
4
value
density
distance: H2 W2
ABC-posterior densities, after 5 × 105 distance calculations
Posterior density in dashed lines
Outline
1 ABC and distance between
samples
2 Wasserstein distance and
computational aspects
3 Asymptotics
4 Handling time series
Minimum Wasserstein estimator
Under some assumptions, ˆθn exists and
lim sup
n→∞
argmin
θ∈H
Wp(ˆµn, µθ) ⊂ argmin
θ∈H
Wp(µ , µθ),
almost surely
In particular, if θ = argminθ∈H Wp(µ , µθ) is unique, then
ˆθn
a.s.
−→ θ
Minimum Wasserstein estimator
Under stronger assumptions, incl. well-specification,
dim(Y) = 1, and p = 1
√
n(ˆθn − θ )
w
−→ argmin
u∈H R
|G (t) − u, D (t) |dt,
where G is a µ -Brownian bridge, and D ∈ (L1(R))dθ satisfies
R
|Fθ(t) − F (t) − θ − θ , D (t) |dt = o( θ − θ H)
[Pollard, 1980; del Barrio et al., 1999, 2005]
Hard to use for confidence intervals, but the bootstrap is an
intersting alternative.
Toy Gamma example
Data-generating process:
y1:n
i.i.d.
∼ Gamma(10, 5)
Model:
M = {N(µ, σ2
) : (µ, σ) ∈ R × R+
}
MLE converges to
argminθ∈H KL(µ , µθ)
MLE top, MWE bottom
Toy Gamma example
Data-generating process:
y1:n
i.i.d.
∼ Gamma(10, 5)
Model:
M = {N(µ, σ2
) : (µ, σ) ∈ R × R+
}
MLE converges to
argminθ∈H KL(µ , µθ)
0
20
40
60
1.96 1.98 2.00 2.02
µ
density
0
20
40
60
80
0.60 0.61 0.62 0.63 0.64 0.65
σ
density
n = 10, 000, MLE dashed,
MWE solid
Asymptotics of WABC-posterior
For fixed n and ε → 0, for i.i.d. data, assuming
sup
y,θ
µθ(y) < ∞
y → µθ(y) continuous, the Wasserstein ABC-posterior converges
to the posterior irrespective of the choice of ρ and p
Concentration as both n → ∞ and ε → ε = inf Wp(µ , µθ)
[Frazier et al., 2016]
Concentration on neighborhoods of θ = arginf Wp(µ , µθ),
whereas posterior concentrates on arginf KL(µ , µθ)
Asymptotics of WABC-posterior
Rate of posterior concentration (and choice of εn) relates to
rate of convergence of the distance, e.g.
µ
(n)
θ Wp µθ,
1
n
n
i=1
δzi > u ≤ c(θ)fn(u),
[Fournier & Guillin, 2015]
Rate of convergence decays with the dimension of Y,
Asymptotics of WABC-posterior
Rate of posterior concentration (and choice of εn) relates to
rate of convergence of the distance, e.g.
µ
(n)
θ Wp µθ,
1
n
n
i=1
δzi > u ≤ c(θ)fn(u),
[Fournier & Guillin, 2015]
Rate of convergence decays with the dimension of Y, fast or
slow, depending on moments of µθ and choice of p
Toy example: univariate
Data-generating process:
Y1:n
i.i.d.
∼ Gamma(10, 5), n = 100,
with mean 2 and standard
deviation ≈ 0.63 0.0
0.1
0.2
0 10 20
step
ε
Evolution of εt against t, the step
index in the adaptive SMC sampler
Toy example: univariate
Data-generating process:
Y1:n
i.i.d.
∼ Gamma(10, 5), n = 100,
with mean 2 and standard
deviation ≈ 0.63 0.0
0.1
0.2
0 10 20
step
ε
Evolution of εt against t, the step
index in the adaptive SMC sampler
Theoretical model:
M = {N(µ, σ2
) : (µ, σ) ∈ R × R+
}
Prior: µ ∼ N(0, 1) and
σ ∼ Gamma(2, 1)
0
3
6
9
0 1 2
σ
density
Convergence to posterior
Toy example: multivariate
Observation space: Y = R10
Model: Yi ∼ N10(θ, S), for
i ∈ 1 : 100, where Skj = 0.5|k−j| for
k, j ∈ 1 : 10
Data generated with θ defined as
a 10-vector, chosen by drawing
standard Normal variables
Prior: θi ∼ N(0, 1) for all i ∈ 1 : 10
Toy example: multivariate
Observation space: Y = R10
Model: Yi ∼ N10(θ, S), for
i ∈ 1 : 100, where Skj = 0.5|k−j| for
k, j ∈ 1 : 10
Data generated with θ defined as
a 10-vector, chosen by drawing
standard Normal variables
Prior: θi ∼ N(0, 1) for all i ∈ 1 : 10
1e+04
1e+06
0 10 20 30
step
#distancescalculated
Evolution of number of distances
calculated up to t, step index in
adaptive SMC sampler
Toy example: multivariate
Observation space: Y = R10
Model: Yi ∼ N10(θ, S), for
i ∈ 1 : 100, where Skj = 0.5|k−j| for
k, j ∈ 1 : 10
Data generated with θ defined as
a 10-vector, chosen by drawing
standard Normal variables
Prior: θi ∼ N(0, 1) for all i ∈ 1 : 10
Bivariate marginal of (θ3, θ7)
approximated by SMC sampler
(posterior contours in yellow, θ
indicated by black lines)
Outline
1 ABC and distance between
samples
2 Wasserstein distance and
computational aspects
3 Asymptotics
4 Handling time series
Method 1 (0?): ignoring dependencies
Consider only marginal distribution
AR(1) example:
y0 ∼ N 0,
σ2
1 − φ2
, yt+1 ∼ N φyt, σ2
.
Marginally
yt ∼ N 0, σ2
/ 1 − φ2
which identifies σ2/ 1 − φ2 but
not (φ, σ)
Produces a region of plausible
parameters
For n = 1, 000, generated with
φ = 0.7 and log σ = 0.9
Method 2: delay reconstruction
Introduce ˜yt = (yt, yt−1, . . . , yt−k) for lag k, and treat ˜yt as data
AR(1) example: ˜yt = (yt, yt−1)
with marginal distribution
N
0
0
,
σ2
1 − φ2
1 φ
φ 1
,
identifies both φ and σ
Related to Takens’ theorem in
dynamical systems literature
For
n = 1, 000, generated with φ = 0.7
and log σ = 0.9.
Method 3: residual reconstruction
Time series y1:n deterministic transform of θ and w1:n
Given y1:n and θ, reconstruct w1:n
Cosine example:
yt = A cos(2πωt + φ) + σwt
wt ∼ N(0, 1)
wt = (yt − A cos(2πωt + φ))/σ
and calculate distance between
reconstructed w1:n and Normal
sample
[Mengersen et al., 2013]
Method 3: residual reconstruction
Time series y1:n deterministic transform of θ and w1:n
Given y1:n and θ, reconstruct w1:n
Cosine example:
yt = A cos(2πωt + φ) + σwt
n = 500 observations with
ω = 1/80, φ = π/4,
σ = 1, A = 2, under prior
U[0, 0.1] and U[0, 2π] for ω and φ,
and N(0, 1) for log σ, log A
−2.5
0.0
2.5
0 100 200 300 400 500
time
y
Cosine example with delay reconstruction, k = 3
0
10
20
30
40
0.000 0.025 0.050 0.075 0.100
ω
density
0.00
0.05
0.10
0.15
0.20
0 2 4 6
φ
density
0
2
4
6
−4 −2 0 2
log(σ)
density
0.0
2.5
5.0
7.5
−2 0 2
log(A)
density
and with residual and delay reconstructions,
k = 1
0
2000
4000
6000
0.000 0.025 0.050 0.075 0.100
ω
density
0.0
0.5
1.0
1.5
2.0
0 2 4 6
φ
density
0.0
2.5
5.0
7.5
10.0
−2 0 2
log(σ)
density
0
1
2
3
−2 0 2
log(A)
density
Method 4: curve matching
Define ˜yt = (t, yt) for all t ∈ 1 : n.
Define a metric on {1, . . . , T} × Y. e.g.
ρ((t, yt), (s, zs)) = λ|t − s| + |yt − zs|, for some λ
Use distance D to compare ˜y1:n = (t, yt)n
t=1 and ˜z1:n = (s, zs)n
s=1
If λ 1, optimal transport will associate each (t, yt) with (t, zt)
We get back the “vector” norm y1:n − z1:n .
If λ = 0, time indices are ignored: identical to Method 1
For any λ > 0, there is hope to retrieve the posterior as ε → 0
Cosine example
(a) Posteriors of ω. (b) Posteriors of φ.
(c) Posteriors of log(σ). (d) Posteriors of log(A).
Discussion
Transport metrics can be used to compare samples
Various complexities from n3 log n to n2 to n log n
Asymptotic guarantees as ε → 0 for fixed n,
and as n → ∞ and ε → ε
Various ways of applying these ideas to time series and
maybe spatial data, maps, images. . . ?

More Related Content

What's hot

ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
Christian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
Christian Robert
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
Christian Robert
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
Christian Robert
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
Christian Robert
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
Christian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
Christian Robert
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
Christian Robert
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
Christian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
Christian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
Christian Robert
 

What's hot (20)

ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
 

Similar to Inference in generative models using the Wasserstein distance [[INI]

ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
Christian Robert
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
Nested sampling
Nested samplingNested sampling
Nested sampling
Christian Robert
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
The Statistical and Applied Mathematical Sciences Institute
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
Christian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
Frank Nielsen
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Christian Robert
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
Jae-kwang Kim
 
Probability and Statistics Exam Help
Probability and Statistics Exam HelpProbability and Statistics Exam Help
Probability and Statistics Exam Help
Statistics Exam Help
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
Matt Moores
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Frank Nielsen
 
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Dahua Lin
 
Econometrics 2.pptx
Econometrics 2.pptxEconometrics 2.pptx
Econometrics 2.pptx
fuad80
 
Probability and Statistics Assignment Help
Probability and Statistics Assignment HelpProbability and Statistics Assignment Help
Probability and Statistics Assignment Help
Statistics Assignment Help
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
Christian Robert
 

Similar to Inference in generative models using the Wasserstein distance [[INI] (20)

ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Probability and Statistics Exam Help
Probability and Statistics Exam HelpProbability and Statistics Exam Help
Probability and Statistics Exam Help
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
Voronoi diagrams in information geometry:  Statistical Voronoi diagrams and ...
 
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
Appendix to MLPI Lecture 2 - Monte Carlo Methods (Basics)
 
Hsu etal-2009
Hsu etal-2009Hsu etal-2009
Hsu etal-2009
 
Econometrics 2.pptx
Econometrics 2.pptxEconometrics 2.pptx
Econometrics 2.pptx
 
Probability and Statistics Assignment Help
Probability and Statistics Assignment HelpProbability and Statistics Assignment Help
Probability and Statistics Assignment Help
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 

More from Christian Robert

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
Christian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
Christian Robert
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
Christian Robert
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
Christian Robert
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
Christian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
Christian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
Christian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
Christian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
Christian Robert
 

More from Christian Robert (13)

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 

Recently uploaded

GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 

Inference in generative models using the Wasserstein distance [[INI]

  • 1. Inference in generative models using the Wasserstein distance Christian P. Robert (Paris Dauphine PSL & Warwick U.) joint work with E. Bernton (Harvard), P.E. Jacob (Harvard), and M. Gerber (Bristol) INI, July 2017
  • 2. Outline 1 ABC and distance between samples 2 Wasserstein distance and computational aspects 3 Asymptotics 4 Handling time series
  • 3. Outline 1 ABC and distance between samples 2 Wasserstein distance and computational aspects 3 Asymptotics 4 Handling time series
  • 4. Generative model Assumption of a data-generating distribution µ (n) for data y1:n = y1, . . . , yn ∈ Yn Parametric generative model M = {µ (n) θ : θ ∈ H} such that sampling z1:n from µ (n) θ is feasible Prior distribution π(θ) available as density and generative model Goal: inference on parameters θ given observations y1:n
  • 5. Generic ABC Basic (summary-less) ABC posterior with density (θ, z1:n) ∼ π(θ) µ (n) θ 1 ( y1:n − z1:n < ε) Yn 1 ( y1:n − z1:n < ε) dz1:n . and ABC marginal qε (θ) = Yn n i=1 µ(dzi|θ)1 ( y1:n − z1:n < ε) Yn 1 ( y1:n − z1:n < ε) dz1:n unbiasedly estimated by π(θ)1 ( y1:n − z1:n < ε) where z1:n ∼ µ (n) θ Reminder: ABC-posterior goes to posterior as ε → 0
  • 6. Summary statistics Since random variable y1:n − z1:n may have large variance, { y1:n − z1:n < ε} gets rare as ε → 0 When using |η(y1:n) − η(z1:n) < ε based on (insufficient) summary statistic η, variance and dimension decrease but q-likelihood differs from likelihood Arbitrariness and impact of summaries [X et al., 2011; Fearnhead & Prangle, 2012; Li & Fearnhead, 2016]
  • 7. Distances between samples Aim: Use alternate distances D between samples y1:n and z1:n such that D(y1:n, z1:n) has smaller variance than y1:n − z1:n while resulting ABC-posterior still converges to posterior when ε → 0
  • 8. ABC with order statistics For univariate i.i.d. observations, order statistics are sufficient 1. sort observed and generated samples y1:n and z1:n 2. compute yσy(1:n) − zσz(1:n) p = n i=1 |y(i) − z(i)|p 1/p for order p (e.g. 1 or 2) instead of y1:n − z1:n = n i=1 |yi − zi|p 1/p
  • 9. Toy example Data-generating process given by Y1:1000 ∼ Gamma(10, 5) Hypothesised model: M = {N(µ, σ2 ) : (µ, σ) ∈ R × R+ } Prior µ ∼ N(0, 1) and σ ∼ Gamma(2, 1) ABC-Rejection sampling: 105 draws, using Euclidean distance, on sorted vs. unsorted samples and keeping 102 draws with smallest distances
  • 10. Toy example 0 2 4 6 1.6 2.0 2.4 µ density distance: L2 sorted L2 0 2 4 0.00 0.25 0.50 0.75 1.00 σ density distance: L2 sorted L2
  • 11. ABC with transport distances Distance D(y1:n, z1:n) = 1 n n i=1 |y(i) − z(i)|p 1/p is p-Wasserstein distance between empirical ˆµn(dy) = 1 n n i=1 δyi (dy) and ˆνn(dy) = 1 n n i=1 δzi (dy) Rather than comparing samples as vectors, alternative representation as empirical distributions c Novel ABC methods, which does not require summary statistics, available with multivariate or dependent data
  • 12. Outline 1 ABC and distance between samples 2 Wasserstein distance and computational aspects 3 Asymptotics 4 Handling time series
  • 13. Wasserstein distance Ground distance ρ (x, y) → ρ(x, y) on Y along with order p ≥ 1 leads to Wasserstein distance between µ, ν ∈ Pp(Y), p ≥ 1: Wp(µ, ν) = inf γ∈Γ(µ,ν) Y×Y ρ(x, y)p dγ(x, y) 1/p where Γ(µ, ν) set of joints with marginals µ, ν and Pp(Y) set of distributions µ for which Eµ[ρ(Y, y0)p] < ∞ for one y0
  • 14. Wasserstein distance: univariate case 1 23 123 Two empirical distributions on R with 3 atoms: 1 3 3 i=1 δyi and 1 3 3 j=1 δzj Matrix of pair-wise costs:    ρ(y1, z1)p ρ(y1, z2)p ρ(y1, z3)p ρ(y2, z1)p ρ(y2, z2)p ρ(y2, z3)p ρ(y3, z1)p ρ(y3, z2)p ρ(y3, z3)p   
  • 15. Wasserstein distance: univariate case 1 23 123 Joint distribution γ =    γ1,1 γ1,2 γ1,3 γ2,1 γ2,2 γ2,3 γ3,1 γ3,2 γ3,3    , with marginals (1/3 1/3 1/3), corresponds to a transport cost of 3 i,j=1 γi,jρ(yi, zj)p
  • 16. Wasserstein distance: univariate case 1 23 123 Optimal assignment: y1 ←→ z3 y2 ←→ z1 y3 ←→ z2 corresponds to choice of joint distribution γ γ =    0 0 1/3 1/3 0 0 0 1/3 0    , with marginals (1/3 1/3 1/3) and cost 3 i=1 ρ(y(i), z(i))p
  • 17. Wasserstein distance: bivariate case 1 2 3 1 2 3 there exists a joint distribution γ minimizing cost 3 i,j=1 γi,jρ(yi, zj)p with various algorithms to compute/approximate it
  • 18. Wasserstein distance also called Earth Mover, Gini, Mallows, Kantorovich, &tc. can be defined between arbitrary distributions actual distance statistically sound: ˆθn = arginf θ∈H Wp( 1 n n i=1 δyi , µθ) → θ = arginf θ∈H Wp(µ , µθ), at rate √ n, plus asymptotic distribution [Bassetti & al., 2006]
  • 19. Computing Wasserstein distances when Y = R, computing Wp(µn, νn) costs O(n log n) when Y = Rd, exact calculation is O(n3 log n) For entropic regularization, with δ > 0 Wp,δ(ˆµn, ˆνn)p = inf γ∈Γ(ˆµn,ˆνn) Y×Y ρ(x, y)p dγ(x, y) − δH(γ) , where H(γ) = − ij γij log γij entropy of γ, existence of Sinkhorn’s algorithm that yields cost O(n2) [Genevay et al., 2016]
  • 20. Computing Wasserstein distances other approximations, like Ye et al. (2016) using Simulated Annealing regularized Wasserstein not a distance, but as δ goes to zero, Wp,δ(ˆµn, ˆνn) → Wp(ˆµn, ˆνn) for δ small enough, Wp,δ(ˆµn, ˆνn) = Wp(ˆµn, ˆνn) (exact) in practice, δ 5% of median(ρ(yi, zj)p)i,j [Cuturi, 2013]
  • 21. Computing Wasserstein distances cost linear in the dimension of observations distance calculations model-independent other transport distances calculated in only O(n log n), based on different generalizations of “sorting”
  • 22. Computing Wasserstein distances cost linear in the dimension of observations distance calculations model-independent other transport distances calculated in only O(n log n), based on different generalizations of “sorting” [source: Wikipedia]
  • 23. Transport distance via Hilbert curve Sort multivariate data via space-filling curves, like Hilbert space-filling curve H : [0, 1] → [0, 1]d continuous mapping, with pseudo-inverse h : [0, 1]d → [0, 1] Compute order σ ∈ S of projected points, and compute hp(y1:n, z1:n) = 1 n n i=1 ρ(yσy(i), zσz(i))p 1/p , called Hilbert ordering transport distance [Gerber & Chopin, 2015]
  • 24. Transport distance via Hilbert curve Fact: hp(y1:n, z1:n) is a distance between empirical distributions with n atoms, for all p ≥ 1 Hence, hp(y1:n, z1:n) = 0 if and only if y1:n = zσ(1:n), for a permutation σ, with hope to retrieve posterior as ε → 0 Cost O(n log n) per calculation, but encompassing sampler might be more costly than with regularized or exact Wasserstein distances
  • 25. Adaptive SMC with r-hit moves Start with ε0 = ∞ 1. ∀k ∈ 1 : N, sample θk 0 ∼ π(θ) (prior) 2. ∀k ∈ 1 : N, sample zk 1:n from µ (n) θk 3. ∀k ∈ 1 : N, compute the distance dk 0 = D(y1:n, zk 1:n) 4. based on (θk 0)N k=1 and (dk 0)N k=1, compute ε1, s.t. resampled particles have at least 50% unique values At step t ≥ 1, weight wk t ∝ 1(dk t−1 ≤ εt), resample, and perform r-hit MCMC with adaptive independent proposals [Lee, 2012]
  • 26. Toy example: bivariate Normal 100 observations from bivariate Normal with variance 1 and covariance 0.55 Compare WABC with ABC versions based on raw Euclidean distance and Euclidean distance between (sufficient) sample means on 106 model simulations.
  • 27. Toy example: multivariate Normal θ6 θ7 θ8 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 0 1 2 3 4 value density distance: H2 W2 ABC-posterior densities, after 5 × 105 distance calculations Posterior density in dashed lines
  • 28. Outline 1 ABC and distance between samples 2 Wasserstein distance and computational aspects 3 Asymptotics 4 Handling time series
  • 29. Minimum Wasserstein estimator Under some assumptions, ˆθn exists and lim sup n→∞ argmin θ∈H Wp(ˆµn, µθ) ⊂ argmin θ∈H Wp(µ , µθ), almost surely In particular, if θ = argminθ∈H Wp(µ , µθ) is unique, then ˆθn a.s. −→ θ
  • 30. Minimum Wasserstein estimator Under stronger assumptions, incl. well-specification, dim(Y) = 1, and p = 1 √ n(ˆθn − θ ) w −→ argmin u∈H R |G (t) − u, D (t) |dt, where G is a µ -Brownian bridge, and D ∈ (L1(R))dθ satisfies R |Fθ(t) − F (t) − θ − θ , D (t) |dt = o( θ − θ H) [Pollard, 1980; del Barrio et al., 1999, 2005] Hard to use for confidence intervals, but the bootstrap is an intersting alternative.
  • 31. Toy Gamma example Data-generating process: y1:n i.i.d. ∼ Gamma(10, 5) Model: M = {N(µ, σ2 ) : (µ, σ) ∈ R × R+ } MLE converges to argminθ∈H KL(µ , µθ) MLE top, MWE bottom
  • 32. Toy Gamma example Data-generating process: y1:n i.i.d. ∼ Gamma(10, 5) Model: M = {N(µ, σ2 ) : (µ, σ) ∈ R × R+ } MLE converges to argminθ∈H KL(µ , µθ) 0 20 40 60 1.96 1.98 2.00 2.02 µ density 0 20 40 60 80 0.60 0.61 0.62 0.63 0.64 0.65 σ density n = 10, 000, MLE dashed, MWE solid
  • 33. Asymptotics of WABC-posterior For fixed n and ε → 0, for i.i.d. data, assuming sup y,θ µθ(y) < ∞ y → µθ(y) continuous, the Wasserstein ABC-posterior converges to the posterior irrespective of the choice of ρ and p Concentration as both n → ∞ and ε → ε = inf Wp(µ , µθ) [Frazier et al., 2016] Concentration on neighborhoods of θ = arginf Wp(µ , µθ), whereas posterior concentrates on arginf KL(µ , µθ)
  • 34. Asymptotics of WABC-posterior Rate of posterior concentration (and choice of εn) relates to rate of convergence of the distance, e.g. µ (n) θ Wp µθ, 1 n n i=1 δzi > u ≤ c(θ)fn(u), [Fournier & Guillin, 2015] Rate of convergence decays with the dimension of Y,
  • 35. Asymptotics of WABC-posterior Rate of posterior concentration (and choice of εn) relates to rate of convergence of the distance, e.g. µ (n) θ Wp µθ, 1 n n i=1 δzi > u ≤ c(θ)fn(u), [Fournier & Guillin, 2015] Rate of convergence decays with the dimension of Y, fast or slow, depending on moments of µθ and choice of p
  • 36. Toy example: univariate Data-generating process: Y1:n i.i.d. ∼ Gamma(10, 5), n = 100, with mean 2 and standard deviation ≈ 0.63 0.0 0.1 0.2 0 10 20 step ε Evolution of εt against t, the step index in the adaptive SMC sampler
  • 37. Toy example: univariate Data-generating process: Y1:n i.i.d. ∼ Gamma(10, 5), n = 100, with mean 2 and standard deviation ≈ 0.63 0.0 0.1 0.2 0 10 20 step ε Evolution of εt against t, the step index in the adaptive SMC sampler Theoretical model: M = {N(µ, σ2 ) : (µ, σ) ∈ R × R+ } Prior: µ ∼ N(0, 1) and σ ∼ Gamma(2, 1) 0 3 6 9 0 1 2 σ density Convergence to posterior
  • 38. Toy example: multivariate Observation space: Y = R10 Model: Yi ∼ N10(θ, S), for i ∈ 1 : 100, where Skj = 0.5|k−j| for k, j ∈ 1 : 10 Data generated with θ defined as a 10-vector, chosen by drawing standard Normal variables Prior: θi ∼ N(0, 1) for all i ∈ 1 : 10
  • 39. Toy example: multivariate Observation space: Y = R10 Model: Yi ∼ N10(θ, S), for i ∈ 1 : 100, where Skj = 0.5|k−j| for k, j ∈ 1 : 10 Data generated with θ defined as a 10-vector, chosen by drawing standard Normal variables Prior: θi ∼ N(0, 1) for all i ∈ 1 : 10 1e+04 1e+06 0 10 20 30 step #distancescalculated Evolution of number of distances calculated up to t, step index in adaptive SMC sampler
  • 40. Toy example: multivariate Observation space: Y = R10 Model: Yi ∼ N10(θ, S), for i ∈ 1 : 100, where Skj = 0.5|k−j| for k, j ∈ 1 : 10 Data generated with θ defined as a 10-vector, chosen by drawing standard Normal variables Prior: θi ∼ N(0, 1) for all i ∈ 1 : 10 Bivariate marginal of (θ3, θ7) approximated by SMC sampler (posterior contours in yellow, θ indicated by black lines)
  • 41. Outline 1 ABC and distance between samples 2 Wasserstein distance and computational aspects 3 Asymptotics 4 Handling time series
  • 42. Method 1 (0?): ignoring dependencies Consider only marginal distribution AR(1) example: y0 ∼ N 0, σ2 1 − φ2 , yt+1 ∼ N φyt, σ2 . Marginally yt ∼ N 0, σ2 / 1 − φ2 which identifies σ2/ 1 − φ2 but not (φ, σ) Produces a region of plausible parameters For n = 1, 000, generated with φ = 0.7 and log σ = 0.9
  • 43. Method 2: delay reconstruction Introduce ˜yt = (yt, yt−1, . . . , yt−k) for lag k, and treat ˜yt as data AR(1) example: ˜yt = (yt, yt−1) with marginal distribution N 0 0 , σ2 1 − φ2 1 φ φ 1 , identifies both φ and σ Related to Takens’ theorem in dynamical systems literature For n = 1, 000, generated with φ = 0.7 and log σ = 0.9.
  • 44. Method 3: residual reconstruction Time series y1:n deterministic transform of θ and w1:n Given y1:n and θ, reconstruct w1:n Cosine example: yt = A cos(2πωt + φ) + σwt wt ∼ N(0, 1) wt = (yt − A cos(2πωt + φ))/σ and calculate distance between reconstructed w1:n and Normal sample [Mengersen et al., 2013]
  • 45. Method 3: residual reconstruction Time series y1:n deterministic transform of θ and w1:n Given y1:n and θ, reconstruct w1:n Cosine example: yt = A cos(2πωt + φ) + σwt n = 500 observations with ω = 1/80, φ = π/4, σ = 1, A = 2, under prior U[0, 0.1] and U[0, 2π] for ω and φ, and N(0, 1) for log σ, log A −2.5 0.0 2.5 0 100 200 300 400 500 time y
  • 46. Cosine example with delay reconstruction, k = 3 0 10 20 30 40 0.000 0.025 0.050 0.075 0.100 ω density 0.00 0.05 0.10 0.15 0.20 0 2 4 6 φ density 0 2 4 6 −4 −2 0 2 log(σ) density 0.0 2.5 5.0 7.5 −2 0 2 log(A) density
  • 47. and with residual and delay reconstructions, k = 1 0 2000 4000 6000 0.000 0.025 0.050 0.075 0.100 ω density 0.0 0.5 1.0 1.5 2.0 0 2 4 6 φ density 0.0 2.5 5.0 7.5 10.0 −2 0 2 log(σ) density 0 1 2 3 −2 0 2 log(A) density
  • 48. Method 4: curve matching Define ˜yt = (t, yt) for all t ∈ 1 : n. Define a metric on {1, . . . , T} × Y. e.g. ρ((t, yt), (s, zs)) = λ|t − s| + |yt − zs|, for some λ Use distance D to compare ˜y1:n = (t, yt)n t=1 and ˜z1:n = (s, zs)n s=1 If λ 1, optimal transport will associate each (t, yt) with (t, zt) We get back the “vector” norm y1:n − z1:n . If λ = 0, time indices are ignored: identical to Method 1 For any λ > 0, there is hope to retrieve the posterior as ε → 0
  • 49. Cosine example (a) Posteriors of ω. (b) Posteriors of φ. (c) Posteriors of log(σ). (d) Posteriors of log(A).
  • 50. Discussion Transport metrics can be used to compare samples Various complexities from n3 log n to n2 to n log n Asymptotic guarantees as ε → 0 for fixed n, and as n → ∞ and ε → ε Various ways of applying these ideas to time series and maybe spatial data, maps, images. . . ?