This document discusses using the Wasserstein distance for inference in generative models. It begins with an overview of approximate Bayesian computation (ABC) and how distances between samples are used. It then introduces the Wasserstein distance as an alternative distance that can have lower variance than the Euclidean distance. Computational aspects and asymptotics of using the Wasserstein distance are discussed. The document also covers how transport distances can handle time series data.
We examine the effectiveness of randomized quasi Monte Carlo (RQMC) to improve the convergence rate of the mean integrated square error, compared with crude Monte Carlo (MC), when estimating the density of a random variable X defined as a function over the s-dimensional unit cube (0,1)^s. We consider histograms and kernel density estimators. We show both theoretically and empirically that RQMC estimators can achieve faster convergence rates in
some situations.
This is joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Multi-source connectivity as the driver of solar wind variability in the heli...
ABC based on Wasserstein distances
1. Inference in generative models using the
Wasserstein distance
Christian P. Robert
(Paris Dauphine PSL & Warwick U.)
joint work with E. Bernton (Harvard), P.E. Jacob
(Harvard), and M. Gerber (Bristol)
Big Bayes, CIRM, Nov. 2018
2. Outline
1 ABC and distance between samples
2 Wasserstein distance
3 Computational aspects
4 Asymptotics
5 Handling time series
3. Outline
1 ABC and distance between
samples
2 Wasserstein distance
3 Computational aspects
4 Asymptotics
5 Handling time series
4. Generative model
Assumption of a data-generating distribution µ
(n)
for data
y1:n = y1, . . . , yn ∈ Yn
Parametric generative model
M = {µ
(n)
θ : θ ∈ H}
such that sampling (generating) z1:n from µ
(n)
θ is feasible
Prior distribution π(θ) available as density and generative model
Goal: inference on parameters θ given observations y1:n
5. Generic ABC
Basic (summary-less) ABC posterior with density
(θ, z1:n) ∼ π(θ)
µ
(n)
θ 1 ( y1:n − z1:n < ε)
Yn 1 ( y1:n − z1:n < ε) dz1:n
.
and ABC marginal
qε
(θ) = Yn
n
i=1 µ(dzi|θ)1 ( y1:n − z1:n < ε)
Yn 1 ( y1:n − z1:n < ε) dz1:n
unbiasedly estimated by π(θ)1 ( y1:n − z1:n < ε) where
z1:n ∼ µ
(n)
θ
Reminder: ABC-posterior goes to posterior as ε → 0
6. Summary statistics
Since random variable y1:n − z1:n may have large variance,
{ y1:n − z1:n < ε}
gets rare as ε → 0 and rarer when d ↑
When using
|η(y1:n) − η(z1:n) < ε
based on (insufficient) summary statistic η, variance and
dimension decrease but q-likelihood differs from likelihood
Arbitrariness and impact of summaries, incl. curse of
dimensionality
[X et al., 2011; Fearnhead & Prangle, 2012; Li & Fearnhead, 2016]
7. Distances between samples
Aim: Ressort to alternate distances D between samples y1:n
and z1:n such that
D(y1:n, z1:n)
has smaller variance than
y1:n − z1:n
while induced ABC-posterior still converges to posterior when
ε → 0
8. ABC with order statistics
Recall that, for univariate i.i.d. data, order statistics are
sufficient
1. sort observed and generated samples y1:n and z1:n
2. compute
yσy(1:n) − zσz(1:n) p =
n
i=1
|y(i) − z(i)|p
1/p
for order p (e.g. 1 or 2) instead of
y1:n − z1:n =
n
i=1
|yi − zi|p
1/p
9. Toy example
Data-generating process given by
Y1:1000 ∼ Gamma(10, 5)
Hypothesised model:
M = {N(µ, σ2
) : (µ, σ) ∈ R × R+
}
Prior µ ∼ N(0, 1) and σ ∼ Gamma(2, 1)
ABC-Rejection sampling: 105 draws, using Euclidean
distance, on sorted vs. unsorted samples and keeping 102
draws with smallest distances
10. Toy example
0
2
4
6
1.6 2.0 2.4
µ
density
distance: L2 sorted L2
0
2
4
0.00 0.25 0.50 0.75 1.00
σ
density
distance: L2 sorted L2
11. ABC with transport distances
Distance
D(y1:n, z1:n) =
1
n
n
i=1
|y(i) − z(i)|p
1/p
is p-Wasserstein distance between empirical cdfs
ˆµn(dy) =
1
n
n
i=1
δyi (dy) and ˆνn(dy) =
1
n
n
i=1
δzi (dy)
Rather than comparing samples as vectors, alternative
representation as empirical distributions
c Novel ABC method, which does not require
summary statistics, available with multivariate or
dependent data
12. Outline
1 ABC and distance between
samples
2 Wasserstein distance
3 Computational aspects
4 Asymptotics
5 Handling time series
13. Wasserstein distance
Ground distance ρ (x, y) → ρ(x, y) on Y along with order p ≥ 1
leads to Wasserstein distance between µ, ν ∈ Pp(Y), p ≥ 1:
Wp(µ, ν) = inf
γ∈Γ(µ,ν) Y×Y
ρ(x, y)p
dγ(x, y)
1/p
where Γ(µ, ν) set of joints with marginals µ, ν and Pp(Y) set of
distributions µ for which Eµ[ρ(Y, y0)p] < ∞ for one y0
14. Wasserstein distance: univariate case
1 23 123
Two empirical distributions on R with 3 atoms:
1
3
3
i=1
δyi and
1
3
3
j=1
δzj
Matrix of pair-wise costs:
ρ(y1, z1)p ρ(y1, z2)p ρ(y1, z3)p
ρ(y2, z1)p ρ(y2, z2)p ρ(y2, z3)p
ρ(y3, z1)p ρ(y3, z2)p ρ(y3, z3)p
15. Wasserstein distance: univariate case
1 23 123
Joint distribution
γ =
γ1,1 γ1,2 γ1,3
γ2,1 γ2,2 γ2,3
γ3,1 γ3,2 γ3,3
,
with marginals (1/3 1/3 1/3), corresponds to a transport
cost of
3
i,j=1
γi,jρ(yi, zj)p
16. Wasserstein distance: univariate case
1 23 123
Optimal assignment:
y1 ←→ z3
y2 ←→ z1
y3 ←→ z2
corresponds to choice of joint distribution γ
γ =
0 0 1/3
1/3 0 0
0 1/3 0
,
with marginals (1/3 1/3 1/3) and cost 3
i=1 ρ(y(i), z(i))p
17. Wasserstein distance
Two samples y1, . . . , yn and z1, . . . , zm
Wp(ˆµn, ˆνm) =
1
nm i,j
ρ(y,zj)
Important special case when n = m, for which solution to the
optimization problem γ corresponds to an assignment matrix,
with only one non-zero entry per row and column, equal to n−1.
[Villani, 2003]
18. Wasserstein distance
Two samples y1, . . . , yn and z1, . . . , zm
Wp(ˆµn, ˆνm) =
1
nm i,j
ρ(y,zj)
Wasserstein distance thus represented as
Wp(y1:n, z1:n)p
= inf
σ∈Sn
1
n
n
i=1
ρ(yi, zσ(i))p
Computing Wasserstein distance between two samples of same
size equivalent to optimal matching problem.
19. Wasserstein distance: bivariate case
1
2
3
1
2
3
there exists a joint distribution γ minimizing cost
3
i,j=1
γi,jρ(yi, zj)p
with various algorithms to compute/approximate it
20. Wasserstein distance
also called Vaserˇste˘ın, Earth Mover, Gini, Mallows,
Kantorovich, Rubinstein, &tc.
can be defined between arbitrary distributions
actual distance
statistically sound:
ˆθn = arginf
θ∈H
Wp(
1
n
n
i=1
δyi , µθ) → θ = arginf
θ∈H
Wp(µ , µθ),
at rate
√
n, plus asymptotic distribution
[Bassetti & al., 2006]
23. Outline
1 ABC and distance between
samples
2 Wasserstein distance
3 Computational aspects
4 Asymptotics
5 Handling time series
24. Computing Wasserstein distances
when Y = R, computing Wp(µn, νn) costs O(n log n)
when Y = Rd, exact calculation is O(n3) [Hungarian]or
O(n2.5 log n) [short-list]
For entropic regularization, with δ > 0
Wp,δ(ˆµn, ˆνn)p
= inf
γ∈Γ(ˆµn,ˆνn) Y×Y
ρ(x, y)p
dγ(x, y) − δH(γ) ,
where H(γ) = − ij γij log γij entropy of γ, existence of
Sinkhorn’s algorithm that yields cost O(n2)
[Genevay et al., 2016]
25. Computing Wasserstein distances
other approximations, like Ye et al. (2016) using Simulated
Annealing
regularized Wasserstein not a distance, but as δ goes to
zero,
Wp,δ(ˆµn, ˆνn) → Wp(ˆµn, ˆνn)
for δ small enough, Wp,δ(ˆµn, ˆνn) = Wp(ˆµn, ˆνn) (exact)
in practice, δ 5% of median(ρ(yi, zj)p)i,j
[Cuturi, 2013]
26. Computing Wasserstein distances
cost linear in the dimension of
observations
distance calculations
model-independent
other transport distances
calculated in O(n log n), based
on different generalizations of
“sorting” (swapping, Hilbert)
[Gerber & Chopin, 2015]
acceleration by combination of
distances and subsampling
[source: Wikipedia]
27. Computing Wasserstein distances
cost linear in the dimension of
observations
distance calculations
model-independent
other transport distances
calculated in O(n log n), based
on different generalizations of
“sorting” (swapping, Hilbert)
[Gerber & Chopin, 2015]
acceleration by combination of
distances and subsampling
[source: Wikipedia]
28. Transport distance via Hilbert curve
Sort multivariate data via space-filling curves, like Hilbert
space-filling curve
H : [0, 1] → [0, 1]d
continuous mapping, with pseudo-inverse
h : [0, 1]d
→ [0, 1]
Compute order σ ∈ S of projected points, and compute
hp(y1:n, z1:n) =
1
n
n
i=1
ρ(yσy(i), zσz(i))p
1/p
,
called Hilbert ordering transport distance
[Gerber & Chopin, 2015]
29. Transport distance via Hilbert curve
Fact: hp(y1:n, z1:n) is a distance between empirical distributions
with n atoms, for all p ≥ 1
Hence, hp(y1:n, z1:n) = 0 if and only if y1:n = zσ(1:n), for a
permutation σ, with hope to retrieve posterior as ε → 0
Cost O(n log n) per calculation, but encompassing sampler
might be more costly than with regularized or exact
Wasserstein distances
Upper bound on corresponding Wasserstein distance, only
accurate for small dimension
30. Adaptive SMC with r-hit moves
Start with ε0 = ∞
1. ∀k ∈ 1 : N, sample θk
0 ∼ π(θ) (prior)
2. ∀k ∈ 1 : N, sample zk
1:n from µ
(n)
θk
3. ∀k ∈ 1 : N, compute the distance dk
0 = D(y1:n, zk
1:n)
4. based on (θk
0)N
k=1 and (dk
0)N
k=1, compute ε1, s.t.
resampled particles have at least 50% unique values
At step t ≥ 1, weight wk
t ∝ 1(dk
t−1 ≤ εt), resample, and perform
r-hit MCMC with adaptive independent proposals
[Lee, 2012; Lee and Latuszy´nski, 2014]
31. Toy example: bivariate Normal
100 observations from bivariate Normal with variance 1 and
covariance 0.55
Compare WABC with ABC versions based on raw Euclidean
distance and Euclidean distance between (sufficient) sample
means on 106 model simulations.
32. Toy example: bivariate Normal
100 observations from bivariate Normal with variance 1 and
covariance 0.55
Compare WABC with ABC versions based on raw Euclidean
distance and Euclidean distance between (sufficient) sample
means on 106 model simulations.
In terms of computing time, based on our R implementation on
an Intel Core i7-5820K (3.30GHz), each Euclidean distance
calculation takes an average 2.2 x 104 s while each Wasserstein
distance calculation takes an average 8:2 x 103s, i.e. 40 times
greater
33. Quantile “g-and-k” distribution
bivariate extension of the g-and-k distribution with quantile
functions
ai + bi 1 + 0.8
1 − exp(−gizi(r)
1 + exp(−giz(r)
1 + zi(r)2
k
zi(r) (1)
and correlation ρ
Intractable density that can be numerically approximated
[Rayner and MacGillivray, 2002; Prangle, 2017]
Simulation by MCMC and W-ABC (sequential tolerance
exploration)
34. Quantile “g-and-k” distribution
(a) a1. (b) b1. (c) g1. (d) k1.
(e) a2. (f) b2. (g) g2. (h) k2.
(i) ρ. (j) g2 last
10 steps.
(k) W1 to
posterior, vs.
simulations.
35. Outline
1 ABC and distance between
samples
2 Wasserstein distance
3 Computational aspects
4 Asymptotics
5 Handling time series
36. Minimum Wasserstein estimator
Under some assumptions, ˆθn exists and
lim sup
n→∞
argmin
θ∈H
Wp(ˆµn, µθ) ⊂ argmin
θ∈H
Wp(µ , µθ),
almost surely
In particular, if θ = argminθ∈H Wp(µ , µθ) is unique, then
ˆθn
a.s.
−→ θ
37. Minimum Wasserstein estimator
Under stronger assumptions, incl. well-specification,
dim(Y) = 1, and p = 1
√
n(ˆθn − θ )
w
−→ argmin
u∈H R
|G (t) − u, D (t) |dt,
where G is a µ -Brownian bridge, and D ∈ (L1(R))dθ satisfies
R
|Fθ(t) − F (t) − θ − θ , D (t) |dt = o( θ − θ H)
[Pollard, 1980; del Barrio et al., 1999, 2005]
Hard to use for confidence intervals, but the bootstrap is an
intersting alternative.
38. Toy Gamma example
Data-generating process:
y1:n
i.i.d.
∼ Gamma(10, 5)
Model:
M = {N(µ, σ2
) : (µ, σ) ∈ R × R+
}
MLE converges to
argminθ∈H KL(µ , µθ)
MLE top, MWE bottom
39. Toy Gamma example
Data-generating process:
y1:n
i.i.d.
∼ Gamma(10, 5)
Model:
M = {N(µ, σ2
) : (µ, σ) ∈ R × R+
}
MLE converges to
argminθ∈H KL(µ , µθ)
0
20
40
60
1.96 1.98 2.00 2.02
µ
density
0
20
40
60
80
0.60 0.61 0.62 0.63 0.64 0.65
σ
density
n = 10, 000, MLE dashed,
MWE solid
40. Minimum expected Wasserstein estimator
ˆθn,m = argmin
θ∈H
E[D(ˆµn, ˆµθ,m)]
with expectation under distribution of sample z1:m ∼ µ
(m)
θ
giving rise to ˆµθ,m = m−1 m
i=1 δzi .
41. Minimum expected Wasserstein estimator
ˆθn,m = argmin
θ∈H
E[D(ˆµn, ˆµθ,m)]
with expectation under distribution of sample z1:m ∼ µ
(m)
θ
giving rise to ˆµθ,m = m−1 m
i=1 δzi .
Under further assumptions, incl. m(n) → ∞ with n,
inf
θ∈H
EWp(ˆµn(ω), ˆµθ,m(n)) → inf
θ∈H
Wp(µ , µθ)
and
lim sup
n→∞
argmin
θ∈H
EWp(ˆµn(ω), ˆµθ,m(n)) ⊂ argmin
θ∈H
Wp(µ , µθ).
42. Minimum expected Wasserstein estimator
ˆθn,m = argmin
θ∈H
E[D(ˆµn, ˆµθ,m)]
with expectation under distribution of sample z1:m ∼ µ
(m)
θ
giving rise to ˆµθ,m = m−1 m
i=1 δzi .
Further, for n fixed,
inf
θ∈H
EWp(ˆµn, ˆµθ,m) → inf
θ∈H
Wp(ˆµn, µθ)
as m → ∞ and
lim sup
m→∞
argmin
θ∈H
EWp(ˆµn, ˆµθ,m) ⊂ argmin
θ∈H
Wp(ˆµn, µθ).
43. Quantile “g-and-k” distribution
Sampling achieved by plugging standard Normal variables into
(1) in place of z(r).
MEWE with large m can be computed to high precision
(a) MEWE: a vs
b.
(b) MEWE: g vs
κ.
0.00
0.05
0.10
0.15
0.20
−4 0 4
κ
density
(c)
√
n-scaled
estim. of κ.
0.0
0.1
0.2
0.3
0.4
0 10 20 30
observations
density
(d) Histogram of
data.
44. Asymptotics of WABC-posterior
convergence to true posterior as → 0
convergence to non-Dirac as n → ∞ for fixed
Bayesian consistency if n ↓ at proper speed
[Frazier, X & Rousseau, 2017]
WARNING: Theoretical conditions extremely rarely open
checks in practice
45. Asymptotics of WABC-posterior
convergence to true posterior as → 0
convergence to non-Dirac as n → ∞ for fixed
Bayesian consistency if n ↓ at proper speed
[Frazier, X & Rousseau, 2017]
WARNING: Theoretical conditions extremely rarely open
checks in practice
46. Asymptotics of WABC-posterior
For fixed n and ε → 0, for i.i.d. data, assuming
sup
y,θ
µθ(y) < ∞
y → µθ(y) continuous, the Wasserstein ABC-posterior converges
to the posterior irrespective of the choice of ρ and p
Concentration as both n → ∞ and ε → ε = inf Wp(µ , µθ)
[Frazier et al., 2018]
Concentration on neighborhoods of θ = arginf Wp(µ , µθ),
whereas posterior concentrates on arginf KL(µ , µθ)
47. Asymptotics of WABC-posterior
Rate of posterior concentration (and choice of εn) relates to
rate of convergence of the distance, e.g.
µ
(n)
θ Wp µθ,
1
n
n
i=1
δzi > u ≤ c(θ)fn(u),
[Fournier & Guillin, 2015]
Rate of convergence decays with the dimension of Y, fast or
slow, depending on moments of µθ and choice of p
48. Asymptotics of WABC-posterior
Rate of posterior concentration (and choice of εn) relates to
rate of convergence of the distance, e.g.
µ
(n)
θ Wp µθ,
1
n
n
i=1
δzi > u ≤ c(θ)fn(u),
[Fournier & Guillin, 2015]
Rate of convergence decays with the dimension of Y, fast or
slow, depending on moments of µθ and choice of p
49. Toy example: univariate
Data-generating process:
Y1:n
i.i.d.
∼ Gamma(10, 5), n = 100,
with mean 2 and standard
deviation ≈ 0.63 0.0
0.1
0.2
0 10 20
step
ε
Evolution of εt against t, the step
index in the adaptive SMC sampler
Theoretical model:
M = {N(µ, σ2
) : (µ, σ) ∈ R × R+
}
Prior: µ ∼ N(0, 1) and
σ ∼ Gamma(2, 1)
0
3
6
9
0 1 2
σ
density
Convergence to posterior
50. Toy example: univariate
Data-generating process:
Y1:n
i.i.d.
∼ Gamma(10, 5), n = 100,
with mean 2 and standard
deviation ≈ 0.63 0.0
0.1
0.2
0 10 20
step
ε
Evolution of εt against t, the step
index in the adaptive SMC sampler
Theoretical model:
M = {N(µ, σ2
) : (µ, σ) ∈ R × R+
}
Prior: µ ∼ N(0, 1) and
σ ∼ Gamma(2, 1)
0
3
6
9
0 1 2
σ
density
Convergence to posterior
51. Toy example: multivariate
Observation space: Y = R10
Model: Yi ∼ N10(θ, S), for
i ∈ 1 : 100, where Skj = 0.5|k−j| for
k, j ∈ 1 : 10
Data generated with θ defined as
a 10-vector, chosen by drawing
standard Normal variables
Prior: θi ∼ N(0, 1) for all i ∈ 1 : 10
52. Toy example: multivariate
Observation space: Y = R10
Model: Yi ∼ N10(θ, S), for
i ∈ 1 : 100, where Skj = 0.5|k−j| for
k, j ∈ 1 : 10
Data generated with θ defined as
a 10-vector, chosen by drawing
standard Normal variables
Prior: θi ∼ N(0, 1) for all i ∈ 1 : 10
1e+04
1e+06
0 10 20 30
step
#distancescalculated
Evolution of number of distances
calculated up to t, step index in
adaptive SMC sampler
53. Toy example: multivariate
Observation space: Y = R10
Model: Yi ∼ N10(θ, S), for
i ∈ 1 : 100, where Skj = 0.5|k−j| for
k, j ∈ 1 : 10
Data generated with θ defined as
a 10-vector, chosen by drawing
standard Normal variables
Prior: θi ∼ N(0, 1) for all i ∈ 1 : 10
Bivariate marginal of (θ3, θ7)
approximated by SMC sampler
(posterior contours in yellow, θ
indicated by black lines)
54. sum of log-Normals
Distribution of the sum of log-Normal random variables
intractable but easy to simulate
x1, . . . , xL ∼ N(γ, σ2
) y =
L
=1
exp(x )
(a) MEWE of
(γ, σ).
0.0
0.2
0.4
0.6
−2 −1 0 1 2
γ
density
(b)
√
n-scaled
estim. of γ.
0.0
0.2
0.4
0.6
−2 −1 0 1 2
σ
density
(c)
√
n-scaled
estim. of σ.
0.00
0.02
0.04
0.06
0 25 50 75
observations
density
(d) Histogram of
data.
55. misspecified model
Gamma Gamma(10, 5) data fitted with a Normal model
N(γ, σ2)
approximate MEWE by sampling k = 20 independent u(i) and
minimize
θ → k−1
k
i=1
Wp(y1:n, gm(u(i)
, θ))
(a) MLE of
(γ, σ).
(b) MEWE of
(γ, σ).
0
20
40
60
1.94 1.96 1.98 2.00 2.02
γ
density
(c) Estimators
of γ.
0
20
40
60
80
0.60 0.61 0.62 0.63 0.64 0.65
σ
density
(d) Estimators
of σ.
56. Outline
1 ABC and distance between
samples
2 Wasserstein distance
3 Computational aspects
4 Asymptotics
5 Handling time series
57. Method 1 (0?): ignoring dependencies
Consider only marginal distribution
AR(1) example:
y0 ∼ N 0,
σ2
1 − φ2
, yt+1 ∼ N φyt, σ2
.
Marginally
yt ∼ N 0, σ2
/ 1 − φ2
which identifies σ2/ 1 − φ2 but
not (φ, σ)
Produces a region of plausible
parameters
For n = 1, 000, generated with
φ = 0.7 and log σ = 0.9
58. Method 2: delay reconstruction
Introduce ˜yt = (yt, yt−1, . . . , yt−k) for lag k, and treat ˜yt as data
AR(1) example: ˜yt = (yt, yt−1)
with marginal distribution
N
0
0
,
σ2
1 − φ2
1 φ
φ 1
,
identifies both φ and σ
Related to Takens’ theorem in
dynamical systems literature
For
n = 1, 000, generated with φ = 0.7
and log σ = 0.9.
59. Method 3: residual reconstruction
Time series y1:n deterministic transform of θ and w1:n
Given y1:n and θ, reconstruct w1:n
Cosine example:
yt = A cos(2πωt + φ) + σwt
wt ∼ N(0, 1)
wt = (yt − A cos(2πωt + φ))/σ
and calculate distance between
reconstructed w1:n and Normal
sample
[Mengersen et al., 2013]
60. Method 3: residual reconstruction
Time series y1:n deterministic transform of θ and w1:n
Given y1:n and θ, reconstruct w1:n
Cosine example:
yt = A cos(2πωt + φ) + σwt
n = 500 observations with
ω = 1/80, φ = π/4,
σ = 1, A = 2, under prior
U[0, 0.1] and U[0, 2π] for ω and φ,
and N(0, 1) for log σ, log A
−2.5
0.0
2.5
0 100 200 300 400 500
time
y
61. Cosine example with delay reconstruction, k = 3
0
10
20
30
40
0.000 0.025 0.050 0.075 0.100
ω
density
0.00
0.05
0.10
0.15
0.20
0 2 4 6
φ
density
0
2
4
6
−4 −2 0 2
log(σ)
density
0.0
2.5
5.0
7.5
−2 0 2
log(A)
density
62. and with residual and delay reconstructions,
k = 1
0
2000
4000
6000
0.000 0.025 0.050 0.075 0.100
ω
density
0.0
0.5
1.0
1.5
2.0
0 2 4 6
φ
density
0.0
2.5
5.0
7.5
10.0
−2 0 2
log(σ)
density
0
1
2
3
−2 0 2
log(A)
density
63. Method 4: curve matching
Define ˜yt = (t, yt) for all t ∈ 1 : n.
Define a metric on {1, . . . , T} × Y. e.g.
ρ((t, yt), (s, zs)) = λ|t − s| + |yt − zs|, for some λ
Use distance D to compare ˜y1:n = (t, yt)n
t=1 and ˜z1:n = (s, zs)n
s=1
If λ 1, optimal transport will associate each (t, yt) with (t, zt)
We get back the “vector” norm y1:n − z1:n .
If λ = 0, time indices are ignored: identical to Method 1
For any λ > 0, there is hope to retrieve the posterior as ε → 0
65. Discussion
Transport metrics can be used to compare samples
Various complexities from n3 log n to n2 to n log n
Asymptotic guarantees as ε → 0 for fixed n,
and as n → ∞ and ε → ε
Various ways of applying these ideas to time series and
maybe spatial data, maps, images. . .