Dependent processes in Bayesian Nonparametrics

Dependent processes
in Bayesian nonparametrics
Matteo Ruggiero
University of Torino and Collegio Carlo Alberto
Moncalieri, Feb 19 2016
0.0 0.2 0.4 0.6 0.8 1.0
time 1
0
0.029
0.059
0.088

1. Motivation and general setting
BNP and discrete random probability measures
p = (p1, p2, . . .) frequencies in
∆∞ = p ∈ [0, 1]∞
:
i
pi = 1
p↓
= (p(1), p(2), . . .) ordered frequencies in
∞ = p ∈ [0, 1]∞
: p1 ≥ p2 ≥ · · · ≥ 0,
i
pi = 1
Assign law to p, which induces a distribution
on ∆∞, ∞
Otherwise assign to the indices unique labels
X1, X2, . . .
iid
∼ P0 continuous on X and deﬁne
the discrete measure
∞
i=1
piδXi
which induces a distribution on P(X)
Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 3

BNP and discrete random probability measures
Approach 1:
model observations Yj directly with
p = (p1, p2, . . .) or P =
∞
i=1
piδXi
where Yj = Xi w.p. pi, and the (Xi, pi) are random
Approach 2:
use mixtures to yield more ﬂexibility and possibly aim at continuous
distributions
f(y) =
X
f(y | x)P(dx) ⇒ f(y) =
∞
i=1
pif(y | Xi)
i.e. Yj ∼ f(y | Xi) w.p. pi and the (Xi, pi) are random
Use either approach as a base for estimation, uncertainty quantiﬁcation,
forecasting, clustering, . . .

Motivation for dependent processes
Assumptions in classical BNP approach:
observations are excheangeable
observations depend on a ﬁxed environment/state of the world
inference is static (ﬁxed time)/carried out on single environment
Data may not satisfy these assumptions (e.g. prices dynamics)
Need for more general types of dependence

Partial exchangeability
Natural extension is partial exchangeability (de Finetti sense), e.g.




X1,1 X1,2 X1,3 · · ·
X2,1 X2,2 X2,3 · · ·
X3,1 X3,2 X3,3 · · ·
· · · · · · · · · · · ·




row-wise exchangeability (not overall): given i, Xi,j are exchangeable
Accommodates e.g. temporal structures
Collection of random probability measures, indexed by some covariate
Can be extended to an uncountable family

Dependent densities: discrete time

Dependent densities: continuous time

Modelling and inference with time-dependent processes
Temporal dependence structure
Partial exchangeability, for any t we have a distribution (possibly a mixture)
(Possibly multiple) data available at discrete time points
Model collection of random probability measures, forming
a discrete time process, or
a continuous-time process, with continuous paths or jumps
Nonparametric approach to allow for full ﬂexibility
Analyse properties of the resulting model
Devise suitable strategies for
posterior computation
Carry out inference on desired quantities

General setting
X1, X2, . . .
iid
∼ P0 unique labels or locations in X
We are interested in time-dependent random probability measures of type
p(t) = (p1(t), p2(t), . . .) ∈ ∆∞
p↓
(t) = (p(1)(t), p(2)(t), . . .) ∈ ∞
P(t) =
∞
i=1
pi(t)δXi(t) ∈ P(X)
where t ≥ 0 represents time.
Discrete sample paths:
p, p↓
, P are countable collections of distributions, t ∈ N
Continous sample paths:
p, p↓
, P are (random) t-continuous functions from [0, ∞) to ∆∞, ∞ or P(X)

2. Diffusive Dirichlet mixture models
Dirichlet process
The Dirichlet process [Ferguson 1973] extends the Dirichlet distribution from K
to infinitely many types
Can be defined via stick-breaking [Sethuraman 1994]
Vi
iid
∼ Beta(1, θ), pi = Vi
i−1
k=1
(1 − Vk)
0 1
p1 = V1 1 − V1
V2
p2 (1 − V1)(1 − V2)
V3
...
s.t. pi → 0 as i → ∞ and i≥1 pi = 1
Take Xi
iid
∼ P0 with P0 continuous on X.
Then P = ∞
i=1 piδXi is a Dirichlet process

Dirichlet process
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.02
0.04
0.06
0.08
0.10
x
p

Dependent Dirichlet process
Basic idea [MacEachern, 1999]
We aim at defining a process
P(t) =
∞
i=1
pi(t)δXi(t), t ≥ 0,
with Dirichlet process marginals
Handling both (p1(t), p2(t), . . .) and (X1(t), X2(t), . . .) can be non trivial.
Consider instead
P(t) =
∞
i=1
pi(t)δXi , t ≥ 0, Xi
iid
∼ P0
Atoms are fixed, but there are infinitely many of them
In practice, as many as you need

Diffusive Dirichlet process
Take the Dirichlet stick-breaking weights
pi = Vi
i−1
k=1
(1 − Vk), Vi ∼iid
Beta(1, θ)
Substitute each component Vi ∈ [0, 1] with a diffusion {Vi(t)}t≥0 on [0, 1]
Then take
pi(t) = Vi(t)
i−1
k=1
(1 − Vk(t))
Each component needs to have Beta marginals, Vi(t) ∼ Beta(1, θ)
One-dimensional Wright–Fisher diffusions satisfy this

Wright–Fisher diﬀusions
0 2 4 6 8 10

% of type 1 individuals (mutation rates: theta_1 = 2 , theta_2 = 8 )
Time (50K steps)
Statespace
0 2 4 6 8 10
0
1
Ergodic frequencies against Stationary Distribution Beta( 2 , 8 )
State space
0.0 0.2 0.4 0.6 0.8 1.0
0123

% of type 1 individuals (mutation rates: theta_1 = 8 , theta_2 = 8 )
Time (50K steps)
Statespace
0 2 4 6 8 10
0
1
Ergodic frequencies against Stationary Distribution Beta( 8 , 8 )
State space
0.0 0.2 0.4 0.6 0.8 1.0
0.01.53.0

% of type 1 individuals (mutation rates: theta_1 = 0.4 , theta_2 = 0.4 )
Time (50K steps)
Statespace
0 2 4 6 8 10
0
1
Ergodic frequencies against Stationary Distribution Beta( 0.4 , 0.4 )
State space
0.0 0.2 0.4 0.6 0.8 1.0
048

Diffusive Dirichlet process [Mena and R. 2016]
The resulting object
P(t) =
∞
i=1
Vi(t)
i−1
k=1
(1 − Vk(t))
pi(t)
δXi , Vi(t) ∼ WF(a, b)
has Dirichlet marginals for (a, b) = (1, θ), i.e. P(t) is a DP for all t
has GEM marginals for (a, b) ∈ R2
+
has diffusive behaviour, P(t) is t-continuous in total variation
See also
Gutierrez, Mena and & R. 2016 (version with jumps)
Mena, R. & Walker 2011 (geometric weights, different marginals)
for related models

Diﬀusive Dirichlet process
0.0 0.2 0.4 0.6 0.8 1.0
time 1
0
0.029
0.059
0.088

Estimation
At each time ti we have observations (yi,1, . . . , yi,ni ).
Set up the hierarchical mixture
{Pt, t ≥ 0} ∼ diﬀ-DP or GSB
xti | Pti ∼ Pti
yi,j | ti, xti
iid
∼ f(· | xti )
equivalently yi is drawn from the time-dependent nonparametric mixture model
fti (y) =
X
f(y|x)Pti (dx) =
∞
i=1
pti f(y | xi)

Simulated data
True model

Simulated data
Single data points
0 2 4 6 8 10
−202468
True model (heat map), posterior mode (solid), 95% credible intervals for the mean (dashed), 95%
quantiles of posterior density estimate (dotted).

Simulated data
Multiple data points
0 2 4 6 8 10
−202468
True model (heat map), posterior mode (solid), 95% credible intervals for the mean (dashed), 95%
quantiles of posterior density estimate (dotted).

Real data: S&P 500 (03/08 - 02/09)
Dependent density estimate
Heat map of estimated density (red), and mean estimate (solid)

Real data: S&P 500 (03/08 - 02/09)
160 170 180 190 200
80090010001100

Real data: S&P 500 (03/08 - 02/09)

populations
A diﬀerent view: modelling evolving populations
A sample path of p↓
(t) = (p(1), . . . , p(7))
Time
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Frequency
Dynamic frenquencies of 7 species

populations
A different view: modelling evolving populations
Distinct values X1, X2, . . . are interpreted as
allelic types in genetics
plant or animal species
unique identifiers of some evolving groups
Large population → species abundances approximate diffusive behaviours
If cannot provide an a priori upper bound, assume infinitely many species
Two different approaches:
constructing stochastic models for pseudo-realistic evolutionary mechanisms
(mutation, selection, recombination, migration, . . . )
studying the association between certain
distributions and connected dynamics
Dynamics in figure are related to
a Dirichlet distribution
Can we extend them? To what extent?
With what interpretation?
Time
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Frequency
Dynamic frenquencies of 7 species

populations
Wright–Fisher signals: Dirichlet-Multinomial model

populations
Poisson-Dirichlet case
No. species Markov chain
(N individuals)
K
Wright-Fisher(N, K, θ)
Fisher (1930), Wright (1931)
Diffusion
(∞ individuals)
d
N → ∞
Wright-Fisher(K, θ)
Sato (1976)
stationary
w.r.t.
Dir θ
K , . . . , θ
K
Random measure
(t fixed)
∞ IMNA(θ)
Ethier and Kurtz (1981)
d K → ∞
PD(θ)
Kingman (1975)
d K → ∞
stationary
w.r.t.
Moran(N, θ)
Watterson (1976)
d
N → ∞
“
d
−→” = convergence in distribution
IMNA = infinitely many neutral alleles

populations
Two-parameter Poisson-Dirichlet case
No. species
∞ PD(θ, α)
Pitman (1995)
Random measure
(t fixed)
Diffusion
(∞ individuals)
IMNA(θ, α)
Petrov (2009)
stationary
w.r.t.
?? Moran(N, θ, α)
R. and Walker (2009)
d
N → ∞
Markov chain
(N individuals)
?? WF(K, θ, α)
Costantini, De Blasi,
Ethier, R., Spanò (2016)
d K → ∞
K ?? WF(N, K, θ, α)
Costantini, De Blasi,
Ethier, R., Spanò (2016)
d
N → ∞
stationary
w.r.t. ??
d K → ∞
Remarks:
IMNA = infinitely many neutral allelesBased on Pitman’s generalized Pólya urn schemeMutation and immigration

Continuous-time Gamma-Poisson model
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
CIR path X_t
0
5
10
15
20
25
30
35
0 10 20 30 40 50
Poisson(X_t) likelihood

The propagation mixture
Prior X ∼ πα := Gamma(α1, α2)
Likelihood Y | X ∼ Poisson(X)
Posterior X | Y1, . . . , Yn ∼ πα,n := Gamma α1 +
n
i=1
yi, α2 + n
Propagation mixture [Papaspiliopoulos & R. 2014]
ψt(πα,n) := πα,n(x)Pt(x, dx )
is given by
ψt(πα,n) =
n
j=0
pt(n, j)Gamma α1 +
n
i=0
yi − j, α2 + n − st
for appropriate time-varying weights pt(n, j)
Can be extended to inﬁnite dimensional models [Papaspiliopoulos, R. & Span`o
2016]

0 1 2 3 4 5 6 7
0.1
0.2
0.3
0.4
0.5
t t0

Some references
Costantini, De Blasi, Ethier, R. and Spanò (2016).
Wright–Fisher construction of the two-parameter Poisson–Dirichlet diffusion.
arXiv:1601.06064
Gutierrez, Mena & R. (2016).
A time dependent Bayesian nonparametric model for air quality analysis.
Comput. Statist. Data Anal.
Mena & R. (2016).
Dynamic density estimation with diffusive Dirichlet mixtures. Bernoulli
Mena, R. & Walker (2011).
Geometric stick-breaking processes for continuous-time Bayesian nonparametric modeling.
J. Statist. Plann. Inf.
Papaspiliopoulos & R. (2014).
Optimal filtering and the dual process. Bernoulli
Papaspiliopoulos, R. & Spanò (2014).
Filtering hidden Markov measures. arXiv:1411.4944
R. & Walker (2009).
Countable representation for infinite dimensional diffusions derived from the
two-parameter Poisson–Dirichlet process. Electr. Comm. Probab.
For more info: www.matteoruggiero.it

Dependent processes in Bayesian Nonparametrics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Dependent processes in Bayesian Nonparametrics

Similar to Dependent processes in Bayesian Nonparametrics (20)

More from Julyan Arbel

More from Julyan Arbel (16)

Recently uploaded

Recently uploaded (20)

Dependent processes in Bayesian Nonparametrics