Arbel oviedo

Dependent Dirichlet processes
and application to ecological data

Julyan Arbel
Joint work with Kerrie Mengersen & Judith Rousseau

´
CREST-INSEE, Universite Paris-Dauphine

2 December 2012
ERCIM 2012
5th International Conference on
Computing & Statistics

Biology question
Nonparametric model

Outline

1 Biology question
Introduction
Data

2 Nonparametric model
Dirichlet process
Dependent Dirichlet process

Julyan Arbel DDP and ecological data

Biology question Introduction
Nonparametric model Data

Outline

1 Biology question
Introduction
Data

Dirichlet process



Biology introduction

Series of measurements at
different places around
Casey Station, permanent
base in Antarctica
At each site: pollution
level, and abundance of
microbes called OTUs.
Assess the impact of a
pollutant on the soil
composition / biodiversity



Data

Data consist of measurements of microbes abundance:



Data


Site TPH 06251 00576 00429 06360 08793 06259 05164 00772

Sample of abundance of 8 microbes (columns) at 6 sites
(rows)
Main covariate is a pollution level called TPH, denoted x


Data


Site TPH 06251 00576 00429 06360 08793 06259 05164 00772
1 80 3 724 88 1 0 0 0 467
2 80 9 2364 252 0 0 2 0 616
3 80 12 443 1655 11 0 0 0 168
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
. . . . . . . . . .

(rows)


Data


Site TPH 06251 00576 00429 06360 08793 06259 05164 00772
1 80 3 724 88 1 0 0 0 467
2 80 9 2364 252 0 0 2 0 616
3 80 12 443 1655 11 0 0 0 168
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
. . . . . . . . . .
13 2600 2262 339 229 1100 537 352 0 0
20 10000 1883 23 18 879 224 325 9 1
24 22000 1446 2 27 920 1808 1456 0 0

(rows)



Notations

Microbe species are denoted by j = 1, . . . by decreasing
total abundance



Notations

total abundance
At each site x, there are N(x) microbes, denoted Yi (x),
i = 1, . . . , N(x).



Notations

total abundance
At each site x, there are N(x) microbes, denoted Yi (x),
i = 1, . . . , N(x).
Data are a frequency matrix:

Site TPH 06251 00576 ...
j =1 j ...
1 x = 80 #(Yn (x = 80) = 1) = 3 ... ...
.
. .
. .
. .
. .
.
. . . . .
k x ... #(Yn (x) = j) ...



Notations
A standard example of diversity is Shannon diversity, taken as
the exponential of Shannon entropy
#(Yn (x)=j)
D(x) = exp j −pj (x) log pj (x) with pj (x) = N(x)



Notations
A standard example of diversity is Shannon diversity, taken as
the exponential of Shannon entropy
#(Yn (x)=j)
D(x) = exp j −pj (x) log pj (x) with pj (x) = N(x)

40
3.5

Shannon diversity
Shannon entropy

30
3.0

20
2.5

0 5000 10000 20000 10 0 5000 10000 20000

tph tph

Figure: Left: Shannon entropy in row data. Right: Shannon diversity
in row data.


Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process

Outline

1 Biology question
Introduction
Data

Dirichlet process



First model
Pavlovian conditioning associated with the word species leads
to the Dirichlet process and/or related processes.



First model

Yi (x) | G ∼ G,
First, we run an ∞
independent model at G(·) = pj δj (·),
each site with TPH x j=1
(pj )j ∼ GEM(M).



First model

Yi (x) | G ∼ G,
First, we run an ∞
independent model at G(·) = pj δj (·),
each site with TPH x j=1
(pj )j ∼ GEM(M).
The GEM(M) distribution is deﬁned in [Pitman, 2002] (GEM
stands for Grifﬁths, Engen and McCloskey) and represents the
distribution of the weights in a Dirichlet process:

pj = Vj (1 − Vl ), Vj ∼ Beta(1, M).
l<j



Posterior sampling

We use a blocked Gibbs sampler (truncated version of the
inﬁnite sum)



Posterior sampling

inﬁnite sum)
The prior on p is induced by the Beta prior on V ,
π⊥ (Vj ) = Be(1, M).



Posterior sampling

inﬁnite sum)
The prior on p is induced by the Beta prior on V ,
π⊥ (Vj ) = Be(1, M).
This is conjugated, with a Beta posterior:

π(Vj |Y ) = Be(Vj |1 + #(Yn = j), M + #(Yn > j)).



Second model

But we want to run a single model across TPH x ; it means a
predictor-dependent model



Second model

Early references to predictor-dependent DP models include
Cifarelli and Regazzini [1978] and Muliere and Petrone
[1993]



Second model

[1993]
Increasing interest since MacEachern [1999,2000,2001]



Second model

[1993]
Increasing interest since MacEachern [1999,2000,2001]
Extensions with varying weights include, among others,
order-based DDP [Grifﬁn and Steel, 2006], local DP [Chung
and Dunson, 2009], weighted mixtures of DP [Dunson and
Park, 2008], and kernel stick-breaking processes [Dunson
et al., 2007].



Second model

Only interested in a dependence in the weights. We worked out
a dependent process prior with a simple structure of
dependence on the weights.



Second model


Yi (x) | G(x) ∼ G(x),
∞
G(x)(·) = pj (x)δj (·),
j=1
(pj (x))j ∼ DGEM(M),



Second model


Yi (x) | G(x) ∼ G(x),
∞
G(x)(·) = pj (x)δj (·), pj (x) = Vj (x) (1 − Vl (x)),
j=1 l<j

(pj (x))j ∼ DGEM(M), Vj (x) ∼ Beta(1, M).

where DGEM(M) stands for Dependent GEM distribution.



Second model


Yi (x) | G(x) ∼ G(x),
∞
G(x)(·) = pj (x)δj (·), pj (x) = Vj (x) (1 − Vl (x)),
j=1 l<j

(pj (x))j ∼ DGEM(M), Vj (x) ∼ Beta(1, M).

where DGEM(M) stands for Dependent GEM distribution.
Want a process for each j, (Vj (x))x , which is marginally
Beta(1, M).



Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨



¨

Γ(x1 )
V (x1 ) = Γ(x1 )+ΓM (x1 )



¨

α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3



¨

α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3

Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
ΓM (x1 ) = ΓM + ΓM + ΓM .
1 12 123



¨

α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3

Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ),
ΓM (x1 ) = ΓM + ΓM + ΓM .
1 12 123
ΓM
1
∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M).
123



¨

α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3

Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ),
ΓM (x1 ) = ΓM + ΓM + ΓM .
1 12 123
ΓM
1
∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M).
123

In the end:
pj (x) = Vj (x) l<j (1 − Vl (x)) ∼ DGEM(M).



Interesting features

This idea can be extended to large dimensional covariate
spaces:

α3
x3.
α123
α1 x1. x2.
α23
α12
α2

Easy to simulate in: only needs to simulate Gamma
random variables



Posterior sampling

There is independence across j, so it sufﬁces to be able to
simulate in each posterior:

π(Vj | Y ) ∝ π(V j )L(Y | V j ),
∝ π(V j ) Vj (x)#(Yn (x)=j) (1 − Vj (x))#(Yn (x)>j) .
x

Quite uncommon situation: we can sample in the prior
π(V j ), but we cannot evaluate it. Reverse situation to
Approximate Bayesian computation (ABC), where the
likelihood is intractable, but can be sampled.



A ﬁrst solution is to use a Metropolis-Hastings algorithm:
Metropolis-Hastings Algorithm
1 Given a current value V j , sample a new one V ∗
j
independently in the prior π(V j ).
2 Acceptance probability is

 L(Y |V ∗ ) 
 
j 
ρ = min 
 L(Y |V )  .
1,
 
 

j



A ﬁrst solution is to use a Metropolis-Hastings algorithm:
Metropolis-Hastings Algorithm
1 Given a current value V j , sample a new one V ∗
j
independently in the prior π(V j ).
2 Acceptance probability is

 L(Y |V ∗ ) 
 
j 
ρ = min 
 L(Y |V )  .
1,
 
 

j

But it is not a good idea to propose in the prior.
Acceptance rate is low (around 1%).



A better solution is to use Importance Sampling:
Importance Sampling
1 Sample iid values V j in the prior π(V j ).
2 Use a weighted sample by the importance weights deﬁned
by the likelihood w(V j ) = L(Y |V j ).



A better solution is to use Importance Sampling:
Importance Sampling
1 Sample iid values V j in the prior π(V j ).
2 Use a weighted sample by the importance weights deﬁned
by the likelihood w(V j ) = L(Y |V j ).

iid sample instead of a Markov chain
better precision by a Rao-Blackwellisation argument
(weights instead of accept-reject)



Results
40

40
Posterior diversity

Diversity in data
30

30
20

20
10

10
0 5000 10000 20000 0 5000 10000 20000

tph tph

Figure: Left: dependent DP prior: posterior mean of the Shannon
diversity by TPH; 95% centred credible intervals. Right: Shannon
diversity in row data.



Conclusion

Such a model allows to give probabilistic answers to
questions about diversity as we get a posterior sample.
The use of Gaussian processes transformed to Beta
processes by the inverse CDF might fastened the posterior
computations.
Extension to handle other covariates.


Arbel oviedo

Recommended

Recommended

More Related Content

Similar to Arbel oviedo

Similar to Arbel oviedo (15)

More from Julyan Arbel

More from Julyan Arbel (17)

Recently uploaded

Recently uploaded (11)

Arbel oviedo