83778-876O7, Cash On Delivery Call Girls In South- EX-(Delhi) Escorts Service...
Arbel oviedo
1. Dependent Dirichlet processes
and application to ecological data
Julyan Arbel
Joint work with Kerrie Mengersen & Judith Rousseau
´
CREST-INSEE, Universite Paris-Dauphine
2 December 2012
ERCIM 2012
5th International Conference on
Computing & Statistics
2. Biology question
Nonparametric model
Outline
1 Biology question
Introduction
Data
2 Nonparametric model
Dirichlet process
Dependent Dirichlet process
Julyan Arbel DDP and ecological data
3. Biology question Introduction
Nonparametric model Data
Outline
1 Biology question
Introduction
Data
2 Nonparametric model
Dirichlet process
Dependent Dirichlet process
Julyan Arbel DDP and ecological data
4. Biology question Introduction
Nonparametric model Data
Biology introduction
Series of measurements at
different places around
Casey Station, permanent
base in Antarctica
At each site: pollution
level, and abundance of
microbes called OTUs.
Assess the impact of a
pollutant on the soil
composition / biodiversity
Julyan Arbel DDP and ecological data
5. Biology question Introduction
Nonparametric model Data
Data
Data consist of measurements of microbes abundance:
Julyan Arbel DDP and ecological data
6. Biology question Introduction
Nonparametric model Data
Data
Data consist of measurements of microbes abundance:
Site TPH 06251 00576 00429 06360 08793 06259 05164 00772
Sample of abundance of 8 microbes (columns) at 6 sites
(rows)
Main covariate is a pollution level called TPH, denoted x
7. Biology question Introduction
Nonparametric model Data
Data
Data consist of measurements of microbes abundance:
Site TPH 06251 00576 00429 06360 08793 06259 05164 00772
1 80 3 724 88 1 0 0 0 467
2 80 9 2364 252 0 0 2 0 616
3 80 12 443 1655 11 0 0 0 168
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
. . . . . . . . . .
Sample of abundance of 8 microbes (columns) at 6 sites
(rows)
Main covariate is a pollution level called TPH, denoted x
8. Biology question Introduction
Nonparametric model Data
Data
Data consist of measurements of microbes abundance:
Site TPH 06251 00576 00429 06360 08793 06259 05164 00772
1 80 3 724 88 1 0 0 0 467
2 80 9 2364 252 0 0 2 0 616
3 80 12 443 1655 11 0 0 0 168
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
. . . . . . . . . .
13 2600 2262 339 229 1100 537 352 0 0
20 10000 1883 23 18 879 224 325 9 1
24 22000 1446 2 27 920 1808 1456 0 0
Sample of abundance of 8 microbes (columns) at 6 sites
(rows)
Main covariate is a pollution level called TPH, denoted x
Julyan Arbel DDP and ecological data
9. Biology question Introduction
Nonparametric model Data
Notations
Microbe species are denoted by j = 1, . . . by decreasing
total abundance
Julyan Arbel DDP and ecological data
10. Biology question Introduction
Nonparametric model Data
Notations
Microbe species are denoted by j = 1, . . . by decreasing
total abundance
At each site x, there are N(x) microbes, denoted Yi (x),
i = 1, . . . , N(x).
Julyan Arbel DDP and ecological data
11. Biology question Introduction
Nonparametric model Data
Notations
Microbe species are denoted by j = 1, . . . by decreasing
total abundance
At each site x, there are N(x) microbes, denoted Yi (x),
i = 1, . . . , N(x).
Data are a frequency matrix:
Site TPH 06251 00576 ...
j =1 j ...
1 x = 80 #(Yn (x = 80) = 1) = 3 ... ...
.
. .
. .
. .
. .
.
. . . . .
k x ... #(Yn (x) = j) ...
Julyan Arbel DDP and ecological data
12. Biology question Introduction
Nonparametric model Data
Notations
A standard example of diversity is Shannon diversity, taken as
the exponential of Shannon entropy
#(Yn (x)=j)
D(x) = exp j −pj (x) log pj (x) with pj (x) = N(x)
Julyan Arbel DDP and ecological data
13. Biology question Introduction
Nonparametric model Data
Notations
A standard example of diversity is Shannon diversity, taken as
the exponential of Shannon entropy
#(Yn (x)=j)
D(x) = exp j −pj (x) log pj (x) with pj (x) = N(x)
40
3.5
Shannon diversity
Shannon entropy
30
3.0
20
2.5
0 5000 10000 20000 10 0 5000 10000 20000
tph tph
Figure: Left: Shannon entropy in row data. Right: Shannon diversity
in row data.
Julyan Arbel DDP and ecological data
14. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Outline
1 Biology question
Introduction
Data
2 Nonparametric model
Dirichlet process
Dependent Dirichlet process
Julyan Arbel DDP and ecological data
15. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
First model
Pavlovian conditioning associated with the word species leads
to the Dirichlet process and/or related processes.
Julyan Arbel DDP and ecological data
16. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
First model
Pavlovian conditioning associated with the word species leads
to the Dirichlet process and/or related processes.
Yi (x) | G ∼ G,
First, we run an ∞
independent model at G(·) = pj δj (·),
each site with TPH x j=1
(pj )j ∼ GEM(M).
Julyan Arbel DDP and ecological data
17. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
First model
Pavlovian conditioning associated with the word species leads
to the Dirichlet process and/or related processes.
Yi (x) | G ∼ G,
First, we run an ∞
independent model at G(·) = pj δj (·),
each site with TPH x j=1
(pj )j ∼ GEM(M).
The GEM(M) distribution is defined in [Pitman, 2002] (GEM
stands for Griffiths, Engen and McCloskey) and represents the
distribution of the weights in a Dirichlet process:
pj = Vj (1 − Vl ), Vj ∼ Beta(1, M).
l<j
Julyan Arbel DDP and ecological data
18. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Posterior sampling
We use a blocked Gibbs sampler (truncated version of the
infinite sum)
Julyan Arbel DDP and ecological data
19. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Posterior sampling
We use a blocked Gibbs sampler (truncated version of the
infinite sum)
The prior on p is induced by the Beta prior on V ,
π⊥ (Vj ) = Be(1, M).
Julyan Arbel DDP and ecological data
20. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Posterior sampling
We use a blocked Gibbs sampler (truncated version of the
infinite sum)
The prior on p is induced by the Beta prior on V ,
π⊥ (Vj ) = Be(1, M).
This is conjugated, with a Beta posterior:
π(Vj |Y ) = Be(Vj |1 + #(Yn = j), M + #(Yn > j)).
Julyan Arbel DDP and ecological data
21. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
But we want to run a single model across TPH x ; it means a
predictor-dependent model
Julyan Arbel DDP and ecological data
22. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
But we want to run a single model across TPH x ; it means a
predictor-dependent model
Early references to predictor-dependent DP models include
Cifarelli and Regazzini [1978] and Muliere and Petrone
[1993]
Julyan Arbel DDP and ecological data
23. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
But we want to run a single model across TPH x ; it means a
predictor-dependent model
Early references to predictor-dependent DP models include
Cifarelli and Regazzini [1978] and Muliere and Petrone
[1993]
Increasing interest since MacEachern [1999,2000,2001]
Julyan Arbel DDP and ecological data
24. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
But we want to run a single model across TPH x ; it means a
predictor-dependent model
Early references to predictor-dependent DP models include
Cifarelli and Regazzini [1978] and Muliere and Petrone
[1993]
Increasing interest since MacEachern [1999,2000,2001]
Extensions with varying weights include, among others,
order-based DDP [Griffin and Steel, 2006], local DP [Chung
and Dunson, 2009], weighted mixtures of DP [Dunson and
Park, 2008], and kernel stick-breaking processes [Dunson
et al., 2007].
Julyan Arbel DDP and ecological data
25. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
Only interested in a dependence in the weights. We worked out
a dependent process prior with a simple structure of
dependence on the weights.
Julyan Arbel DDP and ecological data
26. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
Only interested in a dependence in the weights. We worked out
a dependent process prior with a simple structure of
dependence on the weights.
Yi (x) | G(x) ∼ G(x),
∞
G(x)(·) = pj (x)δj (·),
j=1
(pj (x))j ∼ DGEM(M),
Julyan Arbel DDP and ecological data
27. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
Only interested in a dependence in the weights. We worked out
a dependent process prior with a simple structure of
dependence on the weights.
Yi (x) | G(x) ∼ G(x),
∞
G(x)(·) = pj (x)δj (·), pj (x) = Vj (x) (1 − Vl (x)),
j=1 l<j
(pj (x))j ∼ DGEM(M), Vj (x) ∼ Beta(1, M).
where DGEM(M) stands for Dependent GEM distribution.
Julyan Arbel DDP and ecological data
28. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
Only interested in a dependence in the weights. We worked out
a dependent process prior with a simple structure of
dependence on the weights.
Yi (x) | G(x) ∼ G(x),
∞
G(x)(·) = pj (x)δj (·), pj (x) = Vj (x) (1 − Vl (x)),
j=1 l<j
(pj (x))j ∼ DGEM(M), Vj (x) ∼ Beta(1, M).
where DGEM(M) stands for Dependent GEM distribution.
Want a process for each j, (Vj (x))x , which is marginally
Beta(1, M).
Julyan Arbel DDP and ecological data
29. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
Julyan Arbel DDP and ecological data
30. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
Γ(x1 )
V (x1 ) = Γ(x1 )+ΓM (x1 )
Julyan Arbel DDP and ecological data
31. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3
Julyan Arbel DDP and ecological data
32. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3
Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
ΓM (x1 ) = ΓM + ΓM + ΓM .
1 12 123
Julyan Arbel DDP and ecological data
33. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3
Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ),
ΓM (x1 ) = ΓM + ΓM + ΓM .
1 12 123
ΓM
1
∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M).
123
Julyan Arbel DDP and ecological data
34. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3
Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ),
ΓM (x1 ) = ΓM + ΓM + ΓM .
1 12 123
ΓM
1
∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M).
123
In the end:
pj (x) = Vj (x) l<j (1 − Vl (x)) ∼ DGEM(M).
Julyan Arbel DDP and ecological data
35. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Interesting features
This idea can be extended to large dimensional covariate
spaces:
α3
x3.
α123
α1 x1. x2.
α23
α12
α2
Easy to simulate in: only needs to simulate Gamma
random variables
Julyan Arbel DDP and ecological data
36. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Posterior sampling
There is independence across j, so it suffices to be able to
simulate in each posterior:
π(Vj | Y ) ∝ π(V j )L(Y | V j ),
∝ π(V j ) Vj (x)#(Yn (x)=j) (1 − Vj (x))#(Yn (x)>j) .
x
Quite uncommon situation: we can sample in the prior
π(V j ), but we cannot evaluate it. Reverse situation to
Approximate Bayesian computation (ABC), where the
likelihood is intractable, but can be sampled.
Julyan Arbel DDP and ecological data
37. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
A first solution is to use a Metropolis-Hastings algorithm:
Metropolis-Hastings Algorithm
1 Given a current value V j , sample a new one V ∗
j
independently in the prior π(V j ).
2 Acceptance probability is
L(Y |V ∗ )
j
ρ = min
L(Y |V ) .
1,
j
Julyan Arbel DDP and ecological data
38. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
A first solution is to use a Metropolis-Hastings algorithm:
Metropolis-Hastings Algorithm
1 Given a current value V j , sample a new one V ∗
j
independently in the prior π(V j ).
2 Acceptance probability is
L(Y |V ∗ )
j
ρ = min
L(Y |V ) .
1,
j
But it is not a good idea to propose in the prior.
Acceptance rate is low (around 1%).
Julyan Arbel DDP and ecological data
39. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
A better solution is to use Importance Sampling:
Importance Sampling
1 Sample iid values V j in the prior π(V j ).
2 Use a weighted sample by the importance weights defined
by the likelihood w(V j ) = L(Y |V j ).
Julyan Arbel DDP and ecological data
40. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
A better solution is to use Importance Sampling:
Importance Sampling
1 Sample iid values V j in the prior π(V j ).
2 Use a weighted sample by the importance weights defined
by the likelihood w(V j ) = L(Y |V j ).
iid sample instead of a Markov chain
better precision by a Rao-Blackwellisation argument
(weights instead of accept-reject)
Julyan Arbel DDP and ecological data
41. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Results
40
40
Posterior diversity
Diversity in data
30
30
20
20
10
10
0 5000 10000 20000 0 5000 10000 20000
tph tph
Figure: Left: dependent DP prior: posterior mean of the Shannon
diversity by TPH; 95% centred credible intervals. Right: Shannon
diversity in row data.
Julyan Arbel DDP and ecological data
42. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Conclusion
Such a model allows to give probabilistic answers to
questions about diversity as we get a posterior sample.
The use of Gaussian processes transformed to Beta
processes by the inverse CDF might fastened the posterior
computations.
Extension to handle other covariates.
Julyan Arbel DDP and ecological data