SlideShare a Scribd company logo
Dependent Dirichlet processes
    and application to ecological data

                     Julyan Arbel
Joint work with Kerrie Mengersen & Judith Rousseau

                                ´
          CREST-INSEE, Universite Paris-Dauphine


                  2 December 2012
                    ERCIM 2012
          5th International Conference on
               Computing & Statistics
Biology question
                    Nonparametric model


Outline



  1   Biology question
         Introduction
         Data


  2   Nonparametric model
        Dirichlet process
        Dependent Dirichlet process




                            Julyan Arbel   DDP and ecological data
Biology question   Introduction
                    Nonparametric model    Data


Outline



  1   Biology question
         Introduction
         Data


  2   Nonparametric model
        Dirichlet process
        Dependent Dirichlet process




                            Julyan Arbel   DDP and ecological data
Biology question   Introduction
                    Nonparametric model    Data


Biology introduction



    Series of measurements at
    different places around
    Casey Station, permanent
    base in Antarctica
    At each site: pollution
    level, and abundance of
    microbes called OTUs.
    Assess the impact of a
    pollutant on the soil
    composition / biodiversity



                            Julyan Arbel   DDP and ecological data
Biology question   Introduction
                   Nonparametric model    Data


Data

       Data consist of measurements of microbes abundance:




                           Julyan Arbel   DDP and ecological data
Biology question   Introduction
                         Nonparametric model    Data


Data

         Data consist of measurements of microbes abundance:


  Site     TPH   06251     00576       00429    06360          08793   06259   05164   00772




         Sample of abundance of 8 microbes (columns) at 6 sites
         (rows)
         Main covariate is a pollution level called TPH, denoted x
Biology question   Introduction
                         Nonparametric model    Data


Data

         Data consist of measurements of microbes abundance:


  Site     TPH   06251     00576       00429    06360          08793   06259   05164   00772
   1        80     3        724          88       1              0       0       0      467
   2        80     9        2364        252       0              0       2       0      616
   3        80    12        443        1655      11              0       0       0      168
   .
   .         .
             .     .
                   .          .
                              .           .
                                          .       .
                                                  .              .
                                                                 .       .
                                                                         .       .
                                                                                 .       .
                                                                                         .
   .         .     .          .           .       .              .       .       .       .




         Sample of abundance of 8 microbes (columns) at 6 sites
         (rows)
         Main covariate is a pollution level called TPH, denoted x
Biology question   Introduction
                          Nonparametric model    Data


Data

         Data consist of measurements of microbes abundance:


  Site     TPH    06251     00576       00429    06360          08793   06259     05164   00772
   1        80      3        724          88       1              0       0         0      467
   2        80      9        2364        252       0              0       2         0      616
   3        80     12        443        1655      11              0       0         0      168
   .
   .         .
             .      .
                    .          .
                               .           .
                                           .       .
                                                   .              .
                                                                  .       .
                                                                          .         .
                                                                                    .       .
                                                                                            .
   .         .      .          .           .       .              .       .         .       .
   13      2600   2262        339        229     1100            537        352     0       0
   20     10000   1883         23         18      879            224        325     9       1
   24     22000   1446          2         27      920           1808       1456     0       0


         Sample of abundance of 8 microbes (columns) at 6 sites
         (rows)
         Main covariate is a pollution level called TPH, denoted x

                                  Julyan Arbel   DDP and ecological data
Biology question   Introduction
                  Nonparametric model    Data


Notations


     Microbe species are denoted by j = 1, . . . by decreasing
     total abundance




                          Julyan Arbel   DDP and ecological data
Biology question   Introduction
                  Nonparametric model    Data


Notations


     Microbe species are denoted by j = 1, . . . by decreasing
     total abundance
     At each site x, there are N(x) microbes, denoted Yi (x),
     i = 1, . . . , N(x).




                          Julyan Arbel   DDP and ecological data
Biology question    Introduction
                       Nonparametric model     Data


Notations


     Microbe species are denoted by j = 1, . . . by decreasing
     total abundance
     At each site x, there are N(x) microbes, denoted Yi (x),
     i = 1, . . . , N(x).
     Data are a frequency matrix:

         Site    TPH               06251                         00576        ...
                                    j =1                            j         ...
            1   x = 80     #(Yn (x = 80) = 1) = 3                  ...        ...
            .
            .      .
                   .                  .
                                      .                             .
                                                                    .          .
                                                                               .
            .      .                  .                             .          .
            k     x                      ...                  #(Yn (x) = j)   ...




                               Julyan Arbel    DDP and ecological data
Biology question   Introduction
                       Nonparametric model    Data


Notations
  A standard example of diversity is Shannon diversity, taken as
  the exponential of Shannon entropy
                                                                 #(Yn (x)=j)
  D(x) = exp    j   −pj (x) log pj (x) with pj (x) =               N(x)




                               Julyan Arbel   DDP and ecological data
Biology question                        Introduction
                                              Nonparametric model                         Data


Notations
  A standard example of diversity is Shannon diversity, taken as
  the exponential of Shannon entropy
                                                                                                              #(Yn (x)=j)
  D(x) = exp                          j   −pj (x) log pj (x) with pj (x) =                                      N(x)




                                                                                           40
                            3.5




                                                                      Shannon diversity
          Shannon entropy




                                                                                           30
                            3.0




                                                                                           20
                            2.5




                                  0       5000 10000       20000                           10   0    5000 10000     20000

                                                 tph                                                        tph




  Figure: Left: Shannon entropy in row data. Right: Shannon diversity
  in row data.

                                                       Julyan Arbel                       DDP and ecological data
Biology question   Dirichlet process
                    Nonparametric model    Dependent Dirichlet process


Outline



  1   Biology question
         Introduction
         Data


  2   Nonparametric model
        Dirichlet process
        Dependent Dirichlet process




                            Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


First model
  Pavlovian conditioning associated with the word species leads
  to the Dirichlet process and/or related processes.




                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


First model
  Pavlovian conditioning associated with the word species leads
  to the Dirichlet process and/or related processes.

                                                    Yi (x) | G ∼ G,
First, we run an                                           ∞
independent model at                         G(·) =             pj δj (·),
each site with TPH x                                      j=1
                                                (pj )j ∼ GEM(M).




                           Julyan Arbel   DDP and ecological data
Biology question     Dirichlet process
                      Nonparametric model      Dependent Dirichlet process


First model
  Pavlovian conditioning associated with the word species leads
  to the Dirichlet process and/or related processes.

                                                         Yi (x) | G ∼ G,
First, we run an                                                ∞
independent model at                              G(·) =             pj δj (·),
each site with TPH x                                           j=1
                                                     (pj )j ∼ GEM(M).
  The GEM(M) distribution is defined in [Pitman, 2002] (GEM
  stands for Griffiths, Engen and McCloskey) and represents the
  distribution of the weights in a Dirichlet process:


            pj = Vj          (1 − Vl ),       Vj ∼ Beta(1, M).
                       l<j

                               Julyan Arbel    DDP and ecological data
Biology question   Dirichlet process
                  Nonparametric model    Dependent Dirichlet process


Posterior sampling



     We use a blocked Gibbs sampler (truncated version of the
     infinite sum)




                          Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


Posterior sampling



     We use a blocked Gibbs sampler (truncated version of the
     infinite sum)
     The prior on p is induced by the Beta prior on V ,
     π⊥ (Vj ) = Be(1, M).




                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


Posterior sampling



     We use a blocked Gibbs sampler (truncated version of the
     infinite sum)
     The prior on p is induced by the Beta prior on V ,
     π⊥ (Vj ) = Be(1, M).
     This is conjugated, with a Beta posterior:

           π(Vj |Y ) = Be(Vj |1 + #(Yn = j), M + #(Yn > j)).




                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


Second model

  But we want to run a single model across TPH x ; it means a
  predictor-dependent model




                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


Second model

  But we want to run a single model across TPH x ; it means a
  predictor-dependent model
      Early references to predictor-dependent DP models include
      Cifarelli and Regazzini [1978] and Muliere and Petrone
      [1993]




                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


Second model

  But we want to run a single model across TPH x ; it means a
  predictor-dependent model
      Early references to predictor-dependent DP models include
      Cifarelli and Regazzini [1978] and Muliere and Petrone
      [1993]
      Increasing interest since MacEachern [1999,2000,2001]




                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


Second model

  But we want to run a single model across TPH x ; it means a
  predictor-dependent model
      Early references to predictor-dependent DP models include
      Cifarelli and Regazzini [1978] and Muliere and Petrone
      [1993]
      Increasing interest since MacEachern [1999,2000,2001]
      Extensions with varying weights include, among others,
      order-based DDP [Griffin and Steel, 2006], local DP [Chung
      and Dunson, 2009], weighted mixtures of DP [Dunson and
      Park, 2008], and kernel stick-breaking processes [Dunson
      et al., 2007].



                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


Second model

  Only interested in a dependence in the weights. We worked out
  a dependent process prior with a simple structure of
  dependence on the weights.




                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                          Nonparametric model    Dependent Dirichlet process


Second model

  Only interested in a dependence in the weights. We worked out
  a dependent process prior with a simple structure of
  dependence on the weights.

     Yi (x) | G(x) ∼ G(x),
              ∞
  G(x)(·) =         pj (x)δj (·),
              j=1
     (pj (x))j ∼ DGEM(M),




                                  Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                          Nonparametric model    Dependent Dirichlet process


Second model

  Only interested in a dependence in the weights. We worked out
  a dependent process prior with a simple structure of
  dependence on the weights.

     Yi (x) | G(x) ∼ G(x),
              ∞
  G(x)(·) =         pj (x)δj (·),                pj (x) = Vj (x)                 (1 − Vl (x)),
              j=1                                                          l<j

     (pj (x))j ∼ DGEM(M),                                        Vj (x) ∼ Beta(1, M).


  where DGEM(M) stands for Dependent GEM distribution.




                                  Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                          Nonparametric model    Dependent Dirichlet process


Second model

  Only interested in a dependence in the weights. We worked out
  a dependent process prior with a simple structure of
  dependence on the weights.

     Yi (x) | G(x) ∼ G(x),
              ∞
  G(x)(·) =         pj (x)δj (·),                pj (x) = Vj (x)                 (1 − Vl (x)),
              j=1                                                          l<j

     (pj (x))j ∼ DGEM(M),                                        Vj (x) ∼ Beta(1, M).


  where DGEM(M) stands for Dependent GEM distribution.
  Want a process for each j, (Vj (x))x , which is marginally
  Beta(1, M).

                                  Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process


Process on the beta breaks,Vj (x)
  Construction from Trippa, Muller and Johnson [2011].
                             ¨




                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                          Nonparametric model    Dependent Dirichlet process


Process on the beta breaks,Vj (x)
  Construction from Trippa, Muller and Johnson [2011].
                             ¨



                    Γ(x1 )
   V (x1 ) =   Γ(x1 )+ΓM (x1 )




                                  Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                          Nonparametric model    Dependent Dirichlet process


Process on the beta breaks,Vj (x)
  Construction from Trippa, Muller and Johnson [2011].
                             ¨


                                                                      α2
                    Γ(x1 )                               α1                        α3
   V (x1 ) =   Γ(x1 )+ΓM (x1 )                                 α12
                                                                             α23
                                                                          α123
                                                              x1     x2            x3




                                  Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                          Nonparametric model    Dependent Dirichlet process


Process on the beta breaks,Vj (x)
  Construction from Trippa, Muller and Johnson [2011].
                             ¨


                                                                      α2
                    Γ(x1 )                               α1                        α3
   V (x1 ) =   Γ(x1 )+ΓM (x1 )                                 α12
                                                                             α23
                                                                          α123
                                                              x1     x2            x3



   Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
  ΓM (x1 ) = ΓM + ΓM + ΓM .
              1    12   123




                                  Julyan Arbel   DDP and ecological data
Biology question     Dirichlet process
                          Nonparametric model      Dependent Dirichlet process


Process on the beta breaks,Vj (x)
  Construction from Trippa, Muller and Johnson [2011].
                             ¨


                                                                        α2
                    Γ(x1 )                                 α1                        α3
   V (x1 ) =   Γ(x1 )+ΓM (x1 )                                   α12
                                                                               α23
                                                                            α123
                                                                x1     x2            x3



   Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
                                                 Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ),
  ΓM (x1 ) = ΓM + ΓM + ΓM .
              1    12   123
                                             ΓM
                                              1
                                                  ∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M).
                                                                       123




                                  Julyan Arbel     DDP and ecological data
Biology question     Dirichlet process
                          Nonparametric model      Dependent Dirichlet process


Process on the beta breaks,Vj (x)
  Construction from Trippa, Muller and Johnson [2011].
                             ¨


                                                                        α2
                    Γ(x1 )                                 α1                        α3
   V (x1 ) =   Γ(x1 )+ΓM (x1 )                                   α12
                                                                               α23
                                                                            α123
                                                                x1     x2            x3



   Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
                                                 Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ),
  ΓM (x1 ) = ΓM + ΓM + ΓM .
              1    12   123
                                             ΓM
                                              1
                                                  ∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M).
                                                                       123

  In the end:
                pj (x) = Vj (x)         l<j (1   − Vl (x)) ∼ DGEM(M).

                                  Julyan Arbel     DDP and ecological data
Biology question         Dirichlet process
                   Nonparametric model          Dependent Dirichlet process


Interesting features

      This idea can be extended to large dimensional covariate
      spaces:


                                                               α3
                                                         x3.
                                            α123
                            α1    x1.             x2.
                                                         α23
                                          α12
                                                    α2



      Easy to simulate in: only needs to simulate Gamma
      random variables


                           Julyan Arbel         DDP and ecological data
Biology question   Dirichlet process
                     Nonparametric model    Dependent Dirichlet process


Posterior sampling


     There is independence across j, so it suffices to be able to
     simulate in each posterior:

    π(Vj | Y ) ∝ π(V j )L(Y | V j ),
               ∝ π(V j )          Vj (x)#(Yn (x)=j) (1 − Vj (x))#(Yn (x)>j) .
                              x


     Quite uncommon situation: we can sample in the prior
     π(V j ), but we cannot evaluate it. Reverse situation to
     Approximate Bayesian computation (ABC), where the
     likelihood is intractable, but can be sampled.



                             Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                  Nonparametric model    Dependent Dirichlet process




A first solution is to use a Metropolis-Hastings algorithm:
Metropolis-Hastings Algorithm
 1   Given a current value V j , sample a new one V ∗
                                                    j
     independently in the prior π(V j ).
 2   Acceptance probability is

                                   L(Y |V ∗ ) 
                                              
                                           j 
                          ρ = min 
                                   L(Y |V )  .
                                  1,
                                              
                                              
                                               
                                           j




                          Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                   Nonparametric model    Dependent Dirichlet process




A first solution is to use a Metropolis-Hastings algorithm:
Metropolis-Hastings Algorithm
 1   Given a current value V j , sample a new one V ∗
                                                    j
     independently in the prior π(V j ).
 2   Acceptance probability is

                                    L(Y |V ∗ ) 
                                               
                                            j 
                           ρ = min 
                                    L(Y |V )  .
                                   1,
                                               
                                               
                                                
                                            j


     But it is not a good idea to propose in the prior.
     Acceptance rate is low (around 1%).



                           Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                    Nonparametric model    Dependent Dirichlet process




A better solution is to use Importance Sampling:
Importance Sampling
 1   Sample iid values V j in the prior π(V j ).
 2   Use a weighted sample by the importance weights defined
     by the likelihood w(V j ) = L(Y |V j ).




                            Julyan Arbel   DDP and ecological data
Biology question   Dirichlet process
                    Nonparametric model    Dependent Dirichlet process




A better solution is to use Importance Sampling:
Importance Sampling
 1   Sample iid values V j in the prior π(V j ).
 2   Use a weighted sample by the importance weights defined
     by the likelihood w(V j ) = L(Y |V j ).

     iid sample instead of a Markov chain
     better precision by a Rao-Blackwellisation argument
     (weights instead of accept-reject)




                            Julyan Arbel   DDP and ecological data
Biology question                       Dirichlet process
                                          Nonparametric model                        Dependent Dirichlet process


Results
                        40




                                                                                        40
  Posterior diversity




                                                                 Diversity in data
                        30




                                                                                        30
                        20




                                                                                        20
                        10




                                                                                        10
                             0   5000 10000        20000                                     0     5000 10000        20000

                                        tph                                                                    tph



  Figure: Left: dependent DP prior: posterior mean of the Shannon
  diversity by TPH; 95% centred credible intervals. Right: Shannon
  diversity in row data.

                                                  Julyan Arbel                       DDP and ecological data
Biology question   Dirichlet process
                  Nonparametric model    Dependent Dirichlet process


Conclusion




     Such a model allows to give probabilistic answers to
     questions about diversity as we get a posterior sample.
     The use of Gaussian processes transformed to Beta
     processes by the inverse CDF might fastened the posterior
     computations.
     Extension to handle other covariates.




                          Julyan Arbel   DDP and ecological data

More Related Content

Similar to Arbel oviedo

Novel image fusion techniques using global and local kekre wavelet transforms
Novel image fusion techniques using global and local kekre wavelet transformsNovel image fusion techniques using global and local kekre wavelet transforms
Novel image fusion techniques using global and local kekre wavelet transforms
IAEME Publication
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
USC
 
image processing to detect worms
image processing to detect wormsimage processing to detect worms
image processing to detect worms
Synergy Vision
 
Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011
USC
 
Elizabeth Iorns - How Science Exchange promotes Open Science
Elizabeth Iorns - How Science Exchange promotes Open ScienceElizabeth Iorns - How Science Exchange promotes Open Science
Elizabeth Iorns - How Science Exchange promotes Open Science
Science Exchange
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networks
ESCOM
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
Giuseppe Rizzo
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
Prof. Wim Van Criekinge
 
Multimodal Image Processing in Cytology
Multimodal Image Processing in CytologyMultimodal Image Processing in Cytology
Multimodal Image Processing in Cytology
University of Zurich
 
Analysis Methods in Flow Cytometry
Analysis Methods in Flow CytometryAnalysis Methods in Flow Cytometry
Analysis Methods in Flow Cytometry
Nikolas Pontikos
 
Classification of squamous cell cervical cytology
Classification of squamous cell cervical cytologyClassification of squamous cell cervical cytology
Classification of squamous cell cervical cytology
karthigailakshmi
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Anonymous9etQKwW
 
main
mainmain
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware Classification
Silvio Cesare
 
MMseqs NGS 2014
MMseqs NGS 2014MMseqs NGS 2014
MMseqs NGS 2014
Martin Steinegger
 

Similar to Arbel oviedo (15)

Novel image fusion techniques using global and local kekre wavelet transforms
Novel image fusion techniques using global and local kekre wavelet transformsNovel image fusion techniques using global and local kekre wavelet transforms
Novel image fusion techniques using global and local kekre wavelet transforms
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
image processing to detect worms
image processing to detect wormsimage processing to detect worms
image processing to detect worms
 
Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011
 
Elizabeth Iorns - How Science Exchange promotes Open Science
Elizabeth Iorns - How Science Exchange promotes Open ScienceElizabeth Iorns - How Science Exchange promotes Open Science
Elizabeth Iorns - How Science Exchange promotes Open Science
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networks
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
Multimodal Image Processing in Cytology
Multimodal Image Processing in CytologyMultimodal Image Processing in Cytology
Multimodal Image Processing in Cytology
 
Analysis Methods in Flow Cytometry
Analysis Methods in Flow CytometryAnalysis Methods in Flow Cytometry
Analysis Methods in Flow Cytometry
 
Classification of squamous cell cervical cytology
Classification of squamous cell cervical cytologyClassification of squamous cell cervical cytology
Classification of squamous cell cervical cytology
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
 
main
mainmain
main
 
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware Classification
 
MMseqs NGS 2014
MMseqs NGS 2014MMseqs NGS 2014
MMseqs NGS 2014
 

More from Julyan Arbel

UCD_talk_nov_2020
UCD_talk_nov_2020UCD_talk_nov_2020
UCD_talk_nov_2020
Julyan Arbel
 
Bayesian neural networks increasingly sparsify their units with depth
Bayesian neural networks increasingly sparsify their units with depthBayesian neural networks increasingly sparsify their units with depth
Bayesian neural networks increasingly sparsify their units with depth
Julyan Arbel
 
Species sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsSpecies sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian Nonparametrics
Julyan Arbel
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
Julyan Arbel
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
Julyan Arbel
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Julyan Arbel
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
Julyan Arbel
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
Julyan Arbel
 
Lindley smith 1972
Lindley smith 1972Lindley smith 1972
Lindley smith 1972
Julyan Arbel
 
Berger 2000
Berger 2000Berger 2000
Berger 2000
Julyan Arbel
 
Seneta 1993
Seneta 1993Seneta 1993
Seneta 1993
Julyan Arbel
 
Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985
Julyan Arbel
 
Jefferys Berger 1992
Jefferys Berger 1992Jefferys Berger 1992
Jefferys Berger 1992
Julyan Arbel
 
R in latex
R in latexR in latex
R in latex
Julyan Arbel
 
Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)
Julyan Arbel
 
Bayesian adaptive optimal estimation using a sieve prior
Bayesian adaptive optimal estimation using a sieve priorBayesian adaptive optimal estimation using a sieve prior
Bayesian adaptive optimal estimation using a sieve prior
Julyan Arbel
 
Seminaire ihp
Seminaire ihpSeminaire ihp
Seminaire ihp
Julyan Arbel
 

More from Julyan Arbel (17)

UCD_talk_nov_2020
UCD_talk_nov_2020UCD_talk_nov_2020
UCD_talk_nov_2020
 
Bayesian neural networks increasingly sparsify their units with depth
Bayesian neural networks increasingly sparsify their units with depthBayesian neural networks increasingly sparsify their units with depth
Bayesian neural networks increasingly sparsify their units with depth
 
Species sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsSpecies sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian Nonparametrics
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Lindley smith 1972
Lindley smith 1972Lindley smith 1972
Lindley smith 1972
 
Berger 2000
Berger 2000Berger 2000
Berger 2000
 
Seneta 1993
Seneta 1993Seneta 1993
Seneta 1993
 
Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985Diaconis Ylvisaker 1985
Diaconis Ylvisaker 1985
 
Jefferys Berger 1992
Jefferys Berger 1992Jefferys Berger 1992
Jefferys Berger 1992
 
R in latex
R in latexR in latex
R in latex
 
Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)Poster DDP (BNP 2011 Veracruz)
Poster DDP (BNP 2011 Veracruz)
 
Bayesian adaptive optimal estimation using a sieve prior
Bayesian adaptive optimal estimation using a sieve priorBayesian adaptive optimal estimation using a sieve prior
Bayesian adaptive optimal estimation using a sieve prior
 
Seminaire ihp
Seminaire ihpSeminaire ihp
Seminaire ihp
 

Recently uploaded

MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANEMRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
DK PAGEANT
 
Self-Discipline: The Secret Weapon for Certain Victory
Self-Discipline: The Secret Weapon for Certain VictorySelf-Discipline: The Secret Weapon for Certain Victory
Self-Discipline: The Secret Weapon for Certain Victory
bluetroyvictorVinay
 
thrifthands-thrift store- get the latest trends
thrifthands-thrift store- get the latest trendsthrifthands-thrift store- get the latest trends
thrifthands-thrift store- get the latest trends
amarshifan555
 
Capsule Wardrobe Women: A document show
Capsule Wardrobe Women:  A document showCapsule Wardrobe Women:  A document show
Capsule Wardrobe Women: A document show
mustaphaadeyemi08
 
The Fascinating World of Bats: Unveiling the Secrets of the Night
The Fascinating World of Bats: Unveiling the Secrets of the NightThe Fascinating World of Bats: Unveiling the Secrets of the Night
The Fascinating World of Bats: Unveiling the Secrets of the Night
thomasard1122
 
Analysis and Assessment of Gateway Process – HemiSync(1).PDF
Analysis and Assessment of Gateway Process – HemiSync(1).PDFAnalysis and Assessment of Gateway Process – HemiSync(1).PDF
Analysis and Assessment of Gateway Process – HemiSync(1).PDF
JoshuaDagama1
 
Insanony: Watch Instagram Stories Secretly - A Complete Guide
Insanony: Watch Instagram Stories Secretly - A Complete GuideInsanony: Watch Instagram Stories Secretly - A Complete Guide
Insanony: Watch Instagram Stories Secretly - A Complete Guide
Trending Blogers
 
Care Instructions for Activewear & Swim Suits.pdf
Care Instructions for Activewear & Swim Suits.pdfCare Instructions for Activewear & Swim Suits.pdf
Care Instructions for Activewear & Swim Suits.pdf
sundazesurf80
 
Biography and career history of Bruno Amezcua
Biography and career history of Bruno AmezcuaBiography and career history of Bruno Amezcua
Biography and career history of Bruno Amezcua
Bruno Amezcua
 
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
lyurzi7r
 
Types of Garage Doors Explained: Energy Efficiency, Style, and More
Types of Garage Doors Explained: Energy Efficiency, Style, and MoreTypes of Garage Doors Explained: Energy Efficiency, Style, and More
Types of Garage Doors Explained: Energy Efficiency, Style, and More
Affordable Garage Door Repair
 

Recently uploaded (11)

MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANEMRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANE
 
Self-Discipline: The Secret Weapon for Certain Victory
Self-Discipline: The Secret Weapon for Certain VictorySelf-Discipline: The Secret Weapon for Certain Victory
Self-Discipline: The Secret Weapon for Certain Victory
 
thrifthands-thrift store- get the latest trends
thrifthands-thrift store- get the latest trendsthrifthands-thrift store- get the latest trends
thrifthands-thrift store- get the latest trends
 
Capsule Wardrobe Women: A document show
Capsule Wardrobe Women:  A document showCapsule Wardrobe Women:  A document show
Capsule Wardrobe Women: A document show
 
The Fascinating World of Bats: Unveiling the Secrets of the Night
The Fascinating World of Bats: Unveiling the Secrets of the NightThe Fascinating World of Bats: Unveiling the Secrets of the Night
The Fascinating World of Bats: Unveiling the Secrets of the Night
 
Analysis and Assessment of Gateway Process – HemiSync(1).PDF
Analysis and Assessment of Gateway Process – HemiSync(1).PDFAnalysis and Assessment of Gateway Process – HemiSync(1).PDF
Analysis and Assessment of Gateway Process – HemiSync(1).PDF
 
Insanony: Watch Instagram Stories Secretly - A Complete Guide
Insanony: Watch Instagram Stories Secretly - A Complete GuideInsanony: Watch Instagram Stories Secretly - A Complete Guide
Insanony: Watch Instagram Stories Secretly - A Complete Guide
 
Care Instructions for Activewear & Swim Suits.pdf
Care Instructions for Activewear & Swim Suits.pdfCare Instructions for Activewear & Swim Suits.pdf
Care Instructions for Activewear & Swim Suits.pdf
 
Biography and career history of Bruno Amezcua
Biography and career history of Bruno AmezcuaBiography and career history of Bruno Amezcua
Biography and career history of Bruno Amezcua
 
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
一比一原版(McGill毕业证书)麦吉尔大学毕业证如何办理
 
Types of Garage Doors Explained: Energy Efficiency, Style, and More
Types of Garage Doors Explained: Energy Efficiency, Style, and MoreTypes of Garage Doors Explained: Energy Efficiency, Style, and More
Types of Garage Doors Explained: Energy Efficiency, Style, and More
 

Arbel oviedo

  • 1. Dependent Dirichlet processes and application to ecological data Julyan Arbel Joint work with Kerrie Mengersen & Judith Rousseau ´ CREST-INSEE, Universite Paris-Dauphine 2 December 2012 ERCIM 2012 5th International Conference on Computing & Statistics
  • 2. Biology question Nonparametric model Outline 1 Biology question Introduction Data 2 Nonparametric model Dirichlet process Dependent Dirichlet process Julyan Arbel DDP and ecological data
  • 3. Biology question Introduction Nonparametric model Data Outline 1 Biology question Introduction Data 2 Nonparametric model Dirichlet process Dependent Dirichlet process Julyan Arbel DDP and ecological data
  • 4. Biology question Introduction Nonparametric model Data Biology introduction Series of measurements at different places around Casey Station, permanent base in Antarctica At each site: pollution level, and abundance of microbes called OTUs. Assess the impact of a pollutant on the soil composition / biodiversity Julyan Arbel DDP and ecological data
  • 5. Biology question Introduction Nonparametric model Data Data Data consist of measurements of microbes abundance: Julyan Arbel DDP and ecological data
  • 6. Biology question Introduction Nonparametric model Data Data Data consist of measurements of microbes abundance: Site TPH 06251 00576 00429 06360 08793 06259 05164 00772 Sample of abundance of 8 microbes (columns) at 6 sites (rows) Main covariate is a pollution level called TPH, denoted x
  • 7. Biology question Introduction Nonparametric model Data Data Data consist of measurements of microbes abundance: Site TPH 06251 00576 00429 06360 08793 06259 05164 00772 1 80 3 724 88 1 0 0 0 467 2 80 9 2364 252 0 0 2 0 616 3 80 12 443 1655 11 0 0 0 168 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sample of abundance of 8 microbes (columns) at 6 sites (rows) Main covariate is a pollution level called TPH, denoted x
  • 8. Biology question Introduction Nonparametric model Data Data Data consist of measurements of microbes abundance: Site TPH 06251 00576 00429 06360 08793 06259 05164 00772 1 80 3 724 88 1 0 0 0 467 2 80 9 2364 252 0 0 2 0 616 3 80 12 443 1655 11 0 0 0 168 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2600 2262 339 229 1100 537 352 0 0 20 10000 1883 23 18 879 224 325 9 1 24 22000 1446 2 27 920 1808 1456 0 0 Sample of abundance of 8 microbes (columns) at 6 sites (rows) Main covariate is a pollution level called TPH, denoted x Julyan Arbel DDP and ecological data
  • 9. Biology question Introduction Nonparametric model Data Notations Microbe species are denoted by j = 1, . . . by decreasing total abundance Julyan Arbel DDP and ecological data
  • 10. Biology question Introduction Nonparametric model Data Notations Microbe species are denoted by j = 1, . . . by decreasing total abundance At each site x, there are N(x) microbes, denoted Yi (x), i = 1, . . . , N(x). Julyan Arbel DDP and ecological data
  • 11. Biology question Introduction Nonparametric model Data Notations Microbe species are denoted by j = 1, . . . by decreasing total abundance At each site x, there are N(x) microbes, denoted Yi (x), i = 1, . . . , N(x). Data are a frequency matrix: Site TPH 06251 00576 ... j =1 j ... 1 x = 80 #(Yn (x = 80) = 1) = 3 ... ... . . . . . . . . . . . . . . . k x ... #(Yn (x) = j) ... Julyan Arbel DDP and ecological data
  • 12. Biology question Introduction Nonparametric model Data Notations A standard example of diversity is Shannon diversity, taken as the exponential of Shannon entropy #(Yn (x)=j) D(x) = exp j −pj (x) log pj (x) with pj (x) = N(x) Julyan Arbel DDP and ecological data
  • 13. Biology question Introduction Nonparametric model Data Notations A standard example of diversity is Shannon diversity, taken as the exponential of Shannon entropy #(Yn (x)=j) D(x) = exp j −pj (x) log pj (x) with pj (x) = N(x) 40 3.5 Shannon diversity Shannon entropy 30 3.0 20 2.5 0 5000 10000 20000 10 0 5000 10000 20000 tph tph Figure: Left: Shannon entropy in row data. Right: Shannon diversity in row data. Julyan Arbel DDP and ecological data
  • 14. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Outline 1 Biology question Introduction Data 2 Nonparametric model Dirichlet process Dependent Dirichlet process Julyan Arbel DDP and ecological data
  • 15. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process First model Pavlovian conditioning associated with the word species leads to the Dirichlet process and/or related processes. Julyan Arbel DDP and ecological data
  • 16. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process First model Pavlovian conditioning associated with the word species leads to the Dirichlet process and/or related processes. Yi (x) | G ∼ G, First, we run an ∞ independent model at G(·) = pj δj (·), each site with TPH x j=1 (pj )j ∼ GEM(M). Julyan Arbel DDP and ecological data
  • 17. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process First model Pavlovian conditioning associated with the word species leads to the Dirichlet process and/or related processes. Yi (x) | G ∼ G, First, we run an ∞ independent model at G(·) = pj δj (·), each site with TPH x j=1 (pj )j ∼ GEM(M). The GEM(M) distribution is defined in [Pitman, 2002] (GEM stands for Griffiths, Engen and McCloskey) and represents the distribution of the weights in a Dirichlet process: pj = Vj (1 − Vl ), Vj ∼ Beta(1, M). l<j Julyan Arbel DDP and ecological data
  • 18. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Posterior sampling We use a blocked Gibbs sampler (truncated version of the infinite sum) Julyan Arbel DDP and ecological data
  • 19. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Posterior sampling We use a blocked Gibbs sampler (truncated version of the infinite sum) The prior on p is induced by the Beta prior on V , π⊥ (Vj ) = Be(1, M). Julyan Arbel DDP and ecological data
  • 20. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Posterior sampling We use a blocked Gibbs sampler (truncated version of the infinite sum) The prior on p is induced by the Beta prior on V , π⊥ (Vj ) = Be(1, M). This is conjugated, with a Beta posterior: π(Vj |Y ) = Be(Vj |1 + #(Yn = j), M + #(Yn > j)). Julyan Arbel DDP and ecological data
  • 21. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Second model But we want to run a single model across TPH x ; it means a predictor-dependent model Julyan Arbel DDP and ecological data
  • 22. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Second model But we want to run a single model across TPH x ; it means a predictor-dependent model Early references to predictor-dependent DP models include Cifarelli and Regazzini [1978] and Muliere and Petrone [1993] Julyan Arbel DDP and ecological data
  • 23. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Second model But we want to run a single model across TPH x ; it means a predictor-dependent model Early references to predictor-dependent DP models include Cifarelli and Regazzini [1978] and Muliere and Petrone [1993] Increasing interest since MacEachern [1999,2000,2001] Julyan Arbel DDP and ecological data
  • 24. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Second model But we want to run a single model across TPH x ; it means a predictor-dependent model Early references to predictor-dependent DP models include Cifarelli and Regazzini [1978] and Muliere and Petrone [1993] Increasing interest since MacEachern [1999,2000,2001] Extensions with varying weights include, among others, order-based DDP [Griffin and Steel, 2006], local DP [Chung and Dunson, 2009], weighted mixtures of DP [Dunson and Park, 2008], and kernel stick-breaking processes [Dunson et al., 2007]. Julyan Arbel DDP and ecological data
  • 25. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Second model Only interested in a dependence in the weights. We worked out a dependent process prior with a simple structure of dependence on the weights. Julyan Arbel DDP and ecological data
  • 26. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Second model Only interested in a dependence in the weights. We worked out a dependent process prior with a simple structure of dependence on the weights. Yi (x) | G(x) ∼ G(x), ∞ G(x)(·) = pj (x)δj (·), j=1 (pj (x))j ∼ DGEM(M), Julyan Arbel DDP and ecological data
  • 27. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Second model Only interested in a dependence in the weights. We worked out a dependent process prior with a simple structure of dependence on the weights. Yi (x) | G(x) ∼ G(x), ∞ G(x)(·) = pj (x)δj (·), pj (x) = Vj (x) (1 − Vl (x)), j=1 l<j (pj (x))j ∼ DGEM(M), Vj (x) ∼ Beta(1, M). where DGEM(M) stands for Dependent GEM distribution. Julyan Arbel DDP and ecological data
  • 28. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Second model Only interested in a dependence in the weights. We worked out a dependent process prior with a simple structure of dependence on the weights. Yi (x) | G(x) ∼ G(x), ∞ G(x)(·) = pj (x)δj (·), pj (x) = Vj (x) (1 − Vl (x)), j=1 l<j (pj (x))j ∼ DGEM(M), Vj (x) ∼ Beta(1, M). where DGEM(M) stands for Dependent GEM distribution. Want a process for each j, (Vj (x))x , which is marginally Beta(1, M). Julyan Arbel DDP and ecological data
  • 29. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Process on the beta breaks,Vj (x) Construction from Trippa, Muller and Johnson [2011]. ¨ Julyan Arbel DDP and ecological data
  • 30. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Process on the beta breaks,Vj (x) Construction from Trippa, Muller and Johnson [2011]. ¨ Γ(x1 ) V (x1 ) = Γ(x1 )+ΓM (x1 ) Julyan Arbel DDP and ecological data
  • 31. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Process on the beta breaks,Vj (x) Construction from Trippa, Muller and Johnson [2011]. ¨ α2 Γ(x1 ) α1 α3 V (x1 ) = Γ(x1 )+ΓM (x1 ) α12 α23 α123 x1 x2 x3 Julyan Arbel DDP and ecological data
  • 32. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Process on the beta breaks,Vj (x) Construction from Trippa, Muller and Johnson [2011]. ¨ α2 Γ(x1 ) α1 α3 V (x1 ) = Γ(x1 )+ΓM (x1 ) α12 α23 α123 x1 x2 x3 Γ(x1 ) = Γ1 + Γ12 + Γ123 , ΓM (x1 ) = ΓM + ΓM + ΓM . 1 12 123 Julyan Arbel DDP and ecological data
  • 33. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Process on the beta breaks,Vj (x) Construction from Trippa, Muller and Johnson [2011]. ¨ α2 Γ(x1 ) α1 α3 V (x1 ) = Γ(x1 )+ΓM (x1 ) α12 α23 α123 x1 x2 x3 Γ(x1 ) = Γ1 + Γ12 + Γ123 , Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ), ΓM (x1 ) = ΓM + ΓM + ΓM . 1 12 123 ΓM 1 ∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M). 123 Julyan Arbel DDP and ecological data
  • 34. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Process on the beta breaks,Vj (x) Construction from Trippa, Muller and Johnson [2011]. ¨ α2 Γ(x1 ) α1 α3 V (x1 ) = Γ(x1 )+ΓM (x1 ) α12 α23 α123 x1 x2 x3 Γ(x1 ) = Γ1 + Γ12 + Γ123 , Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ), ΓM (x1 ) = ΓM + ΓM + ΓM . 1 12 123 ΓM 1 ∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M). 123 In the end: pj (x) = Vj (x) l<j (1 − Vl (x)) ∼ DGEM(M). Julyan Arbel DDP and ecological data
  • 35. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Interesting features This idea can be extended to large dimensional covariate spaces: α3 x3. α123 α1 x1. x2. α23 α12 α2 Easy to simulate in: only needs to simulate Gamma random variables Julyan Arbel DDP and ecological data
  • 36. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Posterior sampling There is independence across j, so it suffices to be able to simulate in each posterior: π(Vj | Y ) ∝ π(V j )L(Y | V j ), ∝ π(V j ) Vj (x)#(Yn (x)=j) (1 − Vj (x))#(Yn (x)>j) . x Quite uncommon situation: we can sample in the prior π(V j ), but we cannot evaluate it. Reverse situation to Approximate Bayesian computation (ABC), where the likelihood is intractable, but can be sampled. Julyan Arbel DDP and ecological data
  • 37. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process A first solution is to use a Metropolis-Hastings algorithm: Metropolis-Hastings Algorithm 1 Given a current value V j , sample a new one V ∗ j independently in the prior π(V j ). 2 Acceptance probability is  L(Y |V ∗ )    j  ρ = min   L(Y |V )  . 1,      j Julyan Arbel DDP and ecological data
  • 38. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process A first solution is to use a Metropolis-Hastings algorithm: Metropolis-Hastings Algorithm 1 Given a current value V j , sample a new one V ∗ j independently in the prior π(V j ). 2 Acceptance probability is  L(Y |V ∗ )    j  ρ = min   L(Y |V )  . 1,      j But it is not a good idea to propose in the prior. Acceptance rate is low (around 1%). Julyan Arbel DDP and ecological data
  • 39. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process A better solution is to use Importance Sampling: Importance Sampling 1 Sample iid values V j in the prior π(V j ). 2 Use a weighted sample by the importance weights defined by the likelihood w(V j ) = L(Y |V j ). Julyan Arbel DDP and ecological data
  • 40. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process A better solution is to use Importance Sampling: Importance Sampling 1 Sample iid values V j in the prior π(V j ). 2 Use a weighted sample by the importance weights defined by the likelihood w(V j ) = L(Y |V j ). iid sample instead of a Markov chain better precision by a Rao-Blackwellisation argument (weights instead of accept-reject) Julyan Arbel DDP and ecological data
  • 41. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Results 40 40 Posterior diversity Diversity in data 30 30 20 20 10 10 0 5000 10000 20000 0 5000 10000 20000 tph tph Figure: Left: dependent DP prior: posterior mean of the Shannon diversity by TPH; 95% centred credible intervals. Right: Shannon diversity in row data. Julyan Arbel DDP and ecological data
  • 42. Biology question Dirichlet process Nonparametric model Dependent Dirichlet process Conclusion Such a model allows to give probabilistic answers to questions about diversity as we get a posterior sample. The use of Gaussian processes transformed to Beta processes by the inverse CDF might fastened the posterior computations. Extension to handle other covariates. Julyan Arbel DDP and ecological data