SlideShare a Scribd company logo
Model for Estimating Population Diversity as the
 Prediction of Sample needed for full Coverage
      with Applications in Bioinformatics

             Torres, David A., Pericchi, Luis R.
                Department of Mathematics
            University of Puerto Rico, Rio Piedras.
Abstract
  There exist several methods for estimating
community diversity using coverage (Bunge and
Fitzpatrick 1993).          The biologist and
environmental      scientist   challenge    the
statisticians in order to solve such problem.
Here we present an approach for the estimation
using coverage model (Good, I. G, 1953) and a
population estimator (Good, I. G. and G. H.
Toulmin, 1956). We apply the method to a data
given from microbial diversity presented in the
crop of the hoatzin by molecular analysis of
cloned 16S RNA genes.
Introduction
• Estimating the number of species in a community is
  a classical problem in Ecology, biogeography, and
  conservation biology, and parallel problems arise in
  many other disciplines. This research topic has been
  extensively discussed in the literature; see Bunge
  and Fitzpatrick (1993), Seber (1982, 1986, 1992) for a
  review of the historical and theoretical development.
• Ecologists and other biologists have long
  recognized that there are undiscovered species in
  almost every survey or species inventory. A parallel
  problem is tried to answer how many words did a
  particular author know. Efron, B., Twisted, R. (1975).
• A random sample is taken from a Community. We
  will refer to this sample as the basic sample.

• Our intention is calculate an estimator for coverage
  of the community using the information provided in
  the basic sample and then estimate the number of
  species in the community.

• Moreover, we pretend to describe a method that
  present an estimator of the number of additional
  data needed to get a total coverage of the
  community .

• An example will be presented in order to apply the
  theory.
Methods
• A random sample of size N is drawn from a
  community and let be the      n r
                                    numbers of
  distinct species represented exactly r times in
  the sample, then

                    ∞

                  ∑rn
                   i=1
                           i   =N
• We shall be concerned with, qr , the community
  frequency of an arbitrary species that is
  represented r times in the basic sample.
• Let, Ε(q ) , be the expected value of q . A main
          r                               r
  result used by Good (1953) is that
                            *                  (2)
                                       r
                             Ε (qr ) =
                                       N
   where         ( r + 1) nr + 1   .
           r =
            *
                       nr
• This can be generalized to give a higher
  moment of qr . As a matter of fact


                                m
                         r + m  nr + m
             Ε ( qr ) =                  (3)
                         N  nr

where
         r = 1,2,3; m = 1,2,3
and              t   .
        t =
         m
                ∏i
               i = m+1
• Recursively, we can rewrite (3) as
                                r + m −1
                     Ε (q ) ≈
                           m
                           r    ∏ Ε (q )
                                 i=r
                                                i        .

• Moreover, the variance of                qr   is approximately:


              (r + 1)(r + 2) nr + 2  (r + 1)nr + 1 
    V (qr ) =         2
                                   −               
                    N         nr  Nnr 
                                       ∞
• Note that,                               then we have that
                     nr ≤∑ =
                          rnr N
                                    i=
             ( r + 1) nr
                                     1

   r   *
           ≤
                 N
As an estimator of the expected total change of all species that

are each     r
             represented times
                               ( )
                                r ≤ 1in the basic sample is
                         ( r + 1) nr
                             N
Also the expected total chance of all species that are represented


times or more in the sample is approximately
                             ∞
                         1
                            ∑+1 ini
                         N i= r

In particular note that the expected total change in the sample is
                       1 ∞           Ε (n1 )
approximately
                         ∑2 knk = 1 − N
                       N k=                          (4)
• Hence, the total coverage of the sample (i.e.
the proportion of community represented in the
sample, which is the sum of the population
frequencies of the species represented) is
approximately.

                Ε (n1 )      n1         (5)
             1−         = 1−
                  N          N
The change that the next member of the community will
belong to a new species is estimated as, n1 .
                                        N
Lets write the total number of distinct species in the
sample as                ∞
                   d = ∑ nx
                        x =1
 and suppose that the total number of distinct species
in the community is a known finite number s. Then the
number of non-represented species in the sample is
given by   n .=0            s −       d
• Then let     pµ ( µ = 1, 2,3,) the
                               be

population frequencies of the species. As in

Good (1953), equation (10),
                                                         (6)
                s        N!         r
    Ε (nr ) = ∑                      p (1 − pµ ) N −r
                    r !( N − r ) !  µ
              µ =1 
                                    
                                    
Ε nr (λ =∑
 (     ))
            µ


             s



             s
              
              
              
               2
                    λN !
          µ=  r !( λ
            1
                               r
                     N −r ) ! 
                   λ !
                    N
                              
                                   µ
  For the population, we have similarly,

  assuming p ≤ 1 for all           .


                              pµ(1 − pµ)
                                       
                                          λN −r



                                                  pµ 
                                                        − ( λ 1)
                                                         N   −

         =∑!( λ − )! pµ(1 −pµ)
               r   N   r
                          r        N−r
                                        +
                                       
                                        1
                                             1− µ  p 
                                                       
          µ1=                          
           s
                   λ !
                    N                   ∞
                                                  −λ 1) N !
                                                    ( −
         =∑!( λ − )! pµ(1 −pµ)
          µ1 r
            =      N   r
                          r        N−r
                                       ∑( −λ 1) N − )! p
                                       i= i !
                                         0         ( −      i
           ∞
                   λ !
                    N        −λ 1) N !
                               ( −              s
         =∑!( λ − )! i !( −λ 1) N − )! ∑µ i (1 −pµ) N −
          i= r
             0     N   r     ( −         i   µ1 =
                                                    pr+       (




          ( λ ) ( − λ 1) N ) ( r + )!
                     ( −
                 r           i
               N                  1
         =               r+i
                                      Ε nr + )
                                        (     i
                        r !i ! N
                        i
                 ∞
                            ( r + )!
                                   i
          λ
         ≈ ∑ 1)
            (−
             r
                                      ( λ 1) Ε nr + )
                                         −
                                            i
                                              (    i
                 i=0           r !i !
• For the case r = 0, we not need to assume the value of s,
  since this assumption is not required to write
                        ∞
         d            ∑
         ˆ ( λ ) − d = ( − 1) i ( 1 − λ ) i n = s − n (λ )
                       i =1
                                             i       0
                                                             (8)

• We may be particularly interested in the coverage of the
  community, then using equation (5) and (7) with r=1 we
  have the expected coverage is approximately

         n1    1                                             (9)
      1 − ≈ 1 − [n1 − 2(λ − 1)n2 + 3(λ − 1) n3 − ]
                                           2

         N     N
• The expected number of distinct species
  represented is approximately


    d + ( λ − 1) n1 − ( λ − 1) n2 + 
                               2


• We use the coverage to estimate the value of
  and straightforward the population size needed
  to get 100% coverage. The equation (9) is the
  one that is called Good-Toulmin model by the
  fact that is a merge between the two models
  proposed by them.
Application

• The hoatzin is a South American leaf-eating bird and the
  its uniqueness lies in its particular foregut (crop), the only
  known for the avian class.
•    Forestomach compartmentalization allows mammal
  herbivores to be nourished on microbial fermentation
  products and microbial biomass. Bacteria are largely
  responsible for fermentation of dietary components, and
  bacterial cells are themselves subject to digestion by
  gastric lysozyme expressed in the abomasum of
  ruminants.
• The evolutionary pressure towards foregut specialization
  in herbivores was presumably exerted by indigestible
  plant polymers (cellulose), so that production of
  microbial biomass at expenses of these indigestible
  materials has clear advantages.

• In the hoatzin, a preliminary characterization of the crop
  microflora was done by culture (Domínguez-Bello et al.,
  1993). In this study we aim to characterize the bacterial
  diversity in the crop of the hoatzin by a molecular
  analysis of cloned 16S rRNA genes.
Results

• For the 69 O.T.U’s obtained, Good’s method left side of
  equation (9)) indicated a coverage of diversity of 77%



• This means that 100% diversity will correspond to 90
  O.T.U. Given that, applying the Good and Toulmin’s
  model (figure 2), we estimate a λ=1.5 which means that
  we need 98 (300-202) additional clones to obtain the 31
  O.T.U’s needed to cover 100% diversity.
Conclusions (Application)

• The estimate indicates 300 clones are needed to
  represent 100% of sample diversity 99% of the clones
  and 88% of OTU analyzed are unidentified species.



• Based on 202 sequences yielding 69 O.T.U, Good and
  Toulmin estimator indicates a coverage of 77% of the
  total diversity.
Future Research
• There are many models and procedure try to calculate
  coverage, instead of using the Good’s estimator of
  coverage it will be interesting try another approach.
  Perhaps, using Poisson process or an Multinomial
  approach it’s possible to get better estimators. Another
  approach could be the use of Bayesian inference in the
  assumption of a no known distribution in a Metropolis
  Hasting procedure.
• The importance of this type of problem is based on the
  experimental designs.
• Good stated once that “I don’t believe it is usually
  possible to estimate the number of unseen species …
  but only an approximate lower bound to that number.”.
  We will keep on the road.
Literature cited
•   Godoy Filipa1, Gao, Z. 2, Pei Z.2, Zhou M.2 ,Garcia-Amado,
    M.A.3,Pericchi, L.R. 4 ,Torres, D. 4 Michelangeli F.3, Blaser M.J 2 ,
    Domínguez-Bello, M.G.1High bacterial diversity in the forestomach of
    the Hoatzin is revealed by molecular analysis of 16S rRNA Genes.
    1Department of Biology, University of Puerto Rico, Rio Piedras, San Juan,
    PR 00931. 2 Departments of Medicine, Pathology and Microbiology, New
    York University School of Medicine, New York, NY 10016 3Venezuelan
    Institute of Scientific Research, CBB, Caracas, Venezuela. 4 Department of
    Mathematics University of Puerto Rico, Rio Piedras, San Juan, PR 00931.
•   Chao,A.,Lee,S.,1992. Estimating the Number of Classes via Sample
    Coverage. Journal of the American Statistical Association,87: 210-217.
•   Domínguez-Bello, M. G.M. Lovera, P. Suarez and F. Michelangeli, 1993,
    Microbial inhabitants in the crop of the hoatzin (Opisthocomus
    hoazin): the only foregut fermented avian. Physiol. Zool. 66: 374-383.
•   Good, I. G. and G. H. Toulmin, 1956. The number of new species and the
    increase in population coverage when the sample is increase.
    Biometrika 43: 45-63.
•   Good,I., 1953. The Population Frequencies of Species and the
    Estimation of Population Parameters. Biometrika,40: 237-264.

More Related Content

What's hot

100 things I know
100 things I know100 things I know
100 things I know
r-uribe
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuQuantum modes - Ion Cotaescu
Quantum modes - Ion Cotaescu
SEENET-MTP
 
Tele3113 wk1wed
Tele3113 wk1wedTele3113 wk1wed
Tele3113 wk1wed
Vin Voro
 
L. Alvarez-Gaume - Minimal Inflation
L. Alvarez-Gaume - Minimal InflationL. Alvarez-Gaume - Minimal Inflation
L. Alvarez-Gaume - Minimal Inflation
SEENET-MTP
 
Savage-Dickey paradox
Savage-Dickey paradoxSavage-Dickey paradox
Savage-Dickey paradox
Christian Robert
 
次数制限モデルにおける全てのCSPに対する最適な定数時間近似アルゴリズムと近似困難性
次数制限モデルにおける全てのCSPに対する最適な定数時間近似アルゴリズムと近似困難性次数制限モデルにおける全てのCSPに対する最適な定数時間近似アルゴリズムと近似困難性
次数制限モデルにおける全てのCSPに対する最適な定数時間近似アルゴリズムと近似困難性
Yuichi Yoshida
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Enthought, Inc.
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
PK Lehre
 
Image Processing 4
Image Processing 4Image Processing 4
Image Processing 4
jainatin
 
Ch7
Ch7Ch7
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
Jules Esp
 
Expressiveness and Model of the Polymorphic λ Calculus
Expressiveness and Model of the Polymorphic λ CalculusExpressiveness and Model of the Polymorphic λ Calculus
Expressiveness and Model of the Polymorphic λ Calculus
evastsdsh
 
Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2
Phong Vo
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexity
Frank Nielsen
 
Exchange confirm
Exchange confirmExchange confirm
Exchange confirm
NBER
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 

What's hot (18)

100 things I know
100 things I know100 things I know
100 things I know
 
Quantum modes - Ion Cotaescu
Quantum modes - Ion CotaescuQuantum modes - Ion Cotaescu
Quantum modes - Ion Cotaescu
 
Tele3113 wk1wed
Tele3113 wk1wedTele3113 wk1wed
Tele3113 wk1wed
 
L. Alvarez-Gaume - Minimal Inflation
L. Alvarez-Gaume - Minimal InflationL. Alvarez-Gaume - Minimal Inflation
L. Alvarez-Gaume - Minimal Inflation
 
Savage-Dickey paradox
Savage-Dickey paradoxSavage-Dickey paradox
Savage-Dickey paradox
 
次数制限モデルにおける全てのCSPに対する最適な定数時間近似アルゴリズムと近似困難性
次数制限モデルにおける全てのCSPに対する最適な定数時間近似アルゴリズムと近似困難性次数制限モデルにおける全てのCSPに対する最適な定数時間近似アルゴリズムと近似困難性
次数制限モデルにおける全てのCSPに対する最適な定数時間近似アルゴリズムと近似困難性
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
Image Processing 4
Image Processing 4Image Processing 4
Image Processing 4
 
Ch7
Ch7Ch7
Ch7
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
 
Expressiveness and Model of the Polymorphic λ Calculus
Expressiveness and Model of the Polymorphic λ CalculusExpressiveness and Model of the Polymorphic λ Calculus
Expressiveness and Model of the Polymorphic λ Calculus
 
Intro probability 2
Intro probability 2Intro probability 2
Intro probability 2
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexity
 
Exchange confirm
Exchange confirmExchange confirm
Exchange confirm
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 

Similar to Model For Estimating Diversity Presentation

S 7
S 7S 7
S 7
admin
 
Ysu conference presentation alaverdyan
Ysu conference  presentation alaverdyanYsu conference  presentation alaverdyan
Ysu conference presentation alaverdyan
Grigor Alaverdyan
 
Dsp U Lec08 Fir Filter Design
Dsp U   Lec08 Fir Filter DesignDsp U   Lec08 Fir Filter Design
Dsp U Lec08 Fir Filter Design
taha25
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
grssieee
 
A note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integersA note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integers
Lukas Nabergall
 
Formulas statistics
Formulas statisticsFormulas statistics
Formulas statistics
Prashi_Jain
 
Analisis Korespondensi
Analisis KorespondensiAnalisis Korespondensi
Analisis Korespondensi
dessybudiyanti
 
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR  SPARSE IMPULSE RESPONSE IDENTIFI...WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR  SPARSE IMPULSE RESPONSE IDENTIFI...
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...
bermudez_jcm
 
Hypergeometric Distribution
Hypergeometric DistributionHypergeometric Distribution
Hypergeometric Distribution
mathscontent
 
Hypergeometric Distribution
Hypergeometric DistributionHypergeometric Distribution
Hypergeometric Distribution
DataminingTools Inc
 
Mcgill3
Mcgill3Mcgill3
Tele3113 wk11tue
Tele3113 wk11tueTele3113 wk11tue
Tele3113 wk11tue
Vin Voro
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
Christian Robert
 
2 senarai rumus add maths k2 trial spm sbp 2010
2 senarai rumus add maths k2 trial spm sbp 20102 senarai rumus add maths k2 trial spm sbp 2010
2 senarai rumus add maths k2 trial spm sbp 2010
zabidah awang
 
2 senarai rumus add maths k1 trial spm sbp 2010
2 senarai rumus add maths k1 trial spm sbp 20102 senarai rumus add maths k1 trial spm sbp 2010
2 senarai rumus add maths k1 trial spm sbp 2010
zabidah awang
 
Color Img at Prisma Network meeting 2009
Color Img at Prisma Network meeting 2009Color Img at Prisma Network meeting 2009
Color Img at Prisma Network meeting 2009
Juan Luis Nieves
 
Game theory
Game theoryGame theory
Game theory
rik0
 
Dsp3
Dsp3Dsp3
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
Christian Robert
 
Ex 7 2_fsc_part1
Ex 7 2_fsc_part1Ex 7 2_fsc_part1
Ex 7 2_fsc_part1
naeemniazi3
 

Similar to Model For Estimating Diversity Presentation (20)

S 7
S 7S 7
S 7
 
Ysu conference presentation alaverdyan
Ysu conference  presentation alaverdyanYsu conference  presentation alaverdyan
Ysu conference presentation alaverdyan
 
Dsp U Lec08 Fir Filter Design
Dsp U   Lec08 Fir Filter DesignDsp U   Lec08 Fir Filter Design
Dsp U Lec08 Fir Filter Design
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
 
A note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integersA note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integers
 
Formulas statistics
Formulas statisticsFormulas statistics
Formulas statistics
 
Analisis Korespondensi
Analisis KorespondensiAnalisis Korespondensi
Analisis Korespondensi
 
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR  SPARSE IMPULSE RESPONSE IDENTIFI...WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR  SPARSE IMPULSE RESPONSE IDENTIFI...
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...
 
Hypergeometric Distribution
Hypergeometric DistributionHypergeometric Distribution
Hypergeometric Distribution
 
Hypergeometric Distribution
Hypergeometric DistributionHypergeometric Distribution
Hypergeometric Distribution
 
Mcgill3
Mcgill3Mcgill3
Mcgill3
 
Tele3113 wk11tue
Tele3113 wk11tueTele3113 wk11tue
Tele3113 wk11tue
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
2 senarai rumus add maths k2 trial spm sbp 2010
2 senarai rumus add maths k2 trial spm sbp 20102 senarai rumus add maths k2 trial spm sbp 2010
2 senarai rumus add maths k2 trial spm sbp 2010
 
2 senarai rumus add maths k1 trial spm sbp 2010
2 senarai rumus add maths k1 trial spm sbp 20102 senarai rumus add maths k1 trial spm sbp 2010
2 senarai rumus add maths k1 trial spm sbp 2010
 
Color Img at Prisma Network meeting 2009
Color Img at Prisma Network meeting 2009Color Img at Prisma Network meeting 2009
Color Img at Prisma Network meeting 2009
 
Game theory
Game theoryGame theory
Game theory
 
Dsp3
Dsp3Dsp3
Dsp3
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Ex 7 2_fsc_part1
Ex 7 2_fsc_part1Ex 7 2_fsc_part1
Ex 7 2_fsc_part1
 

Recently uploaded

How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Leena Ghag-Sakpal
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 

Recently uploaded (20)

How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 

Model For Estimating Diversity Presentation

  • 1. Model for Estimating Population Diversity as the Prediction of Sample needed for full Coverage with Applications in Bioinformatics Torres, David A., Pericchi, Luis R. Department of Mathematics University of Puerto Rico, Rio Piedras.
  • 2. Abstract There exist several methods for estimating community diversity using coverage (Bunge and Fitzpatrick 1993). The biologist and environmental scientist challenge the statisticians in order to solve such problem. Here we present an approach for the estimation using coverage model (Good, I. G, 1953) and a population estimator (Good, I. G. and G. H. Toulmin, 1956). We apply the method to a data given from microbial diversity presented in the crop of the hoatzin by molecular analysis of cloned 16S RNA genes.
  • 3. Introduction • Estimating the number of species in a community is a classical problem in Ecology, biogeography, and conservation biology, and parallel problems arise in many other disciplines. This research topic has been extensively discussed in the literature; see Bunge and Fitzpatrick (1993), Seber (1982, 1986, 1992) for a review of the historical and theoretical development. • Ecologists and other biologists have long recognized that there are undiscovered species in almost every survey or species inventory. A parallel problem is tried to answer how many words did a particular author know. Efron, B., Twisted, R. (1975).
  • 4. • A random sample is taken from a Community. We will refer to this sample as the basic sample. • Our intention is calculate an estimator for coverage of the community using the information provided in the basic sample and then estimate the number of species in the community. • Moreover, we pretend to describe a method that present an estimator of the number of additional data needed to get a total coverage of the community . • An example will be presented in order to apply the theory.
  • 5. Methods • A random sample of size N is drawn from a community and let be the n r numbers of distinct species represented exactly r times in the sample, then ∞ ∑rn i=1 i =N
  • 6. • We shall be concerned with, qr , the community frequency of an arbitrary species that is represented r times in the basic sample. • Let, Ε(q ) , be the expected value of q . A main r r result used by Good (1953) is that * (2) r Ε (qr ) = N where ( r + 1) nr + 1 . r = * nr
  • 7. • This can be generalized to give a higher moment of qr . As a matter of fact m  r + m  nr + m Ε ( qr ) =   (3)  N  nr where r = 1,2,3; m = 1,2,3 and t . t = m ∏i i = m+1
  • 8. • Recursively, we can rewrite (3) as r + m −1 Ε (q ) ≈ m r ∏ Ε (q ) i=r i . • Moreover, the variance of qr is approximately: (r + 1)(r + 2) nr + 2  (r + 1)nr + 1  V (qr ) = 2 −  N nr  Nnr  ∞ • Note that, then we have that nr ≤∑ = rnr N i= ( r + 1) nr 1 r * ≤ N
  • 9. As an estimator of the expected total change of all species that are each r represented times ( ) r ≤ 1in the basic sample is ( r + 1) nr N Also the expected total chance of all species that are represented times or more in the sample is approximately ∞ 1 ∑+1 ini N i= r In particular note that the expected total change in the sample is 1 ∞ Ε (n1 ) approximately ∑2 knk = 1 − N N k= (4)
  • 10. • Hence, the total coverage of the sample (i.e. the proportion of community represented in the sample, which is the sum of the population frequencies of the species represented) is approximately. Ε (n1 ) n1 (5) 1− = 1− N N
  • 11. The change that the next member of the community will belong to a new species is estimated as, n1 . N Lets write the total number of distinct species in the sample as ∞ d = ∑ nx x =1 and suppose that the total number of distinct species in the community is a known finite number s. Then the number of non-represented species in the sample is given by n .=0 s − d
  • 12. • Then let pµ ( µ = 1, 2,3,) the be population frequencies of the species. As in Good (1953), equation (10), (6) s  N!  r Ε (nr ) = ∑  p (1 − pµ ) N −r  r !( N − r ) !  µ µ =1   
  • 13. Ε nr (λ =∑ ( )) µ s s    2 λN ! µ=  r !( λ 1  r N −r ) !  λ ! N  µ For the population, we have similarly, assuming p ≤ 1 for all . pµ(1 − pµ)  λN −r pµ  − ( λ 1) N − =∑!( λ − )! pµ(1 −pµ) r N r r N−r  +  1 1− µ p   µ1=  s λ ! N ∞ −λ 1) N ! ( − =∑!( λ − )! pµ(1 −pµ) µ1 r = N r r N−r ∑( −λ 1) N − )! p i= i ! 0 ( − i ∞ λ ! N −λ 1) N ! ( − s =∑!( λ − )! i !( −λ 1) N − )! ∑µ i (1 −pµ) N − i= r 0 N r ( − i µ1 = pr+ ( ( λ ) ( − λ 1) N ) ( r + )! ( − r i N 1 = r+i Ε nr + ) ( i r !i ! N i ∞ ( r + )! i λ ≈ ∑ 1) (− r ( λ 1) Ε nr + ) − i ( i i=0 r !i !
  • 14. • For the case r = 0, we not need to assume the value of s, since this assumption is not required to write ∞ d ∑ ˆ ( λ ) − d = ( − 1) i ( 1 − λ ) i n = s − n (λ ) i =1 i 0 (8) • We may be particularly interested in the coverage of the community, then using equation (5) and (7) with r=1 we have the expected coverage is approximately n1 1 (9) 1 − ≈ 1 − [n1 − 2(λ − 1)n2 + 3(λ − 1) n3 − ] 2 N N
  • 15. • The expected number of distinct species represented is approximately d + ( λ − 1) n1 − ( λ − 1) n2 +  2 • We use the coverage to estimate the value of and straightforward the population size needed to get 100% coverage. The equation (9) is the one that is called Good-Toulmin model by the fact that is a merge between the two models proposed by them.
  • 16. Application • The hoatzin is a South American leaf-eating bird and the its uniqueness lies in its particular foregut (crop), the only known for the avian class. • Forestomach compartmentalization allows mammal herbivores to be nourished on microbial fermentation products and microbial biomass. Bacteria are largely responsible for fermentation of dietary components, and bacterial cells are themselves subject to digestion by gastric lysozyme expressed in the abomasum of ruminants.
  • 17. • The evolutionary pressure towards foregut specialization in herbivores was presumably exerted by indigestible plant polymers (cellulose), so that production of microbial biomass at expenses of these indigestible materials has clear advantages. • In the hoatzin, a preliminary characterization of the crop microflora was done by culture (Domínguez-Bello et al., 1993). In this study we aim to characterize the bacterial diversity in the crop of the hoatzin by a molecular analysis of cloned 16S rRNA genes.
  • 18. Results • For the 69 O.T.U’s obtained, Good’s method left side of equation (9)) indicated a coverage of diversity of 77% • This means that 100% diversity will correspond to 90 O.T.U. Given that, applying the Good and Toulmin’s model (figure 2), we estimate a λ=1.5 which means that we need 98 (300-202) additional clones to obtain the 31 O.T.U’s needed to cover 100% diversity.
  • 19. Conclusions (Application) • The estimate indicates 300 clones are needed to represent 100% of sample diversity 99% of the clones and 88% of OTU analyzed are unidentified species. • Based on 202 sequences yielding 69 O.T.U, Good and Toulmin estimator indicates a coverage of 77% of the total diversity.
  • 20.
  • 21.
  • 22. Future Research • There are many models and procedure try to calculate coverage, instead of using the Good’s estimator of coverage it will be interesting try another approach. Perhaps, using Poisson process or an Multinomial approach it’s possible to get better estimators. Another approach could be the use of Bayesian inference in the assumption of a no known distribution in a Metropolis Hasting procedure. • The importance of this type of problem is based on the experimental designs. • Good stated once that “I don’t believe it is usually possible to estimate the number of unseen species … but only an approximate lower bound to that number.”. We will keep on the road.
  • 23. Literature cited • Godoy Filipa1, Gao, Z. 2, Pei Z.2, Zhou M.2 ,Garcia-Amado, M.A.3,Pericchi, L.R. 4 ,Torres, D. 4 Michelangeli F.3, Blaser M.J 2 , Domínguez-Bello, M.G.1High bacterial diversity in the forestomach of the Hoatzin is revealed by molecular analysis of 16S rRNA Genes. 1Department of Biology, University of Puerto Rico, Rio Piedras, San Juan, PR 00931. 2 Departments of Medicine, Pathology and Microbiology, New York University School of Medicine, New York, NY 10016 3Venezuelan Institute of Scientific Research, CBB, Caracas, Venezuela. 4 Department of Mathematics University of Puerto Rico, Rio Piedras, San Juan, PR 00931. • Chao,A.,Lee,S.,1992. Estimating the Number of Classes via Sample Coverage. Journal of the American Statistical Association,87: 210-217. • Domínguez-Bello, M. G.M. Lovera, P. Suarez and F. Michelangeli, 1993, Microbial inhabitants in the crop of the hoatzin (Opisthocomus hoazin): the only foregut fermented avian. Physiol. Zool. 66: 374-383. • Good, I. G. and G. H. Toulmin, 1956. The number of new species and the increase in population coverage when the sample is increase. Biometrika 43: 45-63. • Good,I., 1953. The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika,40: 237-264.