Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Generalizing phylogenetics to infer shared evolutionary events
1. Generalizing
phylogenetics to infer
shared evolutionary
events
Jamie R. Oaks1
1Department of Biological Sciences,
Auburn University
November 6, 2017
c 2007 Boris Kulikov boris-kulikov.blogspot.com
Shared divergences Jamie Oaks – phyletica.org 1/33
3. Shared ancestry is a fundamental
property of life
Shared divergences Jamie Oaks – phyletica.org 2/33
4. Shared ancestry is a fundamental
property of life
Phylogenetics is rapidly
progressing as the statistical
foundation of comparatve biology
Shared divergences Jamie Oaks – phyletica.org 2/33
5. Shared ancestry is a fundamental
property of life
Phylogenetics is rapidly
progressing as the statistical
foundation of comparatve biology
“Big data” present exciting
possibilities and computational
challenges
Shared divergences Jamie Oaks – phyletica.org 2/33
6. Shared ancestry is a fundamental
property of life
Phylogenetics is rapidly
progressing as the statistical
foundation of comparatve biology
“Big data” present exciting
possibilities and computational
challenges
Exciting opportunities to develop
new ways to study biology in the
light of phylogeny
Shared divergences Jamie Oaks – phyletica.org 2/33
8. Assumption: All processes of
diversification affect each lineage
independently and only cause
bifurcating divergences.
Shared divergences Jamie Oaks – phyletica.org 3/33
15. Biogeography
Environmental changes that
affect whole communities of
species
Gene family evolution
Chromosomal duplications
Epidemiology
Disease spread via co-infected
individuals
Transmission at social
gatherings
Shared divergences Jamie Oaks – phyletica.org 5/33
16. Biogeography
Environmental changes that
affect whole communities of
species
Gene family evolution
Chromosomal duplications
Epidemiology
Disease spread via co-infected
individuals
Transmission at social
gatherings
Endosymbiont evolution (e.g.,
parasites, microbiome)
Speciation of the host
Co-colonization of new host
species
Shared divergences Jamie Oaks – phyletica.org 5/33
21. Why account for shared divergences?
1. Improve inference
2. Provide a framework for studying processes of co-diversification
Shared divergences Jamie Oaks – phyletica.org 7/33
22. Biogeography
Environmental changes that
affect whole communities of
species
Gene family evolution
Chromosomal duplications
Epidemiology
Disease spread via co-infected
individuals
Transmission at social
gatherings
Endosymbiont evolution (e.g.,
parasites, microbiome)
Speciation of the host
Co-colonization of new host
species
Shared divergences Jamie Oaks – phyletica.org 8/33
30. m1 m2 m3 m4 m5
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
31. m1 m2 m3 m4 m5
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
32. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
33. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
p(mi | X) ∝ p(X | mi )p(mi )
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
34. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
p(mi | X) ∝ p(X | mi )p(mi )
p(X | mi ) =
θ
p(X | θ, mi )p(θ | mi )dθ
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
35. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
p(mi | X) ∝ p(X | mi )p(mi )
p(X | mi ) =
θ
p(X | θ, mi )p(θ | mi )dθ
Divergence times
Gene trees
Substitution parameters
Demographic parameters
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
36. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
37. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but difficult
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
38. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but difficult
2. Sampling over all possible models
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
39. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but difficult
2. Sampling over all possible models
5 taxa = 52 models
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
40. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but difficult
2. Sampling over all possible models
5 taxa = 52 models
10 taxa = 115,975 models
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
41. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but difficult
2. Sampling over all possible models
5 taxa = 52 models
10 taxa = 115,975 models
20 taxa = 51,724,158,235,372 models!!
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/33
48. Approach #1
Challenges:
1. Likelihood is tractable, but difficult
2. Sampling over all possible models
Shared divergences Jamie Oaks – phyletica.org 14/33
49. Approach #1
Challenges:
1. Likelihood is tractable, but difficult
Use an existing method!
2. Sampling over all possible models
Use an existing method!
Shared divergences Jamie Oaks – phyletica.org 14/33
50. 0.00
0.25
0.50
0.75
1.00
1 2 3 4 5 6 7 8 9
Number of events
Probability
J. R. Oaks et al. (2013). Evolution 67: 991–1010
Shared divergences Jamie Oaks – phyletica.org 15/33
51. Approach #2
Challenges:
1. Likelihood is tractable, but difficult
2. Sampling over all possible models
Shared divergences Jamie Oaks – phyletica.org 16/33
52. Approach #2
Challenges:
1. Likelihood is tractable, but difficult
Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible models
Shared divergences Jamie Oaks – phyletica.org 16/33
53.
54. Approach #2
Challenges:
1. Likelihood is tractable, but difficult
Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible models
Shared divergences Jamie Oaks – phyletica.org 18/33
55. Approach #2
Challenges:
1. Likelihood is tractable, but difficult
Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible models
A “diffuse” Dirichlet process prior (DPP)
Shared divergences Jamie Oaks – phyletica.org 18/33
63. New method: dpp-msbayes
Approximate-likelihood Bayesian approach to inferring models of
shared divergences
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 20/33
64. New method: dpp-msbayes
Approximate-likelihood Bayesian approach to inferring models of
shared divergences
Flexible Dirichlet-process prior (DPP) over all possible divergence
models
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 20/33
65. New method: dpp-msbayes
Approximate-likelihood Bayesian approach to inferring models of
shared divergences
Flexible Dirichlet-process prior (DPP) over all possible divergence
models
Flexible priors on parameters to avoid strongly weighted posteriors
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 20/33
66. New method: dpp-msbayes
Approximate-likelihood Bayesian approach to inferring models of
shared divergences
Flexible Dirichlet-process prior (DPP) over all possible divergence
models
Flexible priors on parameters to avoid strongly weighted posteriors
Multi-processing to accommodate genomic datasets
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 20/33
67. 0.00
0.05
0.10
0.15
0.20
1 2 3 4 5 6 7 8 9
Number of events
Probability
J. R. Oaks et al. (2013). Evolution 67: 991–1010
Shared divergences Jamie Oaks – phyletica.org 21/33
69. Approach #3
Challenges:
1. Likelihood is tractable, but difficult
2. Sampling over all possible models
A “diffuse” Dirichlet process prior (DPP)
Shared divergences Jamie Oaks – phyletica.org 22/33
70. Approach #3
Challenges:
1. Likelihood is tractable, but difficult
Challenge accepted
2. Sampling over all possible models
A “diffuse” Dirichlet process prior (DPP)
Shared divergences Jamie Oaks – phyletica.org 22/33
71. Ecoevolity: Estimating evolutionary coevality
1
R. M. Neal (2000). Journal of Computational and Graphical Statistics 9: 249–265
2
D. Bryant et al. (2012). Molecular Biology and Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 23/33
72. Ecoevolity: Estimating evolutionary coevality
CTMC model of characters evolving along genealogies
Coalescent model of genealogies branching within populations
Dirichlet-process prior across divergence models
Gibbs sampling1 to numerically sample models
Analytically integrate over genealogies2
1
R. M. Neal (2000). Journal of Computational and Graphical Statistics 9: 249–265
2
D. Bryant et al. (2012). Molecular Biology and Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 23/33
73. Ecoevolity: Estimating evolutionary coevality
CTMC model of characters evolving along genealogies
Coalescent model of genealogies branching within populations
Dirichlet-process prior across divergence models
Gibbs sampling1 to numerically sample models
Analytically integrate over genealogies2
Goal: Fast, full-likelihood Bayesian method to infer patterns of
co-diversification from genome-scale data
1
R. M. Neal (2000). Journal of Computational and Graphical Statistics 9: 249–265
2
D. Bryant et al. (2012). Molecular Biology and Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 23/33
86. Our journey
0.00
0.25
0.50
0.75
1.00
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.00
0.05
0.10
0.15
0.20
1 2 3 4 5 6 7 8 9
Number of events
Probability
Shared divergences Jamie Oaks – phyletica.org 29/33
87. Our journey
0.00
0.25
0.50
0.75
1.00
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.00
0.05
0.10
0.15
0.20
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.0
0.2
0.4
0.6
1 2 3 4 5 6 7 8
Number of events
Probability
Shared divergences Jamie Oaks – phyletica.org 29/33
88. Our journey
0.00
0.25
0.50
0.75
1.00
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.00
0.05
0.10
0.15
0.20
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.0
0.2
0.4
0.6
1 2 3 4 5 6 7 8
Number of events
Probability
Conclusions?
Shared divergences Jamie Oaks – phyletica.org 29/33
89. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies τ1τ2
Shared divergences Jamie Oaks – phyletica.org 30/33
90. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies τ1τ2
Shared divergences Jamie Oaks – phyletica.org 30/33
91. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies
Generalize Bayesian
phylogenetics to incorporate
shared divergences
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 30/33
92. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies
Generalize Bayesian
phylogenetics to incorporate
shared divergences
Sample models numerically via
reversible-jump Markov chain
Monte Carlo
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 30/33
93. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies
Generalize Bayesian
phylogenetics to incorporate
shared divergences
Sample models numerically via
reversible-jump Markov chain
Monte Carlo
Benefits:
Improve phylogenetic inference
Framework for studying
processes of co-diversification
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 30/33