B D T
E
Tracy Heath
Ecology, Evolution, & Organismal Biology
Iowa State University
@trayc7
http://phyloworks.org
2015 SSB Workshop
Ann Arbor, MI USA
O
Lecture & Demo: Bayesian Fundamentals
• Bayes theorem, priors and posteriors, MCMC
• Example: the birth-death process
• RevBayes Demonstration
break
Lecture: Bayesian Inference of Species Divergence Times
• Relaxed clock models – accounting for variation in
substitution rates among lineages
• Tree priors and fossil calibration
break
BEAST v2 Tutorial — dating species divergences under the
fossilized birth-death process
lunch
Course materials: http://phyloworks.org/resources/ssbws.html
P B
For each of the topics in the pre-course survey, most of you
feel that you “have some experience, but there’s much more
to learn”.
diversification models
relaxed−clock models
Bayesian inference
model−based phylo
probability theory
unfamiliar
I’veheardofit
familiar,butmoretolearn
prettycomfortable
expert
(29 participants responded to the survey)
B I
Estimate the probability of a hypothesis (model) conditional
on observed data.
The probability represents the researcher’s degree of belief.
Bayes’ Theorem specifies the conditional probability of the
hypothesis given the data.
B’ T
Bayesian Fundamentals
B’ T
Bayesian Fundamentals
B’ T
Bayesian Fundamentals
B’ T
Bayesian Fundamentals
B’ T
Bayesian Fundamentals
B’ T
The posterior probability of a discrete parameter δ
conditional on the data D is
Pr(δ | D) =
Pr(D | δ) Pr(δ)
δ Pr(D | δ) Pr(δ)
δ Pr(D | δ) Pr(δ) is the likelihood marginalized over all
possible values of δ.
Bayesian Fundamentals
B’ T
The posterior probability density a continuous parameter θ
conditional on the data D is
f(θ | D) =
f(D | θ)f(θ)
θ f(D | θ)f(θ)dθ
θ f(D | θ)f(θ)dθ is the likelihood marginalized over all
possible values of θ.
Bayesian Fundamentals
P
The the distribution of θ before any data are collected is
the prior
f(θ)
The prior describes your uncertainty in the parameters of
your model.
Bayesian Fundamentals
P
We may assume a gamma-prior distribution on θ with a
shape parameter α and a scale parameter β.
θ ∼ Gamma(α,β)
f(θ | α,β) = 1
Γ(α)βα θα−1e− θ
β
0 1 2 3
Density
θ
This requires us to assign values for α and β based on our
prior belief, or we can place hyperpriors on these
parameters if we are uncertain about their values.
Bayesian Fundamentals
E: T B-D P
A time machine allowed us to observe the dates of each
speciation event in the history of extant bears
0
10
20
30
Oligocene Miocene Plio. Plei.
Paleogene Neogene Quat.
33.24
27.9
15.9
13.1
11.6
9.95
2.8
Example
E: T B-D P
We assume that the
diversification of bears
matches a birth death
process with parameters:
λ = speciation rate
μ = extinction rate
0
10
20
30
Oligocene Miocene Plio. Plei.
Paleogene Neogene Quat.
33.24
27.9
15.9
13.1
11.6
9.95
2.8
Example
E: T B-D P
The birth-death process allows us to compute the probability
density of our observed time-tree (Ψ) conditional on any
value of speciation (λ) and any value of extinction (μ).
f(Ψ | λ,μ)
This model states that Ψ ∼ BD(λ,μ)
Example
E: T B-D P
Another way of expressing Ψ ∼ BD(λ,μ) is with a
probabilistic graphical model
Our “observed” time-tree is
conditioned on some
constant value of λ and μ.
Example
E: T B-D P
If our time machine also allowed us to know the rate of
speciation and extinction, we can easily calculate the
likelihood of our observed tree:
f(Ψ | λ,μ) = N!(λ−μ)λN−1 e−(λ−μ)x1
λ − μe−(λ−μ)x1
N−1
i=1
(λ − μ)2e−(λ−μ)x1
(λ − μe−(λ−μ)x1)2
f(Ψ | λ = 0.5,μ = 0.1) = 3.49215e−32
Example
RB D: T B-D P
RevBayes
Fully integrative Bayesian inference of
phylogenetic parameters using
probabilistic graphical models and an
interpreted language
http://RevBayes.com
Example
G M  RB
Graphical models provide tools for
visually & computationally representing
complex, parameter-rich probabilistic
models
We can depict the conditional
dependence structure of various
parameters and other random variables
Höhna, Heath, Boussau, Landis, Ronquist, Huelsenbeck. 2014.
Probabilistic Graphical Model Representation in Phylogenetics.
Systematic Biology. (doi: 10.1093/sysbio/syu039)
RB D: T B-D P
The Rev language specifying a birth-death model on the
bear phylogeny with fixed values for λ and μ.
Example
E: T B-D P
What if we do not know λ and μ?
We can use frequentist or Bayesian methods for estimating
their values.
Frequentist methods require us to find the values of λ and
μ that maximize f(Ψ | λ,μ).
Bayesian methods use prior distributions to describe our
uncertainty in λ and μ and estimate f(λ,μ | Ψ).
Example
E: T B-D P
We must define prior distributions for λ and μ to estimate
the posterior probability density
f(λ,μ | Ψ,y,z) =
f(Ψ | λ,μ)f(λ | y)f(μ | z)
λ μ f(Ψ | λ,μ)f(λ | y)f(μ | z)dλdμ
Now y and z are the parameters of the prior distributions
on λ and μ.
Example
E: T B-D P
We can choose exponential prior distributions for λ and μ.
Now y and z represent the rate parameters of the
exponential priors.
λ ∼ Exponential(y)
μ ∼ Exponential(z)
f(λ | y) = ye−yλ
f(μ | z) = ze−zμ
Example
RB D: T B-D P
The Rev language specifying a birth-death model on the
bear phylogeny with exponential priors on λ and μ.
Example
E: T B-D P
Now that we have a defined model, how do we estimate
the posterior probability density?
λ ∼ Exponential(y)
μ ∼ Exponential(z)
f(λ,μ | Ψ,y,z) =
f(Ψ | λ,μ)f(λ | y)f(μ | z)
λ μ f(Ψ | λ,μ)f(λ | y)f(μ | z)dλdμ
Example
M C M C (MCMC)
An algorithm for approximating the posterior distribution
Metropolis, Rosenblusth, Rosenbluth, Teller, Teller. 1953. Equations of state calculations by fast computing
machines. J. Chem. Phys.
Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika.
Bayesian Fundamentals
M C M C (MCMC)
More on MCMC from Paul Lewis—our esteemed SSB
President—and his lecture on Bayesian phylogenetics
Slides source: https://molevol.mbl.edu/index.php/Paul_Lewis
Bayesian Fundamentals
Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 42
MCMC robot’s rules
Uphill steps are
always accepted
Slightly downhill steps
are usually accepted
Drastic “off the cliff”
downhill steps are almost
never accepted
With these rules, it
is easy to see why the
robot tends to stay near
the tops of hills
Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 43
(Actual) MCMC robot rules
Uphill steps are
always accepted
because R > 1
Slightly downhill steps
are usually accepted
because R is near 1
Drastic “off the cliff”
downhill steps are almost
never accepted because
R is near 0
Currently at 1.0 m
Proposed at 2.3 m
R = 2.3/1.0 = 2.3
Currently at 6.2 m
Proposed at 5.7 m
R = 5.7/6.2 =0.92 Currently at 6.2 m
Proposed at 0.2 m
R = 0.2/6.2 = 0.03
6
8
4
2
0
10
The robot takes a step if it draws
a Uniform(0,1) random deviate
that is less than or equal to R
=
f(D| ⇤
)f( ⇤
)
f(D)
f(D| )f( )
f(D)
Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 44
Cancellation of marginal likelihood
When calculating the ratio R of posterior densities, the marginal
probability of the data cancels.
f( ⇤
|D)
f( |D)
Posterior
odds
=
f(D| ⇤
)f( ⇤
)
f(D| )f( )
Likelihood
ratio
Prior odds
Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 45
Target vs. Proposal Distributions
Pretend this proposal
distribution allows good
mixing. What does good
mixing mean?
default2.TXT
State
0 2500 5000 7500 10000 12500 15000 17500
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 46
Trace plots
“White noise”
appearance is a sign of
good mixing
I used the program Tracer to create this plot:
http://tree.bio.ed.ac.uk/software/tracer/
!AWTY (Are We There Yet?) is useful for
investigating convergence:
http://king2.scs.fsu.edu/CEBProjects/awty/
awty_start.php
log(posterior)
Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 47
Target vs. Proposal Distributions
Proposal distributions
with smaller variance...
Disadvantage: robot takes
smaller steps, more time
required to explore the
same area
Advantage: robot seldom
refuses to take proposed
steps
smallsteps.TXT
State
0 2500 5000 7500 10000 12500 15000 17500
-6
-5
-4
-3
-2
-1
0
Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 48
If step size is too
small, large-scale
trends will be
apparent
log(posterior)
Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 49
Target vs. Proposal Distributions
Proposal distributions
with larger variance...
Disadvantage: robot
often proposes a step
that would take it off
a cliff, and refuses to
move
Advantage: robot can
potentially cover a lot of
ground quickly
bigsteps2.TXT
State
0 2500 5000 7500 10000 12500 15000 17500
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 50
Chain is spending long periods of time
“stuck” in one place
“Stuck” robot is indicative of step sizes
that are too large (most proposed steps
would take the robot “off the cliff”)
log(posterior)
M C M C (MCMC)
Thanks, Paul!
Slides source: https://molevol.mbl.edu/index.php/Paul_Lewis
See MCMCRobot, a helpful
software program for learning
MCMC by Paul Lewis
http://www.mcmcrobot.org
Bayesian Fundamentals
RB D: T B-D P
The Rev language specifying the MCMC sampler for the
birth-death model.
Example
RB D: T B-D P
The trace-plot of the MCMC samples for speciation rate
speciation
State
0 5000 10000 15000 20000 25000 30000
0
2.5E-2
5E-2
7.5E-2
0.1
0.125
0.15
0.175
Example
RB D: T B-D P
Marginal posterior densities of the speciation rate and
extinction rate.
Density
Combined
extinction
speciation
0 5E-2 0.1 0.15 0.2 0.25
0
10
20
30
40
50
Example
E: T B-D P
Alas, we do not have a time machine and we do not know
lineage divergence times without error.
0
10
20
30
Oligocene Miocene Plio. Plei.
Paleogene Neogene Quat.
33.24
27.9
15.9
13.1
11.6
9.95
2.8
Example
D T E
Goal: Estimate the ages of interior nodes to understand the
timing and rates of evolutionary processes
Model how rates are
distributed across the tree
Describe the distribution of
speciation events over time
External calibration
information for estimates of
absolute node times
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
fossil pinnipeds
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Ursidae
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Pleis
Paleocene
stem fossil Ursidae
fossil canids
(Figure from Heath et al., PNAS 2014)
O
Lecture & Demo: Bayesian Fundamentals
• Bayes theorem, priors and posteriors, MCMC
• Example: the birth-death process
• RevBayes Demonstration
break
Lecture: Bayesian Inference of Species Divergence Times
• Relaxed clock models – accounting for variation in
substitution rates among lineages
• Tree priors and fossil calibration
break
BEAST v2 Tutorial — dating species divergences under the
fossilized birth-death process
lunch
Course materials: http://phyloworks.org/resources/ssbws.html
O
Lecture & Demo: Bayesian Fundamentals
• Bayes theorem, priors and posteriors, MCMC
• Example: the birth-death process
• RevBayes Demonstration
break
Lecture: Bayesian Inference of Species Divergence Times
• Relaxed clock models – accounting for variation in
substitution rates among lineages
• Tree priors and fossil calibration
break
BEAST v2 Tutorial — dating species divergences under the
fossilized birth-death process
lunch
Course materials: http://phyloworks.org/resources/ssbws.html
D T E
Goal: Estimate the ages of interior nodes to understand the
timing and rates of evolutionary processes
Model how rates are
distributed across the tree
Describe the distribution of
speciation events over time
External calibration
information for estimates of
absolute node times
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
fossil pinnipeds
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Ursidae
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Pleis
Paleocene
stem fossil Ursidae
fossil canids
(Figure from Heath et al., PNAS 2014)
A T-S  E
Phylogenetic trees can provide both topological information
and temporal information
100 0.020.040.060.080.0
Equus
Rhinoceros
Bos
Hippopotamus
Balaenoptera
Physeter
Ursus
Canis
Felis
Homo
Pan
Gorilla
Pongo
Macaca
Callithrix
Loris
Galago
Daubentonia
Varecia
Eulemur
Lemur
Hapalemur
Propithecus
Lepilemur
Mirza
M. murinus
M. griseorufus
M. myoxinus
M. berthae
M. rufus1
M. tavaratra
M. rufus2
M. sambiranensis
M. ravelobensis
Cheirogaleus
Simiiformes
Microcebus
Cretaceous Paleogene Neogene Q
Time (Millions of years)
Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003; Heath et al. MBE 2012)
T G M C
Assume that the rate of
evolutionary change is
constant over time
(branch lengths equal
percent sequence
divergence) 10%
400 My
200 My
A B C
20%
10%
10%
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
T G M C
We can date the tree if we
know the rate of change is
1% divergence per 10 My
A B C
20%
10%
10%
10%
200 My
400 My
200 My
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
T G M C
If we found a fossil of the
MRCA of B and C, we can
use it to calculate the rate
of change & date the root
of the tree
A B C
20%
10%
10%
10%
200 My
400 My
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
R  G M C
Rates of evolution vary across lineages and over time
Mutation rate:
Variation in
• metabolic rate
• generation time
• DNA repair
Fixation rate:
Variation in
• strength and targets of
selection
• population sizes
10%
400 My
200 My
A B C
20%
10%
10%
U A
Sequence data provide
information about branch
lengths
In units of the expected # of
substitutions per site
branch length rate × time
0.2 expected
substitutions/site
PhylogeneticRelationshipsSequence
Data
R  T
The sequence data
provide information
about branch length
for any possible rate,
there’s a time that fits
the branch length
perfectly
0
1
2
3
4
5
0 1 2 3 4 5
BranchRate
Branch Time
time = 0.8
rate = 0.625
branch length = 0.5
(based on Thorne & Kishino, 2005)
R  T
The expected # of substitutions/site occurring along a
branch is the product of the substitution rate and time
length = rate × time length = rate length = time
Methods for dating species divergences estimate the
substitution rate and time separately
B D T E
length = rate length = time
R (r,r,r,...,rN−)
A (a,a,a,...,aN−)
N number of tips
B D T E
length = rate length = time
R (r,r,r,...,rN−)
A (a,a,a,...,aN−)
N number of tips
B D T E
Posterior probability
f (R,A,θR,θA,θs | D,Ψ)
R Vector of rates on branches
A Vector of internal node ages
θR,θA,θs Model parameters
D Sequence data
Ψ Tree topology
B D T E
f(R,A,θR,θA,θs | D) =
f (D | R,A,θs) f(R | θR) f(A | θA) f(θs)
f(D)
f(D | R,A,θR,θA,θs) Likelihood
f(R | θR) Prior on rates
f(A | θA) Prior on node ages
f(θs) Prior on substitution parameters
f(D) Marginal probability of the data
B D T E
Estimating divergence times relies on 2 main elements:
• Branch-specific rates: f (R | θR)
• Node ages: f (A | θA,C)
M R V
Some models describing lineage-specific substitution rate
variation:
• Global molecular clock (Zuckerkandl & Pauling, 1962)
• Local molecular clocks (Hasegawa, Kishino & Yano 1989;
Kishino & Hasegawa 1990; Yoder & Yang 2000; Yang & Yoder
2003, Drummond and Suchard 2010)
• Punctuated rate change model (Huelsenbeck, Larget and
Swofford 2000)
• Log-normally distributed autocorrelated rates (Thorne,
Kishino & Painter 1998; Kishino, Thorne & Bruno 2001; Thorne &
Kishino 2002)
• Uncorrelated/independent rates models (Drummond et al.
2006; Rannala & Yang 2007; Lepage et al. 2007)
• Mixture models on branch rates (Heath, Holder, Huelsenbeck
2012)
Models of Lineage-specific Rate Variation
G M C
The substitution rate is
constant over time
All lineages share the same
rate
branch length = substitution rate
low high
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
R-C M
To accommodate variation in substitution rates
‘relaxed-clock’ models estimate lineage-specific substitution
rates
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Mixture models on branch rates
L M C
Rate shifts occur
infrequently over the tree
Closely related lineages
have equivalent rates
(clustered by sub-clades)
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
L M C
Most methods for
estimating local clocks
required specifying the
number and locations of
rate changes a priori
Drummond and Suchard
(2010) introduced a
Bayesian method that
samples over a broad range
of possible random local
clocks
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
A R
Substitution rates evolve
gradually over time –
closely related lineages have
similar rates
The rate at a node is
drawn from a lognormal
distribution with a mean
equal to the parent rate
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)
P R C
Rate changes occur along
lineages according to a
point process
At rate-change events, the
new rate is a product of
the parent’s rate and a
Γ-distributed multiplier
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Huelsenbeck, Larget and Swofford 2000)
I/U R
Lineage-specific rates are
uncorrelated when the rate
assigned to each branch is
independently drawn from
an underlying distribution
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Drummond et al. 2006)
I M M
Dirichlet process prior:
Branches are partitioned
into distinct rate categories
Random variables under the
DPP informed by the data:
• the number of rate
classes
• the assignment of
branches to classes
• the rate value for each
class
branch length = substitution rate
c5
c4
c3
c2
substitution rate classes
c1
Models of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE)
M R V
These are only a subset of the available models for
branch-rate variation
• Global molecular clock
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Dirchlet process prior
Models of Lineage-specific Rate Variation
M R V
Are our models appropriate across all data sets?
cave bear
American
black bear
sloth bear
Asian
black bear
brown bear
polar bear
American giant
short-faced bear
giant panda
sun bear
harbor seal
spectacled
bear
4.08
5.39
5.66
12.86
2.75
5.05
19.09
35.7
0.88
4.58
[3.11–5.27]
[4.26–7.34]
[9.77–16.58]
[3.9–6.48]
[0.66–1.17]
[4.2–6.86]
[2.1–3.57]
[14.38–24.79]
[3.51–5.89]
14.32
[9.77–16.58]
95% CI
mean age (Ma)
t2
t3
t4
t6
t7
t5
t8
t9
t10
tx
node
MP•MLu•MLp•Bayesian
100•100•100•1.00
100•100•100•1.00
85•93•93•1.00
76•94•97•1.00
99•97•94•1.00
100•100•100•1.00
100•100•100•1.00
100•100•100•1.00
t1
Eocene Oligocene Miocene Plio Plei Hol
34 5.3 1.823.8 0.01
Epochs
Ma
Global expansion of C4 biomass
Major temperature drop and increasing seasonality
Faunal turnover
Krause et al., 2008. Mitochondrial genomes reveal an
explosive radiation of extinct and extant bears near the
Miocene-Pliocene boundary. BMC Evol. Biol. 8.
Taxa
1
5
10
50
100
500
1000
5000
10000
20000
0100200300
MYA
Ophidiiformes
Percomorpha
Beryciformes
Lampriformes
Zeiforms
Polymixiiformes
Percopsif. + Gadiif.
Aulopiformes
Myctophiformes
Argentiniformes
Stomiiformes
Osmeriformes
Galaxiiformes
Salmoniformes
Esociformes
Characiformes
Siluriformes
Gymnotiformes
Cypriniformes
Gonorynchiformes
Denticipidae
Clupeomorpha
Osteoglossomorpha
Elopomorpha
Holostei
Chondrostei
Polypteriformes
Clade r ε ΔAIC
1. 0.041 0.0017 25.3
2. 0.081 * 25.5
3. 0.067 0.37 45.1
4. 0 * 3.1
Bg. 0.011 0.0011
OstariophysiAcanthomorpha
Teleostei
Santini et al., 2009. Did genome duplication drive the origin
of teleosts? A comparative study of diversification in
ray-finned fishes. BMC Evol. Biol. 9.
M R V
These are only a subset of the available models for
branch-rate variation
• Global molecular clock
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Dirchlet process prior
Model selection and model uncertainty are very important
for Bayesian divergence time analysis
Models of Lineage-specific Rate Variation
B D T E
Estimating divergence times relies on 2 main elements:
• Branch-specific rates: f (R | θR)
• Node ages: f (A | θA,C)
http://bayesiancook.blogspot.com/2013/12/two-sides-of-same-coin.html
P  N T
Relaxed clock Bayesian analyses require a prior distribution
on node times
f(A | θA)
Different node-age priors make different assumptions about
the timing of divergence events
Node Age Priors
G N T P
Assumed to be vague or uninformative by not making
assumptions about biological processes
Uniform prior: the time at
a given node has equal
probability across the
interval between the time
of the parent node and the
time of the oldest daughter
node
(conditioned on root age)
Node Age Priors
S B P
Node-age priors based on stochastic models of lineage
diversification
Yule process: assumes a
constant rate of speciation,
across lineages
A pure birth process—every
node leaves extant
descendants (no extinction)
Node Age Priors
S B P
Node-age priors based on stochastic models of lineage
diversification
Constant-rate birth-death
process: at any point in
time a lineage can speciate
at rate λ or go extinct with
a rate of μ
Node Age Priors
S B P
Node-age priors based on stochastic models of lineage
diversification
Constant-rate birth-death
process: at any point in
time a lineage can speciate
at rate λ or go extinct with
a rate of μ
Node Age Priors
S B P
Node-age priors based on stochastic models of lineage
diversification
Constant-rate birth-death
process: at any point in
time a lineage can speciate
at rate λ or go extinct with
a rate of μ
Node Age Priors
S B P
Different values of λ and μ lead
to different trees
Bayesian inference under these
models can be very sensitive to
the values of these parameters
Using hyperpriors on λ and μ
accounts for uncertainty in these
hyperparameters
Node Age Priors
S B P
Node-age priors based on stochastic models of lineage
diversification
Birth-death-sampling
process: an extension of
the constant-rate birth-death
model that accounts for
random sampling of tips
Conditions on a probability
of sampling a tip, ρ
Node Age Priors
P  N T
Sequence data are only informative on relative rates & times
Node-time priors cannot give precise estimates of absolute
node ages
We need external information (like fossils) to calibrate or
scale the tree to absolute time
Node Age Priors
C D T
Fossils (or other data) are necessary to estimate absolute
node ages
There is no information in
the sequence data for
absolute time
Uncertainty in the
placement of fossils
A B C
20%
10%
10%
10%
200 My
400 My
C D
Bayesian inference is well suited to accommodating
uncertainty in the age of the calibration node
Divergence times are
calibrated by placing
parametric densities on
internal nodes offset by age
estimates from the fossil
record
A B C
200 My
Density
Age
A F  C
Misplaced fossils can affect node age estimates throughout
the tree – if the fossil is older than its presumed MRCA
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
A F  C
Crown clade: all
living species and
their most-recent
common ancestor
(MRCA)
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
A F  C
Stem lineages:
purely fossil forms
that are closer to
their descendant
crown clade than
any other crown
clade
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
A F  C
Fossiliferous
horizons: the
sources in the
rock record for
relevant fossils
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
F C
Age estimates from fossils
can provide minimum time
constraints for internal
nodes
Reliable maximum bounds
are typically unavailable
Minimum age
Time (My)
Calibrating Divergence Times
P D  C N
Common practice in Bayesian divergence-time estimation:
Parametric distributions are
typically off-set by the age
of the oldest fossil assigned
to a clade
These prior densities do not
(necessarily) require
specification of maximum
bounds
Uniform (min, max)
Exponential (λ)
Gamma (α, β)
Log Normal (µ, σ2
)
Time (My)Minimum age
Calibrating Divergence Times
P D  C N
Calibration densities describe
the waiting time between
the divergence event and
the age of the oldest fossil
Minimum age
Exponential (λ)
Time (My)
Calibrating Divergence Times
P D  C N
Common practice in Bayesian divergence-time estimation:
Estimates of absolute node
ages are driven primarily by
the calibration density
Specifying appropriate
densities is a challenge for
most molecular biologists
Uniform (min, max)
Exponential (λ)
Gamma (α, β)
Log Normal (µ, σ2
)
Time (My)Minimum age
Calibration Density Approach
I F C
We would prefer to
eliminate the need for
ad hoc calibration
prior densities
Calibration densities
do not account for
diversification of fossils
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
We want to use all
of the available fossils
Example: Bears
12 fossils are reduced
to 4 calibration ages
with calibration density
methods
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
We want to use all
of the available fossils
Example: Bears
12 fossils are reduced
to 4 calibration ages
with calibration density
methods
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
Because fossils are
part of the
diversification process,
we can combine fossil
calibration with
birth-death models
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
This relies on a
branching model that
accounts for
speciation, extinction,
and rates of
fossilization,
preservation, and
recovery
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
T F B-D P (FBD)
Improving statistical inference of absolute node ages
Eliminates the need to specify arbitrary
calibration densities
Better capture our statistical
uncertainty in species divergence dates
All reliable fossils associated with a
clade are used
Useful for calibration or ‘total-evidence’
dating
150 100 50 0
Time
(Heath, Huelsenbeck, Stadler. 2014 PNAS)
T F B-D P (FBD)
Recovered fossil specimens
provide historical
observations of the
diversification process that
generated the tree of
extant species
150 100 50 0
Time
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The probability of the tree
and fossil observations
under a birth-death model
with rate parameters:
λ = speciation
μ = extinction
ψ = fossilization/recovery
150 100 50 0
Time
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The probability of the tree
and fossil observations
under a birth-death model
with rate parameters:
λ = speciation
μ = extinction
ψ = fossilization/recovery
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
We use MCMC to sample
realizations of the
diversification process,
integrating over the
topology—including
placement of the
fossils—and speciation times
0250 50100150200
Time (My)
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
I FBD T
Extensions of the fossilized birth-death process accommodate
variation in fossil sampling, non-random species sampling, &
shifts in diversification rates.
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
Lower
Middle
Upper
Lower
Upper
Paleocene
Eocene
Oligocene
Miocene
Pliocene
Pleistocen
Jurassic Cretaceous Paleogene Neogene Q.
With character data for both fossil & extant species, we
account for uncertainty in fossil placement
S B-D P
A piecewise shifting model
where parameters change
over time
Used to estimate
epidemiological parameters
of an outbreak
0175 255075100125150
Days
(see Stadler et al. PNAS 2013 and Stadler et al. PLoS Currents Outbreaks 2014)
S B-D P
l is the number of
parameter intervals
Ri is the effective
reproductive number
for interval i ∈ l
δ is the rate of
becoming
non-infectious
s is the probability of
sampling an individual
after becoming
non-infectious
S B-D P
l is the number of
parameter intervals
λi is the transmission
rate for interval i ∈ l
μ is the viral lineage
death rate
ψ is the rate each
individual is sampled
S B-D P
S B-D P
A decline in R over the
history of HIV-1 in the UK
is consistent with the
introduction of effective
drug therapies
After 1998 R decreased
below 1, indicating a
declining epidemic
(Stadler et al. PNAS 2013)
O
Lecture & Demo: Bayesian Fundamentals
• Bayes theorem, priors and posteriors, MCMC
• Example: the birth-death process
• RevBayes Demonstration
break
Lecture: Bayesian Inference of Species Divergence Times
• Relaxed clock models – accounting for variation in
substitution rates among lineages
• Tree priors and fossil calibration
break
BEAST v2 Tutorial — dating species divergences under the
fossilized birth-death process
lunch
Course materials: http://phyloworks.org/resources/ssbws.html
D T E S
Program Models/Method
r8s Strict clock, local clocks, NPRS, PL
ape (R) NPRS, PL
multidivtime log-n autocorrelated (plus some others)
PhyBayes OU, log-n autocorrelated (plus some others)
PhyloBayes CIR, white noise (uncorrelated) (plus some others)
BEAST Uncorrelated (log-n & exp), local clocks (plus others)
TreeTime Dirichlet model, CPP, uncorrelated
MrBayes 3.2 CPP, strict clock, autocorrelated, uncorrelated
DPPDiv DPP, strict clock, uncorrelated
RevBayes “the limit is the sky”∗
∗
and other methods
E: C Y O A
Dating Bear Divergence
Times with the Fossilized
Birth-Death Process
Agriarctos spp X
Ailurarctos lufengensis X
Ailuropoda melanoleuca
Arctodus simus X
Ballusia elmensis X
Helarctos malayanus
Indarctos arctoides X
Indarctos punjabiensis X
Indarctos vireti X
Kretzoiarctos beatrix X
Melursus ursinus
Parictis montanus X
Tremarctos ornatus
Ursavus brevihinus X
Ursavus primaevus X
Ursus abstrusus X
Ursus americanus
Ursus arctos
Ursus maritimus
Ursus spelaeus X
Ursus thibetanus
Zaragocyon daamsi X
Stem
Bears
CrownBears
Pandas
Tremarctinae
Brown Bears
Ursinae
Origin
Root
(Total Group)
Crown
0
10
20
30
40
50
Ypresian
Lutetian
Bartonian
Priabonian
Rupelian
Chattian
Aquitanian
Burdigalian
Langhian
Serravallian
Tortonian
Messinian
Zanclean
Piacenzian
Gelasian
Calabrian
MiddleUpper
Eocene
Oligocene
Miocene
Pliocene
Pleistocene
Holocene
Paleogene Neogene Quat.
Ailuropoda melanoleuca
Tremarctos ornatus
Melursus ursinus
Ursus arctos
Ursus maritimus
Helarctos malayanus
Ursus americanus
Ursus thibetanus
Parictis montanus X
Zaragocyon daamsi X
Ballusia elmensis X
Ursavus primaevus X
Ursavus brevihinus X
Indarctos vireti X
Indarctos arctoides X
Indarctos punjabiensis X
Ailurarctos lufengensis X
Agriarctos spp X
Kretzoiarctos beatrix X
Ursus abstrusus X
Ursus spelaeus X
Arctodus simus X
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Estimating Epidemiological
Parameters of an Ebola
Outbreak
0175 255075100125150
Days
Course materials: http://phyloworks.org/resources/ssbws.html

Bayesian Divergence Time Estimation

  • 1.
    B D T E TracyHeath Ecology, Evolution, & Organismal Biology Iowa State University @trayc7 http://phyloworks.org 2015 SSB Workshop Ann Arbor, MI USA
  • 2.
    O Lecture & Demo:Bayesian Fundamentals • Bayes theorem, priors and posteriors, MCMC • Example: the birth-death process • RevBayes Demonstration break Lecture: Bayesian Inference of Species Divergence Times • Relaxed clock models – accounting for variation in substitution rates among lineages • Tree priors and fossil calibration break BEAST v2 Tutorial — dating species divergences under the fossilized birth-death process lunch Course materials: http://phyloworks.org/resources/ssbws.html
  • 3.
    P B For eachof the topics in the pre-course survey, most of you feel that you “have some experience, but there’s much more to learn”. diversification models relaxed−clock models Bayesian inference model−based phylo probability theory unfamiliar I’veheardofit familiar,butmoretolearn prettycomfortable expert (29 participants responded to the survey)
  • 4.
    B I Estimate theprobability of a hypothesis (model) conditional on observed data. The probability represents the researcher’s degree of belief. Bayes’ Theorem specifies the conditional probability of the hypothesis given the data.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    B’ T The posteriorprobability of a discrete parameter δ conditional on the data D is Pr(δ | D) = Pr(D | δ) Pr(δ) δ Pr(D | δ) Pr(δ) δ Pr(D | δ) Pr(δ) is the likelihood marginalized over all possible values of δ. Bayesian Fundamentals
  • 11.
    B’ T The posteriorprobability density a continuous parameter θ conditional on the data D is f(θ | D) = f(D | θ)f(θ) θ f(D | θ)f(θ)dθ θ f(D | θ)f(θ)dθ is the likelihood marginalized over all possible values of θ. Bayesian Fundamentals
  • 12.
    P The the distributionof θ before any data are collected is the prior f(θ) The prior describes your uncertainty in the parameters of your model. Bayesian Fundamentals
  • 13.
    P We may assumea gamma-prior distribution on θ with a shape parameter α and a scale parameter β. θ ∼ Gamma(α,β) f(θ | α,β) = 1 Γ(α)βα θα−1e− θ β 0 1 2 3 Density θ This requires us to assign values for α and β based on our prior belief, or we can place hyperpriors on these parameters if we are uncertain about their values. Bayesian Fundamentals
  • 14.
    E: T B-DP A time machine allowed us to observe the dates of each speciation event in the history of extant bears 0 10 20 30 Oligocene Miocene Plio. Plei. Paleogene Neogene Quat. 33.24 27.9 15.9 13.1 11.6 9.95 2.8 Example
  • 15.
    E: T B-DP We assume that the diversification of bears matches a birth death process with parameters: λ = speciation rate μ = extinction rate 0 10 20 30 Oligocene Miocene Plio. Plei. Paleogene Neogene Quat. 33.24 27.9 15.9 13.1 11.6 9.95 2.8 Example
  • 16.
    E: T B-DP The birth-death process allows us to compute the probability density of our observed time-tree (Ψ) conditional on any value of speciation (λ) and any value of extinction (μ). f(Ψ | λ,μ) This model states that Ψ ∼ BD(λ,μ) Example
  • 17.
    E: T B-DP Another way of expressing Ψ ∼ BD(λ,μ) is with a probabilistic graphical model Our “observed” time-tree is conditioned on some constant value of λ and μ. Example
  • 18.
    E: T B-DP If our time machine also allowed us to know the rate of speciation and extinction, we can easily calculate the likelihood of our observed tree: f(Ψ | λ,μ) = N!(λ−μ)λN−1 e−(λ−μ)x1 λ − μe−(λ−μ)x1 N−1 i=1 (λ − μ)2e−(λ−μ)x1 (λ − μe−(λ−μ)x1)2 f(Ψ | λ = 0.5,μ = 0.1) = 3.49215e−32 Example
  • 19.
    RB D: TB-D P RevBayes Fully integrative Bayesian inference of phylogenetic parameters using probabilistic graphical models and an interpreted language http://RevBayes.com Example
  • 20.
    G M RB Graphical models provide tools for visually & computationally representing complex, parameter-rich probabilistic models We can depict the conditional dependence structure of various parameters and other random variables Höhna, Heath, Boussau, Landis, Ronquist, Huelsenbeck. 2014. Probabilistic Graphical Model Representation in Phylogenetics. Systematic Biology. (doi: 10.1093/sysbio/syu039)
  • 21.
    RB D: TB-D P The Rev language specifying a birth-death model on the bear phylogeny with fixed values for λ and μ. Example
  • 22.
    E: T B-DP What if we do not know λ and μ? We can use frequentist or Bayesian methods for estimating their values. Frequentist methods require us to find the values of λ and μ that maximize f(Ψ | λ,μ). Bayesian methods use prior distributions to describe our uncertainty in λ and μ and estimate f(λ,μ | Ψ). Example
  • 23.
    E: T B-DP We must define prior distributions for λ and μ to estimate the posterior probability density f(λ,μ | Ψ,y,z) = f(Ψ | λ,μ)f(λ | y)f(μ | z) λ μ f(Ψ | λ,μ)f(λ | y)f(μ | z)dλdμ Now y and z are the parameters of the prior distributions on λ and μ. Example
  • 24.
    E: T B-DP We can choose exponential prior distributions for λ and μ. Now y and z represent the rate parameters of the exponential priors. λ ∼ Exponential(y) μ ∼ Exponential(z) f(λ | y) = ye−yλ f(μ | z) = ze−zμ Example
  • 25.
    RB D: TB-D P The Rev language specifying a birth-death model on the bear phylogeny with exponential priors on λ and μ. Example
  • 26.
    E: T B-DP Now that we have a defined model, how do we estimate the posterior probability density? λ ∼ Exponential(y) μ ∼ Exponential(z) f(λ,μ | Ψ,y,z) = f(Ψ | λ,μ)f(λ | y)f(μ | z) λ μ f(Ψ | λ,μ)f(λ | y)f(μ | z)dλdμ Example
  • 27.
    M C MC (MCMC) An algorithm for approximating the posterior distribution Metropolis, Rosenblusth, Rosenbluth, Teller, Teller. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. Bayesian Fundamentals
  • 28.
    M C MC (MCMC) More on MCMC from Paul Lewis—our esteemed SSB President—and his lecture on Bayesian phylogenetics Slides source: https://molevol.mbl.edu/index.php/Paul_Lewis Bayesian Fundamentals
  • 29.
    Paul O. Lewis(2014 Woods Hole Molecular Evolution Workshop) 42 MCMC robot’s rules Uphill steps are always accepted Slightly downhill steps are usually accepted Drastic “off the cliff” downhill steps are almost never accepted With these rules, it is easy to see why the robot tends to stay near the tops of hills
  • 30.
    Paul O. Lewis(2014 Woods Hole Molecular Evolution Workshop) 43 (Actual) MCMC robot rules Uphill steps are always accepted because R > 1 Slightly downhill steps are usually accepted because R is near 1 Drastic “off the cliff” downhill steps are almost never accepted because R is near 0 Currently at 1.0 m Proposed at 2.3 m R = 2.3/1.0 = 2.3 Currently at 6.2 m Proposed at 5.7 m R = 5.7/6.2 =0.92 Currently at 6.2 m Proposed at 0.2 m R = 0.2/6.2 = 0.03 6 8 4 2 0 10 The robot takes a step if it draws a Uniform(0,1) random deviate that is less than or equal to R
  • 31.
    = f(D| ⇤ )f( ⇤ ) f(D) f(D|)f( ) f(D) Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 44 Cancellation of marginal likelihood When calculating the ratio R of posterior densities, the marginal probability of the data cancels. f( ⇤ |D) f( |D) Posterior odds = f(D| ⇤ )f( ⇤ ) f(D| )f( ) Likelihood ratio Prior odds
  • 32.
    Paul O. Lewis(2014 Woods Hole Molecular Evolution Workshop) 45 Target vs. Proposal Distributions Pretend this proposal distribution allows good mixing. What does good mixing mean?
  • 33.
    default2.TXT State 0 2500 50007500 10000 12500 15000 17500 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 46 Trace plots “White noise” appearance is a sign of good mixing I used the program Tracer to create this plot: http://tree.bio.ed.ac.uk/software/tracer/ !AWTY (Are We There Yet?) is useful for investigating convergence: http://king2.scs.fsu.edu/CEBProjects/awty/ awty_start.php log(posterior)
  • 34.
    Paul O. Lewis(2014 Woods Hole Molecular Evolution Workshop) 47 Target vs. Proposal Distributions Proposal distributions with smaller variance... Disadvantage: robot takes smaller steps, more time required to explore the same area Advantage: robot seldom refuses to take proposed steps
  • 35.
    smallsteps.TXT State 0 2500 50007500 10000 12500 15000 17500 -6 -5 -4 -3 -2 -1 0 Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 48 If step size is too small, large-scale trends will be apparent log(posterior)
  • 36.
    Paul O. Lewis(2014 Woods Hole Molecular Evolution Workshop) 49 Target vs. Proposal Distributions Proposal distributions with larger variance... Disadvantage: robot often proposes a step that would take it off a cliff, and refuses to move Advantage: robot can potentially cover a lot of ground quickly
  • 37.
    bigsteps2.TXT State 0 2500 50007500 10000 12500 15000 17500 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 Paul O. Lewis (2014 Woods Hole Molecular Evolution Workshop) 50 Chain is spending long periods of time “stuck” in one place “Stuck” robot is indicative of step sizes that are too large (most proposed steps would take the robot “off the cliff”) log(posterior)
  • 38.
    M C MC (MCMC) Thanks, Paul! Slides source: https://molevol.mbl.edu/index.php/Paul_Lewis See MCMCRobot, a helpful software program for learning MCMC by Paul Lewis http://www.mcmcrobot.org Bayesian Fundamentals
  • 39.
    RB D: TB-D P The Rev language specifying the MCMC sampler for the birth-death model. Example
  • 40.
    RB D: TB-D P The trace-plot of the MCMC samples for speciation rate speciation State 0 5000 10000 15000 20000 25000 30000 0 2.5E-2 5E-2 7.5E-2 0.1 0.125 0.15 0.175 Example
  • 41.
    RB D: TB-D P Marginal posterior densities of the speciation rate and extinction rate. Density Combined extinction speciation 0 5E-2 0.1 0.15 0.2 0.25 0 10 20 30 40 50 Example
  • 42.
    E: T B-DP Alas, we do not have a time machine and we do not know lineage divergence times without error. 0 10 20 30 Oligocene Miocene Plio. Plei. Paleogene Neogene Quat. 33.24 27.9 15.9 13.1 11.6 9.95 2.8 Example
  • 43.
    D T E Goal:Estimate the ages of interior nodes to understand the timing and rates of evolutionary processes Model how rates are distributed across the tree Describe the distribution of speciation events over time External calibration information for estimates of absolute node times fossil Ailuropodinae fossil Arctodus fossil Ursus fossil pinnipeds Gray wolf Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Ursidae Time (My) 60 2040 0 Eocene Oligocene Miocene Plio Pleis Paleocene stem fossil Ursidae fossil canids (Figure from Heath et al., PNAS 2014)
  • 44.
    O Lecture & Demo:Bayesian Fundamentals • Bayes theorem, priors and posteriors, MCMC • Example: the birth-death process • RevBayes Demonstration break Lecture: Bayesian Inference of Species Divergence Times • Relaxed clock models – accounting for variation in substitution rates among lineages • Tree priors and fossil calibration break BEAST v2 Tutorial — dating species divergences under the fossilized birth-death process lunch Course materials: http://phyloworks.org/resources/ssbws.html
  • 45.
    O Lecture & Demo:Bayesian Fundamentals • Bayes theorem, priors and posteriors, MCMC • Example: the birth-death process • RevBayes Demonstration break Lecture: Bayesian Inference of Species Divergence Times • Relaxed clock models – accounting for variation in substitution rates among lineages • Tree priors and fossil calibration break BEAST v2 Tutorial — dating species divergences under the fossilized birth-death process lunch Course materials: http://phyloworks.org/resources/ssbws.html
  • 46.
    D T E Goal:Estimate the ages of interior nodes to understand the timing and rates of evolutionary processes Model how rates are distributed across the tree Describe the distribution of speciation events over time External calibration information for estimates of absolute node times fossil Ailuropodinae fossil Arctodus fossil Ursus fossil pinnipeds Gray wolf Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Ursidae Time (My) 60 2040 0 Eocene Oligocene Miocene Plio Pleis Paleocene stem fossil Ursidae fossil canids (Figure from Heath et al., PNAS 2014)
  • 47.
    A T-S E Phylogenetic trees can provide both topological information and temporal information 100 0.020.040.060.080.0 Equus Rhinoceros Bos Hippopotamus Balaenoptera Physeter Ursus Canis Felis Homo Pan Gorilla Pongo Macaca Callithrix Loris Galago Daubentonia Varecia Eulemur Lemur Hapalemur Propithecus Lepilemur Mirza M. murinus M. griseorufus M. myoxinus M. berthae M. rufus1 M. tavaratra M. rufus2 M. sambiranensis M. ravelobensis Cheirogaleus Simiiformes Microcebus Cretaceous Paleogene Neogene Q Time (Millions of years) Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003; Heath et al. MBE 2012)
  • 48.
    T G MC Assume that the rate of evolutionary change is constant over time (branch lengths equal percent sequence divergence) 10% 400 My 200 My A B C 20% 10% 10% (Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
  • 49.
    T G MC We can date the tree if we know the rate of change is 1% divergence per 10 My A B C 20% 10% 10% 10% 200 My 400 My 200 My (Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
  • 50.
    T G MC If we found a fossil of the MRCA of B and C, we can use it to calculate the rate of change & date the root of the tree A B C 20% 10% 10% 10% 200 My 400 My (Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
  • 51.
    R  GM C Rates of evolution vary across lineages and over time Mutation rate: Variation in • metabolic rate • generation time • DNA repair Fixation rate: Variation in • strength and targets of selection • population sizes 10% 400 My 200 My A B C 20% 10% 10%
  • 52.
    U A Sequence dataprovide information about branch lengths In units of the expected # of substitutions per site branch length rate × time 0.2 expected substitutions/site PhylogeneticRelationshipsSequence Data
  • 53.
    R  T Thesequence data provide information about branch length for any possible rate, there’s a time that fits the branch length perfectly 0 1 2 3 4 5 0 1 2 3 4 5 BranchRate Branch Time time = 0.8 rate = 0.625 branch length = 0.5 (based on Thorne & Kishino, 2005)
  • 54.
    R  T Theexpected # of substitutions/site occurring along a branch is the product of the substitution rate and time length = rate × time length = rate length = time Methods for dating species divergences estimate the substitution rate and time separately
  • 55.
    B D TE length = rate length = time R (r,r,r,...,rN−) A (a,a,a,...,aN−) N number of tips
  • 56.
    B D TE length = rate length = time R (r,r,r,...,rN−) A (a,a,a,...,aN−) N number of tips
  • 57.
    B D TE Posterior probability f (R,A,θR,θA,θs | D,Ψ) R Vector of rates on branches A Vector of internal node ages θR,θA,θs Model parameters D Sequence data Ψ Tree topology
  • 58.
    B D TE f(R,A,θR,θA,θs | D) = f (D | R,A,θs) f(R | θR) f(A | θA) f(θs) f(D) f(D | R,A,θR,θA,θs) Likelihood f(R | θR) Prior on rates f(A | θA) Prior on node ages f(θs) Prior on substitution parameters f(D) Marginal probability of the data
  • 59.
    B D TE Estimating divergence times relies on 2 main elements: • Branch-specific rates: f (R | θR) • Node ages: f (A | θA,C)
  • 60.
    M R V Somemodels describing lineage-specific substitution rate variation: • Global molecular clock (Zuckerkandl & Pauling, 1962) • Local molecular clocks (Hasegawa, Kishino & Yano 1989; Kishino & Hasegawa 1990; Yoder & Yang 2000; Yang & Yoder 2003, Drummond and Suchard 2010) • Punctuated rate change model (Huelsenbeck, Larget and Swofford 2000) • Log-normally distributed autocorrelated rates (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001; Thorne & Kishino 2002) • Uncorrelated/independent rates models (Drummond et al. 2006; Rannala & Yang 2007; Lepage et al. 2007) • Mixture models on branch rates (Heath, Holder, Huelsenbeck 2012) Models of Lineage-specific Rate Variation
  • 61.
    G M C Thesubstitution rate is constant over time All lineages share the same rate branch length = substitution rate low high Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
  • 62.
    R-C M To accommodatevariation in substitution rates ‘relaxed-clock’ models estimate lineage-specific substitution rates • Local molecular clocks • Punctuated rate change model • Log-normally distributed autocorrelated rates • Uncorrelated/independent rates models • Mixture models on branch rates
  • 63.
    L M C Rateshifts occur infrequently over the tree Closely related lineages have equivalent rates (clustered by sub-clades) low high branch length = substitution rate Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
  • 64.
    L M C Mostmethods for estimating local clocks required specifying the number and locations of rate changes a priori Drummond and Suchard (2010) introduced a Bayesian method that samples over a broad range of possible random local clocks low high branch length = substitution rate Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
  • 65.
    A R Substitution ratesevolve gradually over time – closely related lineages have similar rates The rate at a node is drawn from a lognormal distribution with a mean equal to the parent rate low high branch length = substitution rate Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)
  • 66.
    P R C Ratechanges occur along lineages according to a point process At rate-change events, the new rate is a product of the parent’s rate and a Γ-distributed multiplier low high branch length = substitution rate Models of Lineage-specific Rate Variation (Huelsenbeck, Larget and Swofford 2000)
  • 67.
    I/U R Lineage-specific ratesare uncorrelated when the rate assigned to each branch is independently drawn from an underlying distribution low high branch length = substitution rate Models of Lineage-specific Rate Variation (Drummond et al. 2006)
  • 68.
    I M M Dirichletprocess prior: Branches are partitioned into distinct rate categories Random variables under the DPP informed by the data: • the number of rate classes • the assignment of branches to classes • the rate value for each class branch length = substitution rate c5 c4 c3 c2 substitution rate classes c1 Models of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE)
  • 69.
    M R V Theseare only a subset of the available models for branch-rate variation • Global molecular clock • Local molecular clocks • Punctuated rate change model • Log-normally distributed autocorrelated rates • Uncorrelated/independent rates models • Dirchlet process prior Models of Lineage-specific Rate Variation
  • 70.
    M R V Areour models appropriate across all data sets? cave bear American black bear sloth bear Asian black bear brown bear polar bear American giant short-faced bear giant panda sun bear harbor seal spectacled bear 4.08 5.39 5.66 12.86 2.75 5.05 19.09 35.7 0.88 4.58 [3.11–5.27] [4.26–7.34] [9.77–16.58] [3.9–6.48] [0.66–1.17] [4.2–6.86] [2.1–3.57] [14.38–24.79] [3.51–5.89] 14.32 [9.77–16.58] 95% CI mean age (Ma) t2 t3 t4 t6 t7 t5 t8 t9 t10 tx node MP•MLu•MLp•Bayesian 100•100•100•1.00 100•100•100•1.00 85•93•93•1.00 76•94•97•1.00 99•97•94•1.00 100•100•100•1.00 100•100•100•1.00 100•100•100•1.00 t1 Eocene Oligocene Miocene Plio Plei Hol 34 5.3 1.823.8 0.01 Epochs Ma Global expansion of C4 biomass Major temperature drop and increasing seasonality Faunal turnover Krause et al., 2008. Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene-Pliocene boundary. BMC Evol. Biol. 8. Taxa 1 5 10 50 100 500 1000 5000 10000 20000 0100200300 MYA Ophidiiformes Percomorpha Beryciformes Lampriformes Zeiforms Polymixiiformes Percopsif. + Gadiif. Aulopiformes Myctophiformes Argentiniformes Stomiiformes Osmeriformes Galaxiiformes Salmoniformes Esociformes Characiformes Siluriformes Gymnotiformes Cypriniformes Gonorynchiformes Denticipidae Clupeomorpha Osteoglossomorpha Elopomorpha Holostei Chondrostei Polypteriformes Clade r ε ΔAIC 1. 0.041 0.0017 25.3 2. 0.081 * 25.5 3. 0.067 0.37 45.1 4. 0 * 3.1 Bg. 0.011 0.0011 OstariophysiAcanthomorpha Teleostei Santini et al., 2009. Did genome duplication drive the origin of teleosts? A comparative study of diversification in ray-finned fishes. BMC Evol. Biol. 9.
  • 71.
    M R V Theseare only a subset of the available models for branch-rate variation • Global molecular clock • Local molecular clocks • Punctuated rate change model • Log-normally distributed autocorrelated rates • Uncorrelated/independent rates models • Dirchlet process prior Model selection and model uncertainty are very important for Bayesian divergence time analysis Models of Lineage-specific Rate Variation
  • 72.
    B D TE Estimating divergence times relies on 2 main elements: • Branch-specific rates: f (R | θR) • Node ages: f (A | θA,C) http://bayesiancook.blogspot.com/2013/12/two-sides-of-same-coin.html
  • 73.
    P  NT Relaxed clock Bayesian analyses require a prior distribution on node times f(A | θA) Different node-age priors make different assumptions about the timing of divergence events Node Age Priors
  • 74.
    G N TP Assumed to be vague or uninformative by not making assumptions about biological processes Uniform prior: the time at a given node has equal probability across the interval between the time of the parent node and the time of the oldest daughter node (conditioned on root age) Node Age Priors
  • 75.
    S B P Node-agepriors based on stochastic models of lineage diversification Yule process: assumes a constant rate of speciation, across lineages A pure birth process—every node leaves extant descendants (no extinction) Node Age Priors
  • 76.
    S B P Node-agepriors based on stochastic models of lineage diversification Constant-rate birth-death process: at any point in time a lineage can speciate at rate λ or go extinct with a rate of μ Node Age Priors
  • 77.
    S B P Node-agepriors based on stochastic models of lineage diversification Constant-rate birth-death process: at any point in time a lineage can speciate at rate λ or go extinct with a rate of μ Node Age Priors
  • 78.
    S B P Node-agepriors based on stochastic models of lineage diversification Constant-rate birth-death process: at any point in time a lineage can speciate at rate λ or go extinct with a rate of μ Node Age Priors
  • 79.
    S B P Differentvalues of λ and μ lead to different trees Bayesian inference under these models can be very sensitive to the values of these parameters Using hyperpriors on λ and μ accounts for uncertainty in these hyperparameters Node Age Priors
  • 80.
    S B P Node-agepriors based on stochastic models of lineage diversification Birth-death-sampling process: an extension of the constant-rate birth-death model that accounts for random sampling of tips Conditions on a probability of sampling a tip, ρ Node Age Priors
  • 81.
    P  NT Sequence data are only informative on relative rates & times Node-time priors cannot give precise estimates of absolute node ages We need external information (like fossils) to calibrate or scale the tree to absolute time Node Age Priors
  • 82.
    C D T Fossils(or other data) are necessary to estimate absolute node ages There is no information in the sequence data for absolute time Uncertainty in the placement of fossils A B C 20% 10% 10% 10% 200 My 400 My
  • 83.
    C D Bayesian inferenceis well suited to accommodating uncertainty in the age of the calibration node Divergence times are calibrated by placing parametric densities on internal nodes offset by age estimates from the fossil record A B C 200 My Density Age
  • 84.
    A F C Misplaced fossils can affect node age estimates throughout the tree – if the fossil is older than its presumed MRCA Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
  • 85.
    A F C Crown clade: all living species and their most-recent common ancestor (MRCA) Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
  • 86.
    A F C Stem lineages: purely fossil forms that are closer to their descendant crown clade than any other crown clade Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
  • 87.
    A F C Fossiliferous horizons: the sources in the rock record for relevant fossils Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
  • 88.
    F C Age estimatesfrom fossils can provide minimum time constraints for internal nodes Reliable maximum bounds are typically unavailable Minimum age Time (My) Calibrating Divergence Times
  • 89.
    P D C N Common practice in Bayesian divergence-time estimation: Parametric distributions are typically off-set by the age of the oldest fossil assigned to a clade These prior densities do not (necessarily) require specification of maximum bounds Uniform (min, max) Exponential (λ) Gamma (α, β) Log Normal (µ, σ2 ) Time (My)Minimum age Calibrating Divergence Times
  • 90.
    P D C N Calibration densities describe the waiting time between the divergence event and the age of the oldest fossil Minimum age Exponential (λ) Time (My) Calibrating Divergence Times
  • 91.
    P D C N Common practice in Bayesian divergence-time estimation: Estimates of absolute node ages are driven primarily by the calibration density Specifying appropriate densities is a challenge for most molecular biologists Uniform (min, max) Exponential (λ) Gamma (α, β) Log Normal (µ, σ2 ) Time (My)Minimum age Calibration Density Approach
  • 92.
    I F C Wewould prefer to eliminate the need for ad hoc calibration prior densities Calibration densities do not account for diversification of fossils Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
  • 93.
    I F C Wewant to use all of the available fossils Example: Bears 12 fossils are reduced to 4 calibration ages with calibration density methods Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
  • 94.
    I F C Wewant to use all of the available fossils Example: Bears 12 fossils are reduced to 4 calibration ages with calibration density methods Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
  • 95.
    I F C Becausefossils are part of the diversification process, we can combine fossil calibration with birth-death models Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
  • 96.
    I F C Thisrelies on a branching model that accounts for speciation, extinction, and rates of fossilization, preservation, and recovery Domestic dog Spotted seal Giant panda Spectacled bear Sun bear Am. black bear Asian black bear Brown bear Polar bear Sloth bear Zaragocyon daamsi Ballusia elmensis Ursavus brevihinus Ailurarctos lufengensis Ursavus primaevus Agriarctos spp. Kretzoiarctos beatrix Indarctos vireti Indarctos arctoides Indarctos punjabiensis Giant short-faced bear Cave bear Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
  • 97.
    T F B-DP (FBD) Improving statistical inference of absolute node ages Eliminates the need to specify arbitrary calibration densities Better capture our statistical uncertainty in species divergence dates All reliable fossils associated with a clade are used Useful for calibration or ‘total-evidence’ dating 150 100 50 0 Time (Heath, Huelsenbeck, Stadler. 2014 PNAS)
  • 98.
    T F B-DP (FBD) Recovered fossil specimens provide historical observations of the diversification process that generated the tree of extant species 150 100 50 0 Time Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
  • 99.
    T F B-DP (FBD) The probability of the tree and fossil observations under a birth-death model with rate parameters: λ = speciation μ = extinction ψ = fossilization/recovery 150 100 50 0 Time Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
  • 100.
    T F B-DP (FBD) The probability of the tree and fossil observations under a birth-death model with rate parameters: λ = speciation μ = extinction ψ = fossilization/recovery Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
  • 101.
    T F B-DP (FBD) We use MCMC to sample realizations of the diversification process, integrating over the topology—including placement of the fossils—and speciation times 0250 50100150200 Time (My) Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
  • 102.
    I FBD T Extensionsof the fossilized birth-death process accommodate variation in fossil sampling, non-random species sampling, & shifts in diversification rates. 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 Lower Middle Upper Lower Upper Paleocene Eocene Oligocene Miocene Pliocene Pleistocen Jurassic Cretaceous Paleogene Neogene Q. With character data for both fossil & extant species, we account for uncertainty in fossil placement
  • 103.
    S B-D P Apiecewise shifting model where parameters change over time Used to estimate epidemiological parameters of an outbreak 0175 255075100125150 Days (see Stadler et al. PNAS 2013 and Stadler et al. PLoS Currents Outbreaks 2014)
  • 104.
    S B-D P lis the number of parameter intervals Ri is the effective reproductive number for interval i ∈ l δ is the rate of becoming non-infectious s is the probability of sampling an individual after becoming non-infectious
  • 105.
    S B-D P lis the number of parameter intervals λi is the transmission rate for interval i ∈ l μ is the viral lineage death rate ψ is the rate each individual is sampled
  • 106.
  • 107.
    S B-D P Adecline in R over the history of HIV-1 in the UK is consistent with the introduction of effective drug therapies After 1998 R decreased below 1, indicating a declining epidemic (Stadler et al. PNAS 2013)
  • 108.
    O Lecture & Demo:Bayesian Fundamentals • Bayes theorem, priors and posteriors, MCMC • Example: the birth-death process • RevBayes Demonstration break Lecture: Bayesian Inference of Species Divergence Times • Relaxed clock models – accounting for variation in substitution rates among lineages • Tree priors and fossil calibration break BEAST v2 Tutorial — dating species divergences under the fossilized birth-death process lunch Course materials: http://phyloworks.org/resources/ssbws.html
  • 109.
    D T ES Program Models/Method r8s Strict clock, local clocks, NPRS, PL ape (R) NPRS, PL multidivtime log-n autocorrelated (plus some others) PhyBayes OU, log-n autocorrelated (plus some others) PhyloBayes CIR, white noise (uncorrelated) (plus some others) BEAST Uncorrelated (log-n & exp), local clocks (plus others) TreeTime Dirichlet model, CPP, uncorrelated MrBayes 3.2 CPP, strict clock, autocorrelated, uncorrelated DPPDiv DPP, strict clock, uncorrelated RevBayes “the limit is the sky”∗ ∗ and other methods
  • 110.
    E: C YO A Dating Bear Divergence Times with the Fossilized Birth-Death Process Agriarctos spp X Ailurarctos lufengensis X Ailuropoda melanoleuca Arctodus simus X Ballusia elmensis X Helarctos malayanus Indarctos arctoides X Indarctos punjabiensis X Indarctos vireti X Kretzoiarctos beatrix X Melursus ursinus Parictis montanus X Tremarctos ornatus Ursavus brevihinus X Ursavus primaevus X Ursus abstrusus X Ursus americanus Ursus arctos Ursus maritimus Ursus spelaeus X Ursus thibetanus Zaragocyon daamsi X Stem Bears CrownBears Pandas Tremarctinae Brown Bears Ursinae Origin Root (Total Group) Crown 0 10 20 30 40 50 Ypresian Lutetian Bartonian Priabonian Rupelian Chattian Aquitanian Burdigalian Langhian Serravallian Tortonian Messinian Zanclean Piacenzian Gelasian Calabrian MiddleUpper Eocene Oligocene Miocene Pliocene Pleistocene Holocene Paleogene Neogene Quat. Ailuropoda melanoleuca Tremarctos ornatus Melursus ursinus Ursus arctos Ursus maritimus Helarctos malayanus Ursus americanus Ursus thibetanus Parictis montanus X Zaragocyon daamsi X Ballusia elmensis X Ursavus primaevus X Ursavus brevihinus X Indarctos vireti X Indarctos arctoides X Indarctos punjabiensis X Ailurarctos lufengensis X Agriarctos spp X Kretzoiarctos beatrix X Ursus abstrusus X Ursus spelaeus X Arctodus simus X q q q q q q q q q q q q q q q q q q q q q Estimating Epidemiological Parameters of an Ebola Outbreak 0175 255075100125150 Days Course materials: http://phyloworks.org/resources/ssbws.html