TIME-SCALING TREES
IN THE FOSSIL RECORD
David Bapst
University of Chicago
Geophysical Sciences
(Pretty figure taken from Ted Garland’s website)
Phylogeny in the Fossil Record
G
H
E
K
F
J I
D
C
Time
B
L
A
Sampling in the Fossil Record
G
H
Sampling
Event
E
K
F
J I
D
C
Time
B
L
A
What We Have
A H KG E ID
A
K
IG
H
D
E
Time
What We Want
A H KG E ID
A
K
IG
H
D
E
Time
A
K
IG
H
D
E
Time
Original
Of Time and Trees: ‘Basic’ Method
• Clades are as old as earliest observed descendent
A
K
IG
H
D
E
A
K
IG
H
D
E
Time
OriginalBasic Method
Smith, 1994
Of Time and Trees: ‘Basic’ Method
• Creates zero-length branches (…nuisance)
• Common fix: Extend branches a small amount
• No measurement of the uncertainty involved
A
K
IG
H
D
E
A
K
IG
H
D
E
Time
Nodes Separated
by Zero-Length
Branches
OriginalBasic Method
Dealing with Uncertainty:
Stochastic Analyses
• Example: Dealing with Discrete Intervals
K
D
TimeIntervals
t.1
t.2
t.3
t.4
t.5
t.6
A
IG
H E
K
D
A
IG
H E
Lloyd et al., 2012
Dealing with Uncertainty:
Stochastic Analyses
• Example: Dealing with Discrete Intervals
K
D
TimeIntervals
t.1
t.2
t.3
t.4
t.5
t.6
A
IG
H E
K
D
A
I
G
H E
Randomly Drop
FADs and LADs
Lloyd et al., 2012
Dealing with Uncertainty:
Stochastic Analyses
• Example: Dealing with Discrete Intervals
K
D
A
I
G
H E
K
D
TimeIntervals
t.1
t.2
t.3
t.4
t.5
t.6
A
IG
H E
K
D
A
I
G
H
E
Repeat!
K
D
A
I
G
H
E
Lloyd et al., 2012
Stochastic Time-Scaling
• Randomly select new node
ages across a cladogram
– Give lower bounds by
starting with root (with
loose lower bound) and
work up, node by node
• Rinse, repeat many times
to produce a large sample
of trees
?
Time
Stochastic Time-Scaling: Extensions
• Stochastically infer ancestor-descendant
relationships by allowing node ages to occur
after the earliest taxon appears
Time
• Stochastically resolve soft polytomies by
iteratively placing lineages over multiple steps
C
B
A
??
B
A
C
B
A
C
B
A
Time
Stochastic Time-Scaling: Extensions
Stochastic Time-Scaling
• Expect more uncertainty in
poorly-sampled fossil records
• Weight selection of node
ages via probability model of
the unobserved evolutionary
history at a node
Time
Pr(Σ gaps)
A Probabilistic Model of Gaps
• Total minimum unobserved evolutionary
history is dependent on sampling rates
– Can obtain via methods such as the freqRat
Time
A Probabilistic Model of Gaps
• But also dependent on
diversification: branching
and extinction rates
– Unsampled ‘twigs’ matter!
• Node ages need to be
calibrated with three rates
– Cal3 time-scaling method
– Probability of unobserved
twigs derived with Matt
Pennell and Emily King
Foote et al., 1999; Friedman and Brazeau, 2011
Time
A Probabilistic Model of Gaps
• But also dependent on
diversification: branching
and extinction rates
– Unsampled ‘twigs’ matter!
• Node ages need to be
calibrated with three rates
– Cal3 time-scaling method
– Probability of unobserved
twigs derived with Matt
Pennell and Emily King
Foote et al., 1999; Friedman and Brazeau, 2011
Time
But how good is
the time-scaling?
A Probabilistic Model of Gaps
• But also dependent on
diversification: branching
and extinction rates
– Unsampled ‘twigs’ matter!
• Node ages need to be
calibrated with three rates
– Cal3 time-scaling method
– Probability of unobserved
twigs derived with Matt
Pennell and Emily King
Foote et al., 1999; Friedman and Brazeau, 2011
Time
Let’s do some
simulations to
find out!
So, How Good is the Time-Scaling?
Basic
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
Cal3
w/o Ancestors
Rand-Res
Median of Median Error in Per-Node Ages
100 Simulation Runs
Samples of 20 Trees
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Disc. Intervals = 5 tu
Rates est. via ML Using R library paleotree (Bapst, 2012; MEE)
Squared-Error: Cal3 has More Error
Basic
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
Cal3
w/o Ancestors
Rand-Res
Median of Median Squared-Error
in Per-Node Ages
Using R library paleotree (Bapst, 2012)
100 Simulation Runs
Samples of 20 Trees
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Disc. Intervals = 5 tu
Rates est. via ML
Better Estimator of Uncertainty?
Basic
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
Cal3
w/o Ancestors
Rand-Res
Proportion of True Node-Ages
within 95% Age Quantiles
Using R library paleotree (Bapst, 2012)
100 Simulation Runs
Samples of 20 Trees
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Disc. Intervals = 5 tu
Rates est. via ML
Similar Patterns with Terminal Branch Lengths
Basic
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
Cal3
w/o Ancestors
Rand-Res
Proportion within 95% Age
Quantiles
Median of Median Squared
Error
Using R library paleotree (Bapst, 2012)
100 Simulation Runs
Samples of 20 Trees
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Disc. Intervals = 5 tu
Rates est. via ML
100 Simulation Runs
Samples of 20 Trees
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Disc. Intervals = 5 tu
Rates est. via ML Using R library paleotree (Bapst, 2012)
Similar Patterns with Terminal Branch Lengths
Basic
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
Cal3
w/o Ancestors
Rand-Res
Proportion within 95% Age
Quantiles
Median of Median Squared
Error
But this isn’t
what we’re
interested in
most often…
Analyses of Trait Evolution
Basic
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
Cal3
w/o Ancestors
Rand-Res
True Phylogeny
Median Estimated Rate of Trait Change
(Log-Scale Axis)
100 Simulation Runs
Samples of 20 Trees
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Disc. Intervals = 5 tu
Rates est. via ML
Analyses of Trait Evolution
Basic
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
Cal3
w/o Ancestors
Rand-Res
True Phylogeny
AICc Weight of BM vs OU
(for Trait Simulated Under BM)
100 Simulation Runs
Samples of 20 Trees
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Disc. Intervals = 5 tu
Rates est. via ML
Conclusions
• New time-scaling method: cal3 (paleotree v1.5)
– Time-scaling calibrated with estimated rates of
sampling, branching and extinction
• Fidelity of time-scaling can be decoupled from
fidelity of comparative analyses
– Why? A certain je ne sais quoi of the time-scaling?
• Frequency of int. ZLBs? Balance of total BrLen distribution?
– If analytical performance cannot be easily
extrapolated or predicted, simulations are key
Thanks to M. Foote, E. King, J. Felsenstein, A. Haber, M. Pennell, G. Hunt, G. Lloyd, M.
Friedman, P. Wagner, M. Webster, D. Jablonski, M. LaBarbera, K. Boyce, J. Mitchell, P.
Harnik, G. Slater and the R-Sig-Phylo Email List!
Slightly Better Polytomy Resolver
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
timeLadderTree
100 Simulation Runs
20 Trees Samples
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Interval Length = 5 tu
Rates est. via ML
Proportion of Collapsed Clades Correctly Resolved
Fidelity of VCV Matrices: All High
Cal3
w/ Ancestors
Cal3
w/o Ancestors
100 Simulation Runs
20 Trees Samples
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Interval Length = 5 tu
Rates est. via ML
Median Random Skewer Similarity of VCV
Matrices to True VCV Matrix
Cal3
w/o Ancestors
Rand-Res
Basic
Rand-Res
Samp. Rate Cond. gives best est across samp rates
100 trees each (not SRC), ~50 taxa
(Lmy-1)
Model Fitting At Other Parameters
Basic
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
Cal3
w/o Ancestors
Rand-Res
True Phylogeny
AICc Weight of BM vs OU
(for Trait Simulated Under BM)
100 Simulation Runs
20 Trees Samples
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Interval Length = 5 tu
Rates est. via ML
p=q =0.1
r=0.5 per Ltu
Model Fitting At Other Parameters
Basic
Rand-Res
Cal3
w/ Ancestors
Cal3
w/o Ancestors
Cal3
w/o Ancestors
Rand-Res
True Phylogeny
AICc Weight of BM vs OU
(for Trait Simulated Under BM)
100 Simulation Runs
20 Trees Samples
~50 Taxa; Budding
p= q = r = 0.1 per Ltu
Interval Length = 5 tu
Rates est. via ML
Bifurcating
Cladogenesis
A Model of Unobserved History
Time
Using R library paleotree (Bapst, 2012)
A Model of Unobserved History
• Best fit with gamma models
Time
Using R library paleotree (Bapst, 2012)
A Model of Unobserved History
• Importance of Diversification: Twigs Matter!
• Calibrate with Three Rates: ‘Cal3’ time-scaling
– Sampling, branching and extinction rates
Time
(Derived with Matt Pennell
and Emily King)
Of Time and Trees
• Usually start with data like this…
– (Thanks to Melanie Hopkins for this example data!)
Taxon
Ranges
Unscaled
Topology
Hopkins, 2011 (Strict Consensus Tree)
Of Time and Trees
• Usually start with data like this…
– (Thanks to Melanie Hopkins for this example data!)
Taxon
Ranges
Unscaled
Topology
Hopkins, 2011 (Strict Consensus Tree)
Of Time and Trees
Time
• …but want a time-scaled tree for analyses
• How do we time-scale? Effect on analyses?
Note: Time = Strat meters for Hopkins (2011)
Of Time and Trees
Time
• One solution uses morph ‘clock’-like approach
• But what about trees with no char change info?
Previous Approaches
• Unit-length branches
• “speciational” scale
• No actual time-scale!
– Is setting all branches
equal a good approx?
• Soft polytomies have to
be randomly resolved
beforehand
– (True of most methods)
Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.
# of Obs
Branching
Events
Previous Approaches
• Basic Approach
• Clade age = age of
earliest obs desc
• Creates many zero-
length branches (ZLB)
– Unrealistic
– Singular varcovar
matrices: math for
BM, etc. doesn’t work
Time
Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.
Previous Approaches
• Adding X to all
branches (ABA) or just
very short branches
– X = A Number
• Avoids singularity
• Similar: MinBrLength
• Widely used but how
to pick X?
• Can push root back
unrealistically far
Time
Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.
Previous Approaches
• ‘Equal’ Method
– Graeme Lloyd
• Pull root down by X,
redistribute time on
earlier branches
along ZLBs
– X = A Number
• Widely used, but
how choose X?
Time
Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.
New Method:
Sampling Rate Conditioned
• Est samp rate r
• Randomly pick
‘gaps’ in evol history
to scale branches,
using P(gaps|r) as
weights
• Repeat many
times…
Time
Time-Scaled Consensus Tree from Hopkins, Polytomies NOT Rand-Res.
New Method:
Sampling Rate Conditioned
• Est samp rate r
• Randomly pick
‘gaps’ in evol history
to scale branches,
using P(gaps|r) as
weights
• Repeat many times,
make many trees!
– No single answer!
• Resolve polytomies,
identify ancestors
25 Time-Scaled Trees, Polytomies NOT Rand-Res.
New Method:
Sampling Rate Conditioned
• Est samp rate r
• Randomly pick ‘gaps’
in evol history to scale
branches, using
L(gaps|r) as weights
• Repeat many times,
make many trees!
– No single answer!
• Resolve polytomies,
identify ancestors
– Stratolikelihood-esque
But How Do
These Methods
Perform?
25 Time-Scaled Trees, Polytomies NOT Rand-Res.
New Method:
Sampling Rate Conditioned
• Est samp rate r
• Randomly pick ‘gaps’
in evol history to scale
branches, using
L(gaps|r) as weights
• Repeat many times,
make many trees!
– No single answer!
• Resolve polytomies,
identify ancestors
– Stratolikelihood-esque
Let’s Run Some
Birth-Death-
Sampling Simulations
and Find Out!
(…For Some Stuff)
25 Time-Scaled Trees, Polytomies NOT Rand-Res.
Obs Data
Est PDF
(KDE)
True Sim Value
(OUR TARGET)
Quick Guide to Beanplots!
Sampling Rates
Methods
Value
Using R library beanplot
Samp. Rate Cond. gives best est across samp rates
100 trees each (not SRC), ~50 taxa
(Lmy-1)
Signal estimate bad across the board; bias for zero
100 trees each (not SRC), ~50 taxa
(Lmy-1)
100 trees each (not SRC), ~50 taxa
• Implications for trait
evol model-fitting in
fossil record?
• Bias for low-signal
models like OU?
Signal estimate bad across the board; bias for zero
(Lmy-1)
100 trees each (not SRC), ~50 taxa
High correlation generally; SRC performs worst
Only Fully Extinct Clades(Lmy-1)
1 MY timebins
• Similar results for FirstDiffs
• Corr increases with time
step size used
• Obs ranges good
– True under diff sampling
model?
– Clades with living desc?
• Lane et al. 2005
100 trees each (not SRC), ~50 taxa
High correlation generally; SRC performs worst
Only Fully Extinct Clades(Lmy-1)
1 MY timebins
Samp. Rate Cond. better than randomly resolving
100 trees each , ~50 taxa
~50% of Nodes Removed
(Lmy-1)
• Results
– New method: Sampling Rate Conditioned
– Good for BM rate, not so good for diversity curve
– No method unbiased for estimating phylo signal
• Future Work
– Compare fidelity with poorly resolved trees
– Test possible bias in trait model-fitting analyses
• Simulations necessary to understand the
reliability of methods in paleobiology
• Code to be released soon in R library
Thanks for Code: G. Lloyd, G. Hunt Thanks for Data: Melanie Hopkins
Thanks for Comments and Ideas: M. Foote, M. Webster, D. Jablonski, E. King, P. Wagner, J.
Mitchell, M. Friedman, G. Slater, M. Pennell, L. Harmon and the R-Sig-Phylo Email List!
Results and Future Work
– Effect of random zombie lineages on div corr
– Time-scale with joint L(gaps,morph)
– Integrate with birth-death models of branch
length distribution for paleo trees
• Move up tree, node by node
– calculate likelihoods for each possible position of a node
– Randomly sample a position, using likelihoods as weights
• Repeat to produce large sample of time-scaled trees
B
A
B
A
B
A
B
A
B
A
L ( obs gap of length t ) = r * exp (- r * t)
r = instantaneous sampling rate
Time
(Bapst, in prep. C)
• Move up tree, node by node
– calculate likelihoods for each possible position of a node
– Randomly sample a position, using likelihoods as weights
• Repeat to produce large sample of time-scaled trees
B
A
B
A
B
A
B
A
B
A
Time
Pick One by Weighted
Random Sampling (Bapst, in prep. C)
L ( obs gap of length t ) = r * exp (- r * t)
r = instantaneous sampling rate
Time-scaling Difficulties
• In an extinct clade of fossil taxa...
– Temporal placement of nodes constrained only by
appearance of descendant taxa
Time
B
A
C
B
A
C
Time-scaling Difficulties
• In an extinct clade of fossil taxa...
– Ancestors are potentially among our sampled taxa
(particularly in well-sampled clades)
Time
B
A
C
B
A
C
‘Budding’
Anagenesis
Zipper Method: A Stochastic Solution
• Produces stochastic samples of
time-scaled trees
• In each run, samples many
hypotheses of branch lengths
and (also) anc-desc
relationships, weighted by
sampling probabilities
• Cannot reconstruct multi-
budding scenario
• Requires integrated phylogenetic
inference method
• As many truly interesting things do
Problems Created by a
Really Big Supertree of Dead Plankton
• Dealing with topological uncertainty
– Need to resolve soft polytomies for time-scaling
– Randomly resolving can produce poor overall fit to
observed sequence of appearances
A
B
C
D
A
B
C
D
Time
Problems Created by a
Really Big Supertree of Dead Plankton
• Dealing with topological uncertainty
• Developed alternative method based on
stochastic sampling(Bapst, in prep. B)
– Uses sampling rates in the fossil record,
estimated from range data (Foote, 1997)
– Reconstructs more nodes correctly than
randomly resolving nodes in simulated trees
• Evolutionary analyses must be repeated over
large samples of potential topologies
• Additional uncertainties in time-scaling
phylogenies of fossil taxa

SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

  • 1.
    TIME-SCALING TREES IN THEFOSSIL RECORD David Bapst University of Chicago Geophysical Sciences (Pretty figure taken from Ted Garland’s website)
  • 2.
    Phylogeny in theFossil Record G H E K F J I D C Time B L A
  • 3.
    Sampling in theFossil Record G H Sampling Event E K F J I D C Time B L A
  • 4.
    What We Have AH KG E ID A K IG H D E Time
  • 5.
    What We Want AH KG E ID A K IG H D E Time A K IG H D E Time Original
  • 6.
    Of Time andTrees: ‘Basic’ Method • Clades are as old as earliest observed descendent A K IG H D E A K IG H D E Time OriginalBasic Method Smith, 1994
  • 7.
    Of Time andTrees: ‘Basic’ Method • Creates zero-length branches (…nuisance) • Common fix: Extend branches a small amount • No measurement of the uncertainty involved A K IG H D E A K IG H D E Time Nodes Separated by Zero-Length Branches OriginalBasic Method
  • 8.
    Dealing with Uncertainty: StochasticAnalyses • Example: Dealing with Discrete Intervals K D TimeIntervals t.1 t.2 t.3 t.4 t.5 t.6 A IG H E K D A IG H E Lloyd et al., 2012
  • 9.
    Dealing with Uncertainty: StochasticAnalyses • Example: Dealing with Discrete Intervals K D TimeIntervals t.1 t.2 t.3 t.4 t.5 t.6 A IG H E K D A I G H E Randomly Drop FADs and LADs Lloyd et al., 2012
  • 10.
    Dealing with Uncertainty: StochasticAnalyses • Example: Dealing with Discrete Intervals K D A I G H E K D TimeIntervals t.1 t.2 t.3 t.4 t.5 t.6 A IG H E K D A I G H E Repeat! K D A I G H E Lloyd et al., 2012
  • 11.
    Stochastic Time-Scaling • Randomlyselect new node ages across a cladogram – Give lower bounds by starting with root (with loose lower bound) and work up, node by node • Rinse, repeat many times to produce a large sample of trees ? Time
  • 12.
    Stochastic Time-Scaling: Extensions •Stochastically infer ancestor-descendant relationships by allowing node ages to occur after the earliest taxon appears Time
  • 13.
    • Stochastically resolvesoft polytomies by iteratively placing lineages over multiple steps C B A ?? B A C B A C B A Time Stochastic Time-Scaling: Extensions
  • 14.
    Stochastic Time-Scaling • Expectmore uncertainty in poorly-sampled fossil records • Weight selection of node ages via probability model of the unobserved evolutionary history at a node Time Pr(Σ gaps)
  • 15.
    A Probabilistic Modelof Gaps • Total minimum unobserved evolutionary history is dependent on sampling rates – Can obtain via methods such as the freqRat Time
  • 16.
    A Probabilistic Modelof Gaps • But also dependent on diversification: branching and extinction rates – Unsampled ‘twigs’ matter! • Node ages need to be calibrated with three rates – Cal3 time-scaling method – Probability of unobserved twigs derived with Matt Pennell and Emily King Foote et al., 1999; Friedman and Brazeau, 2011 Time
  • 17.
    A Probabilistic Modelof Gaps • But also dependent on diversification: branching and extinction rates – Unsampled ‘twigs’ matter! • Node ages need to be calibrated with three rates – Cal3 time-scaling method – Probability of unobserved twigs derived with Matt Pennell and Emily King Foote et al., 1999; Friedman and Brazeau, 2011 Time But how good is the time-scaling?
  • 18.
    A Probabilistic Modelof Gaps • But also dependent on diversification: branching and extinction rates – Unsampled ‘twigs’ matter! • Node ages need to be calibrated with three rates – Cal3 time-scaling method – Probability of unobserved twigs derived with Matt Pennell and Emily King Foote et al., 1999; Friedman and Brazeau, 2011 Time Let’s do some simulations to find out!
  • 19.
    So, How Goodis the Time-Scaling? Basic Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors Cal3 w/o Ancestors Rand-Res Median of Median Error in Per-Node Ages 100 Simulation Runs Samples of 20 Trees ~50 Taxa; Budding p= q = r = 0.1 per Ltu Disc. Intervals = 5 tu Rates est. via ML Using R library paleotree (Bapst, 2012; MEE)
  • 20.
    Squared-Error: Cal3 hasMore Error Basic Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors Cal3 w/o Ancestors Rand-Res Median of Median Squared-Error in Per-Node Ages Using R library paleotree (Bapst, 2012) 100 Simulation Runs Samples of 20 Trees ~50 Taxa; Budding p= q = r = 0.1 per Ltu Disc. Intervals = 5 tu Rates est. via ML
  • 21.
    Better Estimator ofUncertainty? Basic Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors Cal3 w/o Ancestors Rand-Res Proportion of True Node-Ages within 95% Age Quantiles Using R library paleotree (Bapst, 2012) 100 Simulation Runs Samples of 20 Trees ~50 Taxa; Budding p= q = r = 0.1 per Ltu Disc. Intervals = 5 tu Rates est. via ML
  • 22.
    Similar Patterns withTerminal Branch Lengths Basic Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors Cal3 w/o Ancestors Rand-Res Proportion within 95% Age Quantiles Median of Median Squared Error Using R library paleotree (Bapst, 2012) 100 Simulation Runs Samples of 20 Trees ~50 Taxa; Budding p= q = r = 0.1 per Ltu Disc. Intervals = 5 tu Rates est. via ML
  • 23.
    100 Simulation Runs Samplesof 20 Trees ~50 Taxa; Budding p= q = r = 0.1 per Ltu Disc. Intervals = 5 tu Rates est. via ML Using R library paleotree (Bapst, 2012) Similar Patterns with Terminal Branch Lengths Basic Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors Cal3 w/o Ancestors Rand-Res Proportion within 95% Age Quantiles Median of Median Squared Error But this isn’t what we’re interested in most often…
  • 24.
    Analyses of TraitEvolution Basic Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors Cal3 w/o Ancestors Rand-Res True Phylogeny Median Estimated Rate of Trait Change (Log-Scale Axis) 100 Simulation Runs Samples of 20 Trees ~50 Taxa; Budding p= q = r = 0.1 per Ltu Disc. Intervals = 5 tu Rates est. via ML
  • 25.
    Analyses of TraitEvolution Basic Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors Cal3 w/o Ancestors Rand-Res True Phylogeny AICc Weight of BM vs OU (for Trait Simulated Under BM) 100 Simulation Runs Samples of 20 Trees ~50 Taxa; Budding p= q = r = 0.1 per Ltu Disc. Intervals = 5 tu Rates est. via ML
  • 26.
    Conclusions • New time-scalingmethod: cal3 (paleotree v1.5) – Time-scaling calibrated with estimated rates of sampling, branching and extinction • Fidelity of time-scaling can be decoupled from fidelity of comparative analyses – Why? A certain je ne sais quoi of the time-scaling? • Frequency of int. ZLBs? Balance of total BrLen distribution? – If analytical performance cannot be easily extrapolated or predicted, simulations are key Thanks to M. Foote, E. King, J. Felsenstein, A. Haber, M. Pennell, G. Hunt, G. Lloyd, M. Friedman, P. Wagner, M. Webster, D. Jablonski, M. LaBarbera, K. Boyce, J. Mitchell, P. Harnik, G. Slater and the R-Sig-Phylo Email List!
  • 28.
    Slightly Better PolytomyResolver Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors timeLadderTree 100 Simulation Runs 20 Trees Samples ~50 Taxa; Budding p= q = r = 0.1 per Ltu Interval Length = 5 tu Rates est. via ML Proportion of Collapsed Clades Correctly Resolved
  • 29.
    Fidelity of VCVMatrices: All High Cal3 w/ Ancestors Cal3 w/o Ancestors 100 Simulation Runs 20 Trees Samples ~50 Taxa; Budding p= q = r = 0.1 per Ltu Interval Length = 5 tu Rates est. via ML Median Random Skewer Similarity of VCV Matrices to True VCV Matrix Cal3 w/o Ancestors Rand-Res Basic Rand-Res
  • 30.
    Samp. Rate Cond.gives best est across samp rates 100 trees each (not SRC), ~50 taxa (Lmy-1)
  • 31.
    Model Fitting AtOther Parameters Basic Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors Cal3 w/o Ancestors Rand-Res True Phylogeny AICc Weight of BM vs OU (for Trait Simulated Under BM) 100 Simulation Runs 20 Trees Samples ~50 Taxa; Budding p= q = r = 0.1 per Ltu Interval Length = 5 tu Rates est. via ML p=q =0.1 r=0.5 per Ltu
  • 32.
    Model Fitting AtOther Parameters Basic Rand-Res Cal3 w/ Ancestors Cal3 w/o Ancestors Cal3 w/o Ancestors Rand-Res True Phylogeny AICc Weight of BM vs OU (for Trait Simulated Under BM) 100 Simulation Runs 20 Trees Samples ~50 Taxa; Budding p= q = r = 0.1 per Ltu Interval Length = 5 tu Rates est. via ML Bifurcating Cladogenesis
  • 33.
    A Model ofUnobserved History Time Using R library paleotree (Bapst, 2012)
  • 34.
    A Model ofUnobserved History • Best fit with gamma models Time Using R library paleotree (Bapst, 2012)
  • 35.
    A Model ofUnobserved History • Importance of Diversification: Twigs Matter! • Calibrate with Three Rates: ‘Cal3’ time-scaling – Sampling, branching and extinction rates Time (Derived with Matt Pennell and Emily King)
  • 38.
    Of Time andTrees • Usually start with data like this… – (Thanks to Melanie Hopkins for this example data!) Taxon Ranges Unscaled Topology Hopkins, 2011 (Strict Consensus Tree)
  • 39.
    Of Time andTrees • Usually start with data like this… – (Thanks to Melanie Hopkins for this example data!) Taxon Ranges Unscaled Topology Hopkins, 2011 (Strict Consensus Tree)
  • 40.
    Of Time andTrees Time • …but want a time-scaled tree for analyses • How do we time-scale? Effect on analyses? Note: Time = Strat meters for Hopkins (2011)
  • 41.
    Of Time andTrees Time • One solution uses morph ‘clock’-like approach • But what about trees with no char change info?
  • 42.
    Previous Approaches • Unit-lengthbranches • “speciational” scale • No actual time-scale! – Is setting all branches equal a good approx? • Soft polytomies have to be randomly resolved beforehand – (True of most methods) Scaled Consensus Tree from Hopkins, Polytomies Rand-Res. # of Obs Branching Events
  • 43.
    Previous Approaches • BasicApproach • Clade age = age of earliest obs desc • Creates many zero- length branches (ZLB) – Unrealistic – Singular varcovar matrices: math for BM, etc. doesn’t work Time Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.
  • 44.
    Previous Approaches • AddingX to all branches (ABA) or just very short branches – X = A Number • Avoids singularity • Similar: MinBrLength • Widely used but how to pick X? • Can push root back unrealistically far Time Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.
  • 45.
    Previous Approaches • ‘Equal’Method – Graeme Lloyd • Pull root down by X, redistribute time on earlier branches along ZLBs – X = A Number • Widely used, but how choose X? Time Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.
  • 46.
    New Method: Sampling RateConditioned • Est samp rate r • Randomly pick ‘gaps’ in evol history to scale branches, using P(gaps|r) as weights • Repeat many times… Time Time-Scaled Consensus Tree from Hopkins, Polytomies NOT Rand-Res.
  • 47.
    New Method: Sampling RateConditioned • Est samp rate r • Randomly pick ‘gaps’ in evol history to scale branches, using P(gaps|r) as weights • Repeat many times, make many trees! – No single answer! • Resolve polytomies, identify ancestors 25 Time-Scaled Trees, Polytomies NOT Rand-Res.
  • 48.
    New Method: Sampling RateConditioned • Est samp rate r • Randomly pick ‘gaps’ in evol history to scale branches, using L(gaps|r) as weights • Repeat many times, make many trees! – No single answer! • Resolve polytomies, identify ancestors – Stratolikelihood-esque But How Do These Methods Perform? 25 Time-Scaled Trees, Polytomies NOT Rand-Res.
  • 49.
    New Method: Sampling RateConditioned • Est samp rate r • Randomly pick ‘gaps’ in evol history to scale branches, using L(gaps|r) as weights • Repeat many times, make many trees! – No single answer! • Resolve polytomies, identify ancestors – Stratolikelihood-esque Let’s Run Some Birth-Death- Sampling Simulations and Find Out! (…For Some Stuff) 25 Time-Scaled Trees, Polytomies NOT Rand-Res.
  • 50.
    Obs Data Est PDF (KDE) TrueSim Value (OUR TARGET) Quick Guide to Beanplots! Sampling Rates Methods Value Using R library beanplot
  • 51.
    Samp. Rate Cond.gives best est across samp rates 100 trees each (not SRC), ~50 taxa (Lmy-1)
  • 52.
    Signal estimate badacross the board; bias for zero 100 trees each (not SRC), ~50 taxa (Lmy-1)
  • 53.
    100 trees each(not SRC), ~50 taxa • Implications for trait evol model-fitting in fossil record? • Bias for low-signal models like OU? Signal estimate bad across the board; bias for zero (Lmy-1)
  • 54.
    100 trees each(not SRC), ~50 taxa High correlation generally; SRC performs worst Only Fully Extinct Clades(Lmy-1) 1 MY timebins
  • 55.
    • Similar resultsfor FirstDiffs • Corr increases with time step size used • Obs ranges good – True under diff sampling model? – Clades with living desc? • Lane et al. 2005 100 trees each (not SRC), ~50 taxa High correlation generally; SRC performs worst Only Fully Extinct Clades(Lmy-1) 1 MY timebins
  • 56.
    Samp. Rate Cond.better than randomly resolving 100 trees each , ~50 taxa ~50% of Nodes Removed (Lmy-1)
  • 57.
    • Results – Newmethod: Sampling Rate Conditioned – Good for BM rate, not so good for diversity curve – No method unbiased for estimating phylo signal • Future Work – Compare fidelity with poorly resolved trees – Test possible bias in trait model-fitting analyses • Simulations necessary to understand the reliability of methods in paleobiology • Code to be released soon in R library Thanks for Code: G. Lloyd, G. Hunt Thanks for Data: Melanie Hopkins Thanks for Comments and Ideas: M. Foote, M. Webster, D. Jablonski, E. King, P. Wagner, J. Mitchell, M. Friedman, G. Slater, M. Pennell, L. Harmon and the R-Sig-Phylo Email List! Results and Future Work
  • 61.
    – Effect ofrandom zombie lineages on div corr – Time-scale with joint L(gaps,morph) – Integrate with birth-death models of branch length distribution for paleo trees
  • 62.
    • Move uptree, node by node – calculate likelihoods for each possible position of a node – Randomly sample a position, using likelihoods as weights • Repeat to produce large sample of time-scaled trees B A B A B A B A B A L ( obs gap of length t ) = r * exp (- r * t) r = instantaneous sampling rate Time (Bapst, in prep. C)
  • 63.
    • Move uptree, node by node – calculate likelihoods for each possible position of a node – Randomly sample a position, using likelihoods as weights • Repeat to produce large sample of time-scaled trees B A B A B A B A B A Time Pick One by Weighted Random Sampling (Bapst, in prep. C) L ( obs gap of length t ) = r * exp (- r * t) r = instantaneous sampling rate
  • 64.
    Time-scaling Difficulties • Inan extinct clade of fossil taxa... – Temporal placement of nodes constrained only by appearance of descendant taxa Time B A C B A C
  • 65.
    Time-scaling Difficulties • Inan extinct clade of fossil taxa... – Ancestors are potentially among our sampled taxa (particularly in well-sampled clades) Time B A C B A C ‘Budding’ Anagenesis
  • 66.
    Zipper Method: AStochastic Solution • Produces stochastic samples of time-scaled trees • In each run, samples many hypotheses of branch lengths and (also) anc-desc relationships, weighted by sampling probabilities • Cannot reconstruct multi- budding scenario • Requires integrated phylogenetic inference method • As many truly interesting things do
  • 67.
    Problems Created bya Really Big Supertree of Dead Plankton • Dealing with topological uncertainty – Need to resolve soft polytomies for time-scaling – Randomly resolving can produce poor overall fit to observed sequence of appearances A B C D A B C D Time
  • 68.
    Problems Created bya Really Big Supertree of Dead Plankton • Dealing with topological uncertainty • Developed alternative method based on stochastic sampling(Bapst, in prep. B) – Uses sampling rates in the fossil record, estimated from range data (Foote, 1997) – Reconstructs more nodes correctly than randomly resolving nodes in simulated trees • Evolutionary analyses must be repeated over large samples of potential topologies • Additional uncertainties in time-scaling phylogenies of fossil taxa