@kcranstn!
http://slideshare.net/kcranstn
Enabling science
with the tree of life
Karen Cranston!
National Evolutionary
Synthesis Center (NESCent)
The tree of life provides
a means for organizing
and explaining
biodiversity data
Weigmann et al. PNAS, 2011
What do we want from a Tree of Life?
❖ complete = contains all of
biodiversity!
❖ dynamic = continuously updated
with new data!
❖ available digitally = browse,
query, download
Image: http://evolution.berkeley.edu
❖ Create a complete tree of life by synthesizing
published phylogenetic data!
❖ Provide tools for managing, synthesizing & sharing
phylogenetic data
http://opentreeoflife.org
Synthetic science
❖ Novel methods & analysis tools!
❖ Big data from existing data
Biodiversity Synthesis Center /
Encyclopedia of Life
National Evolutionary Synthesis Center
Challenges
❖ Incongruence: How do we detect and use conflict
between trees?!
❖ Availability: What data do we have to construct a
tree of life?!
❖ Synthesis: How do we combine data across the tree
of life?
What can we learn from conflict
between trees?
aactgtcgcatgttgacg...
aattgtcg-atgttgacg...
aac-gtcgcatgtcgacg...
aac-gtcgcatgtcgacg...
aac-gtcgcatgtcgacg...
aactgtcgcatgtcgacg...
aactgtcgcatgtcgacg...
aactgtcgcatgtcgacg...
Phylogenetic
inference
Many
likely trees
Gene tree uncertainty
Single gene
alignment
Bayesian phylogenetic inference
Input: sequence data + evolutionary model
Output = list of sampled phylogenies
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Sampled trees
Probability
Number of times
sampled ∝ probability
Is there a stable backbone
among the trees?!
What taxa have unstable
placement?
Summarize with agreement subtrees
0.20 0.15
0.25Pr=0.40
1 23 4 5
1 2 3 4 51 23 4 5
1 2 3 4 5
Pr=1.00
0.20 0.15
0.25Pr=0.40
1 3 4 5 1 3 4 5
1 3 4 51 3 4 5
0.20 0.15
0.25Pr=0.40
1 23 4 5
1 2 4 3 51 23 4 5
1 2 3 4 5
Pr=0.85
0.20 0.15
0.25Pr=0.40
1 3 4 5 1 4 3 5
1 3 4 51 3 4 5
Cranston, K.A. and B.H. Rannala. Summarizing a posterior distribution of phylogenies using
agreement subtrees. Systematic Biology 2007: 56(4), pp. 578-590.
Multiple sequence
alignments
Concatenate
Supermatrix
Species tree
Supertrees
Gene
duplication
Coalescent
Gene trees
Phylogenomics of rice (Oryza)
820,000 BAC-end
sequences for 9 diploid
Oryza species
1720 gene fragments!
2.4 million nucleotides
Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic
analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010
What are the biological
causes of gene tree
incongruence in rice?!
Do we need full genomes to
answer these questions?
Phylogenomics of rice (Oryza)
Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic
analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010
Concatenated analysis
Gene trees in Oryza
❖ Gene tree methods: recover every
possible topology!
❖ Species tree methods: many clades
not statistically significant
Cranston, K.A., B. Hurwitz, D. Ware, L. Stein, R.A. Wing. Species trees from highly
incongruent gene trees in rice. Systematic Biology. 2009: doi: 10.1093/syst- bio.syp054
Supermatrix topology
❖ Suggest incomplete lineage sorting and hybridization /
introgression in evolutionary history of rice
What data do we have for creating a
complete tree of life?
Gene tree signal in GenBank
How many trees can we build using all of
the data in GenBank and how are those
trees distributed across the tree of life?
All-vs-all BLAST at each NCBI taxonomy node
Sanderson, M.J., D.T. Boss, D. Chen, K.A. Cranston, and A. Wehe. The PhyLoTA Browser:
Processing GenBank for molecular phylogenetics research. Systematic Biology 2008: 57(3).
Arachis hypogaea
Arachis hypogaea
subsp. fastigiata
Arachis hypogaea
subsp. hypogaea
Arachis glabrata
subtree
clusters
Arachis
All possible clusters, alignments and trees
aactgtcgcatgttgacg...
aattgtcg-atgttgacg...
aac-gtcgcatgtcgacg...
aac-gtcgcatgtcgacg...
aac-gtcgcatgtcgacg...
aactgtcgcatgtcgacg...
aactgtcgcatgtcgacg...
aactgtcgcatgtcgacg...
❖ ~90000 clusters, alignments, trees available for download!
❖ data availability matrix at each NCBI node
❖ complete = contains all of biodiversity!
❖ dynamic = continuously updated with new data!
❖ available digitally = browse, query, download
http://opentreeoflife.org
Gordon Burleigh	

Keith Crandall	

Karl Gude	

David Hibbett	

Mark Holder	

Laura Katz	

Rick Ree 	

Stephen Smith	

Doug Soltis	

Tiffani Williams
Computer science!
Systematics!
Evolutionary theory!
Computational biology!
Bioinformatics!
Journalism
Even if there were phylogenies for all sequence
clusters in GenBank, would only represent a
small fraction of biodiversity
Two types of inputs
Phylogeny!
highly resolved!
computationally derived!
limited coverage
Taxonomy!
poorly resolved!
manually curated!
much more complete
~7000 trees from ~2600 studies
Phylografter: Rick Ree, Field Museum of Natural History
Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =
344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-
proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–
88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The number
EVOLUTION
~ 4% of all published
phylogenetic trees
Stoltzfus et al 2012
Trees generally
published as pictures
in PDFs
OpenTree Reference Taxonomy
+
+
+
patch files for manual edits
+
3,133,028 nodes and 2,559,835 ‘species’
Jonathan Rees, NESCent
How do we combine data to build
and use a tree of life?
Novel datastore for synthesis
Treemachine: Stephen Smith, Cody Hinchliff, Joseph Brown, U Michigan
Jim Allman, NESCent
Manual synthesis based on all data
Automated synthesis based on limited data
Inputs:
Published phylogenies
Taxonomies
• filter / weight input trees	

• re-synthesize
• process feedback 	

• input new trees
synthetic tree of life
Improving the synthetic tree
❖ Branch lengths & divergence times!
❖ Better synthesis using tree metadata!
❖ Community engagement!
❖ data deposition & curation!
❖ feedback & annotation
Moving beyond a single tree
❖ Detecting conflict and coverage!
❖ Visualization! !
❖ Enabling custom synthesis!
❖ Building out to other tools & resources
Leaf
Tree of Life
OPEN
What can we do with a tree of life?
aactgtcgcatgttgacg...
aattgtcg-atgttgacg...
aac-gtcgcatgtcgacg...
aac-gtcgcatgtcgacg...
aac-gtcgcatgtcgacg...
aactgtcgcatgtcgacg...
aactgtcgcatgtcgacg...
aactgtcgcatgtcgacg...
+ =
Image: Zephyris at the English language Wikipedia
10 million years
24 million years
Acer macrophyllum!
Betula lutea!
Aesculus glabra!
Tilia americana!
Ulmus rubra
Leaf patterns image from Walls RL: American Journal
of Botany 2011, 98(2):244-253.
Acer macrophyllum
Betula alleghaniensis
Aesculus glabra
Tilia americana
Ulmus rubra
Stoltzfus, A., Lapp, H., Matasci, N., … Cranston, K.A., ... & Jordan, G. (2013). Phylotastic! Making
tree-of-life knowledge accessible, reusable and convenient. BMC bioinformatics, 14(1), 158.
Collaborative data collection!
Validation of datasets!
Search & download across datasets
Get tree
Get tree
Leaf
Tree of Life
OPEN
What can we do with a tree of life?
University of Alberta: !
! Bruce Rannala!
!
University of Arizona: !
! Michael Sanderson!
!
NESCent:!
! Jonathan Rees!
! Jim Allman

Carleton Biology talk : March 2014

  • 1.
    @kcranstn! http://slideshare.net/kcranstn Enabling science with thetree of life Karen Cranston! National Evolutionary Synthesis Center (NESCent)
  • 2.
    The tree oflife provides a means for organizing and explaining biodiversity data Weigmann et al. PNAS, 2011
  • 3.
    What do wewant from a Tree of Life? ❖ complete = contains all of biodiversity! ❖ dynamic = continuously updated with new data! ❖ available digitally = browse, query, download Image: http://evolution.berkeley.edu
  • 4.
    ❖ Create acomplete tree of life by synthesizing published phylogenetic data! ❖ Provide tools for managing, synthesizing & sharing phylogenetic data http://opentreeoflife.org
  • 5.
    Synthetic science ❖ Novelmethods & analysis tools! ❖ Big data from existing data Biodiversity Synthesis Center / Encyclopedia of Life National Evolutionary Synthesis Center
  • 6.
    Challenges ❖ Incongruence: Howdo we detect and use conflict between trees?! ❖ Availability: What data do we have to construct a tree of life?! ❖ Synthesis: How do we combine data across the tree of life?
  • 7.
    What can welearn from conflict between trees?
  • 8.
  • 9.
    Bayesian phylogenetic inference Input:sequence data + evolutionary model Output = list of sampled phylogenies
  • 10.
    0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 1 3 57 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Sampled trees Probability Number of times sampled ∝ probability Is there a stable backbone among the trees?! What taxa have unstable placement?
  • 11.
    Summarize with agreementsubtrees 0.20 0.15 0.25Pr=0.40 1 23 4 5 1 2 3 4 51 23 4 5 1 2 3 4 5 Pr=1.00 0.20 0.15 0.25Pr=0.40 1 3 4 5 1 3 4 5 1 3 4 51 3 4 5
  • 12.
    0.20 0.15 0.25Pr=0.40 1 234 5 1 2 4 3 51 23 4 5 1 2 3 4 5 Pr=0.85 0.20 0.15 0.25Pr=0.40 1 3 4 5 1 4 3 5 1 3 4 51 3 4 5 Cranston, K.A. and B.H. Rannala. Summarizing a posterior distribution of phylogenies using agreement subtrees. Systematic Biology 2007: 56(4), pp. 578-590.
  • 13.
  • 14.
    Phylogenomics of rice(Oryza) 820,000 BAC-end sequences for 9 diploid Oryza species 1720 gene fragments! 2.4 million nucleotides Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010 What are the biological causes of gene tree incongruence in rice?! Do we need full genomes to answer these questions?
  • 15.
    Phylogenomics of rice(Oryza) Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010 Concatenated analysis
  • 16.
    Gene trees inOryza ❖ Gene tree methods: recover every possible topology! ❖ Species tree methods: many clades not statistically significant Cranston, K.A., B. Hurwitz, D. Ware, L. Stein, R.A. Wing. Species trees from highly incongruent gene trees in rice. Systematic Biology. 2009: doi: 10.1093/syst- bio.syp054 Supermatrix topology ❖ Suggest incomplete lineage sorting and hybridization / introgression in evolutionary history of rice
  • 17.
    What data dowe have for creating a complete tree of life?
  • 18.
    Gene tree signalin GenBank How many trees can we build using all of the data in GenBank and how are those trees distributed across the tree of life?
  • 19.
    All-vs-all BLAST ateach NCBI taxonomy node Sanderson, M.J., D.T. Boss, D. Chen, K.A. Cranston, and A. Wehe. The PhyLoTA Browser: Processing GenBank for molecular phylogenetics research. Systematic Biology 2008: 57(3). Arachis hypogaea Arachis hypogaea subsp. fastigiata Arachis hypogaea subsp. hypogaea Arachis glabrata subtree clusters Arachis
  • 20.
    All possible clusters,alignments and trees aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... ❖ ~90000 clusters, alignments, trees available for download! ❖ data availability matrix at each NCBI node
  • 22.
    ❖ complete =contains all of biodiversity! ❖ dynamic = continuously updated with new data! ❖ available digitally = browse, query, download http://opentreeoflife.org
  • 23.
    Gordon Burleigh Keith Crandall KarlGude David Hibbett Mark Holder Laura Katz Rick Ree Stephen Smith Doug Soltis Tiffani Williams Computer science! Systematics! Evolutionary theory! Computational biology! Bioinformatics! Journalism
  • 25.
    Even if therewere phylogenies for all sequence clusters in GenBank, would only represent a small fraction of biodiversity
  • 26.
    Two types ofinputs Phylogeny! highly resolved! computationally derived! limited coverage Taxonomy! poorly resolved! manually curated! much more complete
  • 27.
    ~7000 trees from~2600 studies Phylografter: Rick Ree, Field Museum of Natural History
  • 28.
    Fig. 1. Combinedmolecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL = 344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im- proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80– 88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The number EVOLUTION ~ 4% of all published phylogenetic trees Stoltzfus et al 2012 Trees generally published as pictures in PDFs
  • 29.
    OpenTree Reference Taxonomy + + + patchfiles for manual edits + 3,133,028 nodes and 2,559,835 ‘species’ Jonathan Rees, NESCent
  • 30.
    How do wecombine data to build and use a tree of life?
  • 31.
    Novel datastore forsynthesis Treemachine: Stephen Smith, Cody Hinchliff, Joseph Brown, U Michigan
  • 33.
  • 34.
    Manual synthesis basedon all data Automated synthesis based on limited data
  • 35.
    Inputs: Published phylogenies Taxonomies • filter/ weight input trees • re-synthesize • process feedback • input new trees synthetic tree of life
  • 36.
    Improving the synthetictree ❖ Branch lengths & divergence times! ❖ Better synthesis using tree metadata! ❖ Community engagement! ❖ data deposition & curation! ❖ feedback & annotation
  • 37.
    Moving beyond asingle tree ❖ Detecting conflict and coverage! ❖ Visualization! ! ❖ Enabling custom synthesis! ❖ Building out to other tools & resources
  • 38.
    Leaf Tree of Life OPEN Whatcan we do with a tree of life?
  • 39.
  • 41.
    Acer macrophyllum! Betula lutea! Aesculusglabra! Tilia americana! Ulmus rubra Leaf patterns image from Walls RL: American Journal of Botany 2011, 98(2):244-253. Acer macrophyllum Betula alleghaniensis Aesculus glabra Tilia americana Ulmus rubra
  • 42.
    Stoltzfus, A., Lapp,H., Matasci, N., … Cranston, K.A., ... & Jordan, G. (2013). Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient. BMC bioinformatics, 14(1), 158.
  • 44.
    Collaborative data collection! Validationof datasets! Search & download across datasets
  • 45.
  • 46.
    Leaf Tree of Life OPEN Whatcan we do with a tree of life?
  • 47.
    University of Alberta:! ! Bruce Rannala! ! University of Arizona: ! ! Michael Sanderson! ! NESCent:! ! Jonathan Rees! ! Jim Allman