2. The tree of life provides
a means for organizing
and explaining
biodiversity data
Weigmann et al. PNAS, 2011
3. What do we want from a Tree of Life?
❖ complete = contains all of
biodiversity!
❖ dynamic = continuously updated
with new data!
❖ available digitally = browse,
query, download
Image: http://evolution.berkeley.edu
4. ❖ Create a complete tree of life by synthesizing
published phylogenetic data!
❖ Provide tools for managing, synthesizing & sharing
phylogenetic data
http://opentreeoflife.org
5. Synthetic science
❖ Novel methods & analysis tools!
❖ Big data from existing data
Biodiversity Synthesis Center /
Encyclopedia of Life
National Evolutionary Synthesis Center
6. Challenges
❖ Incongruence: How do we detect and use conflict
between trees?!
❖ Availability: What data do we have to construct a
tree of life?!
❖ Synthesis: How do we combine data across the tree
of life?
10. 0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Sampled trees
Probability
Number of times
sampled ∝ probability
Is there a stable backbone
among the trees?!
What taxa have unstable
placement?
14. Phylogenomics of rice (Oryza)
820,000 BAC-end
sequences for 9 diploid
Oryza species
1720 gene fragments!
2.4 million nucleotides
Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic
analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010
What are the biological
causes of gene tree
incongruence in rice?!
Do we need full genomes to
answer these questions?
15. Phylogenomics of rice (Oryza)
Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic
analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010
Concatenated analysis
16. Gene trees in Oryza
❖ Gene tree methods: recover every
possible topology!
❖ Species tree methods: many clades
not statistically significant
Cranston, K.A., B. Hurwitz, D. Ware, L. Stein, R.A. Wing. Species trees from highly
incongruent gene trees in rice. Systematic Biology. 2009: doi: 10.1093/syst- bio.syp054
Supermatrix topology
❖ Suggest incomplete lineage sorting and hybridization /
introgression in evolutionary history of rice
17. What data do we have for creating a
complete tree of life?
18. Gene tree signal in GenBank
How many trees can we build using all of
the data in GenBank and how are those
trees distributed across the tree of life?
19. All-vs-all BLAST at each NCBI taxonomy node
Sanderson, M.J., D.T. Boss, D. Chen, K.A. Cranston, and A. Wehe. The PhyLoTA Browser:
Processing GenBank for molecular phylogenetics research. Systematic Biology 2008: 57(3).
Arachis hypogaea
Arachis hypogaea
subsp. fastigiata
Arachis hypogaea
subsp. hypogaea
Arachis glabrata
subtree
clusters
Arachis
20. All possible clusters, alignments and trees
aactgtcgcatgttgacg...
aattgtcg-atgttgacg...
aac-gtcgcatgtcgacg...
aac-gtcgcatgtcgacg...
aac-gtcgcatgtcgacg...
aactgtcgcatgtcgacg...
aactgtcgcatgtcgacg...
aactgtcgcatgtcgacg...
❖ ~90000 clusters, alignments, trees available for download!
❖ data availability matrix at each NCBI node
21.
22. ❖ complete = contains all of biodiversity!
❖ dynamic = continuously updated with new data!
❖ available digitally = browse, query, download
http://opentreeoflife.org
23. Gordon Burleigh
Keith Crandall
Karl Gude
David Hibbett
Mark Holder
Laura Katz
Rick Ree
Stephen Smith
Doug Soltis
Tiffani Williams
Computer science!
Systematics!
Evolutionary theory!
Computational biology!
Bioinformatics!
Journalism
24.
25. Even if there were phylogenies for all sequence
clusters in GenBank, would only represent a
small fraction of biodiversity
26. Two types of inputs
Phylogeny!
highly resolved!
computationally derived!
limited coverage
Taxonomy!
poorly resolved!
manually curated!
much more complete
27. ~7000 trees from ~2600 studies
Phylografter: Rick Ree, Field Museum of Natural History
28. Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =
344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-
proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–
88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The number
EVOLUTION
~ 4% of all published
phylogenetic trees
Stoltzfus et al 2012
Trees generally
published as pictures
in PDFs
36. Improving the synthetic tree
❖ Branch lengths & divergence times!
❖ Better synthesis using tree metadata!
❖ Community engagement!
❖ data deposition & curation!
❖ feedback & annotation
37. Moving beyond a single tree
❖ Detecting conflict and coverage!
❖ Visualization! !
❖ Enabling custom synthesis!
❖ Building out to other tools & resources