Karen Cranston
National Evolutionary Synthesis Center
@kcranstn
http://www.slideshare.net/kcranstn
opentreeoflife.org
What does it mean to “have” the tree of life?
complete & dynamic
browse, download, query
use for research questions
implies digital access
0"
2000"
4000"
6000"
8000"
10000"
12000"
1978"1979"1980"1981"1982"1983"1984"1985"1986"1987"1988"1989"1990"1991"1992"1993"1994"1995"1996"1997"1998"1999"2000"2001"2002"2003"2004"2005"2006"2007"2008"
Number'of'papers'published'
Year'
Phylogeny'papers,'1978;2008'
Source:"ISI"Web"of"Science""
Rapid"increase"in"applica?ons"of"
phylogeny,"beginning"in"early"1990s"
graph from David Hillis
Goals
1. Synthesize a complete draft tree of life from existing phylogenies
2. Release in year 1 with:
a. engaging public interface
b. ability to upload new data, explore conflict, see provenance
c. open data: tree, subtrees and source data
Graph databases of
taxonomy + source trees
•filter / weight input trees
•combine into synthetic trees
•feedback
•input new data sets
~ 4% of all published
phylogenetic trees
Stoltzfus et al 2012
Inputs: Phylogenetic data
Archiving sequence data is a community norm
assembly
alignment
inference
expertise
time
$$$
thermore, a paraphyletic relationship of phorids and syrphids
would support the hypothesis that their shared special mode of
extraembryonic development (dorsal amnion closure) (26)
evolved in the stem lineage of Cyclorrhapha and preceded the
origin of the schizophoran amnioserosa.
To test this hypothesis, we used a relatively recent phylogenomic
marker: small, noncoding, regulatory micro-RNAs (miRNAs).
miRNAs exhibit a striking phylogenetic pattern of conservation
across the metazoan tree of life, suggesting the accumulation and
maintenance of miRNA families throughout organismal evolution
Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =
344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-
proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–
88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The number
of origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology of
the organisms.
Wiegmann et al. PNAS Early Edition | 3 of 6
Why do we need to database phylogenetic trees?
Heroic data collection efforts
Surveyed >7000 phylogenetic studies in plants, fungi and
animals, unicellular organisms
Result: repository of data for >2300 studies, >4800 trees
Remaining data not available digitally
Manuscript accepted to PLoS Biology
Inputs: Taxonomy
Large fraction of species not represented in phylogenies
taxonomy provides backbone & coverage at tips
Need name resolution services for data cleaning
Process
Source trees
(Phylografter) Data storage &
synthesis
(treemachine)
OpenTree:
visualization,
search, downloadTaxonomies
(taxamachine)
Source tree management
phylografter.opentreeoflife.org
Source tree & taxonomy synthesis
Novel graph database for phylogenies (treemachine) and
taxonomy (taxomachine)
Allows for efficient storage and retrieval
OpenTree
dev.opentreeoflife/opentree
Public tree of life
publictreeoflife.com/tree
open data: requiring CC0 license on source trees
open source software: https://github.com/OpenTreeOfLife
wiki: http://opentree.wikispaces.com/ (52 members)
public mailing list (67 members)
“Open” Tree of Life
Community engagement
~50 visitors per day to blog.opentreeoflife.org
@opentreeoflife on Twitter (~900 followers)
Tree of Life symposium: Evolution 2013
Hackathon in year 2 (joint with Arbor)
Collaborations
providing images and text for public tree
developing methods for subtree extraction
summer student providing links to ToLWeb
pages
treeviz project from U Indiana MOOC,
upcoming summer intern
year 2-3 plans for data archiving / harvest
Assessment: PI survey
general satisfaction with progress on data collection,
synthesis and software development
more focus on incentives for users
more integration across labs
Assessment: Advisory board	
Members:
David Hillis (UT Austin)
Jan Reichelt (Mendeley)
Andy Sinauer (Sinauer Associates)
Planning meeting for start of year 2
On track for year 1 release
1. Synthesize a complete draft tree of life from existing phylogenies
2. Release in year 1 with:
a. engaging public interface
b. ability to upload new data, explore conflict, see provenance
c. open data: tree, subtrees and source data
Goals for year 2
Refine draft tree based on user feedback
Empirical use cases drive development
Incentives for users / data contributors
Collaboration with external projects (AVAToL, ToLWeb,
Phylotastic, Dryad)
opentreeoflife.org

Open Tree of Life @NSF

  • 1.
    Karen Cranston National EvolutionarySynthesis Center @kcranstn http://www.slideshare.net/kcranstn opentreeoflife.org
  • 2.
    What does itmean to “have” the tree of life? complete & dynamic browse, download, query use for research questions implies digital access
  • 3.
  • 4.
    Goals 1. Synthesize acomplete draft tree of life from existing phylogenies 2. Release in year 1 with: a. engaging public interface b. ability to upload new data, explore conflict, see provenance c. open data: tree, subtrees and source data
  • 5.
    Graph databases of taxonomy+ source trees •filter / weight input trees •combine into synthetic trees •feedback •input new data sets
  • 6.
    ~ 4% ofall published phylogenetic trees Stoltzfus et al 2012 Inputs: Phylogenetic data Archiving sequence data is a community norm
  • 7.
    assembly alignment inference expertise time $$$ thermore, a paraphyleticrelationship of phorids and syrphids would support the hypothesis that their shared special mode of extraembryonic development (dorsal amnion closure) (26) evolved in the stem lineage of Cyclorrhapha and preceded the origin of the schizophoran amnioserosa. To test this hypothesis, we used a relatively recent phylogenomic marker: small, noncoding, regulatory micro-RNAs (miRNAs). miRNAs exhibit a striking phylogenetic pattern of conservation across the metazoan tree of life, suggesting the accumulation and maintenance of miRNA families throughout organismal evolution Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL = 344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im- proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80– 88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The number of origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology of the organisms. Wiegmann et al. PNAS Early Edition | 3 of 6 Why do we need to database phylogenetic trees?
  • 8.
    Heroic data collectionefforts Surveyed >7000 phylogenetic studies in plants, fungi and animals, unicellular organisms Result: repository of data for >2300 studies, >4800 trees Remaining data not available digitally Manuscript accepted to PLoS Biology
  • 9.
    Inputs: Taxonomy Large fractionof species not represented in phylogenies taxonomy provides backbone & coverage at tips Need name resolution services for data cleaning
  • 10.
    Process Source trees (Phylografter) Datastorage & synthesis (treemachine) OpenTree: visualization, search, downloadTaxonomies (taxamachine)
  • 11.
  • 12.
    Source tree &taxonomy synthesis Novel graph database for phylogenies (treemachine) and taxonomy (taxomachine) Allows for efficient storage and retrieval
  • 13.
  • 14.
    Public tree oflife publictreeoflife.com/tree
  • 15.
    open data: requiringCC0 license on source trees open source software: https://github.com/OpenTreeOfLife wiki: http://opentree.wikispaces.com/ (52 members) public mailing list (67 members) “Open” Tree of Life
  • 16.
    Community engagement ~50 visitorsper day to blog.opentreeoflife.org @opentreeoflife on Twitter (~900 followers) Tree of Life symposium: Evolution 2013 Hackathon in year 2 (joint with Arbor)
  • 17.
    Collaborations providing images andtext for public tree developing methods for subtree extraction summer student providing links to ToLWeb pages treeviz project from U Indiana MOOC, upcoming summer intern year 2-3 plans for data archiving / harvest
  • 18.
    Assessment: PI survey generalsatisfaction with progress on data collection, synthesis and software development more focus on incentives for users more integration across labs
  • 19.
    Assessment: Advisory board Members: DavidHillis (UT Austin) Jan Reichelt (Mendeley) Andy Sinauer (Sinauer Associates) Planning meeting for start of year 2
  • 20.
    On track foryear 1 release 1. Synthesize a complete draft tree of life from existing phylogenies 2. Release in year 1 with: a. engaging public interface b. ability to upload new data, explore conflict, see provenance c. open data: tree, subtrees and source data
  • 21.
    Goals for year2 Refine draft tree based on user feedback Empirical use cases drive development Incentives for users / data contributors Collaboration with external projects (AVAToL, ToLWeb, Phylotastic, Dryad)
  • 22.