assembling a draft
overall tree of life from
phylogenetic trees and
taxonomic databases
Jonathan A Rees
US National Evolut...
software team:
Jim Allman
Joseph Brown
Karen Cranston
Cody Hinchliff
Mark Holder
Jonathan Leto
Emily McTavish
Peter Midfor...
what is
open tree of life?
1. collect phylogenetic trees for best
possible coverage of entire tree of life

Drew BT, Gazis R, Cabezas P, Swithers KS,...
2. normalize tips so that they match
between source trees
label

normalization

Hemsleya amabilis HS454

524163 Hemsleya a...
3. synthesize a single ‘big tree’
algorithmically from the source trees

Smith SA, Brown JW, Hinchliff CE (2013) Analyzing...
4. expose source trees and ‘big tree’ in
various ways
exposing provenance
• links to studies
• links to data deposits (e.g. treebase)
• links to taxonomic database records
• me...
reference taxonomy
• used for normalization, internal node
labeling, gap-filling

• need NCBI taxonomy
• supplement with G...
‘open’
trees are not creative expression
... ergo no © protection
... ergo © licensing is meaningless
... CC0 is nice (and...
lessons
• NeXML and badgerfish are good
• machine-processable tip identity would
be awfully nice

• we were surprised by t...
© 2013 Jonathan A Rees / CC-BY 3.0
Upcoming SlideShare
Loading in …5
×

Assembling a draft overall tree of life from phylogenetic trees and taxonomic databases

423 views

Published on

Presentation at Symposium on sharing and delivery of reusable phylogenetic knowledge at TDWG conference 2013, http://wiki.tdwg.org/twiki/bin/view/Phylogenetics/PhyloSharingWorkshop2013

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Assembling a draft overall tree of life from phylogenetic trees and taxonomic databases

  1. 1. assembling a draft overall tree of life from phylogenetic trees and taxonomic databases Jonathan A Rees US National Evolutionary Synthesis Center Duke University rees@nescent.org TDWG, 31 October 2013
  2. 2. software team: Jim Allman Joseph Brown Karen Cranston Cody Hinchliff Mark Holder Jonathan Leto Emily McTavish Peter Midford Rick Ree Stephen Smith funding: US NSF
  3. 3. what is open tree of life?
  4. 4. 1. collect phylogenetic trees for best possible coverage of entire tree of life Drew BT, Gazis R, Cabezas P, Swithers KS, Deng J, et al. (2013) Lost Branches on the Tree of Life. PLoS Biol 11(9): e1001636. http://dx.doi.org/10.1371/journal.pbio.1001636
  5. 5. 2. normalize tips so that they match between source trees label normalization Hemsleya amabilis HS454 524163 Hemsleya amabilis Theria 4267989 Theria in Arthropoda Nicotiana suaveolans var excelsior 232354 Nicotiana rotundifolia Selysia prunifera 949305 Cayaponia prunifera
  6. 6. 3. synthesize a single ‘big tree’ algorithmically from the source trees Smith SA, Brown JW, Hinchliff CE (2013) Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs. PLoS Comput Biol 9(9): e1003223. http://dx.doi.org/10.1371/journal.pcbi.1003223
  7. 7. 4. expose source trees and ‘big tree’ in various ways
  8. 8. exposing provenance • links to studies • links to data deposits (e.g. treebase) • links to taxonomic database records • methods documentation • versioning
  9. 9. reference taxonomy • used for normalization, internal node labeling, gap-filling • need NCBI taxonomy • supplement with GBIF • patch system • future: other sources
  10. 10. ‘open’ trees are not creative expression ... ergo no © protection ... ergo © licensing is meaningless ... CC0 is nice (and required by Dryad), but no CC0 for legacy data or NCBI
  11. 11. lessons • NeXML and badgerfish are good • machine-processable tip identity would be awfully nice • we were surprised by tree rooting problem • provenance is an uphill battle • to be seen: github for data curation?
  12. 12. © 2013 Jonathan A Rees / CC-BY 3.0

×