Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Tree of Life Phyloseminar 2014

1,011 views

Published on

Phyloseminar about the Open Tree of Life project, given Feb 2014 by Karen Cranston http://phyloseminar.org

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Open Tree of Life Phyloseminar 2014

  1. 1. TECHNICAL AND SOCIAL CHALLENGES IN SYNTHESIZING THE TREE OF LIFE Karen Cranston National Evolutionary Synthesis Center @kcranstn http://slideshare.net/kcranstn
  2. 2. IF WE “HAD” A TREE OF LIFE? complete = contains all of biodiversity dynamic = continuously updated with new data available digitally = browsing, querying, downloading
  3. 3. Produce a digitally-available phylogeny that contains all of biodiversity Provide tools for managing, analyzing and sharing phylogenetic data http://avatol.org
  4. 4. CHALLENGE: COMPLETENESS
  5. 5. Even if there were phylogenies for all species in GenBank, would only have a small fraction of biodiversity
  6. 6. NCBI taxonomy data (578 taxa) Soltis et al APG III phylogeny (30 taxa) from Stephen Smith
  7. 7. Dipsicales graph Synthesized tree; contains structure of phylogeny but all 578 taxa from Stephen Smith
  8. 8. Inputs: Published phylogenies Taxonomies • • • • filter / weight input trees synthesize into single data structure process feedback input new data sets complete tree of life
  9. 9. CHALLENGE: ACCESS TO PUBLISHED PHYLOGENIES
  10. 10. “Phylogeny provides a mechanism through which to interpret the patterns and processes of evolution and to predict the responses of life to rapid environmental change. Phylogenies and phylogenetic methods are now being used to enhance agriculture, identify and combat diseases, conserve biodiversity, and predict responses to global climate change and to biological invasions.” * (tl;dr: We need trees to do cool and important science) * OpenTree grant proposal
  11. 11. Expertise in phylogenetic inference Expertise in methods that use phylogenies
  12. 12. EVOLUTION TREE Fig._S1 = [&R] (2,1,((3,7),(4,(6,(33,(15,((20,(47,((51, (49,50)),(46,(48,(52,16)))))),(((44,45),((18,(12,(13,(43,42)))), ((41,((39,38),(40,17))),((35,9),(34,(36,37)))))),(32,(((21,19), ((30,14),(22,((11,31),((27,25),(23,((28,(24,8)),(10,(26, (5,29)))))))))),((((72,(63,57)),((65,64),((66,67),(68,(69,(70, (71,54))))))),(((82,59),(60,(61,(62,55)))),((80,(81,56)),((53, (77,78)),((75,73),(76,(58,74))))))),((88,((86,87),((85,84), (83,89)))),(79,((91,(93,(95,(92,(96,(94,90)))))),((100,(99,98)), (97,(((168,((172,185),((159,101),(109,157)))),(((181,(179,180)), ((102,(183,187)),(175,(176,(178,177))))),(212,((195,(210,211)), (199,((201,(196,202)),((194,197),((203,(192,205)),(204,(193, ((209,(208,206)),(198,(200,207))))))))))))),(113,(((154, ((169,170),(103,191))),((131,126),(128,((134,135),(129,(125, ((132,130),(104,133)))))))),((((190,166),((162,171),((116,120), (115,114)))),((122,(188,(186,108))),((118,(119,105)),(117,(158, (184,189)))))),((123,124),(((148,((165,161),(174,182))), ((106,121),(163,(167,127)))),((173,(156,(155,160))),(164, (((136,137),(139,(138,107))),((153,145),(112,(((146,143),(144, (140,141))),((142,152),(147,((110,111),(149, (150,151))))))))))))))))))))))))))))))))); Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL = 344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with improved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80– 88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The number of origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology of the organisms. thermore, a paraphyletic relationship of phorids and syrphids would support the hypothesis that their shared special mode of extraembryonic development (dorsal amnion closure) (26) evolved in the stem lineage of Cyclorrhapha and preceded the origin of the schizophoran amnioserosa. Wiegmann et al. To test this hypothesis, we used a relatively recent phylogenomic marker: small, noncoding, regulatory micro-RNAs (miRNAs). miRNAs exhibit a striking phylogenetic pattern of conservation across the metazoan tree of life, suggesting the accumulation and maintenance of miRNA families throughout organismal evolution import phytools! flyTree<-read.tree(“flies.tre”)! contMap(flyTree,flyData) PNAS Early Edition | 3 of 6 Weigmann et al. PNAS, 2011
  13. 13. Archiving sequence data is a community norm ~ 4% of all published phylogenetic trees Stoltzfus et al 2012 Archiving phylogenetic data is quite rare
  14. 14. OPENTREE PHYLOGENY INPUTS Surveyed >7000 phylogenetic studies in plants, fungi and animals, unicellular organisms Result: data for >2700 studies, >4800 trees
  15. 15. CHALLENGE: SELECTING BACKBONE TAXONOMY
  16. 16. Complete? Up to date with taxonomic literature? Phylogenetically-informed? Systematics research very slow….. Online taxonomic resources
  17. 17. OPEN TREE TAXONOMY + + + + patch files for manual edits (requires source info!)
  18. 18. • • 3,133,028 nodes and 2,559,835 ‘species’ https://github.com/OpenTreeOfLife/reference-taxonomy
  19. 19. CHALLENGE: PHYLOGENY CURATION
  20. 20. TREE Fig._S1 = [&R] (2,1,((3,7),(4,(6,(33,(15,((20,(47,((51, (49,50)),(46,(48,(52,16)))))),(((44,45),((18,(12,(13,(43,42)))), ((41,((39,38),(40,17))),((35,9),(34,(36,37)))))),(32,(((21,19), ((30,14),(22,((11,31),((27,25),(23,((28,(24,8)),(10,(26, (5,29)))))))))),((((72,(63,57)),((65,64),((66,67),(68,(69,(70, (71,54))))))),(((82,59),(60,(61,(62,55)))),((80,(81,56)),((53, (77,78)),((75,73),(76,(58,74))))))),((88,((86,87),((85,84), (83,89)))),(79,((91,(93,(95,(92,(96,(94,90)))))),((100,(99,98)), (97,(((168,((172,185),((159,101),(109,157)))),(((181,(179,180)), ((102,(183,187)),(175,(176,(178,177))))),(212,((195,(210,211)), (199,((201,(196,202)),((194,197),((203,(192,205)),(204,(193, ((209,(208,206)),(198,(200,207))))))))))))),(113,(((154, ((169,170),(103,191))),((131,126),(128,((134,135),(129,(125, ((132,130),(104,133)))))))),((((190,166),((162,171),((116,120), (115,114)))),((122,(188,(186,108))),((118,(119,105)),(117,(158, (184,189)))))),((123,124),(((148,((165,161),(174,182))), ((106,121),(163,(167,127)))),((173,(156,(155,160))),(164, (((136,137),(139,(138,107))),((153,145),(112,(((146,143),(144, (140,141))),((142,152),(147,((110,111),(149, (150,151))))))))))))))))))))))))))))))))); How was this tree inferred? What are the tip labels? Is it rooted correctly? What clade was the focus of the study?
  21. 21. CURATOR TOOLS
  22. 22. Data curation NeXSON (NeXML as JSON) Tree synthesis
  23. 23. Input names Mapped to taxonomy
  24. 24. Tree synthesis API layer Common data store of NexSON files (NeXML as JSON)
  25. 25. • • • • • Open source software tools for managing open data Publicly-accessible data store Full provenance data (who changed what & when?) Allows access & download through standard protocols (git) Where possible, using Creative Commons 0 waiver
  26. 26. CHALLENGE: SYNTHESIZING PHYLOGENY AND TAXONOMY
  27. 27. Graph databases are key Image:
  28. 28. Open Tree of Life
  29. 29. Thanks to Joseph Brown, Stephen Smith, Jonathan Rees, Jim Allman for getting the latest version up last night!
  30. 30. Thanks to Joseph Brown, Stephen Smith, Jonathan Rees, Jim Allman for getting the latest version up last night!
  31. 31. Synthesis details next week from Stephen Smith, University of Michigan Thursday, February 13, 1 pm EST phyloseminar.org
  32. 32. WHAT CAN WE DO WITH THESE DATA AND TOOLS?
  33. 33. Comparing phylogeny and taxonomy Rick Ree & Lyndon Coghill
  34. 34. Conflict within sets of trees Open Tree of Life Stephen Smith
  35. 35. Highlight under-studied parts of the tree Label internal nodes on phylogenies Test various methods for synthesis Quantify and visualize phylogenetic conflict Extract phylogeny given list of taxa Infer branch lengths on synthetic trees Organize biodiversity data phylogenetically … and many more, enabled by phylogenetic synthesis and digitally available phylogenetic data
  36. 36. COMING IN 2014 Hackathon, jointly with Clade-based curation and analysis workshops
  37. 37. QUESTIONS? PARTICIPATE? opentreeoflife@googlegroups.com opentreeoflife-software@googlegroups.com irc: #opentreeoflife on freenode http://github.com/OpenTreeOfLife
  38. 38. Gordon Burleigh Keith Crandall Karl Gude David Hibbett Mark Holder Laura Katz Rick Ree Stephen Smith Doug Soltis Tiffani Williams + many postdocs, grad students and undergrads @NESCent: Karen Cranston, Jonathan Rees, Jim Allman

×