Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

1,258 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,258
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

  1. 1. Towards a Simple, Standards Compliant, and Generic Phylogenetic Database Module Hilmar Lapp and Todd Vision National Evolutionary Synthesis Center (NESCent)
  2. 2. Rich diversity of online data repositories
  3. 3. Most data is not online Syst. Biol. Data Archive Clark J.R. et al. (2008) A Comparative Study in Ancestral Range Reconstruction Methods: Retracing the Uncertain Histories of Insular Lineages. Systematic Biology,57:5,693-707
  4. 4. Little standards support
  5. 5. Accelerating knowledge dissemination: A Story • Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species. • Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees. • The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.
  6. 6. Accelerating knowledge dissemination: A Story • Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species. • Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees. • The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.
  7. 7. Accelerating knowledge dissemination: A Story • Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species. • Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees. • The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.
  8. 8. Accelerating knowledge dissemination: A Story • Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species. • Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees. • The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.
  9. 9. • Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data. • As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data. • Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.
  10. 10. • Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data. • As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data. • Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.
  11. 11. • Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data. • As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data. • Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.
  12. 12. • Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data. • As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data. • Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.
  13. 13. • Other researchers easily download and integrate her results in their own analyses. • Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it. • Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs. • Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
  14. 14. • Other researchers easily download and integrate her results in their own analyses. • Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it. • Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs. • Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
  15. 15. • Other researchers easily download and integrate her results in their own analyses. • Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it. • Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs. • Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
  16. 16. • Other researchers easily download and integrate her results in their own analyses. • Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it. • Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs. • Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
  17. 17. • Other researchers easily download and integrate her results in their own analyses. • Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it. • Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs. • Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
  18. 18. How to get there? Embeddable Tools Client-based Query Data Aggregators, (PhyloWidget, Interfaces Mash-up Applications GBrowse TreeWidget) Data and other services API (PhyloWS) supporting exchange standards (NeXML, CDAO) Data Middleware: Query & Persistence Management Management Tools Topology- oriented Phylogenetic Database supporting Queries - ontologies - arbitrary metadata Precompute (PhyloDB / BioSQL) Query Optimization Molecular Data (Sequences, Language binding for database model Data loading tools Annotation) (BioPerl, Biojava, Biopython, Bioruby) (BioSQL) Parser libraries for data and semantics Ontologies standards (NeXML, CDAO) Phylogenetic Metadata Character (Evolutionary, ITIS, NCBI Trees Taxonomies (Gene, Species) Data Biodiversity, Taxonomies Computational)
  19. 19. Achieving the Vision: Coordinated & open development, nurturing & harnessing existing efforts
  20. 20. Database: PhyloDB module Edge_Qualifier_ Node_Qualifier_ Node_Path Value Value -Value -Value - distance -Rank -Rank Edge Node_Dbxref Tree_Root -Is_Alternate Node -Significance Tree_Dbxref Node_Taxon -Label -Rank -Left_Idx -Right_Idx Node_Bioentry Tree_Qualifier_ -Rank Value -Value -Rank Tree -Name -Identifier Taxon -Is_Rooted Term Bioentry Dbxref Biodatabase Ontology
  21. 21. Syntax: NeXML
  22. 22. Semantics: CDAO http://www.evolutionaryontology.org
  23. 23. Service API: PhyloWS http://evoinfo.nescent.org/PhyloWS
  24. 24. Embeddable tools:
  25. 25. Community-owned, reusable software
  26. 26. Nurturing the community
  27. 27. Phyloinformatics Hackathon, Dec 2006
  28. 28. • James Estill (U. Georgia): “A Perl-based Command Line Interface to a Topological Query Application for BioSQL in Support of High Throughput Classification and Analysis of LTR Retrotransposons in Plant Genomes”
  29. 29. Acknowledgments • Phyloinformatics • Sponsors & support: Hackathon participants • NESCent • BioHackathon 2008 • BioSynC participants • TDWG • EvoInformatics Working Group • DBCLS, CBRC (Japan) participants • Google Summer of Code Students: Jamie Estill

×