TreeBASE2: Rise of the Machines

2,050 views

Published on

TreeBASE is a public repository of peer-reviewed phylogenetic knowledge . Researchers submit their results to TreeBASE when they are writing a manuscript based on them for publication in a suitable journal. The submitted data are assigned permanent, unique identifiers and web addresses that authors can refer to in their article. Anyone can locate and access the data once the study has been published by TreeBASE and by the targeted journal.

A prototype of this system has served the phylogenetics community well for a number of years, accumulating the results of thousands of studies. The usage model was that of a silo where data could only be accessed through a web browser, and only be downloaded in representations that omitted important associated metadata. A human with considerable expertise needed to read and interpret the web pages through which everything was served up to make sense out of what was available.

This model is not always practical. For example, phyloinformatic research often uses so much data that automation is becoming necessary. Where human intervention is no longer feasible, machines – which are stupid – must be able to do the job instead; and they need to be told what is what. This has spurred more explicit standardization of the syntax and semantics of phylogenetic knowledge. The latest version of TreeBASE facilitates this by adopting a collection of community standards:

• PhyloWS for automated searching using a contextual query language and retrieval using a clearly defined URL API.

• NeXML for robust data syntax and flexible metadata annotation.

• CDAO (and other ontologies) for defining the semantics of the metadata.

We will present an overview of how these components work together to make phylogenetic knowledge accessible to machines on the semantic web. Using this new architecture, client side software (including off-the-shelve tools such as RSS readers) can query, transform and download TreeBASE data autonomously.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,050
On SlideShare
0
From Embeds
0
Number of Embeds
101
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • What are the goals of the TreeBASE project? Who are/were involved and why?
  • Web site interface for version 2
  • Why a machine-readable internet?
  • Here is a simple example of how the API can be used. By constructing a search URL that does exact matching on journal names (using the prism.publicationName predicate), these journals can provide special feeds that show all TreeBASE studies they’ve published. This example is for Evolution.
  • Here’s an example that uses the Yahoo! Pipes tool to turns the list of NCBI taxon identifiers that TreeBASE stores for a given study into a list of all UniProt sequence records for those taxa.
  • A slightly more complex example also using Yahoo pipes! This goes from the uBio namebank identifiers that TreeBASE stores to the RDF that uBio returns for each namebank record and from that it retrieves all mappings to the Tree of Life web project.
  • This example shows that with a minimal amount of JavaScript coding a google map can be added to a web page (first code block), and the taxa for a given study can be mapped onto it using the DarwinCore coordinate annotations that TreeBASE stores.
  • TreeBASE2: Rise of the Machines

    1. 1. Rise of the machines Rutger A. Vos, Hilmar Lapp, William H. Piel, Val Tannen
    2. 2. What is TreeBASE? <ul><li>A repository of user-submitted </li></ul><ul><li>phylogenies and source data. </li></ul><ul><li>Accepts all types of </li></ul><ul><li>comparative data for </li></ul><ul><li>all taxa. </li></ul><ul><li>Data are public once published </li></ul><ul><li>in a peer-reviewed medium. </li></ul><ul><li>Data in preparation are available to </li></ul><ul><li>the editors or reviewers using a special access code. </li></ul>
    3. 3. Web app <ul><li>  </li></ul>
    4. 4. The machine-readable web <ul><li>Locations on the web are increasingly visited by machines instead of human eyes. </li></ul><ul><li>Programmable interfaces with structured return values </li></ul>
    5. 5. The TreeBASE web API <ul><li>Objects can be found using CQL </li></ul><ul><li>Permanent , simple, URLs </li></ul><ul><li>Every object a resolvable resource </li></ul><ul><li>Serialized in various formats </li></ul>
    6. 6. Searching using CQL <ul><li>Contextual Query Language – standard for queries to information retrieval systems </li></ul><ul><li>Hides database schema </li></ul><ul><li>Instead, search on predicates </li></ul><ul><li>Search results as RSS </li></ul>
    7. 7. PhyloWS Resource URI PURL domain Phylogenetics TreeBASE PhyloWS Object ID http://purl.org/phylo/treebase/phylows/study/TB2:S1787
    8. 8. Same data, different formats <ul><li>?format= NEXUS </li></ul><ul><ul><li>Flat file standard for phylogenetics </li></ul></ul><ul><li>?format= NeXML </li></ul><ul><ul><li>XML redesign of NEXUS </li></ul></ul><ul><li>?format= RDF </li></ul><ul><ul><li>CDAO/RDF mapping of NeXML </li></ul></ul><ul><li>?format= HTML </li></ul><ul><ul><li>Web page describing the resource </li></ul></ul><ul><li>?format= RSS1 </li></ul><ul><ul><li>RSS1.0 feed for search results </li></ul></ul>
    9. 9. Data and metadata <ul><li>TreeBASE holds a lot of metadata , for example: </li></ul><ul><ul><li>Lat/long coordinates for specimen samples </li></ul></ul><ul><ul><li>Literature metadata </li></ul></ul><ul><ul><li>Identifiers </li></ul></ul><ul><ul><li>Using the newer serialization formats (NeXML and RDF) we can embed all of them using predicates from a variety of ontologies . </li></ul></ul>
    10. 10. External links Taxon Taxon variant Study
    11. 11. Example: Journal feeds prism.publicationName==Evolution
    12. 12. Example: UniProt sequences TreeBASE stores NCBI taxonomy identifiers Standard tools can rewrite these linkout URLs Result is a corresponding list of UniProt records
    13. 13. Example: ToLWeb pages TreeBASE maps to uBio using skos:closeMatch ... … and uBio to ToL using gla:mapping
    14. 14. Example: geocoding TreeBASE uses DarwinCore for lat/lon annotations
    15. 15. What's next? <ul><li>Make TreeBASE LinkedData compliant </li></ul><ul><li>Make TreeBASE extensible with additional annotations using external triple store </li></ul>
    16. 16. Acknowledgements

    ×