Synthesising disparate data resources to obtain composite estimates of geophylogeny
Upcoming SlideShare
Loading in...5
×
 

Synthesising disparate data resources to obtain composite estimates of geophylogeny

on

  • 598 views

Invited talk to the 2nd BioVeL workshop, Gothenburg, Sweden, 10 May 2012

Invited talk to the 2nd BioVeL workshop, Gothenburg, Sweden, 10 May 2012

Statistics

Views

Total Views
598
Views on SlideShare
598
Embed Views
0

Actions

Likes
1
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • What is the tree of lifeI present a simple idea, illustrated with a workflowWorkflow is based on PhyloWS, a URL API started at BH08Workflow uses semantic annotation using RDFa
  • *workflow developed at thecomphy course here in Kyoto*students learned how to operate on phylogenetic data using Bio* toolkits*one of the problem sets was on how to build a large phylogenetic tree and annotate it using web services.
  • *first we used ToL PhyloWS service*what is tolweb?*what does the service return?*students learned recursion by grafting children on their parent*service also returned metadata
  • *then we used uBio PhyloWS service*what is uBio?*what does the service return?*we expanded unresolved genera into their species*we also fetched linkout metadata
  • *one of the links out point to TreeBASE*what is TreeBASE?*fetched source trees to further resolve skeleton using supertree approach
  • *another uBio annotation is NCBI taxon IDs*we used this to access TimeTree PhyloWS service*what is TimeTree?*used node ages to anchor molecular branch lengths
  • *also accessed GBIF, which is not PhyloWS, but an XML REST API*what is GBIF?*what does it return?*we attached lat/lon coordinates to nodes in our tree
  • *now we have a topology…*and taxonomic statements*and age estimates*and paleontological data*and biogeographic data*we can view all these data in different ways, e.g. in google earth*we can see strepsirrhines, lemurs, lorises, old world monkeys and new world monkeys
  • *the tree of life can be overgrown with metadata*like epiphytes on a tree in a rainforest*we can view these metadata in different ways*unfortunately, services still need a lot of work: standards adoption, choosing the best predicates and values, identifiers!

Synthesising disparate data resources to obtain composite estimates of geophylogeny Synthesising disparate data resources to obtain composite estimates of geophylogeny Presentation Transcript

  • SYNTHESISING DISPARATE DATARESOURCES TO OBTAIN COMPOSITEESTIMATES OF GEOPHYLOGENYRutger Vos
  • A simple assignment? Refine a tree for the Primates with taxonomic and systematic data Add divergence dates Add occurrence data Visualize the result Use public web services
  • Actually not so easy…
  • The Tree of Life Web Service  Using PhyloWS we traversed the Tree of Life and built a local, semantically annotated copy of the Primate clade
  • Adding taxonomic metadata Using the uBio PhyloWS service we enhanced our tree with further taxonomic annotations and links, and expanded some genera
  • Fetching additional tree data  Using the TreeBASE PhyloWS service we fetched additional data to resolve the tree further using a “supertree” approach
  • Computing node ages The TimeTreePhyloWS service allowed us to anchor molecular (i.e. relative) node ages on absolute dates
  • Adding occurrence data  Using the GBIF XML API, we then fetched occurrence records for the species in our tree
  • Visualizing the result
  • Implementation Except for GBIF, all services:  returnNeXML  implement PhyloWS Semantic annotations using RDFa Glued together with Perl
  • Challenges  Although some services have the same API, no GUI exists to chain them together  No web services for computationally intensive steps  Data and metadata are messy and sparse
  • Conclusions The tree of life can be covered with all sorts of metadata (taxonomic, molecular, biogeographic, paleontological), viewable in different ways Standards still incompletely defined and adhered to, though
  • Shameless plug: PhyloTastic  A web service to extract subsets of taxa from megatrees and annotate them  Deliverable of the first HIP hackathon, at NESCent, in June 2012
  • Acknowledgements