Phyloinformatics and the Semantic Web


Published on

Departmental seminar to the University of Bath, 21 March 2011.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Thank for invitation-Thank for showing up given other lecture-Introduce self-Talk title
  • -Mention figure on the right
  • -Mention dobzhansky
  • Here’s an example that uses the Yahoo! Pipes tool to turns the list of NCBI taxon identifiers that TreeBASE stores for a given study into a list of all UniProt sequence records for those taxa.
  • This example shows that with a minimal amount of JavaScript coding a google map can be added to a web page (first code block), and the taxa for a given study can be mapped onto it using the DarwinCore coordinate annotations that TreeBASE stores.
  • Phyloinformatics and the Semantic Web

    1. 1. Phyloinformatics and the Semantic Web<br />Rutger Vos<br />
    2. 2. Outline<br />What is phyloinformatics and why should you care?<br />How we got here and where we are now<br />How the semantic web can help<br />Projects that apply the semantic web to phyloinformatics<br />Examples of linked data<br />Where to next<br />
    3. 3. What is Phyloinformatics?<br />Phylogenetics:<br />“The systematic study of organism relationships based on evolutionary similarities and differences.”<br />Informatics:<br />“The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”<br />
    4. 4. Why should you care?<br />Firstly, <br />“Nothing in evolution makes sense except in the light of phylogeny”<br />Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile?<br />But if that doesn’t convince you…<br />
    5. 5. As a consumer of phylogenetic data<br />The “New Biology” is coming:<br />“Major advances will take place via integration and synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009)<br />Presumably, this will involve retrieving and classifying.<br />
    6. 6. As a consumer of phylogenetic data<br />Or maybe for you phylogeny is simply a nuisance:<br />Functional prediction<br />Comparative analysis<br />Ortholog finding<br />Etc.<br />But it would still be nice to have that out of the way painlessly…<br />
    7. 7. As a producer of phylogenetic data<br />Many journals require proper storage of data described in a manuscript.<br />Funding agencies require dissemination and sharing of research results.<br />
    8. 8. The Past<br />Everything was closed:<br />Idiosyncratic, private data<br /> “pay-walls”<br />Closed source software<br />No accessible publishing medium<br />
    9. 9. The Present<br />Science is opening up:<br />Open data<br />Open access publishing<br />Open source software<br />Publishing is now accessible to everyone, online<br />
    10. 10. Our current nightmare<br />Documents, <br />documents everywhere<br />
    11. 11. The current web makes sense to us<br />
    12. 12. But not to a machine <br />
    13. 13. What was informatics again?<br />“The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”<br />
    14. 14.
    15. 15. This is too hard<br />O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittlemanand A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.<br />
    16. 16. Let’s delegate that<br />
    17. 17. Instead of linked documents<br />
    18. 18. A web of linked concepts<br />
    19. 19. Concepts connected by statements<br />
    20. 20. Concepts are defined in ontologies<br />“An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.”<br />
    21. 21. Expressing concepts in data syntax<br />
    22. 22. Concepts are linked<br />Linked by statements called “triples”<br />A triple is a statement<br />subject<br />predicate<br />object<br />Any part of a triple may have to be uniquely identifiable. For this we use URLs.<br />
    23. 23. An applied example<br />Triple 1<br /> Subject: <><br /> Predicate: <><br /> Object: 2342.323<br />i.e. -lnL(tree1) = 2342.323<br />Triple 2<br /> Subject: <><br /> Predicate: <><br /> Object: 2341.184<br />i.e. -lnL(tree2) = 2341.184<br />
    24. 24. What’s the better tree?<br />The ontology defines what a likelihood is and how to compare negative log likelihoods.<br />Hence, automated reasoning can conclude that tree2 is the better tree. <br />
    25. 25. URLs for phylogenetics<br />PhyloWS doesn’t just provide an anchor to identify phylogenetic data, it also enables searching and retrieval.<br />
    26. 26. The EvoInfo “stack”<br />
    27. 27. TreeBASE<br />
    28. 28. External links<br />Study<br />Taxon<br />variant<br />Taxon<br />
    29. 29. A simple example<br />TreeBASE maps <br />to uBio using skos:closeMatch...<br />…and uBio to ToL <br />using gla:mapping<br />
    30. 30. Another Example, UniProt sequences<br />Standard tools can rewrite these linkout URLs <br />Result is a corresponding list of UniProt records<br />TreeBASE stores NCBI taxonomy identifiers<br />
    31. 31. Another Example, Geocoding<br />TreeBASE uses DarwinCore for lat/lon annotations<br />
    32. 32. Many online data repositories<br />
    33. 33. Challenges<br />Fragile: many services offline in Japan<br />Data gets bigger and bigger<br />Many concepts not yet in ontologies<br />Many data still “locked in” in publications<br />
    34. 34. The Future<br />
    35. 35. The cloud<br />Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo)<br />Data will be stored in the cloud (Big Table, FreeBase)<br />
    36. 36. Interpreting locked in knowledge<br />Text and images meant for humans are being processed by machines. Examples:<br />Taxon name mining (BHL)<br />Gene name and function mining<br />Tree figure processing<br />Automated annotation<br />
    37. 37. Summary<br />Phyloinformatics is moving from closed to open to linked data<br />Concepts and syntax are increasingly formalized and machine readable<br />Automated queries across integrated resources will enable synthetic research<br />Still lots to do to deploy these technologies and unlock legacy data<br />
    38. 38. Acknowledgements<br />Thank you for your attention!<br />Also, many thanks to:<br /> The Pagel lab at UoR<br /> The EvoInfo group<br /> Val Tannen<br /> Wayne Maddison<br /> William Piel<br /> Hilmar Lapp<br />ArlinStoltzfus<br />