Bio2RDF@BH2010

5,169
-1

Published on

Presentation of Bio2RDF Cognoscope on 8 Febuary 2010 at BioHackathon 2010 http://hackathon3.dbcls.jp/

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,169
On Slideshare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
98
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Bio2RDF@BH2010

  1. 1. Bio2RDF Cognoscope A killer app for the life scienc e François Belleau
  2. 2. Agenda <ul><li>The problem
  3. 3. What is RDF ?
  4. 4. The vision
  5. 5. What is know about hexokinase ?
  6. 6. A new approche: The Cognoscope </li></ul>
  7. 7. http://www.pcworld.idg.com.au/article/132245/berners-lee_seeks_killer_app_semantic_web &quot;Similarly, if we could get critical mass in life sciences, if we get a half a dozen or a dozen set of ontologies, the core ones for drug discovery out there, then suddenly the Semantic Web within life sciences would have a critical mass. It'll snowball much more rapidly and it will be copied. Other areas will realize: Oh it's worth investing in this,&quot; Tim Berners-Lee WWW inventor
  8. 8. The problem: How to do data integration in Bioinformatics ? Carole Goble (ISWC 2005)
  9. 9. http://www.biopax.org/Docs/2004-10-28_SWLS-SessionVII.pdf
  10. 10. Tokyo subway map
  11. 11. Montreal subway map
  12. 12. http://informationarchitects.jp/ia-trendmap-2007v2/ Web Trend Map 2007
  13. 13. The proposed solution Bio2RDF solve the problem of data integration in bioinformatics by applying the Semantic Web approach based on RDF, OWL and SPARQL technologies.
  14. 14. Web of data subway map from W3C http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/#(1)
  15. 15. Bio2RDF inspiration in 2005
  16. 16. What is RDF ?
  17. 17. &quot;Wouldn't it be great if you were able to organize all this information based on your own terms, instead of based on the application you use to access the information ?” Ramanathan V. Guha RDF initiator http://cgi.netscape.com/columns/techvision/innovators_rg.html
  18. 18. R esource D escription F ramework
  19. 19. It is triples... < subject > < predicate > < object_uri > . OR < subject > < predicate > &quot; object_literal &quot; .
  20. 20. A triple
  21. 21. The same in RDF/XML <?xml version=&quot;1.0&quot;?> <rdf:RDF xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot; xmlns:exterms=&quot;http://www.example.org/terms/&quot; > <rdf:Description rdf:about=&quot; http://www.example.org/index.html &quot;> < exterms:creation-date > August 16, 1999 </ exterms:creation-date > </rdf:Description> </rdf:RDF>
  22. 22. The same in NTRIPLES < http://www.example.org/index.html > < http://www.example.org/terms/creation-date > “ August 16, 1999 ” .
  23. 23. It is a technology stack http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/
  24. 24. It is a distributed architecture http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/
  25. 25. Goal #1 Convert many public bioinformatic databases to RDF.
  26. 26. Bio2RDF rdfised public databases
  27. 27. Bio2RDF first map in 2007
  28. 28. Bio2RDF Mouse and Human Atlas map in 2008 65 millions triples
  29. 29. Linked Data cloud evolution http://linkeddata.org/ http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics Linked data cloud in March 2009 Linked data cloud in May 2007
  30. 30. http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
  31. 31. LODD wins the 2009 Triplify challenge http://triplify.org/files/challenge_2009/LODD.pdf
  32. 32. Bio2RDF cloud map of namespaces from 2,3 billions triples
  33. 33. How we did it ?
  34. 34. http://www.w3.org/DesignIssues/LinkedData
  35. 35. http://bio2rdf.wiki.sourceforge.net/Banff%20Manifesto
  36. 36. Bio2RDF realtime rdfiser in 2007
  37. 37. Actual Architecture 2010 <ul><li>Offline rdfising process
  38. 38. Virtuoso SPARQL endpoints network
  39. 39. Namespace resolution through DNS subdomain </li></ul>
  40. 40. Bio2RDF has 3 mirror sites http://cu.bio2rdf.org/ http://qut.bio2rdf.org/ http://quebec.bio2rdf.org/
  41. 41. Main REST services <ul><li>Describe a ressource by a dereferencable URI </li><ul><li>http://bio2rdf.org/ ns : id </li></ul><li>Global services over federated endpoints </li><ul><li>http://bio2rdf.org/links/ ns : id
  42. 42. http://bio2rdf.org/search/ searchedTerm </li></ul><li>Targeted services to a specific endpoint </li><ul><li>http://bio2rdf.org/linksns/ ns2 / ns : id
  43. 43. http://bio2rdf.org/searchns/ ns / searchedTerm </li></ul></ul>
  44. 44. Goal #2 Ask a useful question to the network of SPARQL endpoints.
  45. 45. What is known about hexokinase ?
  46. 46. Existing integrated search services NCBI/Entrez EBI/EB-eye KEGG/DBGET Riken/OmicScan
  47. 47. Ask http://atlas.bio2rdf.org/fct
  48. 48. Submit a SPARQL query http://atlas.bio2rdf.org/sparql
  49. 49. Ask it to each SPARQL endpoint http://NAMESPACE.bio2rdf.org/fct
  50. 50. Ask Bio2RDF REST federated search http://bio2rdf.org/search/hexokinase
  51. 51. Or use the Cognoscope...
  52. 52. The mashup principle To answer a complex question we first need to build a specific database, a mashup, to which we submit the appropriate query.
  53. 54. Cognoscope new definition <ul>A Cognoscope is an instrument to explore and collect topics from the Linked Data cloud of SPARQL endpoints. It permits the querying over a distributed network of knowledge resource. </ul>
  54. 55. Cognoscope definition <ul><li>The magnifying effect depends of the density of links between resource (entity links), which is a by-product of the human intellectual activity in the social network.
  55. 56. The filtering effect is based on the inherent semantic of RDF graph described using types and predicates.
  56. 57. Facet browsing is used to zoom in and out in the observed graph.
  57. 58. Full text search is used to discover concept. </li></ul>
  58. 59. Cognoscope function <ul><li>How can we submit a complex query over the network of SPARQL endpoints ? </li><ul><li>By using a workflow fetching individual SPARQL endpoints. </li></ul><li>We use a workflow to build the mashup. </li></ul>
  59. 60. Bio2RDF Cognoscope architecture Linked Data cloud of SPARQL endpoints Triplestore Virtuoso 6 Workflow engine Taverna 2.1
  60. 61. By building a mashup with Taverna <ul><li>Write your complex SPARQL query as if a global graph would be available
  61. 62. Identify the needed namespaces and split the query to fetch each data source separetly
  62. 63. Build a mashup using a Taverna workflow that instanciate a local triplestore
  63. 64. Execute your complex query locally on the mashup </li></ul>
  64. 65. The SPARQL query needed (dont try this home, do it on the web !)
  65. 66. Bio2RDF Cognoscope using Taverna 2.1
  66. 67. Cognoscope query for What is known about hexokinase ?
  67. 68. Et voilà !
  68. 69. Where to get Bio2RDF Cognoscope http://www.myexperiment.org/search?query=cognoscope
  69. 70. Bio2RDF SPARQL endpoints http://delicious.com/tag/bio2rdf:sparql
  70. 71. Thanks <ul><li>The Bio2RDF community </li></ul><ul><ul><li>Centre de recherche du CHUL
  71. 72. Dumontier Lab
  72. 73. QUT eResearch Center </li></ul><li>The software provider </li><ul><li>Openlink Virtuoso
  73. 74. Taverna community </li></ul><li>My colleagues </li><ul><li>Marc-Alexandre Nolin
  74. 75. Michel Dumontier
  75. 76. Peter Ansell </li></ul></ul>
  76. 77. Can you help ?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×