Bio2RDF cloud of
Virtuoso SPARQL endpoints


 Life Science
Raw Data Now


François Belleau, Marc-Alexandre Nolin,
    Pete...
Agenda

    Why we did Bio2RDF ?
●



    How we did it ?
●



    What is know about hexokinase ?
●



    Where we are g...
The problem

According to NAR 2009 Database
collection 1170 public databases
exists.

How can they be integrated to behave...
Public map of 1744 namespaces according to
  BioMoby, NAR, SRS, GO, NCBI, UniProt
Bio2RDF vision in 2007



 Johanne Luciano vision for
knowledge integration in 2005




 W3C vision of semantic web
      ...
Bio2RDF Mouse and Human Atlas map
      in 2008 65 millions triples
Bio2RDF actual contribution
                  to the Linked Data cloud




    Linked data cloud
         in 2007




    ...
Bio2RDF cloud map of
2,3 billions triples in 2009
Why do it ?
Not to replace HTML or XML by an other new
format, RDF and OWL, but to answer science
question by submiting SP...
Solution


Bio2RDF approach to the data integration
problem in bioinformatics :
Apply the semantic web approach based
on R...
How we did it ?
Bio2RDF architecture
Our design principles



http://www.w3.org/DesignIssues/LinkedData




                    http://bio2rdf.wiki.sourceforge...
YeastHub design in 2005

       Conversion of Dataset to RDF
   ●


       Use of Sesame Triplestore
   ●


       SeRQL q...
Bio2RDF at ISMB 2005
                 the begining



Thanks to Kei Cheung,
Johanne Luciano, Eric
Neumann and
Christopher ...
Bio2RDF realtime rdfiser in 2007
Actual Architecture




              Offline rdfising process
            ●

            ● Virtuoso SPARQL endpoints

   ...
Main REST services
    Describe a ressource by a dereferencable URI
●


      http://bio2rdf.org/ns:id
    ●


    Global ...
Describe service implementation
    http://bio2rdf.org/ns:id
●



    Corresponding SPARQL query :
●


        CONSTRUCT {...
Bio2RDF JSP server software
http://sourceforge.net/projects/bio2rdf/
Peter Ansell is writing the Bio2RDF
            JSP server
    The software transform Bio2RDF URIs to SPARQL
●

    querie...
Bio2RDF.war package future
    Provide more pipes to perform integrated actions without
●

    having to put HTTP SPARQL r...
Bio2RDF.owl




http://quebec.bio2rdf.org/download/bio2rdf-2008.owl
Michel Dumontier will design
Bio2RDF.owl ontology next version
What is known about hexokinase ?
Submit your query...
    To the web search engine
●


    To existing public web site offering data
●

    integration ser...
The usual unsemantic way
Existing integrated search services


                     EBI/EB-eye
 NCBI/Entrez




KEGG/DBGET           GoPubmed
By submitting a SPARQL query
   http://atlas.bio2rdf.org/sparql
What is know about « hexokinase »
                with semantic ?
select ?t1 ?p2 count(*)
where {
    ?s1 ?p1 ?o1 .
    FI...
Use Virtuoso 6.0 facet browser
    http://lod.openlinksw.com/
Dereferencing search URL
http://bio2rdf.org/search/hexokinase
How can we submit a complex
query over the network of SPARQL
            endpoints ?
By building a mashup with Taverna
1) Write your complex SPARQL query as if a
  global graph would be available
2) Identify...
The SPARQL query needed
 (dont try this home, do it on the web !)
Get the list of genes
    from KEGG pathways of a specified taxon
    Clear graph
●



    Get KEGG pathways list for a
●
...
Insert into local triplestore
       GeneID genes and KEGG pathways
    Get the list of genes
●



    Get the list of pat...
Insert into local triplestore
             the needed GO annotations
    Get the GO annotations for
●

    each gene
Finally, the neeeded query merging
KEGG, Entrez Gene and GO together
Bio2RDF resources
Bio2RDF's mirrors
http://quebec.bio2rdf.org/
  http://qut.bio2rdf.org/
Bio2RDF SPARQL endpoints
http://www.freebase.com/view/user/bio2rdf/public/sparql
Life Science Raw Data Now
http://quebec.bio2rdf.org/download
Visit our Wiki rdfiser cookbook
http://bio2rdf.wiki.sourceforge.net/
Bio2RDF news




 http://bio2rdf.blogspot.com/
                                http://www.slideshare.net/search/slideshow?...
Our 2009 objectives
    Get approval from data provider to distribute
●

    RDF dump and publish SPARQL endpoints
    (Un...
Thanks
Jean Morissette, Nicole Tourigny

    The Bio2RDF community
●


    Centre de recherche du CHUL
●


    Université ...
Upcoming SlideShare
Loading in...5
×

Bio2RDF @ W3C HCLS2009

1,812

Published on

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,812
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
62
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Bio2RDF @ W3C HCLS2009

  1. 1. Bio2RDF cloud of Virtuoso SPARQL endpoints Life Science Raw Data Now François Belleau, Marc-Alexandre Nolin, Peter Ansell, Michel Dumontier 30th April 2009 W3C-HCLS F2F Meeting, Cambridge, MA
  2. 2. Agenda Why we did Bio2RDF ? ● How we did it ? ● What is know about hexokinase ? ● Where we are going ? ●
  3. 3. The problem According to NAR 2009 Database collection 1170 public databases exists. How can they be integrated to behave like a global coherent resource ?
  4. 4. Public map of 1744 namespaces according to BioMoby, NAR, SRS, GO, NCBI, UniProt
  5. 5. Bio2RDF vision in 2007 Johanne Luciano vision for knowledge integration in 2005 W3C vision of semantic web in 2006
  6. 6. Bio2RDF Mouse and Human Atlas map in 2008 65 millions triples
  7. 7. Bio2RDF actual contribution to the Linked Data cloud Linked data cloud in 2007 Linked data cloud in March 2009 http://linkeddata.org/ http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics
  8. 8. Bio2RDF cloud map of 2,3 billions triples in 2009
  9. 9. Why do it ? Not to replace HTML or XML by an other new format, RDF and OWL, but to answer science question by submiting SPARQL query over the global knowledge base accessible through the Internet to the Life Science SPARQL endpoints cloud.
  10. 10. Solution Bio2RDF approach to the data integration problem in bioinformatics : Apply the semantic web approach based on RDF, OWL and SPARQL technologies.
  11. 11. How we did it ? Bio2RDF architecture
  12. 12. Our design principles http://www.w3.org/DesignIssues/LinkedData http://bio2rdf.wiki.sourceforge.net/Banff%20Manifesto
  13. 13. YeastHub design in 2005 Conversion of Dataset to RDF ● Use of Sesame Triplestore ● SeRQL query interface ● http://www.ncbi.nlm.nih.gov/pubmed/15961502
  14. 14. Bio2RDF at ISMB 2005 the begining Thanks to Kei Cheung, Johanne Luciano, Eric Neumann and Christopher Baker they draw the lines.
  15. 15. Bio2RDF realtime rdfiser in 2007
  16. 16. Actual Architecture Offline rdfising process ● ● Virtuoso SPARQL endpoints network ● Namespace resolution through DNS subdomain
  17. 17. Main REST services Describe a ressource by a dereferencable URI ● http://bio2rdf.org/ns:id ● Global services over federated endpoints ● http://bio2rdf.org/links/ns:id ● http://bio2rdf.org/search/searchedTerm ● Targeted services to a specific endpoint ● http://bio2rdf.org/linksns/ns2/ns1:id ● http://bio2rdf.org/searchns/ns/searchedTerm ● other services are available. ●
  18. 18. Describe service implementation http://bio2rdf.org/ns:id ● Corresponding SPARQL query : ● CONSTRUCT { ● ?s ?p ?o . } WHERE { ?s ?p ?o . FILTER(?s = <http://bio2rdf.org/ns:id>). } Submited at this URL ● http://ns.bio2rdf.org/sparql?query=... ● Based of DNS subdomain resolution service –
  19. 19. Bio2RDF JSP server software http://sourceforge.net/projects/bio2rdf/
  20. 20. Peter Ansell is writing the Bio2RDF JSP server The software transform Bio2RDF URIs to SPARQL ● queries in real time. Its aim is to access normalised RDF information ● located in multiple endpoints using the concept of Public Namespaces and Private Record Identifiers and distributed SPARQL queries which are matched to the content in each endpoint. Each of the following databases have normalisation ● rules which normalise them back to bio2rdf.org URI's :Dbpedia, Drugbank, LinkedCT, HCLS KB/Neurocommons, Diseasome, Dailymed, Bioguid DOI
  21. 21. Bio2RDF.war package future Provide more pipes to perform integrated actions without ● having to put HTTP SPARQL requests into a workflow system when a URI resolution can perform the query in a distributed and normalised manner more efficiently Bring together the current distributed efforts to provide a ● complete HTML redirection registry so that a large percentage of Bio2RDF namespaces can be redirected with http://bio2rdf.org/html/namespace:identifier Form ontologies describing the query type, provider, rdf ● normalisation rule, namespace paradigm Integrate http://rdf.myexperiment.org/sparql and similar ● workflow RDF endpoints so that scientific workflows can be linked to their data cleanly
  22. 22. Bio2RDF.owl http://quebec.bio2rdf.org/download/bio2rdf-2008.owl
  23. 23. Michel Dumontier will design Bio2RDF.owl ontology next version
  24. 24. What is known about hexokinase ?
  25. 25. Submit your query... To the web search engine ● To existing public web site offering data ● integration services; Using Bio2RDF SPARQL endpoints ● Submitting a SPARQL query; ● Using facet browser interface from Virtuoso 6.0 ● server; Dereferencing Bio2RDF search URI; ● Using a Taverna workflow composed of SPARQL ● queries to obtain federated results from KEGG, Entrez Gene and GO;
  26. 26. The usual unsemantic way
  27. 27. Existing integrated search services EBI/EB-eye NCBI/Entrez KEGG/DBGET GoPubmed
  28. 28. By submitting a SPARQL query http://atlas.bio2rdf.org/sparql
  29. 29. What is know about « hexokinase » with semantic ? select ?t1 ?p2 count(*) where { ?s1 ?p1 ?o1 . FILTER( bif:contains(?o1, quot;hexokinasequot;)) . ?s1 a ?t1 . ?s1 ?p2 ?o2 . } ORDER BY ?t1 ?p2
  30. 30. Use Virtuoso 6.0 facet browser http://lod.openlinksw.com/
  31. 31. Dereferencing search URL http://bio2rdf.org/search/hexokinase
  32. 32. How can we submit a complex query over the network of SPARQL endpoints ?
  33. 33. By building a mashup with Taverna 1) Write your complex SPARQL query as if a global graph would be available 2) Identify the needed namespaces and split the query to fetch each data source separetly 3) Build a mashup using a Taverna workflow that instanciate a local triplestore 4) Execute your complex query locally on the mashup
  34. 34. The SPARQL query needed (dont try this home, do it on the web !)
  35. 35. Get the list of genes from KEGG pathways of a specified taxon Clear graph ● Get KEGG pathways list for a ● specific taxon For each pathway get genes ● list and import instances Count the number of genes ● found http://www.myexperiment.org/workflows/747
  36. 36. Insert into local triplestore GeneID genes and KEGG pathways Get the list of genes ● Get the list of pathways ● Insert into local triplestore ● each corresponding graph http://www.myexperiment.org/workflows/748
  37. 37. Insert into local triplestore the needed GO annotations Get the GO annotations for ● each gene
  38. 38. Finally, the neeeded query merging KEGG, Entrez Gene and GO together
  39. 39. Bio2RDF resources
  40. 40. Bio2RDF's mirrors http://quebec.bio2rdf.org/ http://qut.bio2rdf.org/
  41. 41. Bio2RDF SPARQL endpoints http://www.freebase.com/view/user/bio2rdf/public/sparql
  42. 42. Life Science Raw Data Now http://quebec.bio2rdf.org/download
  43. 43. Visit our Wiki rdfiser cookbook http://bio2rdf.wiki.sourceforge.net/
  44. 44. Bio2RDF news http://bio2rdf.blogspot.com/ http://www.slideshare.net/search/slideshow?q=bio2rdf http://scholar.google.com/scholar?q=bio2rdf http://groups.google.ca/group/bio2rdf
  45. 45. Our 2009 objectives Get approval from data provider to distribute ● RDF dump and publish SPARQL endpoints (UniProt, BioCyc, Pathway Commons, Bind are in); Start using Virtuoso 6 cluster; ● Design more services accessible with REST ● protocol via our JSP package; Recruit mirror server; ● Develop new rdfiser program in a community ● effort;
  46. 46. Thanks Jean Morissette, Nicole Tourigny The Bio2RDF community ● Centre de recherche du CHUL ● Université Laval ● Dumontier Lab ● QUT eResearch Center ● Openlink Virtuoso ●
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×