Collaborative development of cross-
    database Bio2RDF queries


                Peter Ansell
Microsoft Queensland Unive...
Introduction
●   Large number of cross-disciplinary datasources in
    different locations
●   Scientists require simplifi...
Sydney, Australia   3rd eResearch Australasia Conference   9-13 Nov 2009

                                                ...
Linked Data
1) Use URIs as names for things
2) Use HTTP URIs so that people can look up those
names.
3) When someone looks...
Linked Data querying
●   Strategy 1 (Naive) :
           –   Retrieve resources
           –   Retrieve linked resources
 ...
Linked Data querying
●   Strategy 3 (Distributed query) :
           –   Mix SPARQL endpoint queries with URI based
      ...
Bio2RDF distributed queries
●   Assign Namespaces to providers
●   Query across relevant providers given a users query
●  ...
Workflow
                    Resolved URI: http://bio2rdf.org/label/go:0000345



        Host name: http://bio2rdf.org/  ...
Collaboration
●   Query and provider definitions have an RDF
    representation
●   Any other person is able to take a def...
Provenance for queries and data
●   Provenance can be attached to each item, including
    details such as OpenID URI's an...
Conclusion
●   Many large distributed datasources
●   Single interface, RDF
●   Distribute queries efficiently across the ...
Upcoming SlideShare
Loading in …5
×

Customisable cross-database Bio2RDF queries

600
-1

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
600
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Customisable cross-database Bio2RDF queries

  1. 1. Collaborative development of cross- database Bio2RDF queries Peter Ansell Microsoft Queensland University of Technology eResearch Centre p.ansell@qut.edu.au
  2. 2. Introduction ● Large number of cross-disciplinary datasources in different locations ● Scientists require simplified access to many of the datasources for complex research ● Recently a large number of datasources have been published using the RDF syntax ● Now, we need to learn how to query across them and be able to share that knowledge with others Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 2
  3. 3. Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 3
  4. 4. Linked Data 1) Use URIs as names for things 2) Use HTTP URIs so that people can look up those names. 3) When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4) Include links to other URIs. so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 4
  5. 5. Linked Data querying ● Strategy 1 (Naive) : – Retrieve resources – Retrieve linked resources – Cache the resources and perform queries locally ● Strategy 2 (Search engine): – Retrieve resources – Retrieve directly linked resources – Query a semantic search engine for related resources – Cache the resources and perform queries locally Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 5
  6. 6. Linked Data querying ● Strategy 3 (Distributed query) : – Mix SPARQL endpoint queries with URI based resolution to avoid having a large local cache – Normalise results from each site to form final query result – Users can process the results, and perform one or more queries based on their interpretation of the results Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 6
  7. 7. Bio2RDF distributed queries ● Assign Namespaces to providers ● Query across relevant providers given a users query ● Aggregate all results into a single RDF document and return to the user ● It works: 700000 queries during the last month ● Largest dataset has 10 billion triples, the Protein Databank, with others making up about 5 billion triples Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 7
  8. 8. Workflow Resolved URI: http://bio2rdf.org/label/go:0000345 Host name: http://bio2rdf.org/ Query: label/go:0000345 Regular expression: label/([w-]+):(.+) http://bio2rdf.org/query:labelsearch http://bio2rdf.org/query:labelsearchforgo Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 8
  9. 9. Collaboration ● Query and provider definitions have an RDF representation ● Any other person is able to take a definition and change it to suit their needs and redistribute their definition ● If definitions are Linked Data themselves, as the Bio2RDF configuration item are, the HTTP URI can be used by others to pull the definition into their own software Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 9
  10. 10. Provenance for queries and data ● Provenance can be attached to each item, including details such as OpenID URI's and dates ● The sources and queries that were used for a particular URI can be found by utilising the query plan option – http://bio2rdf.org/queryplan/label/go:0000345 – Enables automatic query collaboration, as you could take this query plan and add or modify queries inside the query plan Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 10
  11. 11. Conclusion ● Many large distributed datasources ● Single interface, RDF ● Distribute queries efficiently across the endpoints ● Enabling people to create the definitions of what they did so other people can collaborate, via a single server or copy and paste Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 11
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×