Customisable cross-database Bio2RDF queries
Upcoming SlideShare
Loading in...5
×
 

Customisable cross-database Bio2RDF queries

on

  • 919 views

 

Statistics

Views

Total Views
919
Views on SlideShare
919
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Customisable cross-database Bio2RDF queries Customisable cross-database Bio2RDF queries Presentation Transcript

  • Collaborative development of cross- database Bio2RDF queries Peter Ansell Microsoft Queensland University of Technology eResearch Centre p.ansell@qut.edu.au
  • Introduction ● Large number of cross-disciplinary datasources in different locations ● Scientists require simplified access to many of the datasources for complex research ● Recently a large number of datasources have been published using the RDF syntax ● Now, we need to learn how to query across them and be able to share that knowledge with others Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 2
  • Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 3
  • Linked Data 1) Use URIs as names for things 2) Use HTTP URIs so that people can look up those names. 3) When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4) Include links to other URIs. so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 4
  • Linked Data querying ● Strategy 1 (Naive) : – Retrieve resources – Retrieve linked resources – Cache the resources and perform queries locally ● Strategy 2 (Search engine): – Retrieve resources – Retrieve directly linked resources – Query a semantic search engine for related resources – Cache the resources and perform queries locally Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 5
  • Linked Data querying ● Strategy 3 (Distributed query) : – Mix SPARQL endpoint queries with URI based resolution to avoid having a large local cache – Normalise results from each site to form final query result – Users can process the results, and perform one or more queries based on their interpretation of the results Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 6
  • Bio2RDF distributed queries ● Assign Namespaces to providers ● Query across relevant providers given a users query ● Aggregate all results into a single RDF document and return to the user ● It works: 700000 queries during the last month ● Largest dataset has 10 billion triples, the Protein Databank, with others making up about 5 billion triples Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 7
  • Workflow Resolved URI: http://bio2rdf.org/label/go:0000345 Host name: http://bio2rdf.org/ Query: label/go:0000345 Regular expression: label/([w-]+):(.+) http://bio2rdf.org/query:labelsearch http://bio2rdf.org/query:labelsearchforgo Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 8
  • Collaboration ● Query and provider definitions have an RDF representation ● Any other person is able to take a definition and change it to suit their needs and redistribute their definition ● If definitions are Linked Data themselves, as the Bio2RDF configuration item are, the HTTP URI can be used by others to pull the definition into their own software Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 9
  • Provenance for queries and data ● Provenance can be attached to each item, including details such as OpenID URI's and dates ● The sources and queries that were used for a particular URI can be found by utilising the query plan option – http://bio2rdf.org/queryplan/label/go:0000345 – Enables automatic query collaboration, as you could take this query plan and add or modify queries inside the query plan Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 10
  • Conclusion ● Many large distributed datasources ● Single interface, RDF ● Distribute queries efficiently across the endpoints ● Enabling people to create the definitions of what they did so other people can collaborate, via a single server or copy and paste Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 11