Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Including Co-Referent URIs in a SPARQL Query

1,171 views

Published on

Linked data relies on instance level links between potentially differing representations of concepts in multiple datasets. However, in large complex domains, such as pharmacology, the inter-relationship of data instances needs to consider the context (e.g. task, role) of the user and the assumptions they want to apply to the data. Such context is not taken into account in most linked data integration procedures. In this paper we argue that dataset links should be stored in a stand-off fashion, thus enabling different assumptions to be applied to the data links during query execution. We present the infrastructure developed for the Open PHACTS Discovery Platform to enable this and show through evaluation that the incurred performance cost is below the threshold of user perception.

http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf

Published in: Education, Technology
  • Be the first to comment

Including Co-Referent URIs in a SPARQL Query

  1. 1. Including Co-referent URIs in a SPARQL Query Christian Y A Brenninkmeijer, Carole Goble, Alasdair J G Gray, Paul Groth, Antonis Loizou, and Steve Pettifer www.openphacts.org @open_phacts A.J.G.Gray@hw.ac.uk @gray_alasdair
  2. 2. Multiple Identities Andy Law's Third Law “The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study” http://bioinformatics.roslin.ac.uk/lawslaws.html GB:29384 P12047 Are these the same thing? X31045 22/10/2013 COLD 2013 1
  3. 3. Gleevec® = Imatinib Mesylate Imatinib Imatinib Mesylate Mesylate YLMAHDNUQAMNNX-UHFFFAOYSA-N ChemSpider 22/10/2013 Drugbank COLD 2013 PubChem 2
  4. 4. 22/10/2013 COLD 2013 3
  5. 5. 22/10/2013 COLD 2013 4
  6. 6. Multiple Links: Different Reasons Link: skos:closeMatch Reason: non-salt form 22/10/2013 Link: skos:exactMatch Reason: drug name COLD 2013 6
  7. 7. Dynamic Equality Strict Relaxed Analysing Browsing skos:exactMatch (InChI) 22/10/2013 COLD 2013 7
  8. 8. Dynamic Equality Strict Relaxed Analysing Browsing skos:closeMatch (Drug Name) skos:exactMatch (InChI) skos:closeMatch (Drug Name) 22/10/2013 COLD 2013 8
  9. 9. Open PHACTS Discovery Platform Apps Interactive responses Method Calls Domain API Drug Discovery Platform Production quality integration platform 22/10/2013 COLD 2013 9
  10. 10. Integration Approach • • • • Data kept in original model Data cached in central triple store API call translated to SPARQL query Query expressed in terms of original data 22/10/2013 COLD 2013 10
  11. 11. OPS Discovery Platform Core Platform Apps Identity Resolution Service Identifier Management Service “Adenosine receptor 2a” Linked Data API (RDF/XML, TTL, JSON) P12374 EC2.43.4 CS4532 Domain Specific Services Semantic Workflow Engine Chemistry Registration Normalisatio n & Q/C Data Cache (Virtuoso Triple Store) Indexing VoID VoID VoID Nanopub Public Ontologies Db Db 22/10/2013 VoID Nanopub Db Nanopub Db COLD 2013 Public Content VoID Commercial User Annotations 11
  12. 12. Platform Interaction 1. Resolve user input: – User enters search text – Resolve to a URI for concept 2. Request data for URI – Expand URI to equivalent for each dataset – Run resulting SPARQL query 22/10/2013 COLD 2013 12
  13. 13. Query Expansion GRAPH <http://rdf.chemspider.com> { cw:979b545d-f9a9 cheminf:logd ?logd . ?iri cheminf:logd ?logd . FILTER (?iri = cw:979b545d-f9a9 || ?iri = cs:2157 || cw:979b545d-f9a9, L ?iri = chembl:1280 || [cw:979b545d-f9a9, 1 cs:2157, ?iri = db:db00945 ) } Q, L1 Q’ Query Expander Service chembl:1280, db:db00945] Identity Mapping Service (BridgeDB) Can also be achieved through UNION Mappings Profiles 22/10/2013 COLD 2013 13
  14. 14. Experiment Is it feasible to use a stand-off mapping service? • Base lines (no external call): – “Perfect” URIs – Linked data querying • Expansion approaches (external service call): – FILTER by Graph – UNION by Graph 22/10/2013 COLD 2013 14
  15. 15. “Perfect” URI Baseline WHERE { GRAPH <chemspider> { cs:2157 cheminf:logp ?logp . } GRAPH <chembl> { chembl_mol:m1280 cheminf:mw ?mw . } } 22/10/2013 COLD 2013 15
  16. 16. Linked Data Baseline WHERE { GRAPH <chemspider> { cs:2157 cheminf:logp ?logp . } GRAPH <chembl> { ?chemblid cheminf:mw ?mw . } cs:2157 skos:exactMatch ?chemblid . } 22/10/2013 COLD 2013 16
  17. 17. Queries Drawn from Open PHACTS API: 1. Simple compound information (1) 2. Compound information (1) 3. Compound pharmacology (M) 4. Simple target information (1) 5. Target information (1) 6. Target pharmacology (M) 22/10/2013 COLD 2013 17
  18. 18. Queries Drawn from Open PHACTS API: 1. Simple compound information (1) 2. Compound information (1) 3. Compound pharmacology (M) 4. Simple target information (1) 5. Target information (1) 6. Target pharmacology (M) 22/10/2013 COLD 2013 18
  19. 19. Datasets and Links Data: 167,783,592 triples 22/10/2013 Mappings: 2,114,584 triples COLD 2013 Lenses: 1 19
  20. 20. Average execution times 22/10/2013 COLD 2013 20
  21. 21. 0.018 Average execution times 22/10/2013 COLD 2013 21
  22. 22. 22/10/2013 COLD 2013 28
  23. 23. Conclusions • Query expansion slower in general – Due to separate service call – Difference below human perception – UNION faster than FILTER on Virtuoso • Stand-off mappings feasible • Infrastructure can support lenses Strict Relaxed Analysing Browsing 22/10/2013 COLD 2013 29
  24. 24. Questions A.J.G.Gray@hw.ac.uk www.macs.hw.ac.uk/~ajg33 @gray_alasdair Open PHACTS Project pmu@openphacts.org www.openphacts.org @open_phacts

×