XSPARQL CrEDIBLE workshop

732 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
732
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

XSPARQL CrEDIBLE workshop

  1. 1. www.insight)centre.org. XML Data Mediation using XSPARQL Nuno Lopes, Laleh Kazemzadeh October, 2013
  2. 2. www.insight)centre.org. Why use RDF for data integration (I) 1 / 21
  3. 3. www.insight)centre.org. Why use RDF for data integration (I) Flexibility 1 / 21
  4. 4. www.insight)centre.org. Why use RDF for data integration (I) Flexibility 1 Representation Reuse any vocabularies No schema required 1 / 21
  5. 5. www.insight)centre.org. Why use RDF for data integration (I) Flexibility 1 Representation Reuse any vocabularies No schema required 2 Combining Easily combine different datasets RDF merge is simple 1 / 21
  6. 6. www.insight)centre.org. Why use RDF for data integration (I) Flexibility 1 Representation Reuse any vocabularies No schema required 2 Combining Easily combine different datasets RDF merge is simple 3 Sharing Linked Data Built on web technologies (HTTP, URIs) 1 / 21
  7. 7. www.insight)centre.org. Why use RDF for data integration (II) Global Identifiers Schema-less Self-Describing Graph-Based 2 / 21
  8. 8. www.insight)centre.org. Why use RDF for data integration (II) Global Identifiers × Schema-less × Self-Describing × Graph-Based × 2 / 21
  9. 9. www.insight)centre.org. Why use RDF for data integration (II) Global Identifiers × × Schema-less × √ Self-Describing × × / √ Graph-Based × × / √ 2 / 21
  10. 10. www.insight)centre.org. Why use RDF for data integration (II) Global Identifiers × × √ Schema-less × √ √ Self-Describing × × / √ √ Graph-Based × × / √ √ 2 / 21
  11. 11. www.insight)centre.org. RDF-based Data Integration 3 / 21
  12. 12. www.insight)centre.org. RDF-based Data Integration 3 / 21
  13. 13. www.insight)centre.org. RDF-based Data Integration Data Transformation 3 / 21
  14. 14. www.insight)centre.org. RDF-based Data Integration Data Transformation Data Representation 3 / 21
  15. 15. www.insight)centre.org. RDF-based Data Integration Data Transformation Data Representation 3 / 21
  16. 16. www.insight)centre.org. Data Transformation: Query Languages SQL select address from person where name = "Nuno" XQuery for $person in doc("people.xml") return $person//address SPARQL select $address from <nunolopes.org/..> where { :nuno foaf:address $address } relation solution sequence sequence of items Mediator/DataWarehouse SPA QL M L X D F R 4 / 21
  17. 17. www.insight)centre.org. Data Transformation: Query Languages SQL select address from person where name = "Nuno" XQuery for $person in doc("people.xml") return $person//address SPARQL select $address from <nunolopes.org/..> where { :nuno foaf:address $address } relation solution sequence sequence of items Mediator/DataWarehouse SPA QL M L X D F R 4 / 21
  18. 18. www.insight)centre.org. Data Transformation: Query Languages SQL select address from person where name = "Nuno" XQuery for $person in doc("people.xml") return $person//address SPARQL select $address from <nunolopes.org/..> where { :nuno foaf:address $address } relation solution sequence sequence of items Mediator/DataWarehouse SPA QL M L X D F R 4 / 21
  19. 19. www.insight)centre.org. Data Transformation: Query Languages SQL select address from person where name = "Nuno" XQuery for $person in doc("people.xml") return $person//address SPARQL select $address from <nunolopes.org/..> where { :nuno foaf:address $address } relation solution sequence sequence of items Mediator/DataWarehouse SPA QL M L X D F R 4 / 21
  20. 20. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF 5 / 21
  21. 21. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF Syntactic extension of XQuery 5 / 21
  22. 22. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF Syntactic extension of XQuery Semantics based on XQuery’s semantics 5 / 21
  23. 23. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF Syntactic extension of XQuery Semantics based on XQuery’s semantics Why based on XQuery? Expressive language Use as scripting language 5 / 21
  24. 24. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF Syntactic extension of XQuery Semantics based on XQuery’s semantics Why based on XQuery? Expressive language Use as scripting language Arbitrary Nesting of expressions 5 / 21
  25. 25. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  26. 26. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address XQuery for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  27. 27. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address XSPARQL for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  28. 28. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  29. 29. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  30. 30. www.insight)centre.org. Usecase: Inparanoid 7 / 21
  31. 31. www.insight)centre.org. Usecase: Inparanoid 7 / 21
  32. 32. www.insight)centre.org. Usecase: Inparanoid 7 / 21
  33. 33. www.insight)centre.org. Usecase: Predicting Protein-Protein Interactions 8 / 21
  34. 34. www.insight)centre.org. Creating RDF with XSPARQL for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId} . } construct clause generates RDF Arbitrary XSPARQL expressions in subject, predicate, and object 9 / 21
  35. 35. www.insight)centre.org. Creating RDF with XSPARQL for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId} . } construct clause generates RDF Arbitrary XSPARQL expressions in subject, predicate, and object 9 / 21
  36. 36. www.insight)centre.org. Creating RDF with XSPARQL for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId} . } construct clause generates RDF Arbitrary XSPARQL expressions in subject, predicate, and object 9 / 21
  37. 37. www.insight)centre.org. Creating RDF with XSPARQL for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId} . } construct clause generates RDF Arbitrary XSPARQL expressions in subject, predicate, and object Query Result :1 dct:identifier "1" . :1 bioinfo:geneID "ENSG00000155657" . gene:ENSG00000155657 rdf:type bioinfo:Gene . :1 bioinfo:protID "ENSP00000364178" . protein:ENSP00000364178 rdf:type bioinfo:protein . gene:ENSG00000155657 bioinfo:source_database "Ensembl" . protein:ENSG00000155657 bioinfo:organism <http://purl.uniprot.org/taxonomy/9606> 9 / 21
  38. 38. www.insight)centre.org. Usecase: Combining Inparanoid with BridgeDB 10 / 21
  39. 39. www.insight)centre.org. Usecase: Combining Inparanoid with BridgeDB 10 / 21
  40. 40. www.insight)centre.org. Usecase: Combining Inparanoid with BridgeDB 10 / 21
  41. 41. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } 11 / 21
  42. 42. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } 11 / 21
  43. 43. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } 11 / 21
  44. 44. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } 11 / 21
  45. 45. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } More involved XSPARQL queries: RDB2RDF Direct Mapping: ∼130 LoC R2RML: ∼290 LoC 11 / 21
  46. 46. www.insight)centre.org. Cloudspaces CloudSpace Platform offers Infrastructure for the intersection of: Linked Data Big Data Cloud Computing 12 / 21
  47. 47. www.insight)centre.org. Cloudspaces & Sindice Cloudspaces A collection of tool to help users Extract, Transform & Load big data sets. Cloudspaces is based on Sindice, allowing user-defined Linked Data pipelines to be used in the Cloud 13 / 21
  48. 48. www.insight)centre.org. Cloudspaces & Sindice Cloudspaces A collection of tool to help users Extract, Transform & Load big data sets. Cloudspaces is based on Sindice, allowing user-defined Linked Data pipelines to be used in the Cloud Sindice Linked Data "Search Engine" High availability Sparql Endpoint: 12 Billion Triples Can load 100 million triples a day (updated daily) 13 / 21
  49. 49. www.insight)centre.org. Overview XSPARQL Usecases & Other features Beyond XSPARQL Conclusions 14 / 21
  50. 50. www.insight)centre.org. SPARQL 1.1 Data integration features in SPARQL 1.1 Aggregates Subqueries Federation Extensions Negation Expressions in the SELECT clause Property Paths Assignment 14 / 21
  51. 51. www.insight)centre.org. SPARQL 1.1 Data integration features in SPARQL 1.1 Aggregates Subqueries Federation Extensions Negation Expressions in the SELECT clause Property Paths Assignment 14 / 21
  52. 52. www.insight)centre.org. SPARQL 1.1: Federation extensions PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> WHERE { [ foaf:birthday ?MyB ]. SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. } } FILTER ( Regex(Str(?B),str(?MyB)) ) } 15 / 21
  53. 53. www.insight)centre.org. SPARQL 1.1: Federation extensions PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> WHERE { [ foaf:birthday ?MyB ]. SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. } } FILTER ( Regex(Str(?B),str(?MyB)) ) } 15 / 21
  54. 54. www.insight)centre.org. SPARQL 1.1: Federation extensions PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> WHERE { [ foaf:birthday ?MyB ]. SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. } } FILTER ( Regex(Str(?B),str(?MyB)) ) } Problem limits from SPARQL endpoints prevent this query from working! 15 / 21
  55. 55. www.insight)centre.org. XSPARQL: endpoint/service queries XSPARQL endpoint prefix dbprop: <http://dbpedia.org/property/> prefix foaf: <http://xmlns.com/foaf/0.1/> prefix : <http://xsparql.deri.org/bday#> let $MyB := for * from <http://polleres.net/foaf.rdf> where { [ foaf:birthday $B ]. } return $B for * from <http://dbpedia.org/> endpoint <http://dbpedia.org/sparql> where { [ dbprop:born $B; foaf:name $N ]. filter ( regex(str($B),str($MyB)) ) } construct { :axel :sameBirthDayAs $N } 16 / 21
  56. 56. www.insight)centre.org. XSPARQL: endpoint/service queries XSPARQL endpoint prefix dbprop: <http://dbpedia.org/property/> prefix foaf: <http://xmlns.com/foaf/0.1/> prefix : <http://xsparql.deri.org/bday#> let $MyB := for * from <http://polleres.net/foaf.rdf> where { [ foaf:birthday $B ]. } return $B for * from <http://dbpedia.org/> endpoint <http://dbpedia.org/sparql> where { [ dbprop:born $B; foaf:name $N ]. filter ( regex(str($B),str($MyB)) ) } construct { :axel :sameBirthDayAs $N } 16 / 21
  57. 57. www.insight)centre.org. Converting Logainm dump to RDF SPA QL M L X D F R ∼ 1.3M triples Data provided in XML 17 / 21
  58. 58. www.insight)centre.org. Converting Logainm dump to RDF SPA QL M L X D F R ∼ 1.3M triples Data provided in XML Translated to RDF using XSPARQL 17 / 21
  59. 59. www.insight)centre.org. Converting Logainm dump to RDF SPA QL M L X D F R ∼ 1.3M triples Data provided in XML Translated to RDF using XSPARQL Exposed using Openlink Virtuoso 17 / 21
  60. 60. www.insight)centre.org. Overview XSPARQL Usecases & Other features Beyond XSPARQL Conclusions 18 / 21
  61. 61. www.insight)centre.org. XSPARQLViz http://deri-srvgal33.nuig.ie:8080/XsparqlViz/ 18 / 21
  62. 62. www.insight)centre.org. Annotated RDF(S) Domains Annotations refer to a specific domain Temporal :nuno :address :Galway . [2008,2012] 19 / 21
  63. 63. www.insight)centre.org. Annotated RDF(S) Domains Annotations refer to a specific domain Temporal :nuno :address :Galway . [2008,2012] Fuzzy :nuno :address :Dublin . 0.9 19 / 21
  64. 64. www.insight)centre.org. Annotated RDF(S) Domains Annotations refer to a specific domain Temporal :nuno :address :Galway . [2008,2012] Fuzzy :nuno :address :Dublin . 0.9 Access Control :nuno :address :Galway . [nl] 19 / 21
  65. 65. www.insight)centre.org. Annotated RDF(S) AC Query ap address Galway. ap birthday ”24/03”]. nl address Galway. nl birthday ”23/12”]. AnQL AC Query SELECT * WHERE { $person :birthday $birthday .q] } $person $birthday ap “24/03” nl “23/12” 20 / 21
  66. 66. www.insight)centre.org. Annotated RDF(S) AC Query ap address Galway. ap birthday ”24/03” : [[ap]]. nl address Galway. nl birthday ”23/12” : [[nl]]. AnQL AC Query SELECT * WHERE { $person :birthday $birthday :[[nl]] . } $person $birthday ap “24/03” nl “23/12” 20 / 21
  67. 67. www.insight)centre.org. Annotated RDF(S) AC Query ap address Galway. ap birthday ”24/03” : [[ap]]. nl address Galway. nl birthday ”23/12” : [[nl]]. AnQL AC Query SELECT * WHERE { $person :birthday $birthday :[[nl]] . } $person $birthday ap “24/03” nl “23/12” 20 / 21
  68. 68. www.insight)centre.org. Conclusions You can use XSPARQL to easily merge data from different sources in a common language 21 / 21
  69. 69. www.insight)centre.org. Conclusions You can use XSPARQL to easily merge data from different sources in a common language Annotated RDF can provide Access Control over RDF data 21 / 21
  70. 70. www.insight)centre.org. Conclusions You can use XSPARQL to easily merge data from different sources in a common language Annotated RDF can provide Access Control over RDF data Useful links XSPARQL http://xsparql.deri.org/ XSPARQLViz http://deri-srvgal33.nuig.ie: 8080/XsparqlViz/ logainm http://data.logainm.ie/ 21 / 21
  71. 71. www.insight)centre.org. Conclusions You can use XSPARQL to easily merge data from different sources in a common language Annotated RDF can provide Access Control over RDF data Useful links XSPARQL http://xsparql.deri.org/ XSPARQLViz http://deri-srvgal33.nuig.ie: 8080/XsparqlViz/ logainm http://data.logainm.ie/ Thank you! Questions? 21 / 21

×