Your SlideShare is downloading. ×
XSPARQL CrEDIBLE workshop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

XSPARQL CrEDIBLE workshop

233
views

Published on

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
233
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. www.insight)centre.org. XML Data Mediation using XSPARQL Nuno Lopes, Laleh Kazemzadeh October, 2013
  • 2. www.insight)centre.org. Why use RDF for data integration (I) 1 / 21
  • 3. www.insight)centre.org. Why use RDF for data integration (I) Flexibility 1 / 21
  • 4. www.insight)centre.org. Why use RDF for data integration (I) Flexibility 1 Representation Reuse any vocabularies No schema required 1 / 21
  • 5. www.insight)centre.org. Why use RDF for data integration (I) Flexibility 1 Representation Reuse any vocabularies No schema required 2 Combining Easily combine different datasets RDF merge is simple 1 / 21
  • 6. www.insight)centre.org. Why use RDF for data integration (I) Flexibility 1 Representation Reuse any vocabularies No schema required 2 Combining Easily combine different datasets RDF merge is simple 3 Sharing Linked Data Built on web technologies (HTTP, URIs) 1 / 21
  • 7. www.insight)centre.org. Why use RDF for data integration (II) Global Identifiers Schema-less Self-Describing Graph-Based 2 / 21
  • 8. www.insight)centre.org. Why use RDF for data integration (II) Global Identifiers × Schema-less × Self-Describing × Graph-Based × 2 / 21
  • 9. www.insight)centre.org. Why use RDF for data integration (II) Global Identifiers × × Schema-less × √ Self-Describing × × / √ Graph-Based × × / √ 2 / 21
  • 10. www.insight)centre.org. Why use RDF for data integration (II) Global Identifiers × × √ Schema-less × √ √ Self-Describing × × / √ √ Graph-Based × × / √ √ 2 / 21
  • 11. www.insight)centre.org. RDF-based Data Integration 3 / 21
  • 12. www.insight)centre.org. RDF-based Data Integration 3 / 21
  • 13. www.insight)centre.org. RDF-based Data Integration Data Transformation 3 / 21
  • 14. www.insight)centre.org. RDF-based Data Integration Data Transformation Data Representation 3 / 21
  • 15. www.insight)centre.org. RDF-based Data Integration Data Transformation Data Representation 3 / 21
  • 16. www.insight)centre.org. Data Transformation: Query Languages SQL select address from person where name = "Nuno" XQuery for $person in doc("people.xml") return $person//address SPARQL select $address from <nunolopes.org/..> where { :nuno foaf:address $address } relation solution sequence sequence of items Mediator/DataWarehouse SPA QL M L X D F R 4 / 21
  • 17. www.insight)centre.org. Data Transformation: Query Languages SQL select address from person where name = "Nuno" XQuery for $person in doc("people.xml") return $person//address SPARQL select $address from <nunolopes.org/..> where { :nuno foaf:address $address } relation solution sequence sequence of items Mediator/DataWarehouse SPA QL M L X D F R 4 / 21
  • 18. www.insight)centre.org. Data Transformation: Query Languages SQL select address from person where name = "Nuno" XQuery for $person in doc("people.xml") return $person//address SPARQL select $address from <nunolopes.org/..> where { :nuno foaf:address $address } relation solution sequence sequence of items Mediator/DataWarehouse SPA QL M L X D F R 4 / 21
  • 19. www.insight)centre.org. Data Transformation: Query Languages SQL select address from person where name = "Nuno" XQuery for $person in doc("people.xml") return $person//address SPARQL select $address from <nunolopes.org/..> where { :nuno foaf:address $address } relation solution sequence sequence of items Mediator/DataWarehouse SPA QL M L X D F R 4 / 21
  • 20. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF 5 / 21
  • 21. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF Syntactic extension of XQuery 5 / 21
  • 22. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF Syntactic extension of XQuery Semantics based on XQuery’s semantics 5 / 21
  • 23. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF Syntactic extension of XQuery Semantics based on XQuery’s semantics Why based on XQuery? Expressive language Use as scripting language 5 / 21
  • 24. www.insight)centre.org. XSPARQL SPA QL M L X D F R Transformation language between RDB, XML, and RDF Syntactic extension of XQuery Semantics based on XQuery’s semantics Why based on XQuery? Expressive language Use as scripting language Arbitrary Nesting of expressions 5 / 21
  • 25. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  • 26. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address XQuery for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  • 27. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address XSPARQL for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  • 28. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  • 29. www.insight)centre.org. Same Language for each Format XSPARQL for address as $address from people where name = "Nuno" return $address for $person in doc("people.xml") return $person//address XSPARQL for $address from <nunolopes.org/..> where { :nuno foaf:address $address } return $address for var in Expr let var := Expr where Expr order by Expr return Expr for SelectSpec from RelationList where WhereSpecList return Expr for varlist from DatasetClause where { pattern } return Expr 6 / 21
  • 30. www.insight)centre.org. Usecase: Inparanoid 7 / 21
  • 31. www.insight)centre.org. Usecase: Inparanoid 7 / 21
  • 32. www.insight)centre.org. Usecase: Inparanoid 7 / 21
  • 33. www.insight)centre.org. Usecase: Predicting Protein-Protein Interactions 8 / 21
  • 34. www.insight)centre.org. Creating RDF with XSPARQL for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId} . } construct clause generates RDF Arbitrary XSPARQL expressions in subject, predicate, and object 9 / 21
  • 35. www.insight)centre.org. Creating RDF with XSPARQL for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId} . } construct clause generates RDF Arbitrary XSPARQL expressions in subject, predicate, and object 9 / 21
  • 36. www.insight)centre.org. Creating RDF with XSPARQL for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId} . } construct clause generates RDF Arbitrary XSPARQL expressions in subject, predicate, and object 9 / 21
  • 37. www.insight)centre.org. Creating RDF with XSPARQL for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId} . } construct clause generates RDF Arbitrary XSPARQL expressions in subject, predicate, and object Query Result :1 dct:identifier "1" . :1 bioinfo:geneID "ENSG00000155657" . gene:ENSG00000155657 rdf:type bioinfo:Gene . :1 bioinfo:protID "ENSP00000364178" . protein:ENSP00000364178 rdf:type bioinfo:protein . gene:ENSG00000155657 bioinfo:source_database "Ensembl" . protein:ENSG00000155657 bioinfo:organism <http://purl.uniprot.org/taxonomy/9606> 9 / 21
  • 38. www.insight)centre.org. Usecase: Combining Inparanoid with BridgeDB 10 / 21
  • 39. www.insight)centre.org. Usecase: Combining Inparanoid with BridgeDB 10 / 21
  • 40. www.insight)centre.org. Usecase: Combining Inparanoid with BridgeDB 10 / 21
  • 41. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } 11 / 21
  • 42. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } 11 / 21
  • 43. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } 11 / 21
  • 44. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } 11 / 21
  • 45. www.insight)centre.org. Integration Query Example for $gene in doc("H.sapiens -M.musculus.xml")//gene let $id := fn:data($gene/@id) let $geneId :=fn:data($gene/@geneId) let $protId :=fn:data($gene/@protId) let $uri := fn:concat("http://bioinfo.deri.ie/inparanoid/", $id) for $link $idRight from link where idRight = $geneId construct { <{$uri}> purl:identifier {$id}; :geneID {$geneId}; :protID {$protId}; :link {$idRight} . } More involved XSPARQL queries: RDB2RDF Direct Mapping: ∼130 LoC R2RML: ∼290 LoC 11 / 21
  • 46. www.insight)centre.org. Cloudspaces CloudSpace Platform offers Infrastructure for the intersection of: Linked Data Big Data Cloud Computing 12 / 21
  • 47. www.insight)centre.org. Cloudspaces & Sindice Cloudspaces A collection of tool to help users Extract, Transform & Load big data sets. Cloudspaces is based on Sindice, allowing user-defined Linked Data pipelines to be used in the Cloud 13 / 21
  • 48. www.insight)centre.org. Cloudspaces & Sindice Cloudspaces A collection of tool to help users Extract, Transform & Load big data sets. Cloudspaces is based on Sindice, allowing user-defined Linked Data pipelines to be used in the Cloud Sindice Linked Data "Search Engine" High availability Sparql Endpoint: 12 Billion Triples Can load 100 million triples a day (updated daily) 13 / 21
  • 49. www.insight)centre.org. Overview XSPARQL Usecases & Other features Beyond XSPARQL Conclusions 14 / 21
  • 50. www.insight)centre.org. SPARQL 1.1 Data integration features in SPARQL 1.1 Aggregates Subqueries Federation Extensions Negation Expressions in the SELECT clause Property Paths Assignment 14 / 21
  • 51. www.insight)centre.org. SPARQL 1.1 Data integration features in SPARQL 1.1 Aggregates Subqueries Federation Extensions Negation Expressions in the SELECT clause Property Paths Assignment 14 / 21
  • 52. www.insight)centre.org. SPARQL 1.1: Federation extensions PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> WHERE { [ foaf:birthday ?MyB ]. SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. } } FILTER ( Regex(Str(?B),str(?MyB)) ) } 15 / 21
  • 53. www.insight)centre.org. SPARQL 1.1: Federation extensions PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> WHERE { [ foaf:birthday ?MyB ]. SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. } } FILTER ( Regex(Str(?B),str(?MyB)) ) } 15 / 21
  • 54. www.insight)centre.org. SPARQL 1.1: Federation extensions PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?N ?MyB FROM <http://polleres.net/foaf.rdf> WHERE { [ foaf:birthday ?MyB ]. SERVICE <http://dbpedia.org/sparql> { SELECT ?N WHERE { [ dbpedia2:born ?B; foaf:name ?N ]. } } FILTER ( Regex(Str(?B),str(?MyB)) ) } Problem limits from SPARQL endpoints prevent this query from working! 15 / 21
  • 55. www.insight)centre.org. XSPARQL: endpoint/service queries XSPARQL endpoint prefix dbprop: <http://dbpedia.org/property/> prefix foaf: <http://xmlns.com/foaf/0.1/> prefix : <http://xsparql.deri.org/bday#> let $MyB := for * from <http://polleres.net/foaf.rdf> where { [ foaf:birthday $B ]. } return $B for * from <http://dbpedia.org/> endpoint <http://dbpedia.org/sparql> where { [ dbprop:born $B; foaf:name $N ]. filter ( regex(str($B),str($MyB)) ) } construct { :axel :sameBirthDayAs $N } 16 / 21
  • 56. www.insight)centre.org. XSPARQL: endpoint/service queries XSPARQL endpoint prefix dbprop: <http://dbpedia.org/property/> prefix foaf: <http://xmlns.com/foaf/0.1/> prefix : <http://xsparql.deri.org/bday#> let $MyB := for * from <http://polleres.net/foaf.rdf> where { [ foaf:birthday $B ]. } return $B for * from <http://dbpedia.org/> endpoint <http://dbpedia.org/sparql> where { [ dbprop:born $B; foaf:name $N ]. filter ( regex(str($B),str($MyB)) ) } construct { :axel :sameBirthDayAs $N } 16 / 21
  • 57. www.insight)centre.org. Converting Logainm dump to RDF SPA QL M L X D F R ∼ 1.3M triples Data provided in XML 17 / 21
  • 58. www.insight)centre.org. Converting Logainm dump to RDF SPA QL M L X D F R ∼ 1.3M triples Data provided in XML Translated to RDF using XSPARQL 17 / 21
  • 59. www.insight)centre.org. Converting Logainm dump to RDF SPA QL M L X D F R ∼ 1.3M triples Data provided in XML Translated to RDF using XSPARQL Exposed using Openlink Virtuoso 17 / 21
  • 60. www.insight)centre.org. Overview XSPARQL Usecases & Other features Beyond XSPARQL Conclusions 18 / 21
  • 61. www.insight)centre.org. XSPARQLViz http://deri-srvgal33.nuig.ie:8080/XsparqlViz/ 18 / 21
  • 62. www.insight)centre.org. Annotated RDF(S) Domains Annotations refer to a specific domain Temporal :nuno :address :Galway . [2008,2012] 19 / 21
  • 63. www.insight)centre.org. Annotated RDF(S) Domains Annotations refer to a specific domain Temporal :nuno :address :Galway . [2008,2012] Fuzzy :nuno :address :Dublin . 0.9 19 / 21
  • 64. www.insight)centre.org. Annotated RDF(S) Domains Annotations refer to a specific domain Temporal :nuno :address :Galway . [2008,2012] Fuzzy :nuno :address :Dublin . 0.9 Access Control :nuno :address :Galway . [nl] 19 / 21
  • 65. www.insight)centre.org. Annotated RDF(S) AC Query ap address Galway. ap birthday ”24/03”]. nl address Galway. nl birthday ”23/12”]. AnQL AC Query SELECT * WHERE { $person :birthday $birthday .q] } $person $birthday ap “24/03” nl “23/12” 20 / 21
  • 66. www.insight)centre.org. Annotated RDF(S) AC Query ap address Galway. ap birthday ”24/03” : [[ap]]. nl address Galway. nl birthday ”23/12” : [[nl]]. AnQL AC Query SELECT * WHERE { $person :birthday $birthday :[[nl]] . } $person $birthday ap “24/03” nl “23/12” 20 / 21
  • 67. www.insight)centre.org. Annotated RDF(S) AC Query ap address Galway. ap birthday ”24/03” : [[ap]]. nl address Galway. nl birthday ”23/12” : [[nl]]. AnQL AC Query SELECT * WHERE { $person :birthday $birthday :[[nl]] . } $person $birthday ap “24/03” nl “23/12” 20 / 21
  • 68. www.insight)centre.org. Conclusions You can use XSPARQL to easily merge data from different sources in a common language 21 / 21
  • 69. www.insight)centre.org. Conclusions You can use XSPARQL to easily merge data from different sources in a common language Annotated RDF can provide Access Control over RDF data 21 / 21
  • 70. www.insight)centre.org. Conclusions You can use XSPARQL to easily merge data from different sources in a common language Annotated RDF can provide Access Control over RDF data Useful links XSPARQL http://xsparql.deri.org/ XSPARQLViz http://deri-srvgal33.nuig.ie: 8080/XsparqlViz/ logainm http://data.logainm.ie/ 21 / 21
  • 71. www.insight)centre.org. Conclusions You can use XSPARQL to easily merge data from different sources in a common language Annotated RDF can provide Access Control over RDF data Useful links XSPARQL http://xsparql.deri.org/ XSPARQLViz http://deri-srvgal33.nuig.ie: 8080/XsparqlViz/ logainm http://data.logainm.ie/ Thank you! Questions? 21 / 21