Integrating Heterogeneous Data by Extending Semantic Web Standards

779 views

Published on

My PhD VIVA presentation entitled "Integrating Heterogeneous Data by Extending Semantic Web Standards"

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
779
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
41
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Integrating Heterogeneous Data by Extending Semantic Web Standards

  1. 1. Integrating Heterogeneous Data by Extending Semantic Web Standards Nuno Lopes 26 November, 2012
  2. 2. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  3. 3. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  4. 4. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  5. 5. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  6. 6. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  7. 7. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  8. 8. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  9. 9. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  10. 10. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  11. 11. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  12. 12. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> </person> How many pizzas were delivered to Nuno’s home during his PhD? 2012 XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 1 / 24
  13. 13. Using Query Languages for Data Integration SQL to Nuno’s home during his PhD? select address from clients How many pizzas were delivered Mediator / Data Warehouse where person = "Nuno" relation SPARQL select $lat $long from <geonames.org/..> where { $point geo:lat $lat; geo:long $long } solution sequence XQuery for $person in doc("eat.ie/..") return $person//name sequence of items 2 / 24
  14. 14. Using Query Languages for Data Integration SQL to Nuno’s home during his PhD? select address from clients How many pizzas were delivered Mediator / Data Warehouse where person = "Nuno" relation SPARQL select $lat $long from <geonames.org/..> where { $point geo:lat $lat; geo:long $long } solution sequence XQuery for $person in doc("eat.ie/..") return $person//name sequence of items 2 / 24
  15. 15. Using Query Languages for Data Integration SQL to Nuno’s home during his PhD? select address from clients How many pizzas were delivered Mediator / Data Warehouse where person = "Nuno" relation SPARQL select $lat $long from <geonames.org/..> where { $point geo:lat $lat; geo:long $long } solution sequence XQuery for $person in doc("eat.ie/..") return $person//name sequence of items 2 / 24
  16. 16. Using Query Languages for Data Integration SQL to Nuno’s home during his PhD? select address from clients How many pizzas were delivered Mediator / Data Warehouse where person = "Nuno" relation SPARQL select $lat $long from <geonames.org/..> where { $point geo:lat $lat; geo:long $long } solution sequence XQuery for $person in doc("eat.ie/..") return $person//name sequence of items 2 / 24
  17. 17. Using Query Languages for Data Integration SQL to Nuno’s home during his PhD? select address from clients How many pizzas were delivered Mediator / Data Warehouse where person = "Nuno" relation SPARQL select $lat $long from <geonames.org/..> where { $point geo:lat $lat; geo:long $long } solution sequence XQuery for $person in doc("eat.ie/..") return $person//name sequence of items 2 / 24
  18. 18. Using Query Languages for Data Integration SQL to Nuno’s home during his PhD? select address from clients How many pizzas were delivered Mediator / Data Warehouse where person = "Nuno" relation SPARQL select $lat $long from <geonames.org/..> where { $point geo:lat $lat; geo:long $long } solution sequence XQuery for $person in doc("eat.ie/..") return $person//name sequence of items 2 / 24
  19. 19. How to represent context information in RDF? RDF triples :nuno :address :Galway . :nuno :address :Dublin . 3 / 24
  20. 20. How to represent context information in RDF? RDF triples :nuno :address :Galway . :nuno :address :Dublin . Not enough! 3 / 24
  21. 21. How to represent context information in RDF? RDF triples :nuno :address :Galway . :nuno :address :Dublin . Not enough! Domain vocabulary/ontology :address1 a :AddressRecord; :person :nuno; :address :Galway; :start "2008" ; :end "2012" . 3 / 24
  22. 22. How to represent context information in RDF? RDF triples :nuno :address :Galway . :nuno :address :Dublin . Not enough! Domain vocabulary/ontology :address1 a :AddressRecord; :person :nuno; :address :Galway; :start "2008" ; :end "2012" . Reification :address1 rdf:type rdf:Statement rdf:subject :nuno; rdf:predicate :address ; rdf:object :Dublin ; :loc <http://eat.ie/> . 3 / 24
  23. 23. How to represent context information in RDF? RDF triples :nuno :address :Galway . :nuno :address :Dublin . Not enough! Domain vocabulary/ontology :address1 a :AddressRecord; :person :nuno; :address :Galway; :start "2008" ; :end "2012" . Reification :address1 rdf:type rdf:Statement rdf:subject :nuno; rdf:predicate :address ; rdf:object :Dublin ; :loc <http://eat.ie/> . Named Graphs 3 / 24
  24. 24. How to represent context information in RDF? RDF triples :nuno :address :Galway . :nuno :address :Dublin . Not enough! Domain vocabulary/ontology :address1 a :AddressRecord; :person :nuno; :address :Galway; No defined :start "2008" ; semantics! :end "2012" . Reification :address1 rdf:type rdf:Statement rdf:subject :nuno; rdf:predicate :address ; No defined rdf:object :Dublin ; semantics! :loc <http://eat.ie/> . Named Graphs 3 / 24
  25. 25. Hypothesis and Research Questions Hypothesis Efficient data integration over heterogenous data sources can be achieved by 4 / 24
  26. 26. Hypothesis and Research Questions Hypothesis Efficient data integration over heterogenous data sources can be achieved by a combined query language that accesses heterogenous data in its original sources 4 / 24
  27. 27. Hypothesis and Research Questions Hypothesis Efficient data integration over heterogenous data sources can be achieved by a combined query language that accesses heterogenous data in its original sources optimisations for efficient query evaluation for this language 4 / 24
  28. 28. Hypothesis and Research Questions Hypothesis Efficient data integration over heterogenous data sources can be achieved by a combined query language that accesses heterogenous data in its original sources optimisations for efficient query evaluation for this language an RDF-based format with support for context information 4 / 24
  29. 29. Hypothesis and Research Questions Hypothesis Efficient data integration over heterogenous data sources can be achieved by a combined query language that accesses heterogenous data in its original sources optimisations for efficient query evaluation for this language an RDF-based format with support for context information Research Questions How to design a query language that bridges the different formats? 4 / 24
  30. 30. Hypothesis and Research Questions Hypothesis Efficient data integration over heterogenous data sources can be achieved by a combined query language that accesses heterogenous data in its original sources optimisations for efficient query evaluation for this language an RDF-based format with support for context information Research Questions How to design a query language that bridges the different formats? Can we reuse existing optimisations from other query languages? 4 / 24
  31. 31. Hypothesis and Research Questions Hypothesis Efficient data integration over heterogenous data sources can be achieved by a combined query language that accesses heterogenous data in its original sources optimisations for efficient query evaluation for this language an RDF-based format with support for context information Research Questions How to design a query language that bridges the different formats? Can we reuse existing optimisations from other query languages? How to adapt RDF and SPARQL for context information? 4 / 24
  32. 32. Outline and Contributions XSPARQL [JoDS2012], Syntax & Semantics Implementation [EPIA2011] Evaluation Annotated RDF(S) [AAAI2010], Annotated RDF(S) [ISWC2010], AnQL: Annotated SPARQL [JWS2012] Architecture
  33. 33. Outline and Contributions XSPARQL [JoDS2012], Syntax & Semantics Implementation [EPIA2011] Evaluation Annotated RDF(S) [AAAI2010], Annotated RDF(S) [ISWC2010], AnQL: Annotated SPARQL [JWS2012] Architecture
  34. 34. Outline and Contributions XSPARQL [JoDS2012], Syntax & Semantics Implementation [EPIA2011] Evaluation Annotated RDF(S) [AAAI2010], Annotated RDF(S) [ISWC2010], AnQL: Annotated SPARQL [JWS2012] Architecture
  35. 35. XSPARQL Transformation language between RDB, X SPA R QL XML, and RDF M D L F 5 / 24
  36. 36. XSPARQL Transformation language between RDB, X SPA R QL XML, and RDF M D L F Syntactic extension of XQuery 5 / 24
  37. 37. XSPARQL Transformation language between RDB, X SPA R QL XML, and RDF M D L F Syntactic extension of XQuery Semantics based on XQuery’s semantics 5 / 24
  38. 38. XSPARQL Transformation language between RDB, X SPA R QL XML, and RDF M D L F Syntactic extension of XQuery Semantics based on XQuery’s semantics Why based on XQuery? Expressive language Use as scripting language 5 / 24
  39. 39. XSPARQL Transformation language between RDB, X SPA R QL XML, and RDF M D L F Syntactic extension of XQuery Semantics based on XQuery’s semantics Why based on XQuery? Expressive language Use as scripting language Arbitrary Nesting of expressions 5 / 24
  40. 40. Same Language for each Format for var in Expr let var := Expr for SelectSpec varlist from RelationList DatasetClause where Expr order by Expr } { pattern where WhereSpecList return Expr return Expr XSPARQL XSPARQL for $lat $long doc("eat.ie/..") for $person in from <geonames.org/..> for address as $address from clients where {$person//name $lat; $point geo:lat returnperson = "Nuno" where geo:long $long } return $address return $lat 6 / 24
  41. 41. Same Language for each Format for var in Expr let var := Expr for SelectSpec varlist from RelationList DatasetClause where Expr order by Expr } { pattern where WhereSpecList return Expr return Expr XSPARQL XSPARQL XQuery $long for $lat from <geonames.org/..> for address as $address from clients for $person in geo:lat $lat; where { $point doc("eat.ie/..") where person = "Nuno" return $person//name $long } geo:long return $address return $lat 6 / 24
  42. 42. Same Language for each Format for var in Expr let var := Expr for SelectSpec varlist from RelationList DatasetClause where Expr order by Expr } { pattern where WhereSpecList return Expr return Expr XSPARQL XSPARQL XSPARQL for $lat $long from <geonames.org/..> for address as $address from clients for $person in doc("eat.ie/..") where { $point geo:lat $lat; where person = "Nuno" return $person//name $long } geo:long return $address return $lat 6 / 24
  43. 43. Same Language for each Format for var in Expr let var := Expr for SelectSpec varlist from RelationList DatasetClause where Expr order by Expr } { pattern where WhereSpecList return Expr return Expr XSPARQL XSPARQL for $lat $long doc("eat.ie/..") for $person in from <geonames.org/..> for address as $address from clients where {$person//name $lat; $point geo:lat returnperson = "Nuno" where geo:long $long } return $address return $lat 6 / 24
  44. 44. Same Language for each Format for var in Expr let var := Expr for SelectSpec varlist from RelationList DatasetClause where Expr order by Expr } { pattern where WhereSpecList return Expr return Expr XSPARQL XSPARQL for $lat $long doc("eat.ie/..") for $person in from <geonames.org/..> for address as $address from clients where {$person//name $lat; $point geo:lat returnperson = "Nuno" where geo:long $long } return $address return $lat 6 / 24
  45. 45. Creating RDF with XSPARQL Convert online orders into RDF Arbitrary XSPARQL prefix : <http://pizza-vocab.ie/> expressions in subject, predicate, and object for $person in doc("eat.ie/..")//name construct { :meal a :FoodOrder; :author {$person} } construct clause generates RDF 7 / 24
  46. 46. Creating RDF with XSPARQL Convert online orders into RDF Arbitrary XSPARQL prefix : <http://pizza-vocab.ie/> expressions in subject, predicate, and object for $person in doc("eat.ie/..")//name construct { :meal a :FoodOrder; :author {$person} } construct clause generates RDF 7 / 24
  47. 47. Creating RDF with XSPARQL Convert online orders into RDF Arbitrary XSPARQL prefix : <http://pizza-vocab.ie/> expressions in subject, predicate, and object for $person in doc("eat.ie/..")//name construct { :meal a :FoodOrder; :author {$person} } construct clause generates RDF 7 / 24
  48. 48. Creating RDF with XSPARQL Convert online orders into RDF Arbitrary XSPARQL prefix : <http://pizza-vocab.ie/> expressions in subject, predicate, and object for $person in doc("eat.ie/..")//name construct { :meal a :FoodOrder; :author {$person} } construct clause generates RDF Query result @prefix : <http://pizza-vocab.ie/> . :meal a :FoodOrder . :meal :author "Nuno" . 7 / 24
  49. 49. Integration Query Example “Display pizza deliveries in Google Maps using KML” for $person in doc("eat.ie/...")//name for address as $address from clients where person = $person let $uri := fn:concat( "api.geonames.org/search?q=", $address) for $lat $long from $uri where { $point geo:lat $lat; geo:long $long } return <kml> <lat>{$lat}<lat> <long>{$long}</long> </kml> 8 / 24
  50. 50. Integration Query Example “Display pizza deliveries in Google Maps using KML” for $person in doc("eat.ie/...")//name for address as $address from clients where person = $person let $uri := fn:concat( "api.geonames.org/search?q=", $address) for $lat $long from $uri where { $point geo:lat $lat; geo:long $long } return <kml> <lat>{$lat}<lat> <long>{$long}</long> </kml> 8 / 24
  51. 51. Integration Query Example “Display pizza deliveries in Google Maps using KML” for $person in doc("eat.ie/...")//name for address as $address from clients where person = $person let $uri := fn:concat( "api.geonames.org/search?q=", $address) for $lat $long from $uri where { $point geo:lat $lat; geo:long $long } return <kml> <lat>{$lat}<lat> <long>{$long}</long> </kml> 8 / 24
  52. 52. Integration Query Example “Display pizza deliveries in Google Maps using KML” for $person in doc("eat.ie/...")//name for address as $address from clients where person = $person let $uri := fn:concat( "api.geonames.org/search?q=", $address) for $lat $long from $uri where { $point geo:lat $lat; geo:long $long } return <kml> <lat>{$lat}<lat> <long>{$long}</long> </kml> 8 / 24
  53. 53. Integration Query Example “Display pizza deliveries in Google Maps using KML” for $person in doc("eat.ie/...")//name for address as $address from clients where person = $person let $uri := fn:concat( "api.geonames.org/search?q=", $address) for $lat $long from $uri where { $point geo:lat $lat; geo:long $long } return <kml> <lat>{$lat}<lat> <long>{$long}</long> </kml> 8 / 24
  54. 54. Integration Query Example “Display pizza deliveries in Google Maps using KML” for $person in doc("eat.ie/...")//name for address as $address from clients where person = $person let $uri := fn:concat( "api.geonames.org/search?q=", $address) for $lat $long from $uri where { $point geo:lat $lat; geo:long $long } return <kml> <lat>{$lat}<lat> <long>{$long}</long> </kml> More involved XSPARQL queries: RDB2RDF Direct Mapping: ∼130 LOC R2RML: ∼290 LOC 8 / 24
  55. 55. Language Semantics Extension of the XQuery Evaluation Semantics dynEnv.globalPosition = Pos 1 , · · · , Pos j dynEnv fs:dataset(DatasetClause) ⇒ Dataset for $person dynEnv fs:sparql Dataset ,,WhereClause, in doc("eat.ie/...")//name dynEnv Dataset WhereClause, fs:sparql SolutionModifier eval(SQL SELECT) ⇒ µ1 ,,......,,µm ⇒ µ1 µm SolutionModifier for address as $address from clients dynEnv.varValue ⇒ where person globalPosition Pos 1 , · · · , Pos j , 1 dynEnv + = $person relation dynEnv.varValue +activeDataset(Dataset) let $uri + varValue Var 1; ⇒ fs:value µ1 , Var 1 ;  := fn:concat(. .. ExprSingle ⇒ Value 1 "api.geonames.org/search?q=",, Var n Var n ⇒ fs:value µ1 $address) person for $lat $long from $uri . where { $point geo:lat $lat; geo:long $long }”nuno” . . return <kml> dynEnv + globalPosition Pos 1 , · · · , Pos j , m + activeDataset(Dataset)  dynEnv.varValue ⇒  Var 1 ⇒ <lat>{$lat}<lat> fs:value µm , Var 1 ; eval(SPARQL SELECT) + varValue  ... ; <long>{$long}</long> Var n ⇒ fs:value µm , Var n  SPARQL compatible solutions sol. seq. ExprSingle ⇒ Value m </kml> for $Var · · · $Var DatasetClause {{ person ⇒ ”nuno” }} 1 n dynEnv.varValue dynEnv WhereClause SolutionModifier ⇒ Value 1 , . . . , Value m return ExprSingle 9 / 24
  56. 56. Language Semantics Extension of the XQuery Evaluation Semantics for add variables and values to the dynamic environment dynEnv.globalPosition = Pos 1 , · · · , Pos j dynEnv fs:dataset(DatasetClause) ⇒ Dataset for $person dynEnv fs:sparql Dataset ,,WhereClause, in doc("eat.ie/...")//name dynEnv Dataset WhereClause, fs:sparql SolutionModifier eval(SQL SELECT) ⇒ µ1 ,,......,,µm ⇒ µ1 µm SolutionModifier for address as $address from clients dynEnv.varValue ⇒ where person globalPosition Pos 1 , · · · , Pos j , 1 dynEnv + = $person relation dynEnv.varValue +activeDataset(Dataset) let $uri + varValue Var 1; ⇒ fs:value µ1 , Var 1 ;  := fn:concat(. .. ExprSingle ⇒ Value 1 "api.geonames.org/search?q=",, Var n Var n ⇒ fs:value µ1 $address) person for $lat $long from $uri . where { $point geo:lat $lat; geo:long $long }”nuno” . . return <kml> dynEnv + globalPosition Pos 1 , · · · , Pos j , m + activeDataset(Dataset)  dynEnv.varValue ⇒  Var 1 ⇒ <lat>{$lat}<lat> fs:value µm , Var 1 ; eval(SPARQL SELECT) + varValue  ... ; <long>{$long}</long> Var n ⇒ fs:value µm , Var n  SPARQL compatible solutions sol. seq. ExprSingle ⇒ Value m </kml> for $Var · · · $Var DatasetClause {{ person ⇒ ”nuno” }} 1 n dynEnv.varValue dynEnv WhereClause SolutionModifier ⇒ Value 1 , . . . , Value m return ExprSingle 9 / 24
  57. 57. Language Semantics Extension of the XQuery Evaluation Semantics for add variables and values to the dynamic environment Reuse semantics of original languages for the new expressions dynEnv.globalPosition = Pos 1 , · · · , Pos j dynEnv fs:dataset(DatasetClause) ⇒ Dataset for $person dynEnv fs:sparql Dataset ,,WhereClause, in doc("eat.ie/...")//name dynEnv Dataset WhereClause, fs:sparql SolutionModifier eval(SQL SELECT) ⇒ µ1 ,,......,,µm ⇒ µ1 µm SolutionModifier for address as $address from clients dynEnv.varValue ⇒ where person globalPosition Pos 1 , · · · , Pos j , 1 dynEnv + = $person relation dynEnv.varValue +activeDataset(Dataset) let $uri + varValue Var 1; ⇒ fs:value µ1 , Var 1 ;  := fn:concat(. .. ExprSingle ⇒ Value 1 "api.geonames.org/search?q=",, Var n Var n ⇒ fs:value µ1 $address) person for $lat $long from $uri . where { $point geo:lat $lat; geo:long $long }”nuno” . . return <kml> dynEnv + globalPosition Pos 1 , · · · , Pos j , m + activeDataset(Dataset)  dynEnv.varValue ⇒  Var 1 ⇒ <lat>{$lat}<lat> fs:value µm , Var 1 ; eval(SPARQL SELECT) + varValue  ... ; <long>{$long}</long> Var n ⇒ fs:value µm , Var n  SPARQL compatible solutions sol. seq. ExprSingle ⇒ Value m </kml> for $Var · · · $Var DatasetClause {{ person ⇒ ”nuno” }} 1 n dynEnv.varValue dynEnv WhereClause SolutionModifier ⇒ Value 1 , . . . , Value m return ExprSingle 9 / 24
  58. 58. Language Semantics Extension of the XQuery Evaluation Semantics for add variables and values to the dynamic environment Reuse semantics of original languages for the new expressions dynEnv.globalPosition = Pos 1 , · · · , Pos j dynEnv fs:dataset(DatasetClause) ⇒ Dataset for $person dynEnv fs:sparql Dataset ,,WhereClause, in doc("eat.ie/...")//name dynEnv Dataset WhereClause, fs:sparql SolutionModifier eval(SQL SELECT) ⇒ µ1 ,,......,,µm ⇒ µ1 µm SolutionModifier for address as $address from clients dynEnv.varValue ⇒ where person globalPosition Pos 1 , · · · , Pos j , 1 dynEnv + = $person relation dynEnv.varValue +activeDataset(Dataset) let $uri + varValue Var 1; ⇒ fs:value µ1 , Var 1 ;  := fn:concat(. .. ExprSingle ⇒ Value 1 "api.geonames.org/search?q=",, Var n Var n ⇒ fs:value µ1 $address) person for $lat $long from $uri . where { $point geo:lat $lat; geo:long $long }”nuno” . . return <kml> dynEnv + globalPosition Pos 1 , · · · , Pos j , m + activeDataset(Dataset)  dynEnv.varValue ⇒  Var 1 ⇒ <lat>{$lat}<lat> fs:value µm , Var 1 ; eval(SPARQL SELECT) + varValue  ... ; <long>{$long}</long> Var n ⇒ fs:value µm , Var n  SPARQL compatible solutions sol. seq. ExprSingle ⇒ Value m </kml> for $Var · · · $Var DatasetClause {{ person ⇒ ”nuno” }} 1 n dynEnv.varValue dynEnv WhereClause SolutionModifier ⇒ Value 1 , . . . , Value m return ExprSingle 9 / 24
  59. 59. Language Semantics Extension of the XQuery Evaluation Semantics for add variables and values to the dynamic environment Reuse semantics of original languages for the new expressions dynEnv.globalPosition = Pos 1 , · · · , Pos j dynEnv fs:dataset(DatasetClause) ⇒ Dataset for $person dynEnv fs:sparql Dataset ,,WhereClause, in doc("eat.ie/...")//name dynEnv Dataset WhereClause, fs:sparql SolutionModifier eval(SQL SELECT) ⇒ µ1 ,,......,,µm ⇒ µ1 µm SolutionModifier for address as $address from clients dynEnv.varValue ⇒ where person globalPosition Pos 1 , · · · , Pos j , 1 dynEnv + = $person relation dynEnv.varValue +activeDataset(Dataset) let $uri + varValue Var 1; ⇒ fs:value µ1 , Var 1 ;  := fn:concat(. .. ExprSingle ⇒ Value 1 "api.geonames.org/search?q=",, Var n Var n ⇒ fs:value µ1 $address) person for $lat $long from $uri . where { $point geo:lat $lat; geo:long $long }”nuno” . . return <kml> dynEnv + globalPosition Pos 1 , · · · , Pos j , m + activeDataset(Dataset)  dynEnv.varValue ⇒  Var 1 ⇒ <lat>{$lat}<lat> fs:value µm , Var 1 ; eval(SPARQL SELECT) + varValue  ... ; <long>{$long}</long> Var n ⇒ fs:value µm , Var n  SPARQL compatible solutions sol. seq. ExprSingle ⇒ Value m </kml> for $Var · · · $Var DatasetClause {{ person ⇒ ”nuno” }} 1 n dynEnv.varValue dynEnv WhereClause SolutionModifier ⇒ Value 1 , . . . , Value m return ExprSingle 9 / 24
  60. 60. Language Semantics Extension of the XQuery Evaluation Semantics for add variables and values to the dynamic environment Reuse semantics of original languages for the new expressions dynEnv.globalPosition = Pos 1 , · · · , Pos j dynEnv fs:dataset(DatasetClause) ⇒ Dataset for $person dynEnv fs:sparql Dataset ,,WhereClause, in doc("eat.ie/...")//name dynEnv Dataset WhereClause, fs:sparql SolutionModifier eval(SQL SELECT) ⇒ µ1 ,,......,,µm ⇒ µ1 µm SolutionModifier for address as $address from clients dynEnv.varValue ⇒ where person globalPosition Pos 1 , · · · , Pos j , 1 dynEnv + = $person relation dynEnv.varValue +activeDataset(Dataset) let $uri + varValue Var 1; ⇒ fs:value µ1 , Var 1 ;  := fn:concat(. .. ExprSingle ⇒ Value 1 "api.geonames.org/search?q=",, Var n Var n ⇒ fs:value µ1 $address) person for $lat $long from $uri . where { $point geo:lat $lat; geo:long $long }”nuno” . . return <kml> dynEnv + globalPosition Pos 1 , · · · , Pos j , m + activeDataset(Dataset)  dynEnv.varValue ⇒  Var 1 ⇒ <lat>{$lat}<lat> fs:value µm , Var 1 ; eval(SPARQL SELECT) + varValue  ... ; <long>{$long}</long> Var n ⇒ fs:value µm , Var n  SPARQL compatible solutions sol. seq. ExprSingle ⇒ Value m </kml> for $Var · · · $Var DatasetClause {{ person ⇒ ”nuno” }} 1 n dynEnv.varValue dynEnv WhereClause SolutionModifier ⇒ Value 1 , . . . , Value m return ExprSingle 9 / 24
  61. 61. Language Semantics Extension of the XQuery Evaluation Semantics for add variables and values to the dynamic environment Reuse semantics of original languages for the new expressions dynEnv.globalPosition = Pos 1 , · · · , Pos j dynEnv fs:dataset(DatasetClause) ⇒ Dataset for $person dynEnv fs:sparql Dataset ,,WhereClause, in doc("eat.ie/...")//name dynEnv Dataset WhereClause, fs:sparql SolutionModifier eval(SQL SELECT) ⇒ µ1 ,,......,,µm ⇒ µ1 µm SolutionModifier for address as $address from clients dynEnv.varValue ⇒ where person globalPosition Pos 1 , · · · , Pos j , 1 dynEnv + = $person relation dynEnv.varValue +activeDataset(Dataset) let $uri + varValue Var 1; ⇒ fs:value µ1 , Var 1 ;  := fn:concat(. .. ExprSingle ⇒ Value 1 "api.geonames.org/search?q=",, Var n Var n ⇒ fs:value µ1 $address) person for $lat $long from $uri . where { $point geo:lat $lat; geo:long $long }”nuno” . . return <kml> dynEnv + globalPosition Pos 1 , · · · , Pos j , m + activeDataset(Dataset)  dynEnv.varValue ⇒  Var 1 ⇒ <lat>{$lat}<lat> fs:value µm , Var 1 ; eval(SPARQL SELECT) + varValue  ... ; <long>{$long}</long> Var n ⇒ fs:value µm , Var n  SPARQL compatible solutions sol. seq. ExprSingle ⇒ Value m </kml> for $Var · · · $Var DatasetClause {{ person ⇒ ”nuno” }} 1 n dynEnv.varValue dynEnv WhereClause SolutionModifier ⇒ Value 1 , . . . , Value m return ExprSingle 9 / 24
  62. 62. Language Semantics Extension of the XQuery Evaluation Semantics for add variables and values to the dynamic environment Reuse semantics of original languages for the new expressions dynEnv.globalPosition = Pos 1 , · · · , Pos j dynEnv fs:dataset(DatasetClause) ⇒ Dataset for $person dynEnv fs:sparql Dataset ,,WhereClause, in doc("eat.ie/...")//name dynEnv Dataset WhereClause, fs:sparql SolutionModifier eval(SQL SELECT) ⇒ µ1 ,,......,,µm ⇒ µ1 µm SolutionModifier for address as $address from clients dynEnv.varValue ⇒ where person globalPosition Pos 1 , · · · , Pos j , 1 dynEnv + = $person relation dynEnv.varValue +activeDataset(Dataset) let $uri + varValue Var 1; ⇒ fs:value µ1 , Var 1 ;  := fn:concat(. .. ExprSingle ⇒ Value 1 "api.geonames.org/search?q=",, Var n Var n ⇒ fs:value µ1 $address) person for $lat $long from $uri . where { $point geo:lat $lat; geo:long $long }”nuno” . . return <kml> dynEnv + globalPosition Pos 1 , · · · , Pos j , m + activeDataset(Dataset)  dynEnv.varValue ⇒  Var 1 ⇒ <lat>{$lat}<lat> fs:value µm , Var 1 ; eval(SPARQL SELECT) + varValue  ... ; <long>{$long}</long> Var n ⇒ fs:value µm , Var n  SPARQL compatible solutions sol. seq. ExprSingle ⇒ Value m </kml> for $Var · · · $Var DatasetClause {{ person ⇒ ”nuno” }} 1 n dynEnv.varValue dynEnv WhereClause SolutionModifier ⇒ Value 1 , . . . , Value m return ExprSingle 9 / 24
  63. 63. Language Semantics Extension of the XQuery Evaluation Semantics for add variables and values to the dynamic environment Reuse semantics of original languages for the new expressions dynEnv.globalPosition = Pos 1 , · · · , Pos j dynEnv fs:dataset(DatasetClause) ⇒ Dataset for $person dynEnv fs:sparql Dataset ,,WhereClause, in doc("eat.ie/...")//name dynEnv Dataset WhereClause, fs:sparql SolutionModifier eval(SQL SELECT) ⇒ µ1 ,,......,,µm ⇒ µ1 µm SolutionModifier for address as $address from clients dynEnv.varValue ⇒ where person globalPosition Pos 1 , · · · , Pos j , 1 dynEnv + = $person relation dynEnv.varValue +activeDataset(Dataset) let $uri + varValue Var 1; ⇒ fs:value µ1 , Var 1 ;  := fn:concat(. .. ExprSingle ⇒ Value 1 "api.geonames.org/search?q=",, Var n Var n ⇒ fs:value µ1 $address) person for $lat $long from $uri . where { $point geo:lat $lat; geo:long $long }”nuno” . . return <kml> dynEnv + globalPosition Pos 1 , · · · , Pos j , m + activeDataset(Dataset)  dynEnv.varValue ⇒  Var 1 ⇒ <lat>{$lat}<lat> fs:value µm , Var 1 ; eval(SPARQL SELECT) + varValue  ... ; <long>{$long}</long> Var n ⇒ fs:value µm , Var n  SPARQL compatible solutions sol. seq. ExprSingle ⇒ Value m </kml> for $Var · · · $Var DatasetClause {{ person ⇒ ”nuno” }} 1 n dynEnv.varValue dynEnv WhereClause SolutionModifier ⇒ Value 1 , . . . , Value m return ExprSingle 9 / 24
  64. 64. Language Implementation Reuse components (SPARQL engine, Relational Database) 10 / 24
  65. 65. Language Implementation Reuse components (SPARQL engine, Relational Database) Implemented XSPARQL by rewriting to XQuery 10 / 24
  66. 66. Language Implementation Reuse components (SPARQL engine, Relational Database) Implemented XSPARQL by rewriting to XQuery Semantics implemented by substitution of bound variables 10 / 24
  67. 67. Language Implementation Reuse components (SPARQL engine, Relational Database) Implemented XSPARQL by rewriting to XQuery Semantics implemented by substitution of bound variables Bound Variable Substitution Bound variables are replaced by their value at runtime for $person in doc("eat.ie/...")//name for address as $address from clients fn:concat( where person = $person "SELECT address from clients where person = ", $person) 10 / 24
  68. 68. Language Implementation Reuse components (SPARQL engine, Relational Database) Implemented XSPARQL by rewriting to XQuery Semantics implemented by substitution of bound variables Bound Variable Substitution Bound variables are replaced by their value at runtime Implemented in the generated XQuery for $person in doc("eat.ie/...")//name for address as $address from clients fn:concat( where person = $person "SELECT address from clients where person = ", $person) 10 / 24
  69. 69. Language Implementation Reuse components (SPARQL engine, Relational Database) Implemented XSPARQL by rewriting to XQuery Semantics implemented by substitution of bound variables Bound Variable Substitution Bound variables are replaced by their value at runtime Implemented in the generated XQuery Pushing the variable bindings into the respective query engine for $person in doc("eat.ie/...")//name for address as $address from clients fn:concat( where person = $person "SELECT address from clients where person = ", $person) 10 / 24
  70. 70. Implementation XSPARQL XML / query RDF XSPARQL Enhanced Query XQuery XQuery Rewriter query engine SQL XQuery SPARQL Custom XQuery func- tions implemented as Java methods RDB XML XML XML RDF RDF RDF 11 / 24
  71. 71. Implementation XSPARQL XML / query RDF XSPARQL Enhanced Query XQuery XQuery Rewriter query engine SQL XQuery SPARQL Custom XQuery func- tions implemented as Java methods RDB XML XML XML RDF RDF RDF 11 / 24
  72. 72. Implementation XSPARQL XML / query RDF XSPARQL Enhanced Query XQuery XQuery Rewriter query engine SQL XQuery SPARQL Custom XQuery func- tions implemented as Java methods RDB XML XML XML RDF RDF RDF 11 / 24
  73. 73. Implementation XSPARQL XML / query RDF XSPARQL Enhanced Query XQuery XQuery Rewriter query engine SQL XQuery SPARQL Custom XQuery func- tions implemented as Java methods RDB XML XML XML RDF RDF RDF 11 / 24
  74. 74. XSPARQL Evaluation XMark XMarkRDF XMarkRDB 12 / 24
  75. 75. XSPARQL Evaluation XMark XMarkRDF XMarkRDB 12 / 24
  76. 76. XSPARQL Evaluation XMark XMarkRDF XMarkRDB 12 / 24
  77. 77. XSPARQL Evaluation XMark XMarkRDF XMarkRDB Experimental Results Compared to native XQuery (XMark) RDB and RDF: same order of magnitude for most queries 12 / 24
  78. 78. XSPARQL Evaluation XMark XMarkRDF XMarkRDB Experimental Results Compared to native XQuery (XMark) RDB and RDF: same order of magnitude for most queries Except on RDF nested queries (self joining data) several orders of magnitude slower due to the number of calls to the SPARQL engine 12 / 24
  79. 79. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  80. 80. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  81. 81. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  82. 82. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  83. 83. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  84. 84. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  85. 85. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  86. 86. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  87. 87. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  88. 88. Rewriting techniques for Nested Queries Q8 : “List the names of persons and the number of items they bought.” let $aux := sparqlQuery( fn:concat( "SELECT * from <input.rdf>  nuno ,  nuno where { $ca :buyer $person .}" ) ) Returns  axel ,  axel . . for $person $name from <input.rdf> . Nested Loop rewriting where { $person foaf:name $name } return <item person="{$name}"> {count( sparqlQuery( for * from <input.rdf> fn:concat( Join in XQuery where { $ca :buyer $person .} Other optimisations: Applied to*related language: "SELECT from <input.rdf> return $ca where { $ca :buyer ", $person, nuno )} Join in SPARQL SPARQL2XQuery [SAC2008] " .}") axel </item> ) 13 / 24
  89. 89. Evaluation of different rewritings - XMarkRDF Q8 Time in seconds (log scale) XSPARQLRDF 102 Nested Loop 101 SPARQL rewrite 1 2 5 10 20 50 100 Dataset size in MB (log scale) Optimised rewritings show promising results Best Results: SPARQL-based Results included in [JoDS2012] 14 / 24
  90. 90. Evaluation of different rewritings - XMarkRDB Q8 Time in seconds (log scale) Nested Loop 102 101 XSPARQLRDB 100 1 2 5 10 20 50 100 Dataset size in MB (log scale) Not so effective for RDB, requires different optimisations Due to schema representation? Additional Results for this thesis 15 / 24
  91. 91. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> X SPA R QL </person> M D 2012 L F XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 16 / 24
  92. 92. Data Integration: Pizza Delivery Company RDF XML Clients <person> / Orders <name>Nuno</name> <address>Dublin</address> X SPA R QL </person> M D 2012 L F XML RDB Clients From 2008 to 2012 / Orders Deliveries person address Nuno Galway 16 / 24
  93. 93. Use Annotated RDF(S)! Annotations refer to a specific domain 17 / 24
  94. 94. Use Annotated RDF(S)! Annotations refer to a specific domain Temporal: :nuno :address :Galway . [2008,2012] 17 / 24
  95. 95. Use Annotated RDF(S)! Annotations refer to a specific domain Temporal: :nuno :address :Galway . [2008,2012] Fuzzy: :nuno :address :Dublin . 0.9 17 / 24
  96. 96. Use Annotated RDF(S)! Annotations refer to a specific domain Temporal: :nuno :address :Galway . [2008,2012] Fuzzy: :nuno :address :Dublin . 0.9 Provenance: :nuno :address :Dublin . <http://eat.ie> 17 / 24
  97. 97. Use Annotated RDF(S)! Annotations refer to a specific domain Temporal: :nuno :address :Galway . [2008,2012] Fuzzy: :nuno :address :Dublin . 0.9 Provenance: :nuno :address :Dublin . <http://eat.ie> Representation for the values of each annotation domain 17 / 24
  98. 98. Annotated RDF(S) Inference example Inference rules are independent of the annotation domain 18 / 24
  99. 99. Annotated RDF(S) Inference example Inference rules are independent of the annotation domain RDFS subPropertyOf (“sp”) rule: ?Prop1 sp ?Prop2 . ?x ?Prop1 ?y . ⇒ ?x ?Prop2 ?y . :address sp foaf:based near . :nuno :address :Galway . ⇒ :nuno foaf:based near :Galway . 18 / 24
  100. 100. Annotated RDF(S) Inference example Inference rules are independent of the annotation domain Annotated RDFS subPropertyOf (“sp”) rule: ?Prop1 sp ?Prop2 . ?v 1 ?x ?Prop1 ?y . ?v 2 ⇒ ?x ?Prop2 ?y . :address sp foaf:based near . [2009,+∞] :nuno :address :Galway . [2008,2012] ⇒ :nuno foaf:based near :Galway . 18 / 24
  101. 101. Annotated RDF(S) Inference example Inference rules are independent of the annotation domain Annotated RDFS subPropertyOf (“sp”) rule: ?Prop1 sp ?Prop2 . ?v 1 ?x ?Prop1 ?y . ?v 2 ⇒ ?x ?Prop2 ?y . ?v1 ⊗ ?v2 :address sp foaf:based near . [2009,+∞] :nuno :address :Galway . [2008,2012] ⇒ :nuno foaf:based near :Galway . [2009,+∞]⊗[2008,2012] 18 / 24
  102. 102. Annotated RDF(S) Inference example Inference rules are independent of the annotation domain Annotated RDFS subPropertyOf (“sp”) rule: ?Prop1 sp ?Prop2 . ?v 1 ?x ?Prop1 ?y . ?v 2 ⇒ ?x ?Prop2 ?y . ?v1 ⊗ ?v2 :address sp foaf:based near . [2009,+∞] :nuno :address :Galway . [2008,2012] ⇒ :nuno foaf:based near :Galway . [2009,+∞]⊗[2008,2012] Extra rule to group annotations triples (⊕): :nuno :address :Galway . [2009,+∞] :nuno :address :Galway . [2008,2012] 18 / 24

×