RDFS with Attribute Equations via SPARQL Rewriting

  • 159 views
Uploaded on

Slides of presentation I gave in the Reasoning track of the 10th ESWC 2013 in Montpellier.

Slides of presentation I gave in the Reasoning track of the 10th ESWC 2013 in Montpellier.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
159
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. RDFS with Attribute Equationsvia SPARQL RewritingStefan Bischof and Axel PolleresVienna University of TechnologySiemens AG Österreich
  • 2. Place de la Comédie, Montpellier© http://grasstain.wordpress.comCO2emissionsper personpopulationdensityavg.temp.in Mayin Fpopulationareaavg. temp.in May in °C
  • 3. Use equations to infer missing numbersWhat is the population density of Montpellier?‣ Montpellierpopulation: 252 998area: 56 880 000 m2population density: ??? in people/km2‣ Can we infer population density from given data?computations not supported by Semantic Web reasoners‣ How can we get area in km2?unit conversion by computation3
  • 4. RDFS with Attribute Equations via SPARQL Rewritingthe big picture4SPARQLqueryRDFdataResultsTriple store?
  • 5. What is the population density of Montpellier?written in SPARQL5SELECT ?densWHERE { :Montpellier :populationDensity ?dens .}SELECT ?city ?densWHERE {:Montpellier :populationDensity ?mdens .?city rdf:type :City ;:populationDensity ?dens .FILTER(?dens > ?mdens)}
  • 6. RDFS with Attribute Equations via SPARQL Rewritingthe big picture6SPARQLqueryRDFSontologyRDFdataResultsTriple store?
  • 7. We need RDFS for integrating different sources‣ Unify RDFS properties of different data sources‣ Use unified name for population‣ Use already implemented RDFS reasonerswhich allow SPARQL queries7dbp:populationTotal rdfs:subpropertyOf :populationgeo:population rdfs:subpropertyOf :population
  • 8. RDFS with Attribute Equations via SPARQL Rewritingthe big picture8SPARQLqueryRDFSontologyRDFdataResultsTriple store+ equations?
  • 9. Syntax RDFS and equationsconverting between RDFS and DLRDFSEA1 v A2 A1 rdfs:subClassOf A29P v A P rdfs:domain A9P v A P rdfs:range A9U v A U rdfs:domain AP1 v P2 P1 rdfs:subPropertyOf P2U1 v U2 U1 rdfs:subPropertyOf U2U0 = f(U1, . . . , Un) U0 definedByEquation“f(U1, . . . , Un)00A(x) x rdf :type AR(x, y) x R yU(x, q) x U ”q”ˆˆowl:rational9dbp:population rdfs:domain dbp:populatedPlace.geonames:population rdfs:subPropertyOf dbp:population .dbp:populationDensity :definedByEquation“dbp:population / dbp:area” .:Montpellier dbp:population 252998 .
  • 10. Extend the DLRDFS Semantics by equations‣ RDFS + attributes: usual DL model theoretic semantics‣ For an equation, infer a new valueif all other attributes of the equation are given andthere is no division by zerothen the computation result is the new attribute value‣ satisfied in‣ Query answers are not necessarily finiteABoxes inconsistent with equations10U0 = f(U1, . . . , Un)if 8x, y1, . . . , yn(n^i=1(x, yi) 2 UIi ) ^ defined(f(U1/y1, . . . , Un/yn)) (x, eval(f(U1/y1, . . . , Un/yn)) 2 UI0Idbp:populationDensity :definedByEquation“dbp:population / dbp:area” .:Montpellier dbp:population 252998 .:Montpellier dbp:area 56.88 .:Montpellier dbp:populationDensity 4447.93 .
  • 11. Formulate equations as rulesn rules for equations in n variables‣ Equation given for population density‣ Formulate equation as rule‣ More rules needed to cover all directions11areakm2 (X, A) ( popDensity(X, PD), population(X, P), A = P ÷ PD.areakm2 =populationpopDensitypopDensity(X, PD) ( population(X, P), areakm2 (X, A), PD = P ÷ A.population(X, P) ( areakm2 (X, A), popDensity(X, PD), P = A ⇥ PD.
  • 12. ‣ DBpedia: population 252 998, area 56.88 km2‣ Apply rule: population density 4447.925...293‣ Apply rules: population 252 997.999...999 and area 56.880...003‣ Rules engine computes population density again: 4447.925...275Forward chaining often does not terminatebecause of rounding errors12popDensity(X, PD) ( population(X, P), areakm2 (X, A), PD = P ÷ A.areakm2 (X, A) ( population(X, P), popDensity(X, PD), A = P ÷ PD.population(X, P) ( areakm2 (X, A), popDensity(X, PD), P = A ⇥ PD.
  • 13. Naive backward chaining does not terminateunfolding of recursive rules blows up arbitrarily‣ To compute the population density query for population densitymakes no sense13areakm2 (X, A) ( popDensity(X, PD), population(X, P), A = P ÷ PD.popDensity(X, PD) ( population(X, P), areakm2 (X, A), PD = P ÷ A.population(X, P) ( areakm2 (X, A), popDensity(X, PD), P = A ⇥ PD.
  • 14. Rules are problematic for applying equationsbreak the infinite series of rule applications‣ Need to specify all directions of the equationnot as intuitive and short as equations‣ Forward chaining often does not terminatedivision or multiplication is often enough for non-terminationimplementation dependent‣ Backward chaining does not terminateunfolding of recursive rules can blow up arbitrarilyeven for a single equation no termination‣ We have to break the infinite series of rule applications14
  • 15. RDFS with Attribute Equations via SPARQL Rewritingthe big picture15SPARQLqueryRDFSontologyRDFdataResultsTriple storePerfectRefrewriterUCQ+ equations??
  • 16. Query answering in DL-Lite: PerfectRefEncode TBox in the query [Calvanese et al., 2009]conj. query qTBox TABox APerfectRefQuery q’QueryevaluationAnswer query q over ontology (T,A)certain answers16
  • 17. RDFS with Attribute Equations via SPARQL Rewritingthe big picture17SPARQLqueryRDFSontologyRDFdataResultsTriple storePerfectRefrewriterUCQ+ equations + equations??
  • 18. Break the infinite series of equation applicationsAdorn attributes by used attributes18popDensitypopDensitypopDensity popDensityareakm2 areakm2popDensityareakm2 populationareamile2aream2popDensity =populationareakm2aream2 = areakm2 ⇥ 1 000 000 areamile2 = areakm2 ⇥ 2.590popDensity =populationareakm2aream2 = areakm2 ⇥ 1 000 000areamile2 = areakm2 ⇥ 2.590
  • 19. Extend the DL-Lite PerfectRef algorithmby equations and adorned attributes19Next, we extend the PerfectRef algorithm [3] which reformulates a conjunctive queryto directly encode needed TBox assertions in the query. The algorithm PerfectRefEin Algorithm 1 extends the original PerfectRef by equation axioms and conjunctivAlgorithm 1: Rewriting algorithm PerfectRefEInput: Conjunctive query q, TBox TOutput: Union (set) of conjunctive queries1 P := {q}2 repeat3 PÕ:= P4 foreach q œ PÕdo5 foreach g in q do // expansion6 foreach inclusion axiom I in T do7 if I is applicable to g then8 P := P fi)q[g/ gr(g, I)]*9 foreach equation axiom E in T do10 if g = Uadn(g)(x, y) is an (adorned) attribute atom, U œ vars(E) andvars(E) fl adn(g) = ÿ then11 P := P fi)q[g/ expand(g, E)]*12 until PÕ= P13 return PTable 2. Semantics of gr(g, I) of Algorithm 1
  • 20. ‣ Equations‣ Original query‣ Step 1: Expand popDens by E1‣ Step 2: Expand area by E2PerfectRef with adorned attributesquery rewriting20popDens(montpellier, X)E1: popDens =populationareakm2E2: aream2 = areakm2 ⇥ 1 000 000population{popDens}(montpellier, P),aream2{popDens,area}(montpellier, A1), A = A1 ⇤ 1000000, X = P/Apopulation{popDens}(montpellier, P),areakm2{popDens}(montpellier, A), X = P/A
  • 21. PerfectRefE is sound but incomplete in generalconditions for completeness‣ ABox is data-coherent with the TBoxmodel of each object has at most one value per attributeattribute inclusions must also be considered‣ For data-coherent ABoxes wrt. the TBox andrewritten SPARQL queries (free of non-distinguished variables)PerfectRefE is sound and complete21:M dbp:area_km2 1 .:M dbp:area_mi2 2.590 .:M dbp:area_km2 1 .:M dbp:area_mi2 2.6 .
  • 22. RDFS with Attribute Equations via SPARQL Rewritingthe big picture22SPARQLqueryRDFSontologyRDFdataResultsTriple storeSPARQLtransformatorPerfectRefrewriterUCQ+ equations + equations+ equations?+ equations
  • 23. Rewrite SPARQL queries by PerfectRef‣ SPARQL basic graph patterns (BGPs)the fundamental building block for graph pattern matching‣ BGPs can be expressed by conjunctive queries [Perez et al., 2009]no variables as predicates :Montpellier ?prop “Montpellier”no variables for classes :Montpellier rdf:type ?class‣ Convert BGPs to CQs, rewrite CQs to UCQs, convert UCQs to SPARQL‣ SPARQL translatorrewrite each BGP independently by PerfectRefE‣ Variable assignments are rewritten to SPARQL 1.1 BIND23
  • 24. ‣ CQ 1:‣ CQ 2:‣ CQ 3:SELECT ?X WHERE {{ :Montpellier dbo:populationDensity ?X . }UNION{ :Montpellier dbo:populationTotal ?p ; dbp:areaTotalKm ?a .BIND (?p/?a as ?X) }UNION{ :Montpellier dbo:populationTotal ?p ; dbo:area ?a2 .BIND (?a2/1000000 as ?a) BIND (?p/?a as ?X) }}Rewrite SPARQL query by PerfectRefrewritten SPARQL query24popDens(montpellier, X)population{popDens}(montpellier, P),area{popDens}(montpellier, A), X = P/Apopulation{popDens}(montpellier, P), A = A1 ⇤ 1000000,aream2{popDens,area}(montpellier, A1), X = P/A
  • 25. RDFS with Attribute Equations via SPARQL Rewritingthe big picture25SPARQLqueryRDFSontologyRDFdataResultsTriple storeSPARQLtransformatorPerfectRefrewriterUCQSPARQLengine+ equations + equations+ equations+ equations
  • 26. How does the rewriting algorithm perform onreal world data?‣ Collected data about citiesfrom several sources (e.g., DBpedia, Eurostat)254 081 triples for 3161 city contextsinconsistent and consistent dataset‣ 6 equations, 2 subProperties and 1 subClass axioms‣ 4 different queries‣ 3 implementationsJena forward rules with ARQJena forward rules with ARQ with noValueRewriting with ARQ26(?city :area ?ar)(?city :populationDensity ?pd)product(?ar, ?pd, ?p)noValue (?city, :populationDensity)-> (?city :population ?p)
  • 27. Jena rules with ARQ gives no results‣ n rules for an equation in n variables‣ Forward chaining implementation‣ No query returned any result within 10 minutes‣ Even for reduced dataset27
  • 28. Rewriting is significantly faster thanJena rules with noValueQuery 1Query 2Query 3Query 40 10 20 30 40Jena rules with noValueSPARQL query rewritingQuery response time in seconds28
  • 29. Conclusions‣ Reasoning about equations on numerical propertiesis important and feasiblelots of numeric open data available‣ Rule engines are not well suited for such attribute equationsespecially on real world data‣ Query rewriting enables such reasoningon top of off-the-shelf SPARQL enginesalso possible on public SPARQL endpoints‣ Query rewriting can be significantly fasterthan forward chaining rule engines29