We propose a set of optimizations that can be applied to a given SPARQL query, and that guarantee that the optimized query has the same answers under bag semantics as the original query, provided that the queried RDF graph validates certain SHACL constraints. We prove the correctness of these optimizations and show how they can be propagated to larger queries while preserving answers. Further, we prove the confluence of rewritings that employ these optimizations, guaranteeing convergence to the same optimized query regardless of the rewriting order.
3. RDF
▶ Standard for web data
▶ W3C Rec. since 1999
▶ RDF 1.0, 2004 https://www.w3.org/TR/rdf-primer/
▶ RDF 1.2, 2014 https://www.w3.org/TR/rdf11-concepts/
▶ W3C working draft for RDF 1.2, 2023 https://www.w3.org/TR/rdf12-concepts/
4. RDF Syntax
▶ IRIs to reference resources on web
▶ Statements as nodes and arcs in a graph, in the form of triples
”(Subject, Predicate, Object)”. E.g.,
”Mona Lisa has a creator whose value is Leonardo Da Vinci”
http://purl.org/dc/terms/creator
https://en.wikipedia.org/wiki/Mona_Lisa
https://en.wikipedia.org/wiki/Leonardo_da_Vinci
Subject
Predicate
Object
7. RDF: Constraints?
W3C defines RDF as an ”assertional logic,” where each triple
expresses a simple proposition.
▶ This logical framework imposes a strict monotonic discipline
on the language, preventing the expression of closed-world
assumptions, local default preferences, and other commonly
used non-monotonic constructs.
8. SHACL
▶ Constraint language for RDF
▶ W3C Rec. since July 2017
Other constraint languages:
▶ SPIN - SPARQL Syntax, (2009) 2011
https://www.w3.org/submissions/2011/SUBM-spin-sparql-20110222/
▶ IBM Resource Shape 2.0, 2014 https://www.w3.org/submissions/shapes/
▶ Shape Expressions Language 2.0, 2017, http://shex.io/shex-semantics-20170713/
9. SHACL
▶ relies on the notion of ”shapes”
e.g.,
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
10. SHACL Shape
▶ relies on the notion of ”shapes”
e.g.,
:EmployeeNode a sh:NodeShape ;
sh:targetClass :Employee ;
sh:property [ sh:path :hasAddress ;
sh:nodeKind. sh:Literal ;
sh:maxCount 1;
sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employeee ].
shape name
target defn
constraints
defn
11. SHACL: Constraint Validation
Consider an RDF graph on the left and a SHACL shape on the right, written in
Turtle syntax:
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
18. SHACL: Abstract Syntax
Let S, C and P be countable infinite and mutually disjoint sets of
shape, class and property names.
Shape target τs and constraint ϕs are expressions defined by
the grammar
τs := sh:targetClass C | sh:targetSubjectOf P |
sh:targetObjectOf P
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
Where α, α1, α2 ∈ {P ∪ {P− | P ∈ P}}, C ∈ C and s, s′ ∈ S.
19. SHACL: Abstract Syntax
Shape target τs and constraint ϕs are expressions defined by the
grammar:
τs := sh:targetClass C | sh:targetSubjectOf P |
sh:targetObjectOf P
τs := C | P | P−
(i.e., short syntax)
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
A shape in abstract syntax:
⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and
ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee
hasAddress).
20. SHACL
Shape target τs and constraint ϕs are expressions defined by the
grammar:
τs := C | P | P−
(i.e., short syntax)
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
A shape in abstract syntax:
⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and
ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee
hasAddress).
Once the context is clear, we simply write:
⟨Employee, :Employee, (=1 hasAddress. ⊤) ∧ (▷τEmployee
hasAddress)⟩
21. SPARQL
▶ Query language for RDF
▶ W3C Rec. since January
2008
▶ SPARQL 1.1, 2013 https://www.w3.org/TR/sparql11-query/
▶ W3C working draft for SPARQL 1.2, 2023 https://www.w3.org/TR/sparql12-update/
▶ W3C community draft for RDF∗
and SPARQL∗
, 2021 https://www.w3.org/2021/12/rdf-star.html
22. SPARQL: Query Variables?
▶ For Queries we need variables, and SPARQL Variables are
bound to RDF terms
▶ E.g., ?title, ?author, ?published
▶ In the same way as SQL,
A Query for variables is performed via SELECT statement
▶ E.g., SELECT ?title ?author ?published
A SELECT statement returns Query Result as a table
?title ?author ?published
Games of no
chance
Richard J.
Nowakowski
1999
Calculated Bets Steven S. Skiena 2001
▶ Bag Semantics
25. SPARQL Algebra
SPARQL query is a graph pattern P defined by the grammar
P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2)
| DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P)
26. SPARQL Algebra
SPARQL query is a graph pattern P defined by the grammar
P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2)
| DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P)
E.g. Consider a case of nested SPARQL query that retrieves the
name of employees and their office addresses,
SELECT ?y ?z WHERE { ?x :hasName ?y
SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }}
In SPARQL Algebra,
Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y),
hasAddress(y, z)))))
27. SPARQL Algebra
E.g. Consider a case of nested SPARQL query that retrieves the
name of employees and their office addresses,
SELECT ?y ?z WHERE { ?x :hasName ?y
SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }}
In SPARQL Algebra,
Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y),
hasAddress(y, z)))))
Upon simplification (whenever possible but absolutely not necessary), we
get:
Projyz (hasName(x, y) hasOffice(x, n) hasAddress(n, z)) .
28. SPARQL Algebra: some notions on query evaluation?
The semantics of graph patterns is defined in terms of (solution)
mappings, partial functions,
µ : V → T with (possibly empty) dom(µ)
where T is sets of RDF terms I ∪ B ∪ L, and V countably infinite
set of variables disjoint from T.
29. SPARQL Algebra: some notions on query evaluation?
Partial functions,
µ : V → T
Let
▶ µ|L be the restriction of mapping µ to L ⊆ V
▶ µ|L̄ be the restriction of mapping µ to V L
Evaluation of a SPARQL query Q over an RDF graph G, denoted
by QG, returns a multiset (i.e.,bag) of mappings.
▶ QG
|X̄ is the multiset of mappings µ ∈ QG restricted to V X
i.e.,
|µ, QG
|X̄ | =
X
µ=µ′
|X̄
|µ′
, QG
|
▶ Support of the multiset QG, denoted by sup(QG), is
sup(QG
) = {µ | |µ, QG
| > 0}
31. Optimization: Problem Statement
Let S be a set of SHACL shapes, and Q a SPARQL query.
Our goal is to find optimal S-equivalent queries Q′ of the original
query Q s.t.,
Q ≡S Q′
iff, ∀G.G |= S, QG
= Q′G
32. Optimization: Equivalences
Let U and V be two graph patterns, and S a set of SHACL shapes.
U ≡S V iff, ∀G.G |= S, UG
= V G
U ≡S,y V iff, ∀G.G |= S, UG
|ȳ = V G
|ȳ
U ∼
=S,y V iff, ∀G.G |= S, sup(UG
|ȳ ) = sup(V G
|ȳ )
33. Optimization: Rewriting Rules
Let U and V be two graph patterns, and S a set of SHACL shapes.
U ≡S V iff, ∀G.G |= S, UG
= V G
U ≡S,y V iff, ∀G.G |= S, UG
|ȳ = V G
|ȳ
U ∼
=S,y V iff, ∀G.G |= S, sup(UG
|ȳ ) = sup(V G
|ȳ )
We then propose a set of query rewriting rules based on these
equivalences that:
1. reduce OPTIONAL to JOIN Pattern
2. remove redundant JOIN Pattern
3. eliminate DIST Operator etc
34. An Example of Query Rewriting
Consider a SPARQL query,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y))))
over graph G,
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Yacob a :Employee;
. . .
:Nils a :Employee;
. . .
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
. . .
35. An Example of Query Rewriting
Consider a SPARQL query,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y))))
over graph G,
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Yacob a :Employee;
. . .
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
. . .
Assume G satisfies shape,
⟨Employee, :Employee, (=1 hasAddress. ⊤)∧(▷τEmployee
hasAddress)⟩
36. An Example of Query Rewriting
Consider the query over G,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) .
▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt
pattern” can be reduce to “Join pattern”
Dist(Projxy (Join(Employee(x), hasAddress(x, y)))).
▶ Since G satisfies ϕEmployee = (▷τEmployee
hasAddress), “Dist”
can be removed,
Projxy (Join(Employee(x), hasAddress(x, y))).
37. An Example of Query Rewriting
Consider the query over G,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) .
▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt
pattern” can be reduce to “Join pattern”
Dist(Projxy (Join(Employee(x), hasAddress(x, y)))).
▶ Since G satisfies ϕEmployee = (▷τEmployee
hasAddress), “Dist”
can be removed,
Projxy (Join(Employee(x), hasAddress(x, y))).
“≡S Equivalent Queries”
38. Optimization: Example of Rewriting Rules
Lemma
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P), then
1. OptF (P, P(x, y)) ≡S FilterF (Join(P, P(x, y)))
2. Join(P, P(x, y)) ∼
=S,y P
where T =
C(x), if τs = C,
R(x, z), if τs = ∃R,
R−(x, z), if τs = ∃R− .
Corollary
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P ∪ F), then
1. FilterF (Join(P, P(x, y))) ∼
=S,y FilterF (P)
2. OptF (P, P(x, y)) ∼
=S,y FilterF (P)
39. Optimization: Example of Rewriting Rules
T =
C(x), if τs = C,
R(x, z), if τs = ∃R,
R−(x, z), if τs = ∃R− .
Corollary
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P ∪ F), then
FilterF (Join(P, P(x, y))) ∼
=S,y FilterF (P) .
Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x)
lastName(x, y), hasAddress(x, z)))))
Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩.
40. Optimization: Example of Rewriting Rules
Corollary
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P ∪ F), then
FilterF (Join(P, P(x, y))) ∼
=S,y FilterF (P) .
Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x)
lastName(x, y), hasAddress(x, z)))))
Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩.
Then, by following Corollary, we can reduce query Q to :
Dist(Projx y (Filterregex(y,”Smith”)(Student(x)
lastName(x, y))))
41. Property of Query Rewriting Rules
▶ Propagation to Larger Queries
▶ Confluent Reduction
42. Property of Query Rewriting Rules: Propagation
Definition
Let Q be a SPARQL query, and let P and U be two graph patterns.
Then, we write U ◁
∼ Q if Dist(ProjX (P)) ⊴ Q and U ⊴ P .
Theorem
Let Q be a SPARQL query and S a SHACL document. Let U and
V be two graph patterns. Then,
1. Q ≡S QU7→V if U ≡S V
2. ProjX (Q) ≡S ProjX (Q)U7→V if U ⊴ Q, U ≡S,y V and
y /
∈ var(ProjX (Q) U)
3. Q ≡S QU7→V if U ◁
∼ Q, U ∼
=S,y V and y /
∈ var(Q U)
43. Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩
with
{(▷∃employeeID employeeID), (=1 insuredBy. ⊤),
(hiredBy = insuredBy)} ⊆ ϕ∃employedID .
44. Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with
{(▷∃employeeID employeeID), (=1 insuredBy. ⊤),
(hiredBy = insuredBy)} ⊆ ϕ∃employedID
Subsequently,
(=1 insuredBy. ⊤)∧(hiredBy = insuredBy)
−→ (=1 hiredBy. ⊤)
(=1 hiredBy. ⊤) −→ (≥1 hiredBy. ⊤)
(=1 insuredBy. ⊤) −→ (≥1 insuredBy. ⊤)
Need to take-care all explicit and implicit rewritings rules
45. Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with
{(▷∃employeeID employeeID), (=1 insuredBy. ⊤),
, (hiredBy = insuredBy)} ⊆ ϕ∃employedID
Then, the query is subjective to the following rewriting rules:
1. ∼
=S,y - ”Join” optimization based on (≥1 insuredBy. ⊤)
2. ∼
=S,y - ”Join” optimization based on (≥1 hiredBy. ⊤)
3. ∼
=S,y - ”Join” optimization based on (hiredBy = insuredBy)
4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤)
5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤)
6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID)
46. Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩.
Then, the query is subjective to the following rewriting rules:
1. ∼
=S,y - ”Join” optimization based on (≥1 insuredBy. ⊤)
2. ∼
=S,y - ”Join” optimization based on (≥1 hiredBy. ⊤)
3. ∼
=S,y - ”Join” optimization based on (hiredBy = insuredBy)
4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤)
5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤)
6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID)
Regardless of the sequence in which these rewrites are applied, we will get:
Projx y (employeeID(x, y))
47. Property of Query Rewriting Rules: Confluent Reduction
... As rewriting optimizations are generalized in the form of lemmas and their
consequences, we state confluent results as follows:
Theorem
Query rewriting defined by Lemmas 1 to 6 is a confluent reduction.
Theorem
Query rewriting defined by Lemmas 1 to 7 is a confluent reduction iff
ϕ′
=
⊤, if P = T,
Vn
i=1(=1 Pi . ⊤), if P = (T P1(x, z1) . . . Pi (x, zi ) . . . Pn(x, zn))
in
Lemma 7.
48. Other or future work?
▶ Extension to SPARQL Property Path Queries
▶ Optimization of Ontology-Mediated Query Answering