SIRIUS SEMINAR
Ratan Bahadur Thapa
PhD candidate at SIRIUS (IFI)
University of Oslo
October 19, 2023
About
Full paper: https://www.duo.uio.no/handle/10852/103167
RDF
▶ Standard for web data
▶ W3C Rec. since 1999
▶ RDF 1.0, 2004 https://www.w3.org/TR/rdf-primer/
▶ RDF 1.2, 2014 https://www.w3.org/TR/rdf11-concepts/
▶ W3C working draft for RDF 1.2, 2023 https://www.w3.org/TR/rdf12-concepts/
RDF Syntax
▶ IRIs to reference resources on web
▶ Statements as nodes and arcs in a graph, in the form of triples
”(Subject, Predicate, Object)”. E.g.,
”Mona Lisa has a creator whose value is Leonardo Da Vinci”
http://purl.org/dc/terms/creator
https://en.wikipedia.org/wiki/Mona_Lisa
https://en.wikipedia.org/wiki/Leonardo_da_Vinci
Subject
Predicate
Object
RDF Graph
▶ Composed of triples ”(Subject, Predicate, Object)”
RDF: Syntactic shortcuts
Turtle Syntax:
BASE ⟨http : //example.org⟩
PREFIX foaf: ⟨http : //xmlns.com/foaf /0.1/⟩
PREFIX dcterms: . . .
PREFIX wd: . . .
⟨bob#me⟩
a foaf:Person;
foaf:Knows ⟨alice#me⟩;
schema:birthdate ”1990-07-04”xsd:date;
foaf:topic interest wd:Q12418.
wd:Q12418 dcterms:title ”Mona Lisa”;
. . .
RDF: Constraints?
W3C defines RDF as an ”assertional logic,” where each triple
expresses a simple proposition.
▶ This logical framework imposes a strict monotonic discipline
on the language, preventing the expression of closed-world
assumptions, local default preferences, and other commonly
used non-monotonic constructs.
SHACL
▶ Constraint language for RDF
▶ W3C Rec. since July 2017
Other constraint languages:
▶ SPIN - SPARQL Syntax, (2009) 2011
https://www.w3.org/submissions/2011/SUBM-spin-sparql-20110222/
▶ IBM Resource Shape 2.0, 2014 https://www.w3.org/submissions/shapes/
▶ Shape Expressions Language 2.0, 2017, http://shex.io/shex-semantics-20170713/
SHACL
▶ relies on the notion of ”shapes”
e.g.,
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
SHACL Shape
▶ relies on the notion of ”shapes”
e.g.,
:EmployeeNode a sh:NodeShape ;
sh:targetClass :Employee ;
sh:property [ sh:path :hasAddress ;
sh:nodeKind. sh:Literal ;
sh:maxCount 1;
sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employeee ].
shape name
target defn
constraints
defn
SHACL: Constraint Validation
Consider an RDF graph on the left and a SHACL shape on the right, written in
Turtle syntax:
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
SHACL: Constraint Validation
Acquiring Target nodes:
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
SHACL: Constraint Validation
Checking compliance of Target nodes against Constraints : VALID
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
SHACL: Propagated Constraint Validation
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:node :AddressNode ].
:AddressNode a sh:NodeShape;
sh:property [ sh:path :telephone;
sh:maxCount 1; ];
sh:property [ sh:path :locatedIn;
sh:maxCount 1; sh:minCount 1;
sh:value :NorthernNorway; ];
SHACL: Propagated Constraint - ”Recursion”?
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:node :AddressNode ];
sh:property [ sh:path :knows;
sh:minCount 1;
sh:node :EmployeeNode ].
:AddressNode a sh:NodeShape;
sh:property [ sh:path :telephone;
sh:maxCount 1; ];
sh:property [ sh:path :locatedIn;
sh:maxCount 1; sh:minCount 1;
sh:value :NorthernNorway; ].
SHACL: Propagated Constraint - ”Recursion”?
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:node :AddressNode ];
sh:property [ sh:path :knows;
sh:minCount 1;
sh:node :EmployeeNode ].
:AddressNode a sh:NodeShape;
sh:property [ sh:path :telephone;
sh:maxCount 1; ];
sh:property [ sh:path :locatedIn;
sh:maxCount 1; sh:minCount 1;
sh:value :NorthernNorway; ].
From : https://www.w3.org/TR/shacl/
Recursion
Not 100% formal semantics
Validation explicitly left undefined
SHACL: Propagated Constraint - ”Recursion”
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:node :AddressNode ];
sh:property [ sh:path :knows;
sh:minCount 1;
sh:node :EmployeeNode ].
:AddressNode a sh:NodeShape;
sh:property [ sh:path :telephone;
sh:maxCount 1; ];
sh:property [ sh:path :locatedIn;
sh:maxCount 1; sh:minCount 1;
sh:value :NorthernNorway; ].
Contributions:
Abstract syntax of SHACL core
Semantics for recursive SHACL
Validation algorithms and Tractable fragments
SHACL: Abstract Syntax
Let S, C and P be countable infinite and mutually disjoint sets of
shape, class and property names.
Shape target τs and constraint ϕs are expressions defined by
the grammar
τs := sh:targetClass C | sh:targetSubjectOf P |
sh:targetObjectOf P
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
Where α, α1, α2 ∈ {P ∪ {P− | P ∈ P}}, C ∈ C and s, s′ ∈ S.
SHACL: Abstract Syntax
Shape target τs and constraint ϕs are expressions defined by the
grammar:
τs := sh:targetClass C | sh:targetSubjectOf P |
sh:targetObjectOf P
τs := C | P | P−
(i.e., short syntax)
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
A shape in abstract syntax:
⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and
ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee
hasAddress).
SHACL
Shape target τs and constraint ϕs are expressions defined by the
grammar:
τs := C | P | P−
(i.e., short syntax)
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
A shape in abstract syntax:
⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and
ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee
hasAddress).
Once the context is clear, we simply write:
⟨Employee, :Employee, (=1 hasAddress. ⊤) ∧ (▷τEmployee
hasAddress)⟩
SPARQL
▶ Query language for RDF
▶ W3C Rec. since January
2008
▶ SPARQL 1.1, 2013 https://www.w3.org/TR/sparql11-query/
▶ W3C working draft for SPARQL 1.2, 2023 https://www.w3.org/TR/sparql12-update/
▶ W3C community draft for RDF∗
and SPARQL∗
, 2021 https://www.w3.org/2021/12/rdf-star.html
SPARQL: Query Variables?
▶ For Queries we need variables, and SPARQL Variables are
bound to RDF terms
▶ E.g., ?title, ?author, ?published
▶ In the same way as SQL,
A Query for variables is performed via SELECT statement
▶ E.g., SELECT ?title ?author ?published
A SELECT statement returns Query Result as a table
?title ?author ?published
Games of no
chance
Richard J.
Nowakowski
1999
Calculated Bets Steven S. Skiena 2001
▶ Bag Semantics
SPARQL: Evaluation?
▶ Consider a SPARQL query:
SELECT DISTINCT ?article ?author ?affiliation
WHERE {
?article rdf:type :Article;
dc:creator ?author .
?author dc:affiliated ?affiliation .
FILTER (contains (?affiliation, "University of Oslo"))
SPARQL: Bottom-up
Basic Graph Pattern (BGP) Matching
?article rdf:type :Article; dc:creator ?author .
?author dc:affiliated ?affiliation .
Intermediate Operators: FILTER
contains (?affiliation, "University of Oslo")
Intermediate Operators: PROJECTION
?article ?author ?affiliation
Intermediate Operators: DISTINCT
Final Query Result
SPARQL Algebra
SPARQL query is a graph pattern P defined by the grammar
P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2)
| DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P)
SPARQL Algebra
SPARQL query is a graph pattern P defined by the grammar
P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2)
| DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P)
E.g. Consider a case of nested SPARQL query that retrieves the
name of employees and their office addresses,
SELECT ?y ?z WHERE { ?x :hasName ?y
SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }}
In SPARQL Algebra,
Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y),
hasAddress(y, z)))))
SPARQL Algebra
E.g. Consider a case of nested SPARQL query that retrieves the
name of employees and their office addresses,
SELECT ?y ?z WHERE { ?x :hasName ?y
SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }}
In SPARQL Algebra,
Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y),
hasAddress(y, z)))))
Upon simplification (whenever possible but absolutely not necessary), we
get:
Projyz (hasName(x, y) hasOffice(x, n) hasAddress(n, z)) .
SPARQL Algebra: some notions on query evaluation?
The semantics of graph patterns is defined in terms of (solution)
mappings, partial functions,
µ : V → T with (possibly empty) dom(µ)
where T is sets of RDF terms I ∪ B ∪ L, and V countably infinite
set of variables disjoint from T.
SPARQL Algebra: some notions on query evaluation?
Partial functions,
µ : V → T
Let
▶ µ|L be the restriction of mapping µ to L ⊆ V
▶ µ|L̄ be the restriction of mapping µ to V  L
Evaluation of a SPARQL query Q over an RDF graph G, denoted
by QG, returns a multiset (i.e.,bag) of mappings.
▶ QG
|X̄ is the multiset of mappings µ ∈ QG restricted to V  X
i.e.,
|µ, QG
|X̄ | =
X
µ=µ′
|X̄
|µ′
, QG
|
▶ Support of the multiset QG, denoted by sup(QG), is
sup(QG
) = {µ | |µ, QG
| > 0}
Next
SPARQL queries optimizations with SHACL
Optimization: Problem Statement
Let S be a set of SHACL shapes, and Q a SPARQL query.
Our goal is to find optimal S-equivalent queries Q′ of the original
query Q s.t.,
Q ≡S Q′
iff, ∀G.G |= S, QG
= Q′G
Optimization: Equivalences
Let U and V be two graph patterns, and S a set of SHACL shapes.
U ≡S V iff, ∀G.G |= S, UG
= V G
U ≡S,y V iff, ∀G.G |= S, UG
|ȳ = V G
|ȳ
U ∼
=S,y V iff, ∀G.G |= S, sup(UG
|ȳ ) = sup(V G
|ȳ )
Optimization: Rewriting Rules
Let U and V be two graph patterns, and S a set of SHACL shapes.
U ≡S V iff, ∀G.G |= S, UG
= V G
U ≡S,y V iff, ∀G.G |= S, UG
|ȳ = V G
|ȳ
U ∼
=S,y V iff, ∀G.G |= S, sup(UG
|ȳ ) = sup(V G
|ȳ )
We then propose a set of query rewriting rules based on these
equivalences that:
1. reduce OPTIONAL to JOIN Pattern
2. remove redundant JOIN Pattern
3. eliminate DIST Operator etc
An Example of Query Rewriting
Consider a SPARQL query,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y))))
over graph G,
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Yacob a :Employee;
. . .
:Nils a :Employee;
. . .
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
. . .
An Example of Query Rewriting
Consider a SPARQL query,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y))))
over graph G,
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Yacob a :Employee;
. . .
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
. . .
Assume G satisfies shape,
⟨Employee, :Employee, (=1 hasAddress. ⊤)∧(▷τEmployee
hasAddress)⟩
An Example of Query Rewriting
Consider the query over G,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) .
▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt
pattern” can be reduce to “Join pattern”
Dist(Projxy (Join(Employee(x), hasAddress(x, y)))).
▶ Since G satisfies ϕEmployee = (▷τEmployee
hasAddress), “Dist”
can be removed,
Projxy (Join(Employee(x), hasAddress(x, y))).
An Example of Query Rewriting
Consider the query over G,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) .
▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt
pattern” can be reduce to “Join pattern”
Dist(Projxy (Join(Employee(x), hasAddress(x, y)))).
▶ Since G satisfies ϕEmployee = (▷τEmployee
hasAddress), “Dist”
can be removed,
Projxy (Join(Employee(x), hasAddress(x, y))).
“≡S Equivalent Queries”
Optimization: Example of Rewriting Rules
Lemma
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P), then
1. OptF (P, P(x, y)) ≡S FilterF (Join(P, P(x, y)))
2. Join(P, P(x, y)) ∼
=S,y P
where T =



C(x), if τs = C,
R(x, z), if τs = ∃R,
R−(x, z), if τs = ∃R− .
Corollary
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P ∪ F), then
1. FilterF (Join(P, P(x, y))) ∼
=S,y FilterF (P)
2. OptF (P, P(x, y)) ∼
=S,y FilterF (P)
Optimization: Example of Rewriting Rules
T =



C(x), if τs = C,
R(x, z), if τs = ∃R,
R−(x, z), if τs = ∃R− .
Corollary
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P ∪ F), then
FilterF (Join(P, P(x, y))) ∼
=S,y FilterF (P) .
Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x)
lastName(x, y), hasAddress(x, z)))))
Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩.
Optimization: Example of Rewriting Rules
Corollary
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P ∪ F), then
FilterF (Join(P, P(x, y))) ∼
=S,y FilterF (P) .
Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x)
lastName(x, y), hasAddress(x, z)))))
Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩.
Then, by following Corollary, we can reduce query Q to :
Dist(Projx y (Filterregex(y,”Smith”)(Student(x)
lastName(x, y))))
Property of Query Rewriting Rules
▶ Propagation to Larger Queries
▶ Confluent Reduction
Property of Query Rewriting Rules: Propagation
Definition
Let Q be a SPARQL query, and let P and U be two graph patterns.
Then, we write U ◁
∼ Q if Dist(ProjX (P)) ⊴ Q and U ⊴ P .
Theorem
Let Q be a SPARQL query and S a SHACL document. Let U and
V be two graph patterns. Then,
1. Q ≡S QU7→V if U ≡S V
2. ProjX (Q) ≡S ProjX (Q)U7→V if U ⊴ Q, U ≡S,y V and
y /
∈ var(ProjX (Q)  U)
3. Q ≡S QU7→V if U ◁
∼ Q, U ∼
=S,y V and y /
∈ var(Q  U)
Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩
with
{(▷∃employeeID employeeID), (=1 insuredBy. ⊤),
(hiredBy = insuredBy)} ⊆ ϕ∃employedID .
Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with
{(▷∃employeeID employeeID), (=1 insuredBy. ⊤),
(hiredBy = insuredBy)} ⊆ ϕ∃employedID
Subsequently,
(=1 insuredBy. ⊤)∧(hiredBy = insuredBy)
−→ (=1 hiredBy. ⊤)
(=1 hiredBy. ⊤) −→ (≥1 hiredBy. ⊤)
(=1 insuredBy. ⊤) −→ (≥1 insuredBy. ⊤)
Need to take-care all explicit and implicit rewritings rules
Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with
{(▷∃employeeID employeeID), (=1 insuredBy. ⊤),
, (hiredBy = insuredBy)} ⊆ ϕ∃employedID
Then, the query is subjective to the following rewriting rules:
1. ∼
=S,y - ”Join” optimization based on (≥1 insuredBy. ⊤)
2. ∼
=S,y - ”Join” optimization based on (≥1 hiredBy. ⊤)
3. ∼
=S,y - ”Join” optimization based on (hiredBy = insuredBy)
4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤)
5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤)
6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID)
Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩.
Then, the query is subjective to the following rewriting rules:
1. ∼
=S,y - ”Join” optimization based on (≥1 insuredBy. ⊤)
2. ∼
=S,y - ”Join” optimization based on (≥1 hiredBy. ⊤)
3. ∼
=S,y - ”Join” optimization based on (hiredBy = insuredBy)
4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤)
5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤)
6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID)
Regardless of the sequence in which these rewrites are applied, we will get:
Projx y (employeeID(x, y))
Property of Query Rewriting Rules: Confluent Reduction
... As rewriting optimizations are generalized in the form of lemmas and their
consequences, we state confluent results as follows:
Theorem
Query rewriting defined by Lemmas 1 to 6 is a confluent reduction.
Theorem
Query rewriting defined by Lemmas 1 to 7 is a confluent reduction iff
ϕ′
=

⊤, if P = T,
Vn
i=1(=1 Pi . ⊤), if P = (T P1(x, z1) . . . Pi (x, zi ) . . . Pn(x, zn))
in
Lemma 7.
Other or future work?
▶ Extension to SPARQL Property Path Queries
▶ Optimization of Ontology-Mediated Query Answering

Optimizing SPARQL Queries with SHACL.pdf

  • 1.
    SIRIUS SEMINAR Ratan BahadurThapa PhD candidate at SIRIUS (IFI) University of Oslo October 19, 2023
  • 2.
  • 3.
    RDF ▶ Standard forweb data ▶ W3C Rec. since 1999 ▶ RDF 1.0, 2004 https://www.w3.org/TR/rdf-primer/ ▶ RDF 1.2, 2014 https://www.w3.org/TR/rdf11-concepts/ ▶ W3C working draft for RDF 1.2, 2023 https://www.w3.org/TR/rdf12-concepts/
  • 4.
    RDF Syntax ▶ IRIsto reference resources on web ▶ Statements as nodes and arcs in a graph, in the form of triples ”(Subject, Predicate, Object)”. E.g., ”Mona Lisa has a creator whose value is Leonardo Da Vinci” http://purl.org/dc/terms/creator https://en.wikipedia.org/wiki/Mona_Lisa https://en.wikipedia.org/wiki/Leonardo_da_Vinci Subject Predicate Object
  • 5.
    RDF Graph ▶ Composedof triples ”(Subject, Predicate, Object)”
  • 6.
    RDF: Syntactic shortcuts TurtleSyntax: BASE ⟨http : //example.org⟩ PREFIX foaf: ⟨http : //xmlns.com/foaf /0.1/⟩ PREFIX dcterms: . . . PREFIX wd: . . . ⟨bob#me⟩ a foaf:Person; foaf:Knows ⟨alice#me⟩; schema:birthdate ”1990-07-04”xsd:date; foaf:topic interest wd:Q12418. wd:Q12418 dcterms:title ”Mona Lisa”; . . .
  • 7.
    RDF: Constraints? W3C definesRDF as an ”assertional logic,” where each triple expresses a simple proposition. ▶ This logical framework imposes a strict monotonic discipline on the language, preventing the expression of closed-world assumptions, local default preferences, and other commonly used non-monotonic constructs.
  • 8.
    SHACL ▶ Constraint languagefor RDF ▶ W3C Rec. since July 2017 Other constraint languages: ▶ SPIN - SPARQL Syntax, (2009) 2011 https://www.w3.org/submissions/2011/SUBM-spin-sparql-20110222/ ▶ IBM Resource Shape 2.0, 2014 https://www.w3.org/submissions/shapes/ ▶ Shape Expressions Language 2.0, 2017, http://shex.io/shex-semantics-20170713/
  • 9.
    SHACL ▶ relies onthe notion of ”shapes” e.g., :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employee ].
  • 10.
    SHACL Shape ▶ relieson the notion of ”shapes” e.g., :EmployeeNode a sh:NodeShape ; sh:targetClass :Employee ; sh:property [ sh:path :hasAddress ; sh:nodeKind. sh:Literal ; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employeee ]. shape name target defn constraints defn
  • 11.
    SHACL: Constraint Validation Consideran RDF graph on the left and a SHACL shape on the right, written in Turtle syntax: :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employee ].
  • 12.
    SHACL: Constraint Validation AcquiringTarget nodes: :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employee ].
  • 13.
    SHACL: Constraint Validation Checkingcompliance of Target nodes against Constraints : VALID :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employee ].
  • 14.
    SHACL: Propagated ConstraintValidation :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:node :AddressNode ]. :AddressNode a sh:NodeShape; sh:property [ sh:path :telephone; sh:maxCount 1; ]; sh:property [ sh:path :locatedIn; sh:maxCount 1; sh:minCount 1; sh:value :NorthernNorway; ];
  • 15.
    SHACL: Propagated Constraint- ”Recursion”? :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:node :AddressNode ]; sh:property [ sh:path :knows; sh:minCount 1; sh:node :EmployeeNode ]. :AddressNode a sh:NodeShape; sh:property [ sh:path :telephone; sh:maxCount 1; ]; sh:property [ sh:path :locatedIn; sh:maxCount 1; sh:minCount 1; sh:value :NorthernNorway; ].
  • 16.
    SHACL: Propagated Constraint- ”Recursion”? :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:node :AddressNode ]; sh:property [ sh:path :knows; sh:minCount 1; sh:node :EmployeeNode ]. :AddressNode a sh:NodeShape; sh:property [ sh:path :telephone; sh:maxCount 1; ]; sh:property [ sh:path :locatedIn; sh:maxCount 1; sh:minCount 1; sh:value :NorthernNorway; ]. From : https://www.w3.org/TR/shacl/ Recursion Not 100% formal semantics Validation explicitly left undefined
  • 17.
    SHACL: Propagated Constraint- ”Recursion” :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:node :AddressNode ]; sh:property [ sh:path :knows; sh:minCount 1; sh:node :EmployeeNode ]. :AddressNode a sh:NodeShape; sh:property [ sh:path :telephone; sh:maxCount 1; ]; sh:property [ sh:path :locatedIn; sh:maxCount 1; sh:minCount 1; sh:value :NorthernNorway; ]. Contributions: Abstract syntax of SHACL core Semantics for recursive SHACL Validation algorithms and Tractable fragments
  • 18.
    SHACL: Abstract Syntax LetS, C and P be countable infinite and mutually disjoint sets of shape, class and property names. Shape target τs and constraint ϕs are expressions defined by the grammar τs := sh:targetClass C | sh:targetSubjectOf P | sh:targetObjectOf P ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs β := ⊤ | C | s′ | ¬β Where α, α1, α2 ∈ {P ∪ {P− | P ∈ P}}, C ∈ C and s, s′ ∈ S.
  • 19.
    SHACL: Abstract Syntax Shapetarget τs and constraint ϕs are expressions defined by the grammar: τs := sh:targetClass C | sh:targetSubjectOf P | sh:targetObjectOf P τs := C | P | P− (i.e., short syntax) ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs β := ⊤ | C | s′ | ¬β A shape in abstract syntax: ⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee hasAddress).
  • 20.
    SHACL Shape target τsand constraint ϕs are expressions defined by the grammar: τs := C | P | P− (i.e., short syntax) ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs β := ⊤ | C | s′ | ¬β A shape in abstract syntax: ⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee hasAddress). Once the context is clear, we simply write: ⟨Employee, :Employee, (=1 hasAddress. ⊤) ∧ (▷τEmployee hasAddress)⟩
  • 21.
    SPARQL ▶ Query languagefor RDF ▶ W3C Rec. since January 2008 ▶ SPARQL 1.1, 2013 https://www.w3.org/TR/sparql11-query/ ▶ W3C working draft for SPARQL 1.2, 2023 https://www.w3.org/TR/sparql12-update/ ▶ W3C community draft for RDF∗ and SPARQL∗ , 2021 https://www.w3.org/2021/12/rdf-star.html
  • 22.
    SPARQL: Query Variables? ▶For Queries we need variables, and SPARQL Variables are bound to RDF terms ▶ E.g., ?title, ?author, ?published ▶ In the same way as SQL, A Query for variables is performed via SELECT statement ▶ E.g., SELECT ?title ?author ?published A SELECT statement returns Query Result as a table ?title ?author ?published Games of no chance Richard J. Nowakowski 1999 Calculated Bets Steven S. Skiena 2001 ▶ Bag Semantics
  • 23.
    SPARQL: Evaluation? ▶ Considera SPARQL query: SELECT DISTINCT ?article ?author ?affiliation WHERE { ?article rdf:type :Article; dc:creator ?author . ?author dc:affiliated ?affiliation . FILTER (contains (?affiliation, "University of Oslo"))
  • 24.
    SPARQL: Bottom-up Basic GraphPattern (BGP) Matching ?article rdf:type :Article; dc:creator ?author . ?author dc:affiliated ?affiliation . Intermediate Operators: FILTER contains (?affiliation, "University of Oslo") Intermediate Operators: PROJECTION ?article ?author ?affiliation Intermediate Operators: DISTINCT Final Query Result
  • 25.
    SPARQL Algebra SPARQL queryis a graph pattern P defined by the grammar P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2) | DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P)
  • 26.
    SPARQL Algebra SPARQL queryis a graph pattern P defined by the grammar P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2) | DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P) E.g. Consider a case of nested SPARQL query that retrieves the name of employees and their office addresses, SELECT ?y ?z WHERE { ?x :hasName ?y SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }} In SPARQL Algebra, Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y), hasAddress(y, z)))))
  • 27.
    SPARQL Algebra E.g. Considera case of nested SPARQL query that retrieves the name of employees and their office addresses, SELECT ?y ?z WHERE { ?x :hasName ?y SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }} In SPARQL Algebra, Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y), hasAddress(y, z))))) Upon simplification (whenever possible but absolutely not necessary), we get: Projyz (hasName(x, y) hasOffice(x, n) hasAddress(n, z)) .
  • 28.
    SPARQL Algebra: somenotions on query evaluation? The semantics of graph patterns is defined in terms of (solution) mappings, partial functions, µ : V → T with (possibly empty) dom(µ) where T is sets of RDF terms I ∪ B ∪ L, and V countably infinite set of variables disjoint from T.
  • 29.
    SPARQL Algebra: somenotions on query evaluation? Partial functions, µ : V → T Let ▶ µ|L be the restriction of mapping µ to L ⊆ V ▶ µ|L̄ be the restriction of mapping µ to V L Evaluation of a SPARQL query Q over an RDF graph G, denoted by QG, returns a multiset (i.e.,bag) of mappings. ▶ QG |X̄ is the multiset of mappings µ ∈ QG restricted to V X i.e., |µ, QG |X̄ | = X µ=µ′ |X̄ |µ′ , QG | ▶ Support of the multiset QG, denoted by sup(QG), is sup(QG ) = {µ | |µ, QG | > 0}
  • 30.
  • 31.
    Optimization: Problem Statement LetS be a set of SHACL shapes, and Q a SPARQL query. Our goal is to find optimal S-equivalent queries Q′ of the original query Q s.t., Q ≡S Q′ iff, ∀G.G |= S, QG = Q′G
  • 32.
    Optimization: Equivalences Let Uand V be two graph patterns, and S a set of SHACL shapes. U ≡S V iff, ∀G.G |= S, UG = V G U ≡S,y V iff, ∀G.G |= S, UG |ȳ = V G |ȳ U ∼ =S,y V iff, ∀G.G |= S, sup(UG |ȳ ) = sup(V G |ȳ )
  • 33.
    Optimization: Rewriting Rules LetU and V be two graph patterns, and S a set of SHACL shapes. U ≡S V iff, ∀G.G |= S, UG = V G U ≡S,y V iff, ∀G.G |= S, UG |ȳ = V G |ȳ U ∼ =S,y V iff, ∀G.G |= S, sup(UG |ȳ ) = sup(V G |ȳ ) We then propose a set of query rewriting rules based on these equivalences that: 1. reduce OPTIONAL to JOIN Pattern 2. remove redundant JOIN Pattern 3. eliminate DIST Operator etc
  • 34.
    An Example ofQuery Rewriting Consider a SPARQL query, Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) over graph G, :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Yacob a :Employee; . . . :Nils a :Employee; . . . :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". . . .
  • 35.
    An Example ofQuery Rewriting Consider a SPARQL query, Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) over graph G, :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Yacob a :Employee; . . . :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". . . . Assume G satisfies shape, ⟨Employee, :Employee, (=1 hasAddress. ⊤)∧(▷τEmployee hasAddress)⟩
  • 36.
    An Example ofQuery Rewriting Consider the query over G, Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) . ▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt pattern” can be reduce to “Join pattern” Dist(Projxy (Join(Employee(x), hasAddress(x, y)))). ▶ Since G satisfies ϕEmployee = (▷τEmployee hasAddress), “Dist” can be removed, Projxy (Join(Employee(x), hasAddress(x, y))).
  • 37.
    An Example ofQuery Rewriting Consider the query over G, Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) . ▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt pattern” can be reduce to “Join pattern” Dist(Projxy (Join(Employee(x), hasAddress(x, y)))). ▶ Since G satisfies ϕEmployee = (▷τEmployee hasAddress), “Dist” can be removed, Projxy (Join(Employee(x), hasAddress(x, y))). “≡S Equivalent Queries”
  • 38.
    Optimization: Example ofRewriting Rules Lemma Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph pattern s.t. T ◀ P. If y / ∈ var(P), then 1. OptF (P, P(x, y)) ≡S FilterF (Join(P, P(x, y))) 2. Join(P, P(x, y)) ∼ =S,y P where T =    C(x), if τs = C, R(x, z), if τs = ∃R, R−(x, z), if τs = ∃R− . Corollary Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph pattern s.t. T ◀ P. If y / ∈ var(P ∪ F), then 1. FilterF (Join(P, P(x, y))) ∼ =S,y FilterF (P) 2. OptF (P, P(x, y)) ∼ =S,y FilterF (P)
  • 39.
    Optimization: Example ofRewriting Rules T =    C(x), if τs = C, R(x, z), if τs = ∃R, R−(x, z), if τs = ∃R− . Corollary Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph pattern s.t. T ◀ P. If y / ∈ var(P ∪ F), then FilterF (Join(P, P(x, y))) ∼ =S,y FilterF (P) . Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x) lastName(x, y), hasAddress(x, z))))) Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩.
  • 40.
    Optimization: Example ofRewriting Rules Corollary Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph pattern s.t. T ◀ P. If y / ∈ var(P ∪ F), then FilterF (Join(P, P(x, y))) ∼ =S,y FilterF (P) . Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x) lastName(x, y), hasAddress(x, z))))) Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩. Then, by following Corollary, we can reduce query Q to : Dist(Projx y (Filterregex(y,”Smith”)(Student(x) lastName(x, y))))
  • 41.
    Property of QueryRewriting Rules ▶ Propagation to Larger Queries ▶ Confluent Reduction
  • 42.
    Property of QueryRewriting Rules: Propagation Definition Let Q be a SPARQL query, and let P and U be two graph patterns. Then, we write U ◁ ∼ Q if Dist(ProjX (P)) ⊴ Q and U ⊴ P . Theorem Let Q be a SPARQL query and S a SHACL document. Let U and V be two graph patterns. Then, 1. Q ≡S QU7→V if U ≡S V 2. ProjX (Q) ≡S ProjX (Q)U7→V if U ⊴ Q, U ≡S,y V and y / ∈ var(ProjX (Q) U) 3. Q ≡S QU7→V if U ◁ ∼ Q, U ∼ =S,y V and y / ∈ var(Q U)
  • 43.
    Property of QueryRewriting Rules: Confluent Reduction Consider the SPARQL query, Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z))) over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with {(▷∃employeeID employeeID), (=1 insuredBy. ⊤), (hiredBy = insuredBy)} ⊆ ϕ∃employedID .
  • 44.
    Property of QueryRewriting Rules: Confluent Reduction Consider the SPARQL query, Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z))) over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with {(▷∃employeeID employeeID), (=1 insuredBy. ⊤), (hiredBy = insuredBy)} ⊆ ϕ∃employedID Subsequently, (=1 insuredBy. ⊤)∧(hiredBy = insuredBy) −→ (=1 hiredBy. ⊤) (=1 hiredBy. ⊤) −→ (≥1 hiredBy. ⊤) (=1 insuredBy. ⊤) −→ (≥1 insuredBy. ⊤) Need to take-care all explicit and implicit rewritings rules
  • 45.
    Property of QueryRewriting Rules: Confluent Reduction Consider the SPARQL query, Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z))) over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with {(▷∃employeeID employeeID), (=1 insuredBy. ⊤), , (hiredBy = insuredBy)} ⊆ ϕ∃employedID Then, the query is subjective to the following rewriting rules: 1. ∼ =S,y - ”Join” optimization based on (≥1 insuredBy. ⊤) 2. ∼ =S,y - ”Join” optimization based on (≥1 hiredBy. ⊤) 3. ∼ =S,y - ”Join” optimization based on (hiredBy = insuredBy) 4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤) 5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤) 6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID)
  • 46.
    Property of QueryRewriting Rules: Confluent Reduction Consider the SPARQL query, Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z))) over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩. Then, the query is subjective to the following rewriting rules: 1. ∼ =S,y - ”Join” optimization based on (≥1 insuredBy. ⊤) 2. ∼ =S,y - ”Join” optimization based on (≥1 hiredBy. ⊤) 3. ∼ =S,y - ”Join” optimization based on (hiredBy = insuredBy) 4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤) 5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤) 6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID) Regardless of the sequence in which these rewrites are applied, we will get: Projx y (employeeID(x, y))
  • 47.
    Property of QueryRewriting Rules: Confluent Reduction ... As rewriting optimizations are generalized in the form of lemmas and their consequences, we state confluent results as follows: Theorem Query rewriting defined by Lemmas 1 to 6 is a confluent reduction. Theorem Query rewriting defined by Lemmas 1 to 7 is a confluent reduction iff ϕ′ = ⊤, if P = T, Vn i=1(=1 Pi . ⊤), if P = (T P1(x, z1) . . . Pi (x, zi ) . . . Pn(x, zn)) in Lemma 7.
  • 48.
    Other or futurework? ▶ Extension to SPARQL Property Path Queries ▶ Optimization of Ontology-Mediated Query Answering