SlideShare a Scribd company logo
1 of 48
Download to read offline
SIRIUS SEMINAR
Ratan Bahadur Thapa
PhD candidate at SIRIUS (IFI)
University of Oslo
October 19, 2023
About
Full paper: https://www.duo.uio.no/handle/10852/103167
RDF
▶ Standard for web data
▶ W3C Rec. since 1999
▶ RDF 1.0, 2004 https://www.w3.org/TR/rdf-primer/
▶ RDF 1.2, 2014 https://www.w3.org/TR/rdf11-concepts/
▶ W3C working draft for RDF 1.2, 2023 https://www.w3.org/TR/rdf12-concepts/
RDF Syntax
▶ IRIs to reference resources on web
▶ Statements as nodes and arcs in a graph, in the form of triples
”(Subject, Predicate, Object)”. E.g.,
”Mona Lisa has a creator whose value is Leonardo Da Vinci”
http://purl.org/dc/terms/creator
https://en.wikipedia.org/wiki/Mona_Lisa
https://en.wikipedia.org/wiki/Leonardo_da_Vinci
Subject
Predicate
Object
RDF Graph
▶ Composed of triples ”(Subject, Predicate, Object)”
RDF: Syntactic shortcuts
Turtle Syntax:
BASE ⟨http : //example.org⟩
PREFIX foaf: ⟨http : //xmlns.com/foaf /0.1/⟩
PREFIX dcterms: . . .
PREFIX wd: . . .
⟨bob#me⟩
a foaf:Person;
foaf:Knows ⟨alice#me⟩;
schema:birthdate ”1990-07-04”xsd:date;
foaf:topic interest wd:Q12418.
wd:Q12418 dcterms:title ”Mona Lisa”;
. . .
RDF: Constraints?
W3C defines RDF as an ”assertional logic,” where each triple
expresses a simple proposition.
▶ This logical framework imposes a strict monotonic discipline
on the language, preventing the expression of closed-world
assumptions, local default preferences, and other commonly
used non-monotonic constructs.
SHACL
▶ Constraint language for RDF
▶ W3C Rec. since July 2017
Other constraint languages:
▶ SPIN - SPARQL Syntax, (2009) 2011
https://www.w3.org/submissions/2011/SUBM-spin-sparql-20110222/
▶ IBM Resource Shape 2.0, 2014 https://www.w3.org/submissions/shapes/
▶ Shape Expressions Language 2.0, 2017, http://shex.io/shex-semantics-20170713/
SHACL
▶ relies on the notion of ”shapes”
e.g.,
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
SHACL Shape
▶ relies on the notion of ”shapes”
e.g.,
:EmployeeNode a sh:NodeShape ;
sh:targetClass :Employee ;
sh:property [ sh:path :hasAddress ;
sh:nodeKind. sh:Literal ;
sh:maxCount 1;
sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employeee ].
shape name
target defn
constraints
defn
SHACL: Constraint Validation
Consider an RDF graph on the left and a SHACL shape on the right, written in
Turtle syntax:
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
SHACL: Constraint Validation
Acquiring Target nodes:
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
SHACL: Constraint Validation
Checking compliance of Target nodes against Constraints : VALID
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
SHACL: Propagated Constraint Validation
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:node :AddressNode ].
:AddressNode a sh:NodeShape;
sh:property [ sh:path :telephone;
sh:maxCount 1; ];
sh:property [ sh:path :locatedIn;
sh:maxCount 1; sh:minCount 1;
sh:value :NorthernNorway; ];
SHACL: Propagated Constraint - ”Recursion”?
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:node :AddressNode ];
sh:property [ sh:path :knows;
sh:minCount 1;
sh:node :EmployeeNode ].
:AddressNode a sh:NodeShape;
sh:property [ sh:path :telephone;
sh:maxCount 1; ];
sh:property [ sh:path :locatedIn;
sh:maxCount 1; sh:minCount 1;
sh:value :NorthernNorway; ].
SHACL: Propagated Constraint - ”Recursion”?
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:node :AddressNode ];
sh:property [ sh:path :knows;
sh:minCount 1;
sh:node :EmployeeNode ].
:AddressNode a sh:NodeShape;
sh:property [ sh:path :telephone;
sh:maxCount 1; ];
sh:property [ sh:path :locatedIn;
sh:maxCount 1; sh:minCount 1;
sh:value :NorthernNorway; ].
From : https://www.w3.org/TR/shacl/
Recursion
Not 100% formal semantics
Validation explicitly left undefined
SHACL: Propagated Constraint - ”Recursion”
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:node :AddressNode ];
sh:property [ sh:path :knows;
sh:minCount 1;
sh:node :EmployeeNode ].
:AddressNode a sh:NodeShape;
sh:property [ sh:path :telephone;
sh:maxCount 1; ];
sh:property [ sh:path :locatedIn;
sh:maxCount 1; sh:minCount 1;
sh:value :NorthernNorway; ].
Contributions:
Abstract syntax of SHACL core
Semantics for recursive SHACL
Validation algorithms and Tractable fragments
SHACL: Abstract Syntax
Let S, C and P be countable infinite and mutually disjoint sets of
shape, class and property names.
Shape target τs and constraint ϕs are expressions defined by
the grammar
τs := sh:targetClass C | sh:targetSubjectOf P |
sh:targetObjectOf P
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
Where α, α1, α2 ∈ {P ∪ {P− | P ∈ P}}, C ∈ C and s, s′ ∈ S.
SHACL: Abstract Syntax
Shape target τs and constraint ϕs are expressions defined by the
grammar:
τs := sh:targetClass C | sh:targetSubjectOf P |
sh:targetObjectOf P
τs := C | P | P−
(i.e., short syntax)
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
A shape in abstract syntax:
⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and
ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee
hasAddress).
SHACL
Shape target τs and constraint ϕs are expressions defined by the
grammar:
τs := C | P | P−
(i.e., short syntax)
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
A shape in abstract syntax:
⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and
ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee
hasAddress).
Once the context is clear, we simply write:
⟨Employee, :Employee, (=1 hasAddress. ⊤) ∧ (▷τEmployee
hasAddress)⟩
SPARQL
▶ Query language for RDF
▶ W3C Rec. since January
2008
▶ SPARQL 1.1, 2013 https://www.w3.org/TR/sparql11-query/
▶ W3C working draft for SPARQL 1.2, 2023 https://www.w3.org/TR/sparql12-update/
▶ W3C community draft for RDF∗
and SPARQL∗
, 2021 https://www.w3.org/2021/12/rdf-star.html
SPARQL: Query Variables?
▶ For Queries we need variables, and SPARQL Variables are
bound to RDF terms
▶ E.g., ?title, ?author, ?published
▶ In the same way as SQL,
A Query for variables is performed via SELECT statement
▶ E.g., SELECT ?title ?author ?published
A SELECT statement returns Query Result as a table
?title ?author ?published
Games of no
chance
Richard J.
Nowakowski
1999
Calculated Bets Steven S. Skiena 2001
▶ Bag Semantics
SPARQL: Evaluation?
▶ Consider a SPARQL query:
SELECT DISTINCT ?article ?author ?affiliation
WHERE {
?article rdf:type :Article;
dc:creator ?author .
?author dc:affiliated ?affiliation .
FILTER (contains (?affiliation, "University of Oslo"))
SPARQL: Bottom-up
Basic Graph Pattern (BGP) Matching
?article rdf:type :Article; dc:creator ?author .
?author dc:affiliated ?affiliation .
Intermediate Operators: FILTER
contains (?affiliation, "University of Oslo")
Intermediate Operators: PROJECTION
?article ?author ?affiliation
Intermediate Operators: DISTINCT
Final Query Result
SPARQL Algebra
SPARQL query is a graph pattern P defined by the grammar
P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2)
| DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P)
SPARQL Algebra
SPARQL query is a graph pattern P defined by the grammar
P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2)
| DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P)
E.g. Consider a case of nested SPARQL query that retrieves the
name of employees and their office addresses,
SELECT ?y ?z WHERE { ?x :hasName ?y
SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }}
In SPARQL Algebra,
Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y),
hasAddress(y, z)))))
SPARQL Algebra
E.g. Consider a case of nested SPARQL query that retrieves the
name of employees and their office addresses,
SELECT ?y ?z WHERE { ?x :hasName ?y
SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }}
In SPARQL Algebra,
Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y),
hasAddress(y, z)))))
Upon simplification (whenever possible but absolutely not necessary), we
get:
Projyz (hasName(x, y) hasOffice(x, n) hasAddress(n, z)) .
SPARQL Algebra: some notions on query evaluation?
The semantics of graph patterns is defined in terms of (solution)
mappings, partial functions,
µ : V → T with (possibly empty) dom(µ)
where T is sets of RDF terms I ∪ B ∪ L, and V countably infinite
set of variables disjoint from T.
SPARQL Algebra: some notions on query evaluation?
Partial functions,
µ : V → T
Let
▶ µ|L be the restriction of mapping µ to L ⊆ V
▶ µ|L̄ be the restriction of mapping µ to V  L
Evaluation of a SPARQL query Q over an RDF graph G, denoted
by QG, returns a multiset (i.e.,bag) of mappings.
▶ QG
|X̄ is the multiset of mappings µ ∈ QG restricted to V  X
i.e.,
|µ, QG
|X̄ | =
X
µ=µ′
|X̄
|µ′
, QG
|
▶ Support of the multiset QG, denoted by sup(QG), is
sup(QG
) = {µ | |µ, QG
| > 0}
Next
SPARQL queries optimizations with SHACL
Optimization: Problem Statement
Let S be a set of SHACL shapes, and Q a SPARQL query.
Our goal is to find optimal S-equivalent queries Q′ of the original
query Q s.t.,
Q ≡S Q′
iff, ∀G.G |= S, QG
= Q′G
Optimization: Equivalences
Let U and V be two graph patterns, and S a set of SHACL shapes.
U ≡S V iff, ∀G.G |= S, UG
= V G
U ≡S,y V iff, ∀G.G |= S, UG
|ȳ = V G
|ȳ
U ∼
=S,y V iff, ∀G.G |= S, sup(UG
|ȳ ) = sup(V G
|ȳ )
Optimization: Rewriting Rules
Let U and V be two graph patterns, and S a set of SHACL shapes.
U ≡S V iff, ∀G.G |= S, UG
= V G
U ≡S,y V iff, ∀G.G |= S, UG
|ȳ = V G
|ȳ
U ∼
=S,y V iff, ∀G.G |= S, sup(UG
|ȳ ) = sup(V G
|ȳ )
We then propose a set of query rewriting rules based on these
equivalences that:
1. reduce OPTIONAL to JOIN Pattern
2. remove redundant JOIN Pattern
3. eliminate DIST Operator etc
An Example of Query Rewriting
Consider a SPARQL query,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y))))
over graph G,
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Yacob a :Employee;
. . .
:Nils a :Employee;
. . .
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
. . .
An Example of Query Rewriting
Consider a SPARQL query,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y))))
over graph G,
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Yacob a :Employee;
. . .
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
. . .
Assume G satisfies shape,
⟨Employee, :Employee, (=1 hasAddress. ⊤)∧(▷τEmployee
hasAddress)⟩
An Example of Query Rewriting
Consider the query over G,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) .
▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt
pattern” can be reduce to “Join pattern”
Dist(Projxy (Join(Employee(x), hasAddress(x, y)))).
▶ Since G satisfies ϕEmployee = (▷τEmployee
hasAddress), “Dist”
can be removed,
Projxy (Join(Employee(x), hasAddress(x, y))).
An Example of Query Rewriting
Consider the query over G,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) .
▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt
pattern” can be reduce to “Join pattern”
Dist(Projxy (Join(Employee(x), hasAddress(x, y)))).
▶ Since G satisfies ϕEmployee = (▷τEmployee
hasAddress), “Dist”
can be removed,
Projxy (Join(Employee(x), hasAddress(x, y))).
“≡S Equivalent Queries”
Optimization: Example of Rewriting Rules
Lemma
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P), then
1. OptF (P, P(x, y)) ≡S FilterF (Join(P, P(x, y)))
2. Join(P, P(x, y)) ∼
=S,y P
where T =



C(x), if τs = C,
R(x, z), if τs = ∃R,
R−(x, z), if τs = ∃R− .
Corollary
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P ∪ F), then
1. FilterF (Join(P, P(x, y))) ∼
=S,y FilterF (P)
2. OptF (P, P(x, y)) ∼
=S,y FilterF (P)
Optimization: Example of Rewriting Rules
T =



C(x), if τs = C,
R(x, z), if τs = ∃R,
R−(x, z), if τs = ∃R− .
Corollary
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P ∪ F), then
FilterF (Join(P, P(x, y))) ∼
=S,y FilterF (P) .
Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x)
lastName(x, y), hasAddress(x, z)))))
Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩.
Optimization: Example of Rewriting Rules
Corollary
Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph
pattern s.t. T ◀ P. If y /
∈ var(P ∪ F), then
FilterF (Join(P, P(x, y))) ∼
=S,y FilterF (P) .
Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x)
lastName(x, y), hasAddress(x, z)))))
Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩.
Then, by following Corollary, we can reduce query Q to :
Dist(Projx y (Filterregex(y,”Smith”)(Student(x)
lastName(x, y))))
Property of Query Rewriting Rules
▶ Propagation to Larger Queries
▶ Confluent Reduction
Property of Query Rewriting Rules: Propagation
Definition
Let Q be a SPARQL query, and let P and U be two graph patterns.
Then, we write U ◁
∼ Q if Dist(ProjX (P)) ⊴ Q and U ⊴ P .
Theorem
Let Q be a SPARQL query and S a SHACL document. Let U and
V be two graph patterns. Then,
1. Q ≡S QU7→V if U ≡S V
2. ProjX (Q) ≡S ProjX (Q)U7→V if U ⊴ Q, U ≡S,y V and
y /
∈ var(ProjX (Q)  U)
3. Q ≡S QU7→V if U ◁
∼ Q, U ∼
=S,y V and y /
∈ var(Q  U)
Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩
with
{(▷∃employeeID employeeID), (=1 insuredBy. ⊤),
(hiredBy = insuredBy)} ⊆ ϕ∃employedID .
Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with
{(▷∃employeeID employeeID), (=1 insuredBy. ⊤),
(hiredBy = insuredBy)} ⊆ ϕ∃employedID
Subsequently,
(=1 insuredBy. ⊤)∧(hiredBy = insuredBy)
−→ (=1 hiredBy. ⊤)
(=1 hiredBy. ⊤) −→ (≥1 hiredBy. ⊤)
(=1 insuredBy. ⊤) −→ (≥1 insuredBy. ⊤)
Need to take-care all explicit and implicit rewritings rules
Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with
{(▷∃employeeID employeeID), (=1 insuredBy. ⊤),
, (hiredBy = insuredBy)} ⊆ ϕ∃employedID
Then, the query is subjective to the following rewriting rules:
1. ∼
=S,y - ”Join” optimization based on (≥1 insuredBy. ⊤)
2. ∼
=S,y - ”Join” optimization based on (≥1 hiredBy. ⊤)
3. ∼
=S,y - ”Join” optimization based on (hiredBy = insuredBy)
4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤)
5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤)
6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID)
Property of Query Rewriting Rules: Confluent Reduction
Consider the SPARQL query,
Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z)))
over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩.
Then, the query is subjective to the following rewriting rules:
1. ∼
=S,y - ”Join” optimization based on (≥1 insuredBy. ⊤)
2. ∼
=S,y - ”Join” optimization based on (≥1 hiredBy. ⊤)
3. ∼
=S,y - ”Join” optimization based on (hiredBy = insuredBy)
4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤)
5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤)
6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID)
Regardless of the sequence in which these rewrites are applied, we will get:
Projx y (employeeID(x, y))
Property of Query Rewriting Rules: Confluent Reduction
... As rewriting optimizations are generalized in the form of lemmas and their
consequences, we state confluent results as follows:
Theorem
Query rewriting defined by Lemmas 1 to 6 is a confluent reduction.
Theorem
Query rewriting defined by Lemmas 1 to 7 is a confluent reduction iff
ϕ′
=

⊤, if P = T,
Vn
i=1(=1 Pi . ⊤), if P = (T P1(x, z1) . . . Pi (x, zi ) . . . Pn(x, zn))
in
Lemma 7.
Other or future work?
▶ Extension to SPARQL Property Path Queries
▶ Optimization of Ontology-Mediated Query Answering

More Related Content

Similar to Optimizing SPARQL Queries with SHACL.pdf

A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsDr. Neil Brittliff
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapesJose Emilio Labra Gayo
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1net2-project
 
Sparql service-description
Sparql service-descriptionSparql service-description
Sparql service-descriptionSTIinnsbruck
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQLLino Valdivia
 
Validating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape ExpressionsValidating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape ExpressionsJose Emilio Labra Gayo
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Data Con LA
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...Holden Karau
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIsJosef Petrák
 
Rdf data-model-and-storage
Rdf data-model-and-storageRdf data-model-and-storage
Rdf data-model-and-storage灿辉 葛
 
Inductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDFInductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDFJose Emilio Labra Gayo
 
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopSparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopDataWorks Summit
 

Similar to Optimizing SPARQL Queries with SHACL.pdf (20)

A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your Analytics
 
RDF data validation 2017 SHACL
RDF data validation 2017 SHACLRDF data validation 2017 SHACL
RDF data validation 2017 SHACL
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1
 
ShEx vs SHACL
ShEx vs SHACLShEx vs SHACL
ShEx vs SHACL
 
Sparql service-description
Sparql service-descriptionSparql service-description
Sparql service-description
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQL
 
Validating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape ExpressionsValidating and Describing Linked Data Portals using RDF Shape Expressions
Validating and Describing Linked Data Portals using RDF Shape Expressions
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Sparql
SparqlSparql
Sparql
 
HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
 
Rdf data-model-and-storage
Rdf data-model-and-storageRdf data-model-and-storage
Rdf data-model-and-storage
 
Inductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDFInductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDF
 
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on HadoopSparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Optimizing SPARQL Queries with SHACL.pdf

  • 1. SIRIUS SEMINAR Ratan Bahadur Thapa PhD candidate at SIRIUS (IFI) University of Oslo October 19, 2023
  • 3. RDF ▶ Standard for web data ▶ W3C Rec. since 1999 ▶ RDF 1.0, 2004 https://www.w3.org/TR/rdf-primer/ ▶ RDF 1.2, 2014 https://www.w3.org/TR/rdf11-concepts/ ▶ W3C working draft for RDF 1.2, 2023 https://www.w3.org/TR/rdf12-concepts/
  • 4. RDF Syntax ▶ IRIs to reference resources on web ▶ Statements as nodes and arcs in a graph, in the form of triples ”(Subject, Predicate, Object)”. E.g., ”Mona Lisa has a creator whose value is Leonardo Da Vinci” http://purl.org/dc/terms/creator https://en.wikipedia.org/wiki/Mona_Lisa https://en.wikipedia.org/wiki/Leonardo_da_Vinci Subject Predicate Object
  • 5. RDF Graph ▶ Composed of triples ”(Subject, Predicate, Object)”
  • 6. RDF: Syntactic shortcuts Turtle Syntax: BASE ⟨http : //example.org⟩ PREFIX foaf: ⟨http : //xmlns.com/foaf /0.1/⟩ PREFIX dcterms: . . . PREFIX wd: . . . ⟨bob#me⟩ a foaf:Person; foaf:Knows ⟨alice#me⟩; schema:birthdate ”1990-07-04”xsd:date; foaf:topic interest wd:Q12418. wd:Q12418 dcterms:title ”Mona Lisa”; . . .
  • 7. RDF: Constraints? W3C defines RDF as an ”assertional logic,” where each triple expresses a simple proposition. ▶ This logical framework imposes a strict monotonic discipline on the language, preventing the expression of closed-world assumptions, local default preferences, and other commonly used non-monotonic constructs.
  • 8. SHACL ▶ Constraint language for RDF ▶ W3C Rec. since July 2017 Other constraint languages: ▶ SPIN - SPARQL Syntax, (2009) 2011 https://www.w3.org/submissions/2011/SUBM-spin-sparql-20110222/ ▶ IBM Resource Shape 2.0, 2014 https://www.w3.org/submissions/shapes/ ▶ Shape Expressions Language 2.0, 2017, http://shex.io/shex-semantics-20170713/
  • 9. SHACL ▶ relies on the notion of ”shapes” e.g., :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employee ].
  • 10. SHACL Shape ▶ relies on the notion of ”shapes” e.g., :EmployeeNode a sh:NodeShape ; sh:targetClass :Employee ; sh:property [ sh:path :hasAddress ; sh:nodeKind. sh:Literal ; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employeee ]. shape name target defn constraints defn
  • 11. SHACL: Constraint Validation Consider an RDF graph on the left and a SHACL shape on the right, written in Turtle syntax: :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employee ].
  • 12. SHACL: Constraint Validation Acquiring Target nodes: :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employee ].
  • 13. SHACL: Constraint Validation Checking compliance of Target nodes against Constraints : VALID :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:datatype xsd:string ]; sh:property [ sh:path :hasAddress; dash:uniqueValueForClass :Employee ].
  • 14. SHACL: Propagated Constraint Validation :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:node :AddressNode ]. :AddressNode a sh:NodeShape; sh:property [ sh:path :telephone; sh:maxCount 1; ]; sh:property [ sh:path :locatedIn; sh:maxCount 1; sh:minCount 1; sh:value :NorthernNorway; ];
  • 15. SHACL: Propagated Constraint - ”Recursion”? :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:node :AddressNode ]; sh:property [ sh:path :knows; sh:minCount 1; sh:node :EmployeeNode ]. :AddressNode a sh:NodeShape; sh:property [ sh:path :telephone; sh:maxCount 1; ]; sh:property [ sh:path :locatedIn; sh:maxCount 1; sh:minCount 1; sh:value :NorthernNorway; ].
  • 16. SHACL: Propagated Constraint - ”Recursion”? :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:node :AddressNode ]; sh:property [ sh:path :knows; sh:minCount 1; sh:node :EmployeeNode ]. :AddressNode a sh:NodeShape; sh:property [ sh:path :telephone; sh:maxCount 1; ]; sh:property [ sh:path :locatedIn; sh:maxCount 1; sh:minCount 1; sh:value :NorthernNorway; ]. From : https://www.w3.org/TR/shacl/ Recursion Not 100% formal semantics Validation explicitly left undefined
  • 17. SHACL: Propagated Constraint - ”Recursion” :EmployeeNode a sh:NodeShape; sh:targetClass :Employee; sh:property [ sh:path :hasAddress; sh:nodeKind sh:Literal; sh:maxCount 1; sh:minCount 1; sh:node :AddressNode ]; sh:property [ sh:path :knows; sh:minCount 1; sh:node :EmployeeNode ]. :AddressNode a sh:NodeShape; sh:property [ sh:path :telephone; sh:maxCount 1; ]; sh:property [ sh:path :locatedIn; sh:maxCount 1; sh:minCount 1; sh:value :NorthernNorway; ]. Contributions: Abstract syntax of SHACL core Semantics for recursive SHACL Validation algorithms and Tractable fragments
  • 18. SHACL: Abstract Syntax Let S, C and P be countable infinite and mutually disjoint sets of shape, class and property names. Shape target τs and constraint ϕs are expressions defined by the grammar τs := sh:targetClass C | sh:targetSubjectOf P | sh:targetObjectOf P ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs β := ⊤ | C | s′ | ¬β Where α, α1, α2 ∈ {P ∪ {P− | P ∈ P}}, C ∈ C and s, s′ ∈ S.
  • 19. SHACL: Abstract Syntax Shape target τs and constraint ϕs are expressions defined by the grammar: τs := sh:targetClass C | sh:targetSubjectOf P | sh:targetObjectOf P τs := C | P | P− (i.e., short syntax) ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs β := ⊤ | C | s′ | ¬β A shape in abstract syntax: ⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee hasAddress).
  • 20. SHACL Shape target τs and constraint ϕs are expressions defined by the grammar: τs := C | P | P− (i.e., short syntax) ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs β := ⊤ | C | s′ | ¬β A shape in abstract syntax: ⟨Employee, τEmployee, ϕEmployee⟩ with τEmployee = :Employee and ϕEmployee = (=1 hasAddress. ⊤) ∧ (▷τEmployee hasAddress). Once the context is clear, we simply write: ⟨Employee, :Employee, (=1 hasAddress. ⊤) ∧ (▷τEmployee hasAddress)⟩
  • 21. SPARQL ▶ Query language for RDF ▶ W3C Rec. since January 2008 ▶ SPARQL 1.1, 2013 https://www.w3.org/TR/sparql11-query/ ▶ W3C working draft for SPARQL 1.2, 2023 https://www.w3.org/TR/sparql12-update/ ▶ W3C community draft for RDF∗ and SPARQL∗ , 2021 https://www.w3.org/2021/12/rdf-star.html
  • 22. SPARQL: Query Variables? ▶ For Queries we need variables, and SPARQL Variables are bound to RDF terms ▶ E.g., ?title, ?author, ?published ▶ In the same way as SQL, A Query for variables is performed via SELECT statement ▶ E.g., SELECT ?title ?author ?published A SELECT statement returns Query Result as a table ?title ?author ?published Games of no chance Richard J. Nowakowski 1999 Calculated Bets Steven S. Skiena 2001 ▶ Bag Semantics
  • 23. SPARQL: Evaluation? ▶ Consider a SPARQL query: SELECT DISTINCT ?article ?author ?affiliation WHERE { ?article rdf:type :Article; dc:creator ?author . ?author dc:affiliated ?affiliation . FILTER (contains (?affiliation, "University of Oslo"))
  • 24. SPARQL: Bottom-up Basic Graph Pattern (BGP) Matching ?article rdf:type :Article; dc:creator ?author . ?author dc:affiliated ?affiliation . Intermediate Operators: FILTER contains (?affiliation, "University of Oslo") Intermediate Operators: PROJECTION ?article ?author ?affiliation Intermediate Operators: DISTINCT Final Query Result
  • 25. SPARQL Algebra SPARQL query is a graph pattern P defined by the grammar P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2) | DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P)
  • 26. SPARQL Algebra SPARQL query is a graph pattern P defined by the grammar P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2) | DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P) E.g. Consider a case of nested SPARQL query that retrieves the name of employees and their office addresses, SELECT ?y ?z WHERE { ?x :hasName ?y SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }} In SPARQL Algebra, Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y), hasAddress(y, z)))))
  • 27. SPARQL Algebra E.g. Consider a case of nested SPARQL query that retrieves the name of employees and their office addresses, SELECT ?y ?z WHERE { ?x :hasName ?y SELECT ?x ?z WHERE { ?x :hasOffice ?y . ?y :hasAddress ?z }} In SPARQL Algebra, Projyz (Join(hasName(x, y), Projxz (Join(hasOffice(x, y), hasAddress(y, z))))) Upon simplification (whenever possible but absolutely not necessary), we get: Projyz (hasName(x, y) hasOffice(x, n) hasAddress(n, z)) .
  • 28. SPARQL Algebra: some notions on query evaluation? The semantics of graph patterns is defined in terms of (solution) mappings, partial functions, µ : V → T with (possibly empty) dom(µ) where T is sets of RDF terms I ∪ B ∪ L, and V countably infinite set of variables disjoint from T.
  • 29. SPARQL Algebra: some notions on query evaluation? Partial functions, µ : V → T Let ▶ µ|L be the restriction of mapping µ to L ⊆ V ▶ µ|L̄ be the restriction of mapping µ to V L Evaluation of a SPARQL query Q over an RDF graph G, denoted by QG, returns a multiset (i.e.,bag) of mappings. ▶ QG |X̄ is the multiset of mappings µ ∈ QG restricted to V X i.e., |µ, QG |X̄ | = X µ=µ′ |X̄ |µ′ , QG | ▶ Support of the multiset QG, denoted by sup(QG), is sup(QG ) = {µ | |µ, QG | > 0}
  • 31. Optimization: Problem Statement Let S be a set of SHACL shapes, and Q a SPARQL query. Our goal is to find optimal S-equivalent queries Q′ of the original query Q s.t., Q ≡S Q′ iff, ∀G.G |= S, QG = Q′G
  • 32. Optimization: Equivalences Let U and V be two graph patterns, and S a set of SHACL shapes. U ≡S V iff, ∀G.G |= S, UG = V G U ≡S,y V iff, ∀G.G |= S, UG |ȳ = V G |ȳ U ∼ =S,y V iff, ∀G.G |= S, sup(UG |ȳ ) = sup(V G |ȳ )
  • 33. Optimization: Rewriting Rules Let U and V be two graph patterns, and S a set of SHACL shapes. U ≡S V iff, ∀G.G |= S, UG = V G U ≡S,y V iff, ∀G.G |= S, UG |ȳ = V G |ȳ U ∼ =S,y V iff, ∀G.G |= S, sup(UG |ȳ ) = sup(V G |ȳ ) We then propose a set of query rewriting rules based on these equivalences that: 1. reduce OPTIONAL to JOIN Pattern 2. remove redundant JOIN Pattern 3. eliminate DIST Operator etc
  • 34. An Example of Query Rewriting Consider a SPARQL query, Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) over graph G, :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Yacob a :Employee; . . . :Nils a :Employee; . . . :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". . . .
  • 35. An Example of Query Rewriting Consider a SPARQL query, Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) over graph G, :Ida a :Employee; :hasID "001"^^xsd:int; :hasAddress "Oslo". :Yacob a :Employee; . . . :Ingrid a :Employee; :hasID "002"^^xsd:int; :hasAddress "Bergen". . . . Assume G satisfies shape, ⟨Employee, :Employee, (=1 hasAddress. ⊤)∧(▷τEmployee hasAddress)⟩
  • 36. An Example of Query Rewriting Consider the query over G, Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) . ▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt pattern” can be reduce to “Join pattern” Dist(Projxy (Join(Employee(x), hasAddress(x, y)))). ▶ Since G satisfies ϕEmployee = (▷τEmployee hasAddress), “Dist” can be removed, Projxy (Join(Employee(x), hasAddress(x, y))).
  • 37. An Example of Query Rewriting Consider the query over G, Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))) . ▶ Since G satisfies ϕEmployee = (=1 hasAddress. ⊤), “Opt pattern” can be reduce to “Join pattern” Dist(Projxy (Join(Employee(x), hasAddress(x, y)))). ▶ Since G satisfies ϕEmployee = (▷τEmployee hasAddress), “Dist” can be removed, Projxy (Join(Employee(x), hasAddress(x, y))). “≡S Equivalent Queries”
  • 38. Optimization: Example of Rewriting Rules Lemma Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph pattern s.t. T ◀ P. If y / ∈ var(P), then 1. OptF (P, P(x, y)) ≡S FilterF (Join(P, P(x, y))) 2. Join(P, P(x, y)) ∼ =S,y P where T =    C(x), if τs = C, R(x, z), if τs = ∃R, R−(x, z), if τs = ∃R− . Corollary Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph pattern s.t. T ◀ P. If y / ∈ var(P ∪ F), then 1. FilterF (Join(P, P(x, y))) ∼ =S,y FilterF (P) 2. OptF (P, P(x, y)) ∼ =S,y FilterF (P)
  • 39. Optimization: Example of Rewriting Rules T =    C(x), if τs = C, R(x, z), if τs = ∃R, R−(x, z), if τs = ∃R− . Corollary Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph pattern s.t. T ◀ P. If y / ∈ var(P ∪ F), then FilterF (Join(P, P(x, y))) ∼ =S,y FilterF (P) . Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x) lastName(x, y), hasAddress(x, z))))) Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩.
  • 40. Optimization: Example of Rewriting Rules Corollary Let ⟨s, τs, ϕs⟩ ∈ S with (≥n P.⊤) ∈ ϕs s.t. n ≥ 1, and P a graph pattern s.t. T ◀ P. If y / ∈ var(P ∪ F), then FilterF (Join(P, P(x, y))) ∼ =S,y FilterF (P) . Q = Dist(Projx y (Filterregex(y,”Smith”)(Join(Student(x) lastName(x, y), hasAddress(x, z))))) Consider the Q over G |= ⟨Student, :Student, (≥n hasAddress ⊤)⟩. Then, by following Corollary, we can reduce query Q to : Dist(Projx y (Filterregex(y,”Smith”)(Student(x) lastName(x, y))))
  • 41. Property of Query Rewriting Rules ▶ Propagation to Larger Queries ▶ Confluent Reduction
  • 42. Property of Query Rewriting Rules: Propagation Definition Let Q be a SPARQL query, and let P and U be two graph patterns. Then, we write U ◁ ∼ Q if Dist(ProjX (P)) ⊴ Q and U ⊴ P . Theorem Let Q be a SPARQL query and S a SHACL document. Let U and V be two graph patterns. Then, 1. Q ≡S QU7→V if U ≡S V 2. ProjX (Q) ≡S ProjX (Q)U7→V if U ⊴ Q, U ≡S,y V and y / ∈ var(ProjX (Q) U) 3. Q ≡S QU7→V if U ◁ ∼ Q, U ∼ =S,y V and y / ∈ var(Q U)
  • 43. Property of Query Rewriting Rules: Confluent Reduction Consider the SPARQL query, Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z))) over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with {(▷∃employeeID employeeID), (=1 insuredBy. ⊤), (hiredBy = insuredBy)} ⊆ ϕ∃employedID .
  • 44. Property of Query Rewriting Rules: Confluent Reduction Consider the SPARQL query, Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z))) over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with {(▷∃employeeID employeeID), (=1 insuredBy. ⊤), (hiredBy = insuredBy)} ⊆ ϕ∃employedID Subsequently, (=1 insuredBy. ⊤)∧(hiredBy = insuredBy) −→ (=1 hiredBy. ⊤) (=1 hiredBy. ⊤) −→ (≥1 hiredBy. ⊤) (=1 insuredBy. ⊤) −→ (≥1 insuredBy. ⊤) Need to take-care all explicit and implicit rewritings rules
  • 45. Property of Query Rewriting Rules: Confluent Reduction Consider the SPARQL query, Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z))) over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩ with {(▷∃employeeID employeeID), (=1 insuredBy. ⊤), , (hiredBy = insuredBy)} ⊆ ϕ∃employedID Then, the query is subjective to the following rewriting rules: 1. ∼ =S,y - ”Join” optimization based on (≥1 insuredBy. ⊤) 2. ∼ =S,y - ”Join” optimization based on (≥1 hiredBy. ⊤) 3. ∼ =S,y - ”Join” optimization based on (hiredBy = insuredBy) 4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤) 5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤) 6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID)
  • 46. Property of Query Rewriting Rules: Confluent Reduction Consider the SPARQL query, Dist(Projx y (employeeID(x, y) hiredBy(x, k) insuredBy(x, z))) over a graph G s.t. G |= ⟨∃employeeID, τ∃employeeID, ϕ∃employeeID⟩. Then, the query is subjective to the following rewriting rules: 1. ∼ =S,y - ”Join” optimization based on (≥1 insuredBy. ⊤) 2. ∼ =S,y - ”Join” optimization based on (≥1 hiredBy. ⊤) 3. ∼ =S,y - ”Join” optimization based on (hiredBy = insuredBy) 4. ≡S,y - ”Join” optimization based on (=1 insuredBy. ⊤) 5. ≡S,y - ”Join” optimization based on (=1 hiredBy. ⊤) 6. ≡S - ”Dist” optimization based on (▷∃employeeID employeeID) Regardless of the sequence in which these rewrites are applied, we will get: Projx y (employeeID(x, y))
  • 47. Property of Query Rewriting Rules: Confluent Reduction ... As rewriting optimizations are generalized in the form of lemmas and their consequences, we state confluent results as follows: Theorem Query rewriting defined by Lemmas 1 to 6 is a confluent reduction. Theorem Query rewriting defined by Lemmas 1 to 7 is a confluent reduction iff ϕ′ = ⊤, if P = T, Vn i=1(=1 Pi . ⊤), if P = (T P1(x, z1) . . . Pi (x, zi ) . . . Pn(x, zn)) in Lemma 7.
  • 48. Other or future work? ▶ Extension to SPARQL Property Path Queries ▶ Optimization of Ontology-Mediated Query Answering