SPARQL Query Containment with ShEx Constraints

SPARQL Query Containment with ShEx Constraints
SPARQL Query Containment with
ShEx Constraints
21st European Conference on Advances in Databases and
Information Systems (ADBIS 2017)
Abdullah Abbas
Pierre Genevès
Cécile Roisin
Nabil Layaïda
24 - 27 September 2017
1/32

RDF (Resource Description Framework)
RDF Triple
<subject> <predicate> <object>
{
:p1 :named "Alice".
:p1 :likes :tennis.
:p2 :named "Bob".
:p2 :likes :soccer.
}
2/32

RDF Triple
{
:p1 :named "Alice".
:p1 :likes :tennis.
:p2 :named "Bob".
:p2 :likes :soccer.
}
p1
“Alice”named
2/32

RDF Graph
{
:p1 :named "Alice".
:p1 :likes :tennis.
:p2 :named "Bob".
:p2 :likes :soccer.
}
p1
“Alice”
tennis
p2
“Bob”
soccer
named
likes
named
likes
2/32

SPARQL Query
RDF Graph
{
:p1 :named "Alice".
:p1 :likes :tennis.
:p2 :named "Bob".
:p2 :likes :soccer.
}
p1
“Alice”
tennis
p2
“Bob”
soccer
named
likes
named
likes
3/32

SPARQL Query
RDF Graph
{
:p1 :named "Alice".
:p1 :likes :tennis.
:p2 :named "Bob".
:p2 :likes :soccer.
}
SPARQL query
SELECT ?name
WHERE
{
?p :likes :tennis.
?p :named ?name.
}
p1
“Alice”
tennis
p2
“Bob”
soccer
named
likes
named
likes
3/32

SPARQL Query
RDF Graph
{
:p1 :named "Alice".
:p1 :likes :tennis.
:p2 :named "Bob".
:p2 :likes :soccer.
}
SPARQL query
SELECT ?name
WHERE
{
?p :likes :tennis.
?p :named ?name.
}
p1
“Alice”
tennis
p2
“Bob”
soccer
?p
?name
tennis
named
likes
named
likes
named
likes
3/32

Supported SPARQL Fragments
Basic SPARQL Query
SELECT ?name
WHERE
{
?p :likes :tennis.
?p :named ?name.
}
4/32

Basic SPARQL Query
SELECT ?name
WHERE
{
?p :likes :tennis.
?p :named ?name.
}
UNION
SELECT ?name
WHERE
{
?p :likes :tennis.
?p :named ?name.
}
UNION
{
?p :likes :tennis.
?p :named ?name.
}
4/32

OPTIONAL
SELECT ?name, ?other
WHERE
{
?p :likes :tennis.
?p :named ?name.
}
OPTIONAL
{
?p :likes ?other.
?p :named ?name.
}
5/32

ShEx Constraints
ShEx Constraints
<Person> {
:named xsd:string ;
:likes @<Sport>
}
<Sport> {
:category xsd:string
}
RDF Graph 1 (Not valid)
{
:p1 :named "Bob".
:p1 :likes :soccer.
}
RDF Graph 2 (Valid)
{
:p2 :named "Bob".
:p2 :likes :soccer.
:soccer :category "team sports"
}
6/32

Motivation:
SPARQL query containment is well studied in the
literature.
Query containment is useful for query planning and
optimization.
7/32

Motivation:
literature.
optimization.
RDF documents in real life applications are constrained.
ShEx is an emerging schema language (FHIR, WebIndex).
Containment without constraints → False negatives
7/32

Motivation:
literature.
optimization.
Purpose:
SPARQL query containment in the presence of ShEx
constraints.
7/32

Motivation:
literature.
optimization.
Purpose:
SPARQL query containment in the presence of ShEx
constraints.
How?
We beneﬁt from existing containment solvers, and ShEx
validators. We apply novel query transformations that does
not increase the complexity of containment.
7/32

Table of Contents
1 Introduction
Containment Overview
ShEx (Shape Expressions)
2 Containment Examples with ShEx Constraints
SPARQL (Conjunctive)
SPARQL (with OPTIONALs)
3 Containment Procedure Deﬁnition
Procedure Overview
Query Transformation
ShEx Schema Transformation
Containment Procedure Summary
Complexity
4 Alternative Method
5 Conclusion
8/32

Introduction
Table of Contents
1 Introduction
Procedure Overview
Complexity
5 Conclusion
9/32

Introduction
Query 1 Query 2
For any dataset, the results of Query 1 are contained in the
results of Query 2.
10/32

Introduction
Query 1 Query 2
results of Query 2.
Containment Solver:
Q1
Q2
Containment
Solver
(𝑸 𝟏 ⊑ 𝑸 𝟐)
(𝑸 𝟏 ⋢ 𝑸 𝟐)
No
Yes
Input queries Results
10/32

Introduction
Query 1 Query 2
results of Query 2.
Example:
RDF Data (Big)
SELECT* {
?x name ?name.
?x role manager. }
Query 1
SELECT * {
?x name ?name.
?x role ?role. }
Query 2 Query 2 Results
Query 1 Results
10/32

Introduction
What is ShEx?
ShEx is an RDF constraint language.
Validate RDF documents
ShEx uses logical operators to deﬁne constraints inductively:
e ::= | Σ × Γ | e∗ | e[m;n] | (e|e ) | (e e )
11/32

Introduction
ShEx is an RDF constraint language.
Validate RDF documents
Definition (ShEx Expression)
Given a set of edge labels (Σ), and a set of types (Γ), then
Σ × Γ is a shape expression. ShEx uses logical operators to
define constraints inductively:
e ::= | Σ × Γ | e∗ | e[m;n] | (e|e ) | (e e )
Definition (ShEx Schema)
A ShEx schema is a tuple S = (Σ, Γ, δ), where δ is a type
definition function that maps elements of Γ to shape
expressions e over Σ × Γ.
12/32

Introduction
Containment with ShEx Deﬁnition
Containment with ShEx: Query 1 S Query 2
For any dataset valid with respect to the ShEx schema S, the
results of Query 1 are contained in the results of Query 2.
Containment Solver:
Q1
Q2
Containment
Solver
(𝑸 𝟏 ⊑ 𝑺 𝑸 𝟐)
(𝑸 𝟏 ⋢ 𝑺 𝑸 𝟐)
No
Yes
Input queries & schema Results
Schema (S)
13/32

Introduction
Containment with ShEx Deﬁnition
Containment with ShEx: Query 1 S Query 2
For any dataset valid with respect to the ShEx schema S, the
results of Query 1 are contained in the results of Query 2.
Containment Solver:
Q1
Q2
Containment
Solver
(𝑸 𝟏 ⊑ 𝑺 𝑸 𝟐)
(𝑸 𝟏 ⋢ 𝑺 𝑸 𝟐)
No
Yes
Input queries & schema Results
Schema (S)
Why query containment with ShEx?
More containment cases can be inferred.
13/32

Containment Examples with ShEx Constraints
Table of Contents
1 Introduction
Procedure Overview
Complexity
5 Conclusion
14/32

Query 1
SELECT * WHERE {
?p :named ?name .
?p :likes "tennis". }
Query 2
SELECT * WHERE {
?p :named ?name .
?p :plays "soccer" }
Query 1 Query 2
15/32

Query 1 (No results)
SELECT * WHERE {
?p :named ?name .
?p :likes "tennis". }
Query 2
SELECT * WHERE {
?p :named ?name .
?p :plays "soccer" }
Query 1 Query 2
More Containment with ShEx?
ShEx Schema (S)
<Person> {
:named xsd:string ;
:plays xsd:string }
Query 1 S Query 2
15/32

OPTIONAL patterns in queries are more interesting for
query static analysis with constraints:
Semantics: Only get extended results if available.
16/32

ShEx Schema (S)
<Product> {
:name xsd:string ;
:expiryDate xsd:date ;
:producer @<Company> + ;
:feature xsd:string }
Query 1
{ ?x :producer :p1 . ?x :feature "feature 1" }
OPTIONAL
{ ?x :feature "feature 2" . ?x :expiryDate ?d }
Query 2
{ ?x :producer ?y . ?x :feature "feature 1"}
17/32

ShEx Schema (S)
<Product> {
:name xsd:string ;
:expiryDate xsd:date ;
:producer @<Company> + ;
:feature xsd:string }
Query 1
{ ?x :producer :p1 . ?x :feature "feature 1" }
OPTIONAL
{ ?x :feature "feature 2" . ?x :expiryDate ?d }
Query 2
{ ?x :producer ?y . ?x :feature "feature 1"}
Results: Query 1 S Query 2
17/32

Complex OPTIONAL Query
{ :product1 :label ?productLabel . }
OPTIONAL {
?offer :product :product1 .
?offer :price ?price .
?offer :vendor ?vendor } .
OPTIONAL {
?review :reviewFor :product1 .
?review :rating ?rating .
}
18/32

Complex OPTIONAL Query
{ :product1 :label ?productLabel . }
OPTIONAL {
?offer :product :product1 .
?offer :price ?price .
?offer :vendor ?vendor } .
OPTIONAL {
?review :reviewFor :product1 .
OPTIONAL {
?review :reviewer ?reviewer .
OPTIONAL { ?reviewer :name ?revName }} .
?review :rating ?rating .
OPTIONAL {
?rating :nbOfRaters ?n }
}
18/32

Well-designed OPTIONAL Fragment
Deﬁnition:
For every subpattern q = (q1 OPT q2) of q and every variable
x occurring in q, it holds that: if x occurs inside q2 and outside
q , then x also occurs inside q1.
Containment of the w.d. OPT fragment is studied. A sound and
complete procedure is available [Leterlier et al., 2012].
+ UNION at top level.
Representation as pattern trees:
((P1 OPT (P11 OPT P111 OPT P112)) OPT P12) OPT P13
P1
P11 P12 P13
P111 P112 19/32

Containment Procedure Deﬁnition
Table of Contents
1 Introduction
Procedure Overview
Complexity
5 Conclusion
20/32

Procedure Overview
Procedure Overview
Components:
ShEx validator
Containment Solver
Q1
S'
Q'1
SPARQL
Containment Solver
Query
SPARQL Containment
(𝑸′ 𝟏⊑ 𝑸′
𝟐)?
(𝑸 𝟏⊑ 𝑺 𝑸 𝟐)?
SPARQL Containment
with ShEx
≡
P1
𝓡𝓓𝓕
(P1 )
𝓡𝓓𝓕(𝑷n)
(variables  IRIs)
Validate parts of the
transformed pattern
tree and eliminate
the non-valid nodes.
P'1
Q2 Q'2P2
𝓡𝓓𝓕
(P2 )
Validate parts of the
transformed pattern
tree and eliminate
the non-valid nodes.
P'2
Pattern Tree
Pattern Tree
(new)
Results:
Query
Schema
S 𝓜𝓘𝓝0(𝑺)
Minimals discarding
schema transformation
21/32

1- Original Query Pattern (P)
{
?x :producer :p1 .
?x :feature "feature 1"
}
OPTIONAL
{
?x :feature "feature 2" .
?x :expiryDate ?d
}
22/32

2- RDF(P)
{
:x :producer :p1 .
:x :feature "feature 1"
}
OPTIONAL
{
:x :feature "feature 2" .
:x :expiryDate :d
}
22/32

3- Validate Against Schema S (Part 1)
{
:x :producer :p1 .
:x :feature "feature 1"
}
OPTIONAL
{
:x :expiryDate :d
}
Not Valid? → Query is equivalent to the empty query!
Valid? → Validate Against Schema S (Part 1 and 2)
22/32

3- Validate Against Schema S (Part 1 and 2)
{
:x :producer :p1 .
}
OPTIONAL
{
:x :expiryDate :d
}
Not Valid? → Eliminate (Part 2) only!
Valid? → Keep the query as it is!
22/32

Minimal cardinality constraints should be ignored since the
query is not a complete representation of the data.
Example
Assume a ShEx constraint: At least three skills for each person.
Assume a query pattern: {:p1 :skill ?x}
The presence of only one triple is not a violation, the RDF data may
still have three skills for :p1.
Thus minimal cardinality constraints must be ignored.
MIN0(.) is a transformation function that takes a ShEx
document and returns another ShEx document ignoring all
minimal cardinalities.
23/32

Step 1: (Transformation)
Eliminate pattern tree nodes that are not valid.
Minimal cardinalities are not considered.
Query 1 Query 2
Step 2: (Containment)
Check the new modiﬁed queries for containment.
24/32

Step 1: (Transformation)
Eliminate pattern tree nodes that are not valid.
Minimal cardinalities are not considered.
Query 1 Query 2
Step 2: (Containment)
Check the new modiﬁed queries for containment.
Results:
T(Query 1) T(Query 2) ≡ Query 1 S Query 2
24/32

Complexity
Implementation and Theoretical Complexity
Implementation
Queries and ShEx documents transformation implemented using java.
ShEx validator, available from [Boneva et al., 2014] (NP-c)
Containment Solver, available from [Pichler and Skritek, 2014]. (ΠP
2-c)
Complexity of SPARQL containment with ShEx
SPARQL Fragment Without(ShEx) With(ShEx)
BGP [Chandra, 1977] NP-c NP-c
AND-OPT [Pichler, 2014] NP-c NP-c
AND-OPT-(UNION) [Pichler, 2014] ΠP
2-c ΠP
2-c
25/32

Alternative Method
Table of Contents
1 Introduction
Procedure Overview
Complexity
5 Conclusion
26/32

Alternative Method
Alternative Method
We reduce query containment with ShEx into:
FOL (First Order Logic) formula satisﬁability.
FOL with only two variable is a decidable FOL fragment
with NEXP-c satisﬁability complexity.
It supports SPARQL fragment extensions.
Q1
Q2
Queries encoding
function
Schema encoding
function
Schema (S)
FOLAutomated
Theorem Prover
(𝜻)
𝓐(𝑸 𝟏)⋀¬𝓐(𝑸 𝟐)
axioms
conjecture
FOL
(𝑸 𝟏 ⊑ 𝑺 𝑸 𝟐)
(𝑸 𝟏 ⋢ 𝑺 𝑸 𝟐)
Result
FOL Problem
satisfiable
FOL Problem
unsatisfiable
27/32

Alternative Method
Alternative Method
Method 1 Method 2 (FOL)
SPARQL Fragment Without/With(ShEx) Without/With(ShEx)
BGP NP-Complete NEXP
AND-OPT NP-Complete NEXP
AND-OPT-(UNION) ΠP
2-Complete NEXP
AND-OPT-(UNION)-Minus [not supported] NEXP
AND-OPT-(UNION)-FILTER [not supported] NEXP
AND-OPT-(UNION)-PP [not supported] NEXP
AND-OPT-(UNION)-MINUS-FILTER-PP [not supported] NEXP
27/32

Alternative Method
Alternative Method
Method 1 Method 2 (FOL)
SPARQL Fragment Without/With(ShEx) Without/With(ShEx)
BGP NP-Complete NEXP
AND-OPT NP-Complete NEXP
AND-OPT-(UNION) ΠP
2-Complete NEXP
AND-OPT-(UNION)-Minus [not supported] NEXP
AND-OPT-(UNION)-FILTER [not supported] NEXP
AND-OPT-(UNION)-PP [not supported] NEXP
AND-OPT-(UNION)-MINUS-FILTER-PP [not supported] NEXP
Method 2 is more complex but supports more fragments
27/32

Alternative Method
Implementations Tests
2 implementations:
Method 1 (Validation+Containment)
Method 2 (FOL)
Test Queries from the Berlin SPARQL Benchmark
Hand-crafted ShEx schemas
Execution Time (with Method 1) ≈ 800 ms
Execution Time (with Method 2) ≈ 400 ms
Available implementations of FOL automated theorem provers
are highly optimized and efﬁcient!
28/32

Conclusion
Table of Contents
1 Introduction
Procedure Overview
Complexity
5 Conclusion
29/32

Conclusion
Conclusion
Motivation: SPARQL query containment with ShEx
constraints is important (due to constrained databases).
With ShEx more containment cases can be inferred.
30/32

Conclusion
Conclusion
Contribution 1: We deﬁned a sound and complete
procedure for SPARQL query containment with ShEx
constraints.
30/32

Conclusion
Conclusion
constraints.
Contribution 2: We provided the complexity bounds of
the problem considered. (ΠP
2-c for AND-OPT-(UNION))
30/32

Conclusion
Conclusion
constraints.
Contribution 3: We implemented the procedure in a
java framework using an existing ShEx validator and an
existing containment solver.
30/32

Conclusion
Conclusion
constraints.
Contribution 3: We implemented the procedure in a
java framework using an existing ShEx validator and an
existing containment solver.
Contribution 4: Alternative method for extending the
SPARQL fragment (reducing to FOL satisﬁability).
30/32

Conclusion
Further Perspectives
In Application:
There exist many FOL automated theorem provers that
participate annually in competitions.
Only 1 containment solver for the AND-OPT-(UNION)
fragment is available. [Pichler and Skritek, 2014]
Using available FOL automated theorem provers allowed
for more efﬁcient problem solving implementations (even
for small fragments).
Perspective:
It is clear that there exists a space to make better
implementations of the containment solver and/or of
the ShEx validator.
31/32

Conclusion
Thank You
32/32

SPARQL Query Containment with ShEx Constraints

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to SPARQL Query Containment with ShEx Constraints

Similar to SPARQL Query Containment with ShEx Constraints (20)

Recently uploaded

Recently uploaded (20)

SPARQL Query Containment with ShEx Constraints