SlideShare a Scribd company logo
1 of 37
Download to read offline
The Odyssey Approach for
Optimizing Federated SPARQL
Queries
Gabriela Montoya1
, Hala Skaf-Molli2
, and Katja Hose1
1Aalborg University, Denmark
{gmontoya,khose}@cs.aau.dk
2Nantes University, France
hala.skaf@univ-nantes.fr
October 25th, 2017
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Optimizing Federated SPARQL Queries
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG
2
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Optimizing Federated SPARQL Queries
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG
Subquery Relevant Sources
?film dbo:director ?director . DBpedia
?film rdf:type dbo:Film DBpedia,Drugbank,LMDB,ChEBI
Geonames,NYTimes,Jamendo,SWDF,KEGG
?movie owl:sameAs ?film DBpedia,Drugbank,LMDB
Geonames,NYTimes,Jamendo,SWDF,KEGG
?movie dcterms:title ?title LMDB,SWDF
?movie movie:film subject film subject:444 LMDB
2
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Existing approaches
FedX1 SemaGrow2
Plan
tp1 tp2
tp3
tp5
tp4
@DBpedia
@DBpedia,Drugbank,
Geonames,Jamendo
KEGG,LMDB,
NYTimes,SWDF
@LMDB,SWDF
@LMDB
1
tp5 tp3
tp4 tp2 tp1
@DBpedia
@DBpedia,Drugbank,
LMDB,Geonames,
NYTimes,Jamendo,
SWDF,KEGG
@LMDB,
SWDF
@LMDB
1
Optimization Technique Heuristics Dynamic Programming
Optimization Time (s) 0.74 4.75
Execution Time (s) 142 6.93
1
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
2
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
3
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Considering joins between triple patterns
improves the optimization
Subquery Relevant Sources
?movie owl:sameAs ?film . LMDB
?movie dcterms:title ?title .
?movie movie:film subject film subject:444
Only entities that satisfy owl:sameAs, dcterms:title,
and movie:film subject are part of LMDB!
Entities from all other sources that seem relevant
for some triple patterns, actually will not contribute
to the result!
4
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Contributions
Concise statistics describing links among triples
while guaranteeing result completeness.
A technique to compute such statistics in a
federated setup.
Affordable optimization based on dynamic
programming.
5
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Other basic statistics are also stored
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
CSs are connected to other CSs
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
dbr:Journey to the Center of Time → owl:sameAs → CD,1
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
CSs are connected to other CSs
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
dbr:Journey to the Center of Time → owl:sameAs → CD,1
Characteristic Pairs (CP)4
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
CSs are connected to other CSs
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
dbr:Journey to the Center of Time → owl:sameAs → CD,1
Characteristic Pairs (CP)4
(CD,1, CD,2, owl:sameAs) count((CD,1, CD,2, owl:sameAs)) = 1
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Cardinality computation
SELECT DISTINCT ?film ?movie WHERE {
?film dbo:director ?director . ( tp1 )
?film rdf:type ?type . ( tp2 )
?movie owl:sameAs ?film . ( tp3 )
?movie dcterms:title ?title . ( tp4 )
?movie movie:film subject ?subject ( tp5 )
}
cardinality((Pk , Pl , p)) =
Pk ⊆Ci ∧Pl ⊆Cj
count((Ci , Cj , p))
(1)
Pk = {owl : sameAs, dcterms : title, movie : film subject}
Pl = {dbo : director, rdf : type}
p = owl : sameAs
7
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Cardinality estimation
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type ? type . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t ? s u b j e c t ( tp5 )
}
estimatedCardinality((Pk , Pl , p)) =
Pk ⊆Ci ∧Pl ⊆Cj
count((Ci , Cj , p))
∗
pk ∈Pk −{p}
ocurrences(pk , Ci )
count(Ci )
∗
pl ∈Pl
ocurrences(pl , Cj )
count(Cj )
(2)
Pk = {owl : sameAs, dcterms : title, movie : film subject}
Pl = {dbo : director, rdf : type}
p = owl : sameAs
8
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Cardinality estimation
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
estimatedCardinality((Pk , Pl , p)) = max
Pk ⊆Ci ∧Pl ⊆Cj
count((Ci , Cj , p))
∗
pk ∈Pk −{p}
ocurrences(pk , Ci )
count(Ci )
∗
pl ∈Pl
ocurrences(pl , Cj )
count(Cj )
(3)
Pk = {owl : sameAs, dcterms : title, movie : film subject}
Pl = {dbo : director, rdf : type}
p = owl : sameAs
9
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
1. At each source the CSs are computed
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
...
CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...}
...
10
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
1. At each source the CSs are computed
2. At each source local statistics are computed
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
...
CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...}
...
local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... }
local objectsLMDB(owl:sameAs, CLMDB,i )={
dbr:Kate & Leopold, dbr:Journey to the Center of Time, ...}
10
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
3. Local statistics and CSs are transferred to the
federated query engine
local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... }
local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...}
local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...}
local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...}
...
11
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
3. Local statistics and CSs are transferred to the
federated query engine
4. Set overlap is computed to estimate the statistics
of federated CSs and CPs
local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... }
local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...}
local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...}
local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...}
...
(CLMDB,j ,CDBpedia,j ,owl:sameAs)
11
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
SELECT DISTINCT ∗ WHERE {
?film dbo:director ?director . ( tp1 )
?film rdf:type dbo:Film . ( tp2 )
?movie owl:sameAs ?film . ( tp3 )
?movie dcterms:title ?title . ( tp4 )
?movie movie:film subject film subject:444 ( tp5 )
}
12
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
1. Identify the star-shaped subqueries
SELECT DISTINCT ∗ WHERE {
?film dbo:director ?director . ( tp1 )
?film rdf:type dbo:Film . ( tp2 )
?movie owl:sameAs ?film . ( tp3 )
?movie dcterms:title ?title . ( tp4 )
?movie movie:film subject film subject:444 ( tp5 )
}
12
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
1. Identify the star-shaped subqueries
2. Identify the relevant CSs and estimate cardinality
SELECT DISTINCT ∗ WHERE {
?film dbo:director ?director . ( tp1 )
?film rdf:type dbo:Film . ( tp2 )
?movie owl:sameAs ?film . ( tp3 )
?movie dcterms:title ?title . ( tp4 )
?movie movie:film subject film subject:444 ( tp5 )
}
CDBpedia,j ={dbo:director,rdf:type,...}
estimatedCardinality({dbo:director,rdf:type})=162
CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...}
estimatedCardinality({movie:film subject,owl:sameAs,dcterms:title})=4
12
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
3. Identify the relevant CPs and estimate cardinality.
(CLMDB,i,CDBpedia,j,owl:sameAs)
estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1
13
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
3. Identify the relevant CPs and estimate cardinality.
4. A cost function is used to select the plan that
leads to transfer the lower number of tuples.
(CLMDB,i,CDBpedia,j,owl:sameAs)
estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1
tp5 tp3
tp4 tp1 tp2
@DBpedia
@LMDB
1
tp1 tp2
tp5 tp3
tp4
@DBpedia
@LMDB
1
cost=4+1=5 cost=162+1=163
13
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
3. Identify the relevant CPs and estimate cardinality.
4. A cost function is used to select the plan that
leads to transfer the lower number of tuples.
5. If necessary, compute join ordering using
Dynamic Programming.
(CLMDB,i,CDBpedia,j,owl:sameAs)
estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1
tp5 tp3
tp4 tp1 tp2
@DBpedia
@LMDB
1
tp1 tp2
tp5 tp3
tp4
@DBpedia
@LMDB
1
cost=4+1=5 cost=162+1=163
13
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey provides a better plan
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
5
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
6
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
7
G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17.
14
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey provides a better plan
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
FedX5 SemaGrow6 Odyssey7
Plan
tp1 tp2
tp3
tp5
tp4
@DBpedia
@DBpedia,Drugbank,
Geonames,Jamendo
KEGG,LMDB,
NYTimes,SWDF
@LMDB,SWDF
@LMDB
1
tp5 tp3
tp4 tp2 tp1
@DBpedia
@DBpedia,Drugbank,
LMDB,Geonames,
NYTimes,Jamendo,
SWDF,KEGG
@LMDB,
SWDF
@LMDB
1
tp5 tp3
tp4 tp1 tp2
@DBpedia
@LMDB
1
OT 0.74s 4.75s 0.22s
ET 142s 6.93s 1.30s
5
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
6
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
7
G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17.
14
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Experimental setup
Queries and datasets from FedBench8
.
Comparison with existing approaches FedX9
,
SemaGrow10
, SPLENDID11
, HiBISCuS12
.
A Virtuoso 7.2.4.2 endpoint was deployed for
each dataset.
Plots present the average over nine executions
with a timeout of 30m.
8
M. Schmidt et al. “FedBench: A Benchmark Suite for Federated Semantic Data Query Processing”. In:
ISWC’11.
9
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
10
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
11
O. G¨orlitz and S. Staab. “SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions”. In:
COLD’11.
12
M. Saleem and A. N. Ngomo. “HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint
Federation”. In: ESWC’14.
15
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Optimization Time is
comparable with other approaches’
100
102
104
LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11
OT(ms)
Odyssey
HiBISCuS−Warm
HiBISCuS−Cold
FedX−Warm
FedX−Cold
SemaGrow
SPLENDID
100
102
104
CD1 CD2 CD3 CD4 CD5 CD6 CD7
OT(ms)
100
102
104
LS1 LS2 LS3 LS4 LS5 LS6 LS7
OT(ms)
16
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s plans have less subqueries than
other approaches’ plans
0
5
10
15
20
LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11
NSQ
Odyssey
HiBISCuS−Warm
HiBISCuS−Cold
FedX−Warm
FedX−Cold
SemaGrow
SPLENDID
0
5
10
15
20
CD1 CD2 CD3 CD4 CD5 CD6 CD7
NSQ
0
5
10
15
20
LS1 LS2 LS3 LS4 LS5 LS6 LS7
NSQ
17
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s plans are overall executed faster
than other approaches’ plans
TIMEOUT
100
102
104
106
LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11
ET(ms)
Odyssey
HiBISCuS−Warm
HiBISCuS−Cold
FedX−Warm
FedX−Cold
SemaGrow
SPLENDID
100
102
104
106
CD1 CD2 CD3 CD4 CD5 CD6 CD7
ET(ms)
TIMEOUT
TIMEOUT
TIMEOUT
INCOMPLETERESULT
ABORT
100
102
104
106
LS1 LS2 LS3 LS4 LS5 LS6 LS7
ET(ms)
18
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Conclusions
Odyssey uses cardinality estimations that allow
for better optimizations.
Local statistics allow to discover connections
between datasets in a federated setup.
Odyssey’s plans are in general better than
existing approaches’ plans.
19
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Future works
Odyssey’s optimization can be improved using
with runtime optimizations (ASK queries).
Reduce the local statistics computation times
and sizes.
Provide efficient strategies to update the
statistics.
20
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Questions?
21

More Related Content

Similar to Talk odysseyiswc2017

Leveraging Linked Data to Infer Semantic Relations within Structured Sources
Leveraging Linked Data to Infer Semantic Relations within Structured SourcesLeveraging Linked Data to Infer Semantic Relations within Structured Sources
Leveraging Linked Data to Infer Semantic Relations within Structured Sources
Mohsen Taheriyan
 
Presentation final
Presentation finalPresentation final
Presentation final
tmra
 

Similar to Talk odysseyiswc2017 (13)

Summary of SIGIR 2011 Papers
Summary of SIGIR 2011 PapersSummary of SIGIR 2011 Papers
Summary of SIGIR 2011 Papers
 
Leveraging Linked Data to Infer Semantic Relations within Structured Sources
Leveraging Linked Data to Infer Semantic Relations within Structured SourcesLeveraging Linked Data to Infer Semantic Relations within Structured Sources
Leveraging Linked Data to Infer Semantic Relations within Structured Sources
 
Building generic data queries using python ast
Building generic data queries using python astBuilding generic data queries using python ast
Building generic data queries using python ast
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
Presentation final
Presentation finalPresentation final
Presentation final
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic Programming
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discovery
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Talk odysseyiswc2017

  • 1. The Odyssey Approach for Optimizing Federated SPARQL Queries Gabriela Montoya1 , Hala Skaf-Molli2 , and Katja Hose1 1Aalborg University, Denmark {gmontoya,khose}@cs.aau.dk 2Nantes University, France hala.skaf@univ-nantes.fr October 25th, 2017
  • 2. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Optimizing Federated SPARQL Queries SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG 2
  • 3. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Optimizing Federated SPARQL Queries SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG Subquery Relevant Sources ?film dbo:director ?director . DBpedia ?film rdf:type dbo:Film DBpedia,Drugbank,LMDB,ChEBI Geonames,NYTimes,Jamendo,SWDF,KEGG ?movie owl:sameAs ?film DBpedia,Drugbank,LMDB Geonames,NYTimes,Jamendo,SWDF,KEGG ?movie dcterms:title ?title LMDB,SWDF ?movie movie:film subject film subject:444 LMDB 2
  • 4. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Existing approaches FedX1 SemaGrow2 Plan tp1 tp2 tp3 tp5 tp4 @DBpedia @DBpedia,Drugbank, Geonames,Jamendo KEGG,LMDB, NYTimes,SWDF @LMDB,SWDF @LMDB 1 tp5 tp3 tp4 tp2 tp1 @DBpedia @DBpedia,Drugbank, LMDB,Geonames, NYTimes,Jamendo, SWDF,KEGG @LMDB, SWDF @LMDB 1 Optimization Technique Heuristics Dynamic Programming Optimization Time (s) 0.74 4.75 Execution Time (s) 142 6.93 1 A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In: ISWC’11. 2 A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15. 3
  • 5. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Considering joins between triple patterns improves the optimization Subquery Relevant Sources ?movie owl:sameAs ?film . LMDB ?movie dcterms:title ?title . ?movie movie:film subject film subject:444 Only entities that satisfy owl:sameAs, dcterms:title, and movie:film subject are part of LMDB! Entities from all other sources that seem relevant for some triple patterns, actually will not contribute to the result! 4
  • 6. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Contributions Concise statistics describing links among triples while guaranteeing result completeness. A technique to compute such statistics in a federated setup. Affordable optimization based on dynamic programming. 5
  • 7. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 8. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 9. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 10. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 11. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 12. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Other basic statistics are also stored Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 13. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al CSs are connected to other CSs Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... dbr:Journey to the Center of Time → owl:sameAs → CD,1 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 14. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al CSs are connected to other CSs Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... dbr:Journey to the Center of Time → owl:sameAs → CD,1 Characteristic Pairs (CP)4 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 15. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al CSs are connected to other CSs Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... dbr:Journey to the Center of Time → owl:sameAs → CD,1 Characteristic Pairs (CP)4 (CD,1, CD,2, owl:sameAs) count((CD,1, CD,2, owl:sameAs)) = 1 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 16. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Cardinality computation SELECT DISTINCT ?film ?movie WHERE { ?film dbo:director ?director . ( tp1 ) ?film rdf:type ?type . ( tp2 ) ?movie owl:sameAs ?film . ( tp3 ) ?movie dcterms:title ?title . ( tp4 ) ?movie movie:film subject ?subject ( tp5 ) } cardinality((Pk , Pl , p)) = Pk ⊆Ci ∧Pl ⊆Cj count((Ci , Cj , p)) (1) Pk = {owl : sameAs, dcterms : title, movie : film subject} Pl = {dbo : director, rdf : type} p = owl : sameAs 7
  • 17. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Cardinality estimation SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type ? type . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t ? s u b j e c t ( tp5 ) } estimatedCardinality((Pk , Pl , p)) = Pk ⊆Ci ∧Pl ⊆Cj count((Ci , Cj , p)) ∗ pk ∈Pk −{p} ocurrences(pk , Ci ) count(Ci ) ∗ pl ∈Pl ocurrences(pl , Cj ) count(Cj ) (2) Pk = {owl : sameAs, dcterms : title, movie : film subject} Pl = {dbo : director, rdf : type} p = owl : sameAs 8
  • 18. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Cardinality estimation SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } estimatedCardinality((Pk , Pl , p)) = max Pk ⊆Ci ∧Pl ⊆Cj count((Ci , Cj , p)) ∗ pk ∈Pk −{p} ocurrences(pk , Ci ) count(Ci ) ∗ pl ∈Pl ocurrences(pl , Cj ) count(Cj ) (3) Pk = {owl : sameAs, dcterms : title, movie : film subject} Pl = {dbo : director, rdf : type} p = owl : sameAs 9
  • 19. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Federated computation of statistics 1. At each source the CSs are computed film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . ... CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...} ... 10
  • 20. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Federated computation of statistics 1. At each source the CSs are computed 2. At each source local statistics are computed film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . ... CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...} ... local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... } local objectsLMDB(owl:sameAs, CLMDB,i )={ dbr:Kate & Leopold, dbr:Journey to the Center of Time, ...} 10
  • 21. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Federated computation of statistics 3. Local statistics and CSs are transferred to the federated query engine local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... } local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...} local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...} local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...} ... 11
  • 22. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Federated computation of statistics 3. Local statistics and CSs are transferred to the federated query engine 4. Set overlap is computed to estimate the statistics of federated CSs and CPs local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... } local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...} local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...} local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...} ... (CLMDB,j ,CDBpedia,j ,owl:sameAs) 11
  • 23. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization SELECT DISTINCT ∗ WHERE { ?film dbo:director ?director . ( tp1 ) ?film rdf:type dbo:Film . ( tp2 ) ?movie owl:sameAs ?film . ( tp3 ) ?movie dcterms:title ?title . ( tp4 ) ?movie movie:film subject film subject:444 ( tp5 ) } 12
  • 24. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 1. Identify the star-shaped subqueries SELECT DISTINCT ∗ WHERE { ?film dbo:director ?director . ( tp1 ) ?film rdf:type dbo:Film . ( tp2 ) ?movie owl:sameAs ?film . ( tp3 ) ?movie dcterms:title ?title . ( tp4 ) ?movie movie:film subject film subject:444 ( tp5 ) } 12
  • 25. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 1. Identify the star-shaped subqueries 2. Identify the relevant CSs and estimate cardinality SELECT DISTINCT ∗ WHERE { ?film dbo:director ?director . ( tp1 ) ?film rdf:type dbo:Film . ( tp2 ) ?movie owl:sameAs ?film . ( tp3 ) ?movie dcterms:title ?title . ( tp4 ) ?movie movie:film subject film subject:444 ( tp5 ) } CDBpedia,j ={dbo:director,rdf:type,...} estimatedCardinality({dbo:director,rdf:type})=162 CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...} estimatedCardinality({movie:film subject,owl:sameAs,dcterms:title})=4 12
  • 26. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 3. Identify the relevant CPs and estimate cardinality. (CLMDB,i,CDBpedia,j,owl:sameAs) estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1 13
  • 27. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 3. Identify the relevant CPs and estimate cardinality. 4. A cost function is used to select the plan that leads to transfer the lower number of tuples. (CLMDB,i,CDBpedia,j,owl:sameAs) estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1 tp5 tp3 tp4 tp1 tp2 @DBpedia @LMDB 1 tp1 tp2 tp5 tp3 tp4 @DBpedia @LMDB 1 cost=4+1=5 cost=162+1=163 13
  • 28. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 3. Identify the relevant CPs and estimate cardinality. 4. A cost function is used to select the plan that leads to transfer the lower number of tuples. 5. If necessary, compute join ordering using Dynamic Programming. (CLMDB,i,CDBpedia,j,owl:sameAs) estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1 tp5 tp3 tp4 tp1 tp2 @DBpedia @LMDB 1 tp1 tp2 tp5 tp3 tp4 @DBpedia @LMDB 1 cost=4+1=5 cost=162+1=163 13
  • 29. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey provides a better plan SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } 5 A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In: ISWC’11. 6 A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15. 7 G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17. 14
  • 30. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey provides a better plan SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } FedX5 SemaGrow6 Odyssey7 Plan tp1 tp2 tp3 tp5 tp4 @DBpedia @DBpedia,Drugbank, Geonames,Jamendo KEGG,LMDB, NYTimes,SWDF @LMDB,SWDF @LMDB 1 tp5 tp3 tp4 tp2 tp1 @DBpedia @DBpedia,Drugbank, LMDB,Geonames, NYTimes,Jamendo, SWDF,KEGG @LMDB, SWDF @LMDB 1 tp5 tp3 tp4 tp1 tp2 @DBpedia @LMDB 1 OT 0.74s 4.75s 0.22s ET 142s 6.93s 1.30s 5 A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In: ISWC’11. 6 A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15. 7 G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17. 14
  • 31. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Experimental setup Queries and datasets from FedBench8 . Comparison with existing approaches FedX9 , SemaGrow10 , SPLENDID11 , HiBISCuS12 . A Virtuoso 7.2.4.2 endpoint was deployed for each dataset. Plots present the average over nine executions with a timeout of 30m. 8 M. Schmidt et al. “FedBench: A Benchmark Suite for Federated Semantic Data Query Processing”. In: ISWC’11. 9 A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In: ISWC’11. 10 A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15. 11 O. G¨orlitz and S. Staab. “SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions”. In: COLD’11. 12 M. Saleem and A. N. Ngomo. “HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation”. In: ESWC’14. 15
  • 32. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Optimization Time is comparable with other approaches’ 100 102 104 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 OT(ms) Odyssey HiBISCuS−Warm HiBISCuS−Cold FedX−Warm FedX−Cold SemaGrow SPLENDID 100 102 104 CD1 CD2 CD3 CD4 CD5 CD6 CD7 OT(ms) 100 102 104 LS1 LS2 LS3 LS4 LS5 LS6 LS7 OT(ms) 16
  • 33. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s plans have less subqueries than other approaches’ plans 0 5 10 15 20 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 NSQ Odyssey HiBISCuS−Warm HiBISCuS−Cold FedX−Warm FedX−Cold SemaGrow SPLENDID 0 5 10 15 20 CD1 CD2 CD3 CD4 CD5 CD6 CD7 NSQ 0 5 10 15 20 LS1 LS2 LS3 LS4 LS5 LS6 LS7 NSQ 17
  • 34. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s plans are overall executed faster than other approaches’ plans TIMEOUT 100 102 104 106 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 ET(ms) Odyssey HiBISCuS−Warm HiBISCuS−Cold FedX−Warm FedX−Cold SemaGrow SPLENDID 100 102 104 106 CD1 CD2 CD3 CD4 CD5 CD6 CD7 ET(ms) TIMEOUT TIMEOUT TIMEOUT INCOMPLETERESULT ABORT 100 102 104 106 LS1 LS2 LS3 LS4 LS5 LS6 LS7 ET(ms) 18
  • 35. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Conclusions Odyssey uses cardinality estimations that allow for better optimizations. Local statistics allow to discover connections between datasets in a federated setup. Odyssey’s plans are in general better than existing approaches’ plans. 19
  • 36. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Future works Odyssey’s optimization can be improved using with runtime optimizations (ASK queries). Reduce the local statistics computation times and sizes. Provide efficient strategies to update the statistics. 20
  • 37. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Questions? 21