SlideShare a Scribd company logo
The Odyssey Approach for
Optimizing Federated SPARQL
Queries
Gabriela Montoya1
, Hala Skaf-Molli2
, and Katja Hose1
1Aalborg University, Denmark
{gmontoya,khose}@cs.aau.dk
2Nantes University, France
hala.skaf@univ-nantes.fr
October 25th, 2017
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Optimizing Federated SPARQL Queries
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG
2
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Optimizing Federated SPARQL Queries
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG
Subquery Relevant Sources
?film dbo:director ?director . DBpedia
?film rdf:type dbo:Film DBpedia,Drugbank,LMDB,ChEBI
Geonames,NYTimes,Jamendo,SWDF,KEGG
?movie owl:sameAs ?film DBpedia,Drugbank,LMDB
Geonames,NYTimes,Jamendo,SWDF,KEGG
?movie dcterms:title ?title LMDB,SWDF
?movie movie:film subject film subject:444 LMDB
2
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Existing approaches
FedX1 SemaGrow2
Plan
tp1 tp2
tp3
tp5
tp4
@DBpedia
@DBpedia,Drugbank,
Geonames,Jamendo
KEGG,LMDB,
NYTimes,SWDF
@LMDB,SWDF
@LMDB
1
tp5 tp3
tp4 tp2 tp1
@DBpedia
@DBpedia,Drugbank,
LMDB,Geonames,
NYTimes,Jamendo,
SWDF,KEGG
@LMDB,
SWDF
@LMDB
1
Optimization Technique Heuristics Dynamic Programming
Optimization Time (s) 0.74 4.75
Execution Time (s) 142 6.93
1
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
2
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
3
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Considering joins between triple patterns
improves the optimization
Subquery Relevant Sources
?movie owl:sameAs ?film . LMDB
?movie dcterms:title ?title .
?movie movie:film subject film subject:444
Only entities that satisfy owl:sameAs, dcterms:title,
and movie:film subject are part of LMDB!
Entities from all other sources that seem relevant
for some triple patterns, actually will not contribute
to the result!
4
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Contributions
Concise statistics describing links among triples
while guaranteeing result completeness.
A technique to compute such statistics in a
federated setup.
Affordable optimization based on dynamic
programming.
5
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Statistics computation at one location
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Other basic statistics are also stored
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
CSs are connected to other CSs
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
dbr:Journey to the Center of Time → owl:sameAs → CD,1
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
CSs are connected to other CSs
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
dbr:Journey to the Center of Time → owl:sameAs → CD,1
Characteristic Pairs (CP)4
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
CSs are connected to other CSs
Characteristic Sets (CS)3
CD,1 = { movie:film subject, dcterms:title, owl:sameAs }
count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2
CD,2 = { dbo:director, rdf:type } count(CD,2)=1
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt .
dbr:Journey to the Center of Time rdf:type dbo:Film .
...
dbr:Journey to the Center of Time → owl:sameAs → CD,1
Characteristic Pairs (CP)4
(CD,1, CD,2, owl:sameAs) count((CD,1, CD,2, owl:sameAs)) = 1
3
T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with
Multiple Joins”. In: ICDE’11.
4
A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Cardinality computation
SELECT DISTINCT ?film ?movie WHERE {
?film dbo:director ?director . ( tp1 )
?film rdf:type ?type . ( tp2 )
?movie owl:sameAs ?film . ( tp3 )
?movie dcterms:title ?title . ( tp4 )
?movie movie:film subject ?subject ( tp5 )
}
cardinality((Pk , Pl , p)) =
Pk ⊆Ci ∧Pl ⊆Cj
count((Ci , Cj , p))
(1)
Pk = {owl : sameAs, dcterms : title, movie : film subject}
Pl = {dbo : director, rdf : type}
p = owl : sameAs
7
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Cardinality estimation
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type ? type . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t ? s u b j e c t ( tp5 )
}
estimatedCardinality((Pk , Pl , p)) =
Pk ⊆Ci ∧Pl ⊆Cj
count((Ci , Cj , p))
∗
pk ∈Pk −{p}
ocurrences(pk , Ci )
count(Ci )
∗
pl ∈Pl
ocurrences(pl , Cj )
count(Cj )
(2)
Pk = {owl : sameAs, dcterms : title, movie : film subject}
Pl = {dbo : director, rdf : type}
p = owl : sameAs
8
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Cardinality estimation
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
estimatedCardinality((Pk , Pl , p)) = max
Pk ⊆Ci ∧Pl ⊆Cj
count((Ci , Cj , p))
∗
pk ∈Pk −{p}
ocurrences(pk , Ci )
count(Ci )
∗
pl ∈Pl
ocurrences(pl , Cj )
count(Cj )
(3)
Pk = {owl : sameAs, dcterms : title, movie : film subject}
Pl = {dbo : director, rdf : type}
p = owl : sameAs
9
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
1. At each source the CSs are computed
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
...
CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...}
...
10
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
1. At each source the CSs are computed
2. At each source local statistics are computed
film:1129 movie:film subject film subject:444 .
film:1129 dcterms:title ”Kate & Leopold” .
film:1129 owl:sameAs dbr:Kate & Leopold .
film:16189 movie:film subject film subject:444 .
film:16189 dcterms:title ”Journey to the Center of Time” .
film:16189 owl:sameAs dbr:Journey to the Center of Time .
...
CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...}
...
local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... }
local objectsLMDB(owl:sameAs, CLMDB,i )={
dbr:Kate & Leopold, dbr:Journey to the Center of Time, ...}
10
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
3. Local statistics and CSs are transferred to the
federated query engine
local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... }
local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...}
local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...}
local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...}
...
11
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Federated computation of statistics
3. Local statistics and CSs are transferred to the
federated query engine
4. Set overlap is computed to estimate the statistics
of federated CSs and CPs
local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... }
local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...}
local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...}
local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...}
...
(CLMDB,j ,CDBpedia,j ,owl:sameAs)
11
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
SELECT DISTINCT ∗ WHERE {
?film dbo:director ?director . ( tp1 )
?film rdf:type dbo:Film . ( tp2 )
?movie owl:sameAs ?film . ( tp3 )
?movie dcterms:title ?title . ( tp4 )
?movie movie:film subject film subject:444 ( tp5 )
}
12
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
1. Identify the star-shaped subqueries
SELECT DISTINCT ∗ WHERE {
?film dbo:director ?director . ( tp1 )
?film rdf:type dbo:Film . ( tp2 )
?movie owl:sameAs ?film . ( tp3 )
?movie dcterms:title ?title . ( tp4 )
?movie movie:film subject film subject:444 ( tp5 )
}
12
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
1. Identify the star-shaped subqueries
2. Identify the relevant CSs and estimate cardinality
SELECT DISTINCT ∗ WHERE {
?film dbo:director ?director . ( tp1 )
?film rdf:type dbo:Film . ( tp2 )
?movie owl:sameAs ?film . ( tp3 )
?movie dcterms:title ?title . ( tp4 )
?movie movie:film subject film subject:444 ( tp5 )
}
CDBpedia,j ={dbo:director,rdf:type,...}
estimatedCardinality({dbo:director,rdf:type})=162
CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...}
estimatedCardinality({movie:film subject,owl:sameAs,dcterms:title})=4
12
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
3. Identify the relevant CPs and estimate cardinality.
(CLMDB,i,CDBpedia,j,owl:sameAs)
estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1
13
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
3. Identify the relevant CPs and estimate cardinality.
4. A cost function is used to select the plan that
leads to transfer the lower number of tuples.
(CLMDB,i,CDBpedia,j,owl:sameAs)
estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1
tp5 tp3
tp4 tp1 tp2
@DBpedia
@LMDB
1
tp1 tp2
tp5 tp3
tp4
@DBpedia
@LMDB
1
cost=4+1=5 cost=162+1=163
13
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Query Optimization
3. Identify the relevant CPs and estimate cardinality.
4. A cost function is used to select the plan that
leads to transfer the lower number of tuples.
5. If necessary, compute join ordering using
Dynamic Programming.
(CLMDB,i,CDBpedia,j,owl:sameAs)
estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1
tp5 tp3
tp4 tp1 tp2
@DBpedia
@LMDB
1
tp1 tp2
tp5 tp3
tp4
@DBpedia
@LMDB
1
cost=4+1=5 cost=162+1=163
13
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey provides a better plan
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
5
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
6
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
7
G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17.
14
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey provides a better plan
SELECT DISTINCT ∗ WHERE {
? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 )
? f i l m r d f : type dbo : Film . ( tp2 )
? movie owl : sameAs ? f i l m . ( tp3 )
? movie dcterms : t i t l e ? t i t l e . ( tp4 )
? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 )
}
FedX5 SemaGrow6 Odyssey7
Plan
tp1 tp2
tp3
tp5
tp4
@DBpedia
@DBpedia,Drugbank,
Geonames,Jamendo
KEGG,LMDB,
NYTimes,SWDF
@LMDB,SWDF
@LMDB
1
tp5 tp3
tp4 tp2 tp1
@DBpedia
@DBpedia,Drugbank,
LMDB,Geonames,
NYTimes,Jamendo,
SWDF,KEGG
@LMDB,
SWDF
@LMDB
1
tp5 tp3
tp4 tp1 tp2
@DBpedia
@LMDB
1
OT 0.74s 4.75s 0.22s
ET 142s 6.93s 1.30s
5
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
6
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
7
G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17.
14
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Experimental setup
Queries and datasets from FedBench8
.
Comparison with existing approaches FedX9
,
SemaGrow10
, SPLENDID11
, HiBISCuS12
.
A Virtuoso 7.2.4.2 endpoint was deployed for
each dataset.
Plots present the average over nine executions
with a timeout of 30m.
8
M. Schmidt et al. “FedBench: A Benchmark Suite for Federated Semantic Data Query Processing”. In:
ISWC’11.
9
A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In:
ISWC’11.
10
A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15.
11
O. G¨orlitz and S. Staab. “SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions”. In:
COLD’11.
12
M. Saleem and A. N. Ngomo. “HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint
Federation”. In: ESWC’14.
15
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s Optimization Time is
comparable with other approaches’
100
102
104
LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11
OT(ms)
Odyssey
HiBISCuS−Warm
HiBISCuS−Cold
FedX−Warm
FedX−Cold
SemaGrow
SPLENDID
100
102
104
CD1 CD2 CD3 CD4 CD5 CD6 CD7
OT(ms)
100
102
104
LS1 LS2 LS3 LS4 LS5 LS6 LS7
OT(ms)
16
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s plans have less subqueries than
other approaches’ plans
0
5
10
15
20
LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11
NSQ
Odyssey
HiBISCuS−Warm
HiBISCuS−Cold
FedX−Warm
FedX−Cold
SemaGrow
SPLENDID
0
5
10
15
20
CD1 CD2 CD3 CD4 CD5 CD6 CD7
NSQ
0
5
10
15
20
LS1 LS2 LS3 LS4 LS5 LS6 LS7
NSQ
17
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Odyssey’s plans are overall executed faster
than other approaches’ plans
TIMEOUT
100
102
104
106
LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11
ET(ms)
Odyssey
HiBISCuS−Warm
HiBISCuS−Cold
FedX−Warm
FedX−Cold
SemaGrow
SPLENDID
100
102
104
106
CD1 CD2 CD3 CD4 CD5 CD6 CD7
ET(ms)
TIMEOUT
TIMEOUT
TIMEOUT
INCOMPLETERESULT
ABORT
100
102
104
106
LS1 LS2 LS3 LS4 LS5 LS6 LS7
ET(ms)
18
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Conclusions
Odyssey uses cardinality estimations that allow
for better optimizations.
Local statistics allow to discover connections
between datasets in a federated setup.
Odyssey’s plans are in general better than
existing approaches’ plans.
19
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Future works
Odyssey’s optimization can be improved using
with runtime optimizations (ASK queries).
Reduce the local statistics computation times
and sizes.
Provide efficient strategies to update the
statistics.
20
The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al
Questions?
21

More Related Content

Similar to Talk odysseyiswc2017

Summary of SIGIR 2011 Papers
Summary of SIGIR 2011 PapersSummary of SIGIR 2011 Papers
Summary of SIGIR 2011 Papers
chetanagavankar
 
Leveraging Linked Data to Infer Semantic Relations within Structured Sources
Leveraging Linked Data to Infer Semantic Relations within Structured SourcesLeveraging Linked Data to Infer Semantic Relations within Structured Sources
Leveraging Linked Data to Infer Semantic Relations within Structured Sources
Mohsen Taheriyan
 
Building generic data queries using python ast
Building generic data queries using python astBuilding generic data queries using python ast
Building generic data queries using python ast
Adrien Chauve
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Laura Po
 
Presentation final
Presentation finalPresentation final
Presentation final
tmra
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
Joshua Shinavier
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
Dirk Roorda
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
krisztianbalog
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
Aidan Hogan
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Holistic Benchmarking of Big Linked Data
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Lucidworks
 
Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic Programming
Vrije Universiteit Amsterdam
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discovery
Alexandre Passant
 

Similar to Talk odysseyiswc2017 (13)

Summary of SIGIR 2011 Papers
Summary of SIGIR 2011 PapersSummary of SIGIR 2011 Papers
Summary of SIGIR 2011 Papers
 
Leveraging Linked Data to Infer Semantic Relations within Structured Sources
Leveraging Linked Data to Infer Semantic Relations within Structured SourcesLeveraging Linked Data to Infer Semantic Relations within Structured Sources
Leveraging Linked Data to Infer Semantic Relations within Structured Sources
 
Building generic data queries using python ast
Building generic data queries using python astBuilding generic data queries using python ast
Building generic data queries using python ast
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
Presentation final
Presentation finalPresentation final
Presentation final
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic Programming
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discovery
 

Recently uploaded

Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 

Recently uploaded (20)

Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 

Talk odysseyiswc2017

  • 1. The Odyssey Approach for Optimizing Federated SPARQL Queries Gabriela Montoya1 , Hala Skaf-Molli2 , and Katja Hose1 1Aalborg University, Denmark {gmontoya,khose}@cs.aau.dk 2Nantes University, France hala.skaf@univ-nantes.fr October 25th, 2017
  • 2. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Optimizing Federated SPARQL Queries SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG 2
  • 3. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Optimizing Federated SPARQL Queries SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } DBpedia, Drugbank, LMDB, ChEBI, Geonames, NYTimes, Jamendo, SWDF, KEGG Subquery Relevant Sources ?film dbo:director ?director . DBpedia ?film rdf:type dbo:Film DBpedia,Drugbank,LMDB,ChEBI Geonames,NYTimes,Jamendo,SWDF,KEGG ?movie owl:sameAs ?film DBpedia,Drugbank,LMDB Geonames,NYTimes,Jamendo,SWDF,KEGG ?movie dcterms:title ?title LMDB,SWDF ?movie movie:film subject film subject:444 LMDB 2
  • 4. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Existing approaches FedX1 SemaGrow2 Plan tp1 tp2 tp3 tp5 tp4 @DBpedia @DBpedia,Drugbank, Geonames,Jamendo KEGG,LMDB, NYTimes,SWDF @LMDB,SWDF @LMDB 1 tp5 tp3 tp4 tp2 tp1 @DBpedia @DBpedia,Drugbank, LMDB,Geonames, NYTimes,Jamendo, SWDF,KEGG @LMDB, SWDF @LMDB 1 Optimization Technique Heuristics Dynamic Programming Optimization Time (s) 0.74 4.75 Execution Time (s) 142 6.93 1 A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In: ISWC’11. 2 A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15. 3
  • 5. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Considering joins between triple patterns improves the optimization Subquery Relevant Sources ?movie owl:sameAs ?film . LMDB ?movie dcterms:title ?title . ?movie movie:film subject film subject:444 Only entities that satisfy owl:sameAs, dcterms:title, and movie:film subject are part of LMDB! Entities from all other sources that seem relevant for some triple patterns, actually will not contribute to the result! 4
  • 6. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Contributions Concise statistics describing links among triples while guaranteeing result completeness. A technique to compute such statistics in a federated setup. Affordable optimization based on dynamic programming. 5
  • 7. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 8. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 9. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 10. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 11. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Statistics computation at one location Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 12. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Other basic statistics are also stored Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 13. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al CSs are connected to other CSs Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... dbr:Journey to the Center of Time → owl:sameAs → CD,1 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 14. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al CSs are connected to other CSs Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... dbr:Journey to the Center of Time → owl:sameAs → CD,1 Characteristic Pairs (CP)4 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 15. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al CSs are connected to other CSs Characteristic Sets (CS)3 CD,1 = { movie:film subject, dcterms:title, owl:sameAs } count(CD,1)=2; ocurrences(dcterms:title, CD,1)=2 CD,2 = { dbo:director, rdf:type } count(CD,2)=1 film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . dbr:Journey to the Center of Time dbo:director dbr:David L. Hewitt . dbr:Journey to the Center of Time rdf:type dbo:Film . ... dbr:Journey to the Center of Time → owl:sameAs → CD,1 Characteristic Pairs (CP)4 (CD,1, CD,2, owl:sameAs) count((CD,1, CD,2, owl:sameAs)) = 1 3 T. Neumann and G. Moerkotte. “Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins”. In: ICDE’11. 4 A. Gubichev and T. Neumann. “Exploiting the query structure for efficient join ordering in SPARQL queries”. 6
  • 16. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Cardinality computation SELECT DISTINCT ?film ?movie WHERE { ?film dbo:director ?director . ( tp1 ) ?film rdf:type ?type . ( tp2 ) ?movie owl:sameAs ?film . ( tp3 ) ?movie dcterms:title ?title . ( tp4 ) ?movie movie:film subject ?subject ( tp5 ) } cardinality((Pk , Pl , p)) = Pk ⊆Ci ∧Pl ⊆Cj count((Ci , Cj , p)) (1) Pk = {owl : sameAs, dcterms : title, movie : film subject} Pl = {dbo : director, rdf : type} p = owl : sameAs 7
  • 17. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Cardinality estimation SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type ? type . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t ? s u b j e c t ( tp5 ) } estimatedCardinality((Pk , Pl , p)) = Pk ⊆Ci ∧Pl ⊆Cj count((Ci , Cj , p)) ∗ pk ∈Pk −{p} ocurrences(pk , Ci ) count(Ci ) ∗ pl ∈Pl ocurrences(pl , Cj ) count(Cj ) (2) Pk = {owl : sameAs, dcterms : title, movie : film subject} Pl = {dbo : director, rdf : type} p = owl : sameAs 8
  • 18. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Cardinality estimation SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } estimatedCardinality((Pk , Pl , p)) = max Pk ⊆Ci ∧Pl ⊆Cj count((Ci , Cj , p)) ∗ pk ∈Pk −{p} ocurrences(pk , Ci ) count(Ci ) ∗ pl ∈Pl ocurrences(pl , Cj ) count(Cj ) (3) Pk = {owl : sameAs, dcterms : title, movie : film subject} Pl = {dbo : director, rdf : type} p = owl : sameAs 9
  • 19. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Federated computation of statistics 1. At each source the CSs are computed film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . ... CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...} ... 10
  • 20. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Federated computation of statistics 1. At each source the CSs are computed 2. At each source local statistics are computed film:1129 movie:film subject film subject:444 . film:1129 dcterms:title ”Kate & Leopold” . film:1129 owl:sameAs dbr:Kate & Leopold . film:16189 movie:film subject film subject:444 . film:16189 dcterms:title ”Journey to the Center of Time” . film:16189 owl:sameAs dbr:Journey to the Center of Time . ... CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...} ... local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... } local objectsLMDB(owl:sameAs, CLMDB,i )={ dbr:Kate & Leopold, dbr:Journey to the Center of Time, ...} 10
  • 21. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Federated computation of statistics 3. Local statistics and CSs are transferred to the federated query engine local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... } local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...} local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...} local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...} ... 11
  • 22. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Federated computation of statistics 3. Local statistics and CSs are transferred to the federated query engine 4. Set overlap is computed to estimate the statistics of federated CSs and CPs local subjectsLMDB(CLMDB,i )={ film:1129, film:16189, ... } local objectsLMDB(owl:sameAs, CLMDB,i )={ ...,dbr:Journey to the Center of Time, ...} local subjectsDBpedia(CDBpedia,j )={ ..., dbr:Journey to the Center of Time,...} local objectsDBpedia(dbo:director,CDBpedia,j )={ dbr:David L. Hewitt, ...} ... (CLMDB,j ,CDBpedia,j ,owl:sameAs) 11
  • 23. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization SELECT DISTINCT ∗ WHERE { ?film dbo:director ?director . ( tp1 ) ?film rdf:type dbo:Film . ( tp2 ) ?movie owl:sameAs ?film . ( tp3 ) ?movie dcterms:title ?title . ( tp4 ) ?movie movie:film subject film subject:444 ( tp5 ) } 12
  • 24. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 1. Identify the star-shaped subqueries SELECT DISTINCT ∗ WHERE { ?film dbo:director ?director . ( tp1 ) ?film rdf:type dbo:Film . ( tp2 ) ?movie owl:sameAs ?film . ( tp3 ) ?movie dcterms:title ?title . ( tp4 ) ?movie movie:film subject film subject:444 ( tp5 ) } 12
  • 25. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 1. Identify the star-shaped subqueries 2. Identify the relevant CSs and estimate cardinality SELECT DISTINCT ∗ WHERE { ?film dbo:director ?director . ( tp1 ) ?film rdf:type dbo:Film . ( tp2 ) ?movie owl:sameAs ?film . ( tp3 ) ?movie dcterms:title ?title . ( tp4 ) ?movie movie:film subject film subject:444 ( tp5 ) } CDBpedia,j ={dbo:director,rdf:type,...} estimatedCardinality({dbo:director,rdf:type})=162 CLMDB,i ={movie:film subject,owl:sameAs,dcterms:title,...} estimatedCardinality({movie:film subject,owl:sameAs,dcterms:title})=4 12
  • 26. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 3. Identify the relevant CPs and estimate cardinality. (CLMDB,i,CDBpedia,j,owl:sameAs) estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1 13
  • 27. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 3. Identify the relevant CPs and estimate cardinality. 4. A cost function is used to select the plan that leads to transfer the lower number of tuples. (CLMDB,i,CDBpedia,j,owl:sameAs) estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1 tp5 tp3 tp4 tp1 tp2 @DBpedia @LMDB 1 tp1 tp2 tp5 tp3 tp4 @DBpedia @LMDB 1 cost=4+1=5 cost=162+1=163 13
  • 28. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Query Optimization 3. Identify the relevant CPs and estimate cardinality. 4. A cost function is used to select the plan that leads to transfer the lower number of tuples. 5. If necessary, compute join ordering using Dynamic Programming. (CLMDB,i,CDBpedia,j,owl:sameAs) estimatedCardinality((CLMDB,i,CDBpedia,j,owl:sameAs))=1 tp5 tp3 tp4 tp1 tp2 @DBpedia @LMDB 1 tp1 tp2 tp5 tp3 tp4 @DBpedia @LMDB 1 cost=4+1=5 cost=162+1=163 13
  • 29. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey provides a better plan SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } 5 A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In: ISWC’11. 6 A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15. 7 G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17. 14
  • 30. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey provides a better plan SELECT DISTINCT ∗ WHERE { ? f i l m dbo : d i r e c t o r ? d i r e c t o r . ( tp1 ) ? f i l m r d f : type dbo : Film . ( tp2 ) ? movie owl : sameAs ? f i l m . ( tp3 ) ? movie dcterms : t i t l e ? t i t l e . ( tp4 ) ? movie movie : f i l m s u b j e c t f i l m s u b j e c t :444 ( tp5 ) } FedX5 SemaGrow6 Odyssey7 Plan tp1 tp2 tp3 tp5 tp4 @DBpedia @DBpedia,Drugbank, Geonames,Jamendo KEGG,LMDB, NYTimes,SWDF @LMDB,SWDF @LMDB 1 tp5 tp3 tp4 tp2 tp1 @DBpedia @DBpedia,Drugbank, LMDB,Geonames, NYTimes,Jamendo, SWDF,KEGG @LMDB, SWDF @LMDB 1 tp5 tp3 tp4 tp1 tp2 @DBpedia @LMDB 1 OT 0.74s 4.75s 0.22s ET 142s 6.93s 1.30s 5 A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In: ISWC’11. 6 A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15. 7 G. Montoya et al. “The Odyssey Approach for Optimizing Federated SPARQL Queries”. In: ISWC’17. 14
  • 31. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Experimental setup Queries and datasets from FedBench8 . Comparison with existing approaches FedX9 , SemaGrow10 , SPLENDID11 , HiBISCuS12 . A Virtuoso 7.2.4.2 endpoint was deployed for each dataset. Plots present the average over nine executions with a timeout of 30m. 8 M. Schmidt et al. “FedBench: A Benchmark Suite for Federated Semantic Data Query Processing”. In: ISWC’11. 9 A. Schwarte et al. “FedX: Optimization Techniques for Federated Query Processing on Linked Data”. In: ISWC’11. 10 A. Charalambidis et al. “SemaGrow: optimizing federated SPARQL queries”. In: SEMANTICS’15. 11 O. G¨orlitz and S. Staab. “SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions”. In: COLD’11. 12 M. Saleem and A. N. Ngomo. “HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation”. In: ESWC’14. 15
  • 32. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s Optimization Time is comparable with other approaches’ 100 102 104 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 OT(ms) Odyssey HiBISCuS−Warm HiBISCuS−Cold FedX−Warm FedX−Cold SemaGrow SPLENDID 100 102 104 CD1 CD2 CD3 CD4 CD5 CD6 CD7 OT(ms) 100 102 104 LS1 LS2 LS3 LS4 LS5 LS6 LS7 OT(ms) 16
  • 33. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s plans have less subqueries than other approaches’ plans 0 5 10 15 20 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 NSQ Odyssey HiBISCuS−Warm HiBISCuS−Cold FedX−Warm FedX−Cold SemaGrow SPLENDID 0 5 10 15 20 CD1 CD2 CD3 CD4 CD5 CD6 CD7 NSQ 0 5 10 15 20 LS1 LS2 LS3 LS4 LS5 LS6 LS7 NSQ 17
  • 34. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Odyssey’s plans are overall executed faster than other approaches’ plans TIMEOUT 100 102 104 106 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10LD11 ET(ms) Odyssey HiBISCuS−Warm HiBISCuS−Cold FedX−Warm FedX−Cold SemaGrow SPLENDID 100 102 104 106 CD1 CD2 CD3 CD4 CD5 CD6 CD7 ET(ms) TIMEOUT TIMEOUT TIMEOUT INCOMPLETERESULT ABORT 100 102 104 106 LS1 LS2 LS3 LS4 LS5 LS6 LS7 ET(ms) 18
  • 35. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Conclusions Odyssey uses cardinality estimations that allow for better optimizations. Local statistics allow to discover connections between datasets in a federated setup. Odyssey’s plans are in general better than existing approaches’ plans. 19
  • 36. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Future works Odyssey’s optimization can be improved using with runtime optimizations (ASK queries). Reduce the local statistics computation times and sizes. Provide efficient strategies to update the statistics. 20
  • 37. The Odyssey Approach for Optimizing Federated SPARQL Queries,G. Montoya et al Questions? 21