SlideShare a Scribd company logo
1 of 83
Download to read offline
Federated SPARQL Query Processing 
Over the Web of Data 
Muhammad Saleem, Axel-Cyrille Ngonga 
Ngomo 
Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, 
Germany, 25/11/2014
Agenda 
• SPARQL Query Federation Approaches 
• SPARQL Query Federation Optimization 
– Query Rewriting 
– Source Selection 
– Data Integration Options 
– Join Order Selection 
– Join Order Optimization 
– Join Implementations 
• Performance Metrics and Discussion
SPARQL Query Federation Approaches 
• SPARQL Endpoint Federation (SEF) 
• Linked Data Federation (LDF) 
• Distributed Hash Tables (DHTs) 
• Hybrid of SEF+LDF
SPARQL Endpoint Federation Approaches 
• Most commonly used approaches 
• Make use of SPARQL endpoints URLs 
• Fast query execution 
• RDF data needs to be exposed via SPARQL 
endpoints 
• E.g., HiBISCus, FedX, SPLENDID, ANAPSID, LHD etc.
Linked Data Federation Approaches 
• Data needs not be exposed via SPARQL endpoints 
• Uses URI lookups at runtime 
• Data should follow Linked Data principles 
• Slower as compared to previous approaches 
• E.g., LDQPS, SIHJoin, WoDQA etc.
Query federation on top of Distributed Hash Tables 
• Uses DHT indexing to federate SPARQL queries 
• Space efficient 
• Cannot deal with whole LOD 
• E.g., ATLAS
Hybrid of SEF+LDF 
• Federation over SPARQL endpoints and Linked 
Data 
• Can potentially deal with whole LOD 
• E.g., ADERIS-Hybrid
SPARQL Endpoint Federation 
Parsing/Rewriting 
Source Selection 
Federator Optimzer 
Integrator 
S1 S2 S3 S4 
RDF RDF RDF RDF 
Rewrite query 
and get Individual 
Triple Patterns 
Identify capable 
source against 
Individual Triple 
Patterns 
Generate 
optimized sub-query 
Exe. Plan 
Execute sub-queries 
Integrate sub-queries 
results
SPARQL Query Rewriting
SPARQL Query Rewriting 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality ?nationality. 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
Filter (?nationality = dbpedia:United_States ) 
} 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
Try to simplify/avoid SPARQL FILTER and REGEX expressions
Source Selection
Source Selection 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Source Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Source Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Source Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Source Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
TP5 = S1 S2 S4-S9 
Total triple pattern-wise sources selected = 
Jamendo 
RDF 
TP2 = S1 
1+1+1+1+8 => 12 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Types of Source Selection 
• Index-free 
– Using SPARQL ASK queries 
– No index maintenance required 
– Potentially ensures result set completeness 
– SPARQL ASK queries can be expensive 
– Can make use of the cache to store recent SPARQL ASK queries results 
– E.g., FedX 
• Index-only 
– Only make use of Index/data summaries 
– Less efficient but fast source selection 
– Result set completeness is not ensured 
– E.g., DARQ, LHD 
• Hybrid 
– Make use of index+SPARQL ASK 
– Most efficient 
– Result set completeness is not ensured 
– Can make use of the cache to store recent SPARQL ASK queries results 
– E.g., HiBISCuS, ANAPSID, SPLENDID
Index-free Source Selection 
Input: SPARQL query Q , set of all data sources D 
Output: Triple pattern to relevant data sources map M 
for each triple pattern ti in SPARQL query Q 
Ri = {}; // set of relevant data sources for triple pattern ti 
for each data source di in D 
if SPARQL ASK(di , ti) = true 
Ri = Ri U {di}; 
end if 
end for 
M = M U {Ri}; 
end for 
return M What is the total number of SPARQL ASK requests used? 
total number of triple patterns * total number of data sources
Index-free 
Source Selection 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Index-free 
Source Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Index-free 
Source Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Index-free 
Source Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Index-free 
Source Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
TP5 = S1 S2 
S4-S9 
Total number of SPARQL ASK requests used = 45 
Total triple pattern-wise sources selected = 12 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Index-only Source Selection (LHD) 
Input: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for 
all data sources in D 
Output: Triple pattern to relevant data sources map M 
for each triple pattern ti in SPARQL query Q 
Ri = {}; // set of relevant data sources for triple pattern ti 
p = Pred(ti) // predicate of ti 
if (bound (p)) 
Ri = Lookup (I, p) // index lookup for predicate of ti 
else 
Ri = D ; // all data sources are relevant 
end if 
M = M U {Ri} ; 
end for 
return M Why it is the less efficient approach (i.e., greatly overestimate relevant data sources)? 
• Source selection is only based on predicate of triple patterns 
• Simply select all data sources for triple patterns having unbound predicates
Index-only 
Source Selection 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1-S9 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Index-only 
Source Selection 
Triple pattern-wise source selection 
TP1 = 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
S1-S9 TP2 = S1 
Jamendo 
RDF 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Index-only 
Source Selection 
Triple pattern-wise source selection 
TP1 = 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
S1-S9 
TP3 = S1 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Index-only 
Source Selection 
Triple pattern-wise source selection 
TP1 = 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
S1-S9 
TP3 = S1 TP4 = S4 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Index-only 
Source Selection 
Triple pattern-wise source selection 
TP1 = 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
S1-S9 
TP3 = S1 TP4 = S4 
TP5 = S1 S2 S4-S9 
Total number of SPARQL ASK requests used = 0 
Total triple pattern-wise sources selected = 20 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Hybrid Source Selection 
Input: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for all data 
sources in D 
Output: Triple pattern to relevant data sources map M 
for each triple pattern ti in SPARQL query Q 
Ri = {}; // set of relevant data sources for triple pattern ti 
s = Subj(ti) , p = Pred(ti) , o = Obj(ti) ; // subject, predicate, and object of ti 
if (!bound (p) || bound (s) || bound (o) ) 
for each data source di in D 
if SPARQL ASK(di , ti) = true 
Ri = RiU {di}; 
end if 
end for 
else 
Ri = Lookup (I, p) // index lookup for predicate of ti 
end if 
M = M U {Ri} 
end for 
return M 
What is the total number of SPARQL ASK requests used? 
total number of triple patterns with bound subject or bound object 
or unbound predicate * total number of data sources
Hybrid Source 
Selection 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Hybrid Source 
Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Hybrid Source 
Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Hybrid Source 
Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
Anything still needs 
to be improved? 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Hybrid Source 
Selection 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
TP5 = S1 S2 
S4-S9 
Total number of SPARQL ASK requests used = 18 
Total triple pattern-wise sources selected = 12 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Source Selection 
• Triple pattern-wise source selection 
– Ensures 100% recall 
– Can over-estimate capable sources 
– Can be expensive, e.g., total number of SPARQL ASK 
requests used 
– Performed by FedX, SPLENDID, LHD, DARQ, ADERIS etc. 
• Join-aware triple-pattern wise source selection 
– Ensures 100% recall 
– May selects optimal/close to optimal capable sources 
– Can be expensive, e.g., total number of SPARQL ASK 
requests used 
– Can significantly reduce the query execution time 
– Performed by ANAPSID, HiBISCuS
HiBISCuS: Hypergraph-Based Source Selection for 
SPARQL Endpoint Federation 
• Hybrid source selection 
• Join-aware triple-pattern wise source selection 
• Makes use of the hypergraph representation of 
SPARQL queries 
• Makes use of the URI authorities 
• Makes use of the cache to store recent SPARQL 
ASK queries results
Motivation 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Motivation 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Motivation 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Motivation 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
Jamendo 
RDF 
TP2 = S1 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Motivation 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
TP5 = S1 S2 S4 S5 
Jamendo 
RDF 
TP2 = S1 
S6 S7 S8 S9 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Motivation 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
TP5 = S1 S2 S4 S5 
Total triple pattern-wise selected sources = 12 
Total SPARQL ASK queries : 9*5 = 45 
Jamendo 
RDF 
TP2 = S1 
S6 S7 S8 S9 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Motivation 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
TP3 = S1 TP4 = S4 
TP5 = S1 S2 S4 S5 
Total triple pattern-wise selected sources = 12 
Total SPARQL ASK queries : 9*5 = 45 
Jamendo 
RDF 
TP2 = S1 
S6 S7 S8 S9 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Motivation 
FedBench (LD3): Return for all US presidents their party 
membership and news pages about them. 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
dbpedia 
RDF 
//TP3 
//TP4 
//TP5 
Source Selection Algorithm 
Triple pattern-wise source selection 
TP1 = S1 
TP3 = S1 
TP2 = S1 
TP4 = S4 
TP5 = S1 S2 S4 S5 
S6 S7 S8 S9 
Optimal triple pattern-wise selected sources 5 
KEGG 
RDF 
ChEBI 
RDF 
NYT 
RDF 
//TP1 
SWDF 
RDF 
//TP2 
LMDB 
RDF 
Jamendo 
RDF 
Geo 
Names 
RDF 
DrugBank 
RDF 
S1 S2 S3 S4 S5 S6 S7 S8 S9
Problem Statement 
• An overestimation of triple pattern-wise source selection can 
be expensive 
– Resources are wasted 
– Query runtime is increased 
– Extra traffic is generated 
• How do we perform join-aware triple pattern wise source 
selection in time efficient way?
HiBISCuS: Key Concept 
• Makes use of the URI’s authorities 
http://dbpedia.org/ontology/party 
Scheme Authority Path 
For URI details: http://tools.ietf.org/html/rfc3986
HiBISCuS: SPARQL Query as Hypergraph 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President
HiBISCuS: SPARQL Query as Hypergraph 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_S 
tates 
dbpedia: 
nationality
HiBISCuS: SPARQL Query as Hypergraph 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_S 
tates 
dbpedia: 
party 
dbpedia: 
nationality 
?party
HiBISCuS: SPARQL Query as Hypergraph 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_S 
tates 
dbpedia: 
party 
dbpedia: 
nationality 
?party 
?x 
nyt:topi 
cPage 
?page
HiBISCuS: SPARQL Query as Hypergraph 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_S 
tates 
dbpedia: 
party 
dbpedia: 
nationality 
?party 
?x 
nyt:topi 
cPage 
?page 
owl: 
SameAs
HiBISCuS: SPARQL Query as Hypergraph 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_S 
tates 
dbpedia: 
nationality 
?x 
owl: 
SameAs 
dbpedia: 
party 
?party 
nyt:topi 
cPage 
?page 
Star simple hybrid Tail of hyperedge
HiBISCuS: Data Summaries 
[] a ds:Service ; 
ds:endpointUrl <http://dbpedia.org/sparql> ; 
ds:capability [ 
ds:predicate dbpedia:party ; 
ds:sbjAuthority <http://dbpedia.org/> ; 
ds:objAuthority <http://dbpedia.org/> ; 
] ; 
ds:capability [ 
ds:predicate rdf:type ; 
ds:sbjAuthority <http://dbpedia.org/> ; 
ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct 
classes 
] ; 
ds:capability [ 
ds:predicate dbpedia:postalCode ; 
ds:sbjAuthority <http://dbpedia.org/> ; 
#No objAuthority as the object value for dbpedia:postalCode is string 
] ;
HiBISCuS: Triple Pattern-wise Source Selection 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_ 
States 
dbpedia: 
nationality 
?x 
owl: 
SameAs 
dbpedia: 
party ?party 
nyt:topi 
cPage 
?page 
dbpedia KEGG NYT SWDF LMDB Geo DrgBnk Jamendo
HiBISCuS: Triple Pattern-wise Source Pruning 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_ 
States 
dbpedia: 
nationality 
?x 
owl: 
SameAs 
dbpedia: 
party ?party 
nyt:topi 
cPage 
?page 
dbpedia KEGG NYT SWDF 
DrgBnk LMDB Geo Jamendo 
Obj. 
auth. 
dbpedia 
Sbj. auth. 
Sbj. auth. 
KEGG 
Sbj. auth. 
NYT 
Sbj. auth. 
SWDF 
Sbj. auth. 
LMDB 
Sbj. auth. 
Geo 
Sbj. auth. 
DrgBnk 
Sbj. auth. 
Jamendo
HiBISCuS: Triple Pattern-wise Source Pruning 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_ 
States 
dbpedia: 
nationality 
?x 
owl: 
SameAs 
dbpedia: 
party ?party 
nyt:topi 
cPage 
?page 
dbpedia 
Sbj. auth. 
Sbj. auth. 
KEGG 
Sbj. auth. 
NYT 
Sbj. auth. 
SWDF 
Sbj. auth. 
LMDB 
Sbj. auth. 
Geo 
Sbj. auth. 
DrgBnk 
Sbj. auth. 
Jamendo 
dbpedia KEGG NYT SWDF 
DrgBnk LMDB Geo Jamendo 
Obj. 
auth.
HiBISCuS: Triple Pattern-wise Source Pruning 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_ 
States 
dbpedia: 
nationality 
?x 
owl: 
SameAs 
dbpedia: 
party ?party 
nyt:topi 
cPage 
?page 
dbpedia KEGG NYT SWDF 
DrgBnk LMDB Geo Jamendo 
Obj. 
auth.
HiBISCuS: Triple Pattern-wise Source Pruning 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_ 
States 
dbpedia: 
nationality 
?x 
owl: 
SameAs 
dbpedia: 
party ?party 
nyt:topi 
cPage 
?page 
NYT 
Obj. auth.
HiBISCuS: Triple Pattern-wise Source Pruning 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_ 
States 
dbpedia: 
nationality 
?x 
owl: 
SameAs 
dbpedia: 
party ?party 
nyt:topi 
cPage 
?page 
NYT 
Obj. auth.
HiBISCuS: Triple Pattern-wise Source Pruning 
SELECT ?president ?party ?page 
WHERE { 
?president rdf:type dbpedia:President . 
?president dbpedia:nationality dbpedia:United_States . 
?president dbpedia:party ?party . 
?x nyt:topicPage ?page . 
?x owl:sameAs ?president . 
} 
?president 
rdf:type 
dbpedia: 
President 
dbpedia: 
United_ 
States 
dbpedia: 
nationality 
?x 
owl: 
SameAs 
dbpedia: 
party ?party 
nyt:topi 
cPage 
?page 
Total triple pattern-wise selected sources = 5 
Total SPARQL ASK queries : 0
Data Integration Options
Complete Local Integration 
• Triple patterns are individually and completely 
evaluated against every endpoint 
• Triple pattern results are locally integrated using 
different join techniques, e.g., NLJ, Hash Join etc. 
• Less efficient if query contains common 
predicates such rdf:type and owl:sameAs 
• Large amount of potentially irrelevant 
intermediate results retrieval
Iterative Integration 
• Evaluate query iteratively pattern by pattern 
• Start with a single triple pattern 
• Substitute mappings from previous triple pattern 
in the subsequent evaluation 
• Evaluate query in a NLJ fashion 
• NLJ can cause many remote requests 
• Block NLJ fashion minimize the remote requests
Join Order Selection
Join Order Selection 
• Left-deep trees 
– Joins take place in a left-to-right sequential order 
– Result of the join is used as an outer input for the next join 
– Used in FedX, DARQ 
• Right-deep trees 
– Joins take place in a right-to-left sequential order 
– Result of the join is used as an inner input for the next join 
• Bushy trees 
– Joins take place in sub-tress both on left and right sides 
– Used in ANAPSID 
• Dynamic programming 
– Used in SPLENDID
Join Order Selection Example 
Compute Micronutrients using Drugbank and KEGG 
SELECT ?drug ?title WHERE { 
?drug drugbank:drugCategory drugbank-cat:micronutrient. // TP1 
?drug drugbank:casRegistryNumber ?id . // TP2 
?keggDrug rdf:type kegg:Drug . // TP3 
?keggDrug bio2rdf:xRef ?id . // TP4 
?keggDrug dc:title ?title . // TP5 
} 
67 
휋 ? 푑푟푢푔, ? 푡푖푡푙푒 
TP1 TP2 
TP3 
TP4 
TP5 
Left-deep tree 
휋 ? 푑푟푢푔, ? 푡푖푡푙푒 
TP1 TP2 
TP3 
TP4 
TP5 
Right-deep tree 
Bushy tree 
휋 ? 푑푟푢푔, ? 푡푖푡푙푒 
TP1 TP2 
TP3 TP5 
TP4 
Goal: Execute smallest cardinality joins first
Join Order Optimization
Join Order Optimization 
• Exclusive Groups 
– Group triple patterns with the same relevant data source 
– Evaluation in a single (remote) sub-query 
– Push join to the data source, i.e., endpoint 
• Variable count-heuristic 
– Iteratively determine the join order based on free variables 
count of triple patterns and groups 
– Consider “resolved ” variable mappings from earlier iteration 
• Using Selectivities 
– Store distinct predicates, avg. subject selectivities , and avg. 
object selectivities for each predicate in index 
– Use the predicate count, avg. subject selectivities , and avg. 
object selectivities to estimate the join cardinality
Exclusive Groups 
SELECT ?President ?Party ?TopicPage WHERE { 
?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . 
?President dbpedia:party ?Party . 
?nytPresident owl:sameAs ?President . 
?nytPresident nytimes:topicPage ?TopicPage . 
} 
Source Selection 
@ DBpedia 
@ DBpedia 
@ DBpedia, NYTimes 
@ NYTimes 
Exclusive Group 
Advantage: 
Delegate joins to the endpoint by forming exclusive groups (i.e. executing the 
respective patterns in a single subquery) 
70
Exclusive Groups Join Order Optimization 
2 Unoptimized Internal Representation 
1 SPARQL Query 
Compute Micronutrients using Drugbank and KEGG 
SELECT ?drug ?title WHERE { 
?drug drugbank:drugCategory drugbank-cat:micronutrient . 
?drug drugbank:casRegistryNumber ?id . 
?keggDrug rdf:type kegg:Drug . 
?keggDrug bio2rdf:xRef ?id . 
?keggDrug dc:title ?title . 
} 
3 Optimized Internal Representation 
4x Local Join 
= 
4x NLJ 
Exlusive Group 
 Remote Join 
71
Selectivity Based Join Order Optimization 
[] a sd:Service ; 
sd:endpointUrl <http://localhost:8890/sparql> ; 
sd:capability [ 
sd:predicate diseasome:name ; 
sd:totalTriples 147 ; // Total number of triple patterns with predicate value sd:predicate 
sd:avgSbjSel ``0.0068'' ; // 1/ distinct subjects with predicate value sd:predicate 
sd:avgObjSel ``0.0069'' ; // 1/ distinct Objects with predicate value sd:predicate 
] ; 
sd:capability [ 
sd:predicate diseasome:chromosomalLocation ; 
sd:totalTtriples 160 ; 
sd:avgSbjSel ``0.0062'' ; 
sd:avgObjSel ``0.0072'' ; 
] ; 
S1 P O1 . 
S1 P O2 . 
S2 P O1 . 
S3 P O2 . 
totalTriples = 4 
avgSbjSel(p) = 1/3 
avgObjSel(p) =1/2
Selectivity Based Join Order Optimization 
• Triple pattern cardinality 
• Join Cardinality 
푝 = pred(tp) , 푇 = Total triple having predicate 푝 
퐶(푡푝) = 
푇 푖푓 푛푒푖푡ℎ푒푟 푠푢푏푗푒푐푡 푛표푟 표푏푗푒푐푡 푖푠 푏표푢푛푑 
푇 × 푎푣푔푆푏푗푆푒푙 푝 푖푓 푠푢푏푗푒푐푡 푖푠 푏표푢푛푑 
푇 × 푎푣푔푂푏푗푆푒푙 푝 푖푓표푏푗푒푐푡 푖푠 푏표푢푛푑 
퐶(퐽 푡푝1, 푡푝2 ) = 
퐶 푡푝1 × 퐶 푡푝2 × 푎푣푔푃푟푒푑퐽표푖푛푆푒푙 푡푝1 × 푎푣푔푃푟푒푑퐽표푖푛푆푒푙 푡푝2 푖푓 푝 − 푝 푗표푖푛 
퐶 푡푝1 × 퐶 푡푝2 × 푎푣푔푆푏푗퐽표푖푛푆푒푙 푡푝1 × 푎푣푔푆푏푗퐽표푖푛푆푒푙 푡푝2 푖푓 푠 − 푠 푗표푖푛 
퐶 푡푝1 × 퐶 푡푝2 × 푎푣푔푆푏푗퐽표푖푛푆푒푙 푡푝1 × 푎푣푔푂푏푗퐽표푖푛푆푒푙 푡푝2 푖푓 푠 − 표 푗표푖푛 
How to calculate avgPredJoinSel, avgSbjJoinSel, and avgObjJoinSel? 
DARQ selected 0.5 as the avgJoinSel value for all joins
Join Implementations
Join Implementations 
• Bound Joins 
– Start with a single triple pattern (lowest cardinality) 
– Substitute mappings from previous triple pattern in the 
subsequent evaluation 
– Bound Joins in NLJ fashion 
• Execute bound joins in nested loop join fashion 
• Too many remote requests 
– Bound Joins in Block NLJ fashion 
• Execute bound joins in block nested loop join fashion 
• Make use of SPARQL UNION construct 
• Remote requests are reduced by the block size 
• Other Join techniques 
– E.g, Hash Joins
Bound Joins in Block NLJ 
SELECT ?President ?Party ?TopicPage WHERE { 
?President rdf:type dbpedia:PresidentsOfTheUnitedStates . 
?President dbpedia:party ?Party . 
?nytPresident owl:sameAs ?President . 
?nytPresident nytimes:topicPage ?TopicPage . 
} 
Assume that the following intermediate results have been computed as input for the last triple pattern 
Block Input 
“Barack Obama” 
“George W. Bush” 
… 
Before (NLJ) 
SELECT ?TopicPage WHERE { “Barack Obama” nytimes:topicPage ?TopicPage } 
SELECT ?TopicPage WHERE { “George W. Bush” nytimes:topicPage ?TopicPage } 
… 
Now: Evaluation in a single remote request using a SPARQL UNION 
construct + local post processing (SPARQL 1.0) 
76
Parallelization and Pipelining 
• Execute sub-queries concurrently on different data 
sources 
• Multithreaded worker pool to execute the joins 
and UNION operators in parallel 
• Pipelining approach for intermediate results 
• See FedX and LHD implementations
Performance Metrics and Discussion
Performance Metrics 
• Efficient source selection in terms of 
– Total triple pattern-wise sources selected 
– Total number of SPARQL ASK requests used during source 
selection 
– Source selection time 
• Query execution time 
• Results completeness and correctness 
• Number of remote requests during query execution 
• Index compression ratio (1- index size/datadump size) 
• See https://code.google.com/p/bigrdfbench/
Evaluation Setup 
• Local dedicated network 
• Local SPARQL endpoints (One per machine) 
• Run each query 10 times and present the average results 
• Statistically analyzed the results, e.g., Wilcoxon signed rank 
test, student T-test
SPARQL Query Federation Engines 
• FedX 
• SPLENDID 
• HiBISCuS+FedX 
• HiBISCuS+SPLENDID 
• ANAPSID 
• LHD 
• DARQ 
81
AKSW SPARQL Federation Publications 
• HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation by Muhammad 
Saleem and Axel-Cyrille Ngonga Ngomo, in (ESWC, 2014) 
• DAW: Duplicate-AWare Federated Query Processing over the Web of Data by Muhammad 
Saleem Axel-Cyrille Ngonga Ngomo, Josiane Xavier Parreira , Helena Deus , and Manfred Hauswirth 
, in (ISWC 2013). 
• TopFed: TCGA Tailored Federated Query Processing and Linking to LOD by Muhammad Saleem, 
Shanmukha Sampath , Axel-Cyrille Ngonga Ngomo , Aftab Iqbal, Jonas Almeida , and Helena F. Deus 
, in (Journal of Biomedical Semantics, 2014). 
• A Fine-Grained Evaluation of SPARQL Endpoint Federation Systems by Muhammad Saleem, Yasar 
Khan, Ali Hasnain, Ivan Ermilov, and Axel-Cyrille Ngonga Ngomo , in (Semantic Web Journal, 2014) 
• BigRDFBench: A Billion Triples Benchmark for SPARQL Query Federation by Muhammad Saleem, 
Ali Hasnain, Axel-Cyrille Ngonga Ngomo , in (submitted WWW, 2015). 
• SAFE: Policy-Aware SPARQL Query Federation Over RDF Data Cubes 
By Yasar Khan, Muhammed Saleem , Aftab Iqbal, Muntazir Mehdi, Aidan Hogan, Panagiotis 
Hasapis, Axel-Cyrille Ngonga Ngomo, Stefan Decker, and Ratnesh Sahay, in (SWAT4LS, 2014) 
• QFed: Query Set For Federated SPARQL Query Benchmark by Nur Aini Rakhmawati, Sarasi lithsena 
, Muhammad Saleem , Stefan Decker, in (iiWAS, 2014) 
82
Thanks 
{saleem,ngonga}@informatik.uni-leipzig.de 
AKSW, University of Leipzig, Germany

More Related Content

What's hot

Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02eswcsummerschool
 
Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview PresentationKen Varnum
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIsJosef Petrák
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelabCAMELIA BOBAN
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialAdonisDamian
 
RDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesRDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesMarin Dimitrov
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
Ks2008 Semanticweb In Action
Ks2008 Semanticweb In ActionKs2008 Semanticweb In Action
Ks2008 Semanticweb In ActionRinke Hoekstra
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesBasil Ell
 
Semantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynkSemantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynkNamgee Lee
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic WebJan Beeck
 

What's hot (20)

Sparql
SparqlSparql
Sparql
 
RDF data model
RDF data modelRDF data model
RDF data model
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview Presentation
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelab
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorial
 
RDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesRDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic Repositories
 
Rdf
RdfRdf
Rdf
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Ks2008 Semanticweb In Action
Ks2008 Semanticweb In ActionKs2008 Semanticweb In Action
Ks2008 Semanticweb In Action
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
 
Semantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynkSemantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynk
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 

Similar to Federated SPARQL query processing over the Web of Data

Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD CloudRuben Verborgh
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesMULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesKemele M. Endris
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesAlexandra Roatiș
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)EUCLID project
 
Sparql service-description
Sparql service-descriptionSparql service-description
Sparql service-descriptionSTIinnsbruck
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphsandyseaborne
 
SPARQL Query Verbalization for Explaining Semantic Search Engine Queries
SPARQL Query Verbalization for Explaining Semantic Search Engine QueriesSPARQL Query Verbalization for Explaining Semantic Search Engine Queries
SPARQL Query Verbalization for Explaining Semantic Search Engine QueriesBasil Ell
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebShamod Lacoul
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesSyed Muhammad Ali Hasnain
 
EKAW - Triple Pattern Fragments
EKAW - Triple Pattern FragmentsEKAW - Triple Pattern Fragments
EKAW - Triple Pattern FragmentsRuben Taelman
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1net2-project
 
List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsAlbert Meroño-Peñuela
 
Semantic web application architecture
Semantic web   application architectureSemantic web   application architecture
Semantic web application architectureDon Willems
 
Consuming linked data by machines
Consuming linked data by machinesConsuming linked data by machines
Consuming linked data by machinesPatrick Sinclair
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)Olaf Hartig
 

Similar to Federated SPARQL query processing over the Web of Data (20)

Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
inteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access FrameworkinteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access Framework
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD Cloud
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesMULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
Sparql service-description
Sparql service-descriptionSparql service-description
Sparql service-description
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
 
SPARQL Query Verbalization for Explaining Semantic Search Engine Queries
SPARQL Query Verbalization for Explaining Semantic Search Engine QueriesSPARQL Query Verbalization for Explaining Semantic Search Engine Queries
SPARQL Query Verbalization for Explaining Semantic Search Engine Queries
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic Web
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web Technologies
 
EKAW - Triple Pattern Fragments
EKAW - Triple Pattern FragmentsEKAW - Triple Pattern Fragments
EKAW - Triple Pattern Fragments
 
Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1Federation and Navigation in SPARQL 1.1
Federation and Navigation in SPARQL 1.1
 
List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF Lists
 
Semantic web application architecture
Semantic web   application architectureSemantic web   application architecture
Semantic web application architecture
 
Consuming linked data by machines
Consuming linked data by machinesConsuming linked data by machines
Consuming linked data by machines
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
 

More from Muhammad Saleem

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...Muhammad Saleem
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationMuhammad Saleem
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework Muhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Muhammad Saleem
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016Muhammad Saleem
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetMuhammad Saleem
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015Muhammad Saleem
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataMuhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseMuhammad Saleem
 

More from Muhammad Saleem (13)

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
 

Recently uploaded

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 

Recently uploaded (20)

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 

Federated SPARQL query processing over the Web of Data

  • 1. Federated SPARQL Query Processing Over the Web of Data Muhammad Saleem, Axel-Cyrille Ngonga Ngomo Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany, 25/11/2014
  • 2. Agenda • SPARQL Query Federation Approaches • SPARQL Query Federation Optimization – Query Rewriting – Source Selection – Data Integration Options – Join Order Selection – Join Order Optimization – Join Implementations • Performance Metrics and Discussion
  • 3. SPARQL Query Federation Approaches • SPARQL Endpoint Federation (SEF) • Linked Data Federation (LDF) • Distributed Hash Tables (DHTs) • Hybrid of SEF+LDF
  • 4. SPARQL Endpoint Federation Approaches • Most commonly used approaches • Make use of SPARQL endpoints URLs • Fast query execution • RDF data needs to be exposed via SPARQL endpoints • E.g., HiBISCus, FedX, SPLENDID, ANAPSID, LHD etc.
  • 5. Linked Data Federation Approaches • Data needs not be exposed via SPARQL endpoints • Uses URI lookups at runtime • Data should follow Linked Data principles • Slower as compared to previous approaches • E.g., LDQPS, SIHJoin, WoDQA etc.
  • 6. Query federation on top of Distributed Hash Tables • Uses DHT indexing to federate SPARQL queries • Space efficient • Cannot deal with whole LOD • E.g., ATLAS
  • 7. Hybrid of SEF+LDF • Federation over SPARQL endpoints and Linked Data • Can potentially deal with whole LOD • E.g., ADERIS-Hybrid
  • 8. SPARQL Endpoint Federation Parsing/Rewriting Source Selection Federator Optimzer Integrator S1 S2 S3 S4 RDF RDF RDF RDF Rewrite query and get Individual Triple Patterns Identify capable source against Individual Triple Patterns Generate optimized sub-query Exe. Plan Execute sub-queries Integrate sub-queries results
  • 10. SPARQL Query Rewriting FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality ?nationality. ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . Filter (?nationality = dbpedia:United_States ) } FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } Try to simplify/avoid SPARQL FILTER and REGEX expressions
  • 12. Source Selection FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 13. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 14. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 15. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 16. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 TP5 = S1 S2 S4-S9 Total triple pattern-wise sources selected = Jamendo RDF TP2 = S1 1+1+1+1+8 => 12 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 17. Types of Source Selection • Index-free – Using SPARQL ASK queries – No index maintenance required – Potentially ensures result set completeness – SPARQL ASK queries can be expensive – Can make use of the cache to store recent SPARQL ASK queries results – E.g., FedX • Index-only – Only make use of Index/data summaries – Less efficient but fast source selection – Result set completeness is not ensured – E.g., DARQ, LHD • Hybrid – Make use of index+SPARQL ASK – Most efficient – Result set completeness is not ensured – Can make use of the cache to store recent SPARQL ASK queries results – E.g., HiBISCuS, ANAPSID, SPLENDID
  • 18. Index-free Source Selection Input: SPARQL query Q , set of all data sources D Output: Triple pattern to relevant data sources map M for each triple pattern ti in SPARQL query Q Ri = {}; // set of relevant data sources for triple pattern ti for each data source di in D if SPARQL ASK(di , ti) = true Ri = Ri U {di}; end if end for M = M U {Ri}; end for return M What is the total number of SPARQL ASK requests used? total number of triple patterns * total number of data sources
  • 19. Index-free Source Selection FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 20. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Index-free Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 21. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Index-free Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 22. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Index-free Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 23. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Index-free Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 TP5 = S1 S2 S4-S9 Total number of SPARQL ASK requests used = 45 Total triple pattern-wise sources selected = 12 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 24. Index-only Source Selection (LHD) Input: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for all data sources in D Output: Triple pattern to relevant data sources map M for each triple pattern ti in SPARQL query Q Ri = {}; // set of relevant data sources for triple pattern ti p = Pred(ti) // predicate of ti if (bound (p)) Ri = Lookup (I, p) // index lookup for predicate of ti else Ri = D ; // all data sources are relevant end if M = M U {Ri} ; end for return M Why it is the less efficient approach (i.e., greatly overestimate relevant data sources)? • Source selection is only based on predicate of triple patterns • Simply select all data sources for triple patterns having unbound predicates
  • 25. Index-only Source Selection FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1-S9 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 26. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Index-only Source Selection Triple pattern-wise source selection TP1 = KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF S1-S9 TP2 = S1 Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 27. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Index-only Source Selection Triple pattern-wise source selection TP1 = KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF S1-S9 TP3 = S1 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 28. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Index-only Source Selection Triple pattern-wise source selection TP1 = KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF S1-S9 TP3 = S1 TP4 = S4 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 29. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Index-only Source Selection Triple pattern-wise source selection TP1 = KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF S1-S9 TP3 = S1 TP4 = S4 TP5 = S1 S2 S4-S9 Total number of SPARQL ASK requests used = 0 Total triple pattern-wise sources selected = 20 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 30. Hybrid Source Selection Input: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for all data sources in D Output: Triple pattern to relevant data sources map M for each triple pattern ti in SPARQL query Q Ri = {}; // set of relevant data sources for triple pattern ti s = Subj(ti) , p = Pred(ti) , o = Obj(ti) ; // subject, predicate, and object of ti if (!bound (p) || bound (s) || bound (o) ) for each data source di in D if SPARQL ASK(di , ti) = true Ri = RiU {di}; end if end for else Ri = Lookup (I, p) // index lookup for predicate of ti end if M = M U {Ri} end for return M What is the total number of SPARQL ASK requests used? total number of triple patterns with bound subject or bound object or unbound predicate * total number of data sources
  • 31. Hybrid Source Selection FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 32. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Hybrid Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 33. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Hybrid Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 34. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Hybrid Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 35. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } Anything still needs to be improved? dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Hybrid Source Selection Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 TP5 = S1 S2 S4-S9 Total number of SPARQL ASK requests used = 18 Total triple pattern-wise sources selected = 12 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 36. Source Selection • Triple pattern-wise source selection – Ensures 100% recall – Can over-estimate capable sources – Can be expensive, e.g., total number of SPARQL ASK requests used – Performed by FedX, SPLENDID, LHD, DARQ, ADERIS etc. • Join-aware triple-pattern wise source selection – Ensures 100% recall – May selects optimal/close to optimal capable sources – Can be expensive, e.g., total number of SPARQL ASK requests used – Can significantly reduce the query execution time – Performed by ANAPSID, HiBISCuS
  • 37. HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation • Hybrid source selection • Join-aware triple-pattern wise source selection • Makes use of the hypergraph representation of SPARQL queries • Makes use of the URI authorities • Makes use of the cache to store recent SPARQL ASK queries results
  • 38. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 39. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 40. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 41. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 Jamendo RDF TP2 = S1 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 42. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 Jamendo RDF TP2 = S1 S6 S7 S8 S9 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 43. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 Total triple pattern-wise selected sources = 12 Total SPARQL ASK queries : 9*5 = 45 Jamendo RDF TP2 = S1 S6 S7 S8 S9 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 44. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF TP3 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 Total triple pattern-wise selected sources = 12 Total SPARQL ASK queries : 9*5 = 45 Jamendo RDF TP2 = S1 S6 S7 S8 S9 Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 45. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF //TP3 //TP4 //TP5 Source Selection Algorithm Triple pattern-wise source selection TP1 = S1 TP3 = S1 TP2 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 S6 S7 S8 S9 Optimal triple pattern-wise selected sources 5 KEGG RDF ChEBI RDF NYT RDF //TP1 SWDF RDF //TP2 LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9
  • 46. Problem Statement • An overestimation of triple pattern-wise source selection can be expensive – Resources are wasted – Query runtime is increased – Extra traffic is generated • How do we perform join-aware triple pattern wise source selection in time efficient way?
  • 47. HiBISCuS: Key Concept • Makes use of the URI’s authorities http://dbpedia.org/ontology/party Scheme Authority Path For URI details: http://tools.ietf.org/html/rfc3986
  • 48. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President
  • 49. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality
  • 50. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: party dbpedia: nationality ?party
  • 51. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: party dbpedia: nationality ?party ?x nyt:topi cPage ?page
  • 52. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: party dbpedia: nationality ?party ?x nyt:topi cPage ?page owl: SameAs
  • 53. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page Star simple hybrid Tail of hyperedge
  • 54. HiBISCuS: Data Summaries [] a ds:Service ; ds:endpointUrl <http://dbpedia.org/sparql> ; ds:capability [ ds:predicate dbpedia:party ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority <http://dbpedia.org/> ; ] ; ds:capability [ ds:predicate rdf:type ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct classes ] ; ds:capability [ ds:predicate dbpedia:postalCode ; ds:sbjAuthority <http://dbpedia.org/> ; #No objAuthority as the object value for dbpedia:postalCode is string ] ;
  • 55. HiBISCuS: Triple Pattern-wise Source Selection SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia KEGG NYT SWDF LMDB Geo DrgBnk Jamendo
  • 56. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia KEGG NYT SWDF DrgBnk LMDB Geo Jamendo Obj. auth. dbpedia Sbj. auth. Sbj. auth. KEGG Sbj. auth. NYT Sbj. auth. SWDF Sbj. auth. LMDB Sbj. auth. Geo Sbj. auth. DrgBnk Sbj. auth. Jamendo
  • 57. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia Sbj. auth. Sbj. auth. KEGG Sbj. auth. NYT Sbj. auth. SWDF Sbj. auth. LMDB Sbj. auth. Geo Sbj. auth. DrgBnk Sbj. auth. Jamendo dbpedia KEGG NYT SWDF DrgBnk LMDB Geo Jamendo Obj. auth.
  • 58. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia KEGG NYT SWDF DrgBnk LMDB Geo Jamendo Obj. auth.
  • 59. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page NYT Obj. auth.
  • 60. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page NYT Obj. auth.
  • 61. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page Total triple pattern-wise selected sources = 5 Total SPARQL ASK queries : 0
  • 63. Complete Local Integration • Triple patterns are individually and completely evaluated against every endpoint • Triple pattern results are locally integrated using different join techniques, e.g., NLJ, Hash Join etc. • Less efficient if query contains common predicates such rdf:type and owl:sameAs • Large amount of potentially irrelevant intermediate results retrieval
  • 64. Iterative Integration • Evaluate query iteratively pattern by pattern • Start with a single triple pattern • Substitute mappings from previous triple pattern in the subsequent evaluation • Evaluate query in a NLJ fashion • NLJ can cause many remote requests • Block NLJ fashion minimize the remote requests
  • 66. Join Order Selection • Left-deep trees – Joins take place in a left-to-right sequential order – Result of the join is used as an outer input for the next join – Used in FedX, DARQ • Right-deep trees – Joins take place in a right-to-left sequential order – Result of the join is used as an inner input for the next join • Bushy trees – Joins take place in sub-tress both on left and right sides – Used in ANAPSID • Dynamic programming – Used in SPLENDID
  • 67. Join Order Selection Example Compute Micronutrients using Drugbank and KEGG SELECT ?drug ?title WHERE { ?drug drugbank:drugCategory drugbank-cat:micronutrient. // TP1 ?drug drugbank:casRegistryNumber ?id . // TP2 ?keggDrug rdf:type kegg:Drug . // TP3 ?keggDrug bio2rdf:xRef ?id . // TP4 ?keggDrug dc:title ?title . // TP5 } 67 휋 ? 푑푟푢푔, ? 푡푖푡푙푒 TP1 TP2 TP3 TP4 TP5 Left-deep tree 휋 ? 푑푟푢푔, ? 푡푖푡푙푒 TP1 TP2 TP3 TP4 TP5 Right-deep tree Bushy tree 휋 ? 푑푟푢푔, ? 푡푖푡푙푒 TP1 TP2 TP3 TP5 TP4 Goal: Execute smallest cardinality joins first
  • 69. Join Order Optimization • Exclusive Groups – Group triple patterns with the same relevant data source – Evaluation in a single (remote) sub-query – Push join to the data source, i.e., endpoint • Variable count-heuristic – Iteratively determine the join order based on free variables count of triple patterns and groups – Consider “resolved ” variable mappings from earlier iteration • Using Selectivities – Store distinct predicates, avg. subject selectivities , and avg. object selectivities for each predicate in index – Use the predicate count, avg. subject selectivities , and avg. object selectivities to estimate the join cardinality
  • 70. Exclusive Groups SELECT ?President ?Party ?TopicPage WHERE { ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . ?President dbpedia:party ?Party . ?nytPresident owl:sameAs ?President . ?nytPresident nytimes:topicPage ?TopicPage . } Source Selection @ DBpedia @ DBpedia @ DBpedia, NYTimes @ NYTimes Exclusive Group Advantage: Delegate joins to the endpoint by forming exclusive groups (i.e. executing the respective patterns in a single subquery) 70
  • 71. Exclusive Groups Join Order Optimization 2 Unoptimized Internal Representation 1 SPARQL Query Compute Micronutrients using Drugbank and KEGG SELECT ?drug ?title WHERE { ?drug drugbank:drugCategory drugbank-cat:micronutrient . ?drug drugbank:casRegistryNumber ?id . ?keggDrug rdf:type kegg:Drug . ?keggDrug bio2rdf:xRef ?id . ?keggDrug dc:title ?title . } 3 Optimized Internal Representation 4x Local Join = 4x NLJ Exlusive Group  Remote Join 71
  • 72. Selectivity Based Join Order Optimization [] a sd:Service ; sd:endpointUrl <http://localhost:8890/sparql> ; sd:capability [ sd:predicate diseasome:name ; sd:totalTriples 147 ; // Total number of triple patterns with predicate value sd:predicate sd:avgSbjSel ``0.0068'' ; // 1/ distinct subjects with predicate value sd:predicate sd:avgObjSel ``0.0069'' ; // 1/ distinct Objects with predicate value sd:predicate ] ; sd:capability [ sd:predicate diseasome:chromosomalLocation ; sd:totalTtriples 160 ; sd:avgSbjSel ``0.0062'' ; sd:avgObjSel ``0.0072'' ; ] ; S1 P O1 . S1 P O2 . S2 P O1 . S3 P O2 . totalTriples = 4 avgSbjSel(p) = 1/3 avgObjSel(p) =1/2
  • 73. Selectivity Based Join Order Optimization • Triple pattern cardinality • Join Cardinality 푝 = pred(tp) , 푇 = Total triple having predicate 푝 퐶(푡푝) = 푇 푖푓 푛푒푖푡ℎ푒푟 푠푢푏푗푒푐푡 푛표푟 표푏푗푒푐푡 푖푠 푏표푢푛푑 푇 × 푎푣푔푆푏푗푆푒푙 푝 푖푓 푠푢푏푗푒푐푡 푖푠 푏표푢푛푑 푇 × 푎푣푔푂푏푗푆푒푙 푝 푖푓표푏푗푒푐푡 푖푠 푏표푢푛푑 퐶(퐽 푡푝1, 푡푝2 ) = 퐶 푡푝1 × 퐶 푡푝2 × 푎푣푔푃푟푒푑퐽표푖푛푆푒푙 푡푝1 × 푎푣푔푃푟푒푑퐽표푖푛푆푒푙 푡푝2 푖푓 푝 − 푝 푗표푖푛 퐶 푡푝1 × 퐶 푡푝2 × 푎푣푔푆푏푗퐽표푖푛푆푒푙 푡푝1 × 푎푣푔푆푏푗퐽표푖푛푆푒푙 푡푝2 푖푓 푠 − 푠 푗표푖푛 퐶 푡푝1 × 퐶 푡푝2 × 푎푣푔푆푏푗퐽표푖푛푆푒푙 푡푝1 × 푎푣푔푂푏푗퐽표푖푛푆푒푙 푡푝2 푖푓 푠 − 표 푗표푖푛 How to calculate avgPredJoinSel, avgSbjJoinSel, and avgObjJoinSel? DARQ selected 0.5 as the avgJoinSel value for all joins
  • 75. Join Implementations • Bound Joins – Start with a single triple pattern (lowest cardinality) – Substitute mappings from previous triple pattern in the subsequent evaluation – Bound Joins in NLJ fashion • Execute bound joins in nested loop join fashion • Too many remote requests – Bound Joins in Block NLJ fashion • Execute bound joins in block nested loop join fashion • Make use of SPARQL UNION construct • Remote requests are reduced by the block size • Other Join techniques – E.g, Hash Joins
  • 76. Bound Joins in Block NLJ SELECT ?President ?Party ?TopicPage WHERE { ?President rdf:type dbpedia:PresidentsOfTheUnitedStates . ?President dbpedia:party ?Party . ?nytPresident owl:sameAs ?President . ?nytPresident nytimes:topicPage ?TopicPage . } Assume that the following intermediate results have been computed as input for the last triple pattern Block Input “Barack Obama” “George W. Bush” … Before (NLJ) SELECT ?TopicPage WHERE { “Barack Obama” nytimes:topicPage ?TopicPage } SELECT ?TopicPage WHERE { “George W. Bush” nytimes:topicPage ?TopicPage } … Now: Evaluation in a single remote request using a SPARQL UNION construct + local post processing (SPARQL 1.0) 76
  • 77. Parallelization and Pipelining • Execute sub-queries concurrently on different data sources • Multithreaded worker pool to execute the joins and UNION operators in parallel • Pipelining approach for intermediate results • See FedX and LHD implementations
  • 79. Performance Metrics • Efficient source selection in terms of – Total triple pattern-wise sources selected – Total number of SPARQL ASK requests used during source selection – Source selection time • Query execution time • Results completeness and correctness • Number of remote requests during query execution • Index compression ratio (1- index size/datadump size) • See https://code.google.com/p/bigrdfbench/
  • 80. Evaluation Setup • Local dedicated network • Local SPARQL endpoints (One per machine) • Run each query 10 times and present the average results • Statistically analyzed the results, e.g., Wilcoxon signed rank test, student T-test
  • 81. SPARQL Query Federation Engines • FedX • SPLENDID • HiBISCuS+FedX • HiBISCuS+SPLENDID • ANAPSID • LHD • DARQ 81
  • 82. AKSW SPARQL Federation Publications • HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation by Muhammad Saleem and Axel-Cyrille Ngonga Ngomo, in (ESWC, 2014) • DAW: Duplicate-AWare Federated Query Processing over the Web of Data by Muhammad Saleem Axel-Cyrille Ngonga Ngomo, Josiane Xavier Parreira , Helena Deus , and Manfred Hauswirth , in (ISWC 2013). • TopFed: TCGA Tailored Federated Query Processing and Linking to LOD by Muhammad Saleem, Shanmukha Sampath , Axel-Cyrille Ngonga Ngomo , Aftab Iqbal, Jonas Almeida , and Helena F. Deus , in (Journal of Biomedical Semantics, 2014). • A Fine-Grained Evaluation of SPARQL Endpoint Federation Systems by Muhammad Saleem, Yasar Khan, Ali Hasnain, Ivan Ermilov, and Axel-Cyrille Ngonga Ngomo , in (Semantic Web Journal, 2014) • BigRDFBench: A Billion Triples Benchmark for SPARQL Query Federation by Muhammad Saleem, Ali Hasnain, Axel-Cyrille Ngonga Ngomo , in (submitted WWW, 2015). • SAFE: Policy-Aware SPARQL Query Federation Over RDF Data Cubes By Yasar Khan, Muhammed Saleem , Aftab Iqbal, Muntazir Mehdi, Aidan Hogan, Panagiotis Hasapis, Axel-Cyrille Ngonga Ngomo, Stefan Decker, and Ratnesh Sahay, in (SWAT4LS, 2014) • QFed: Query Set For Federated SPARQL Query Benchmark by Nur Aini Rakhmawati, Sarasi lithsena , Muhammad Saleem , Stefan Decker, in (iiWAS, 2014) 82