More Related Content Similar to HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation (20) More from Muhammad Saleem (13) HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation1. HiBISCuS: Hypergraph-Based Source
Selection for SPARQL Endpoint
Federation
Muhammad Saleem, Axel-Cyrille Ngonga Ngomo
{lastname}@informatik.uni-leipzig.de
Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany
11th Extended Semantic Web Conferece (ESWC), Crete, Greece, 25th-29th of May 2014
2. SPARQL Endpoint Federation
S1 S2 S3 S4
RDF RDF RDF RDF
Parser
Source Selection
Federator Optimizer
Integrator
Get individual
triple patterns
Identify capable
source against
individual triple
patterns
Generate
optimized sub-
query execution
plan
Integrate sub-
queries results
Execute sub-
queries
3. Our Contribution
S1 S2 S3 S4
RDF RDF RDF RDF
Parser
Source Selection
Federator Optimizer
Integrator
Efficiently identify
capable source
against individual
triple patterns of
a SPARQL query
4. Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
5. Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
6. Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
7. Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
8. Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
9. Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Total triple pattern-wise selected sources = 12
Total SPARQL ASK queries : 9*5 = 45
10. Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Total triple pattern-wise selected sources = 12
Total SPARQL ASK queries : 9*5 = 45
11. Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
TP3 = S1
Optimal triple pattern-wise selected sources 5
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
12. Problem Statement
• An overestimation of triple pattern-wise
source selection can be expensive
– Resources are wasted
– Query runtime is increased
– Extra traffic is generated
• How do we perform join-aware triple pattern
wise source selection in time efficient way?
13. HiBISCuS: Key Concept
• Makes use of the URI’s authorities
http://dbpedia.org/ontology/party
Scheme Authority Path
For URI details: http://tools.ietf.org/html/rfc3986
14. HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
15. HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
16. HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
dbpedia:
party
?party
17. HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
dbpedia:
party
?party
?x
nyt:topi
cPage
?page
18. HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
dbpedia:
party
?party
?x
nyt:topi
cPage
?page
owl:
SameAs
19. HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party
?party
nyt:topi
cPage
?page
Star simple hybrid Tail of hyperedge
20. HiBISCuS: Data Summaries
[] a ds:Service ;
ds:endpointUrl <http://dbpedia.org/sparql> ;
ds:capability [
ds:predicate dbpedia:party ;
ds:sbjAuthority <http://dbpedia.org/> ;
ds:objAuthority <http://dbpedia.org/> ;
] ;
ds:capability [
ds:predicate rdf:type ;
ds:sbjAuthority <http://dbpedia.org/> ;
ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct
classes
] ;
ds:capability [
ds:predicate dbpedia:postalCode ;
ds:sbjAuthority <http://dbpedia.org/> ;
#No objAuthority as the object value for dbpedia:postalCode is string
] ;
21. HiBISCuS: Triple Pattern-wise Source Selection
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia KEGG NYT SWDF LMDB Geo DrgBnk Jamendo
22. HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.
auth.
dbpedia
Sbj. auth.
KEGG
Sbj. auth.
NYT
Sbj. auth.
SWDF
Sbj. auth.
LMDB
Sbj. auth.
Geo
Sbj. auth.
DrgBnk
Sbj. auth.
Jamendo
Sbj. auth.
23. HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia
Sbj. auth.
KEGG
Sbj. auth.
NYT
Sbj. auth.
SWDF
Sbj. auth.
LMDB
Sbj. auth.
Geo
Sbj. auth.
DrgBnk
Sbj. auth.
Jamendo
Sbj. auth.
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.
auth.
24. HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.
auth.
25. HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
NYT
Obj. auth.
26. HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
NYT
Obj. auth.
27. HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
Total triple pattern-wise selected sources = 5
Total SPARQL ASK queries : 0
28. Experimental Setup
• Benchmark
– FedBench
– Real-world datasets collection
– Real queries showing typical request
– Used all of the 25 queries
• HiBISCuS Extensions
– FedX 2.0
– DARQ
– SPLENDID
• We also included ANAPSID (version of 12/2013)
without HiBISCus extension
29. Experimental Setup
• Metrics
– Index generation time
– Index size
– Total triple pattern-wise sources selected
– Total number of SPARQL ASK requests used
– Source selection time
– Query execution time
30. Index Generation Time and Compression Ratio
FedX SPLENDIDLHD DARQ ANAPSID ADERIS Qtree HiBISCus
Index
Generation
Time (min) NA 75 75 102 6 6 - 36
Compression
Ratio NA 99.998 99.998 99.997 99.999 99.999 96.000 99.997
Compression Ratio = 1 - index size/ total datadump size
31. Efficient Source Selection
FedBench Cross Domain Queries
System #TP #AR SST(ms)
FedX(cold) 78 252 317.8
FedX(warm) 78 0 7.3
SPLENDID 78 99 320.8
DARQ 84 0 7.2
ANAPSID 36 43 186
HiBISCuS(cold) 35 27 63
HiBISCuS(warm) 35 0 30.4
FedBench Life Sciences Queries
System #TP #AR SST(ms)
FedX(cold) 56 297 375.5
FedX(warm) 56 0 8.2
SPLENDID 56 90 307.2
DARQ 77 0 7.5
ANAPSID 44 63 477.4
HiBISCuS(cold) 41 18 31.8
HiBISCuS(warm) 41 0 23.1
FedBench Linked Data Queries
System #TP #AR SST(ms)
FedX(cold) 97 369 307.8
FedX(warm) 97 0 8
SPLENDID 97 126 279
DARQ 113 0 7.7
ANAPSID 54 37 803.5
HiBISCuS(cold) 47 9 20.6
HiBISCuS(warm) 47 0 16
Overall
System #TP #AR SST(ms)
FedX(cold) 231 918 330
FedX(warm) 231 0 8
SPLENDID 231 315 299
DARQ 274 0 7.5
ANAPSID 134 143 554
HiBISCuS(cold) 123 54 35.6
HiBISCuS(warm) 123 0 22
32. Efficient Source Selection
FedBench Cross Domain Queries
System #TP #AR SST(ms)
FedX(cold) 78 252 317.8
FedX(warm) 78 0 7.3
SPLENDID 78 99 320.8
DARQ 84 0 7.2
ANAPSID 36 43 186
HiBISCuS(cold) 35 27 63
HiBISCuS(warm) 35 0 30.4
FedBench Life Sciences Queries
System #TP #AR SST(ms)
FedX(cold) 56 297 375.5
FedX(warm) 56 0 8.2
SPLENDID 56 90 307.2
DARQ 77 0 7.5
ANAPSID 44 63 477.4
HiBISCuS(cold) 41 18 31.8
HiBISCuS(warm) 41 0 23.1
FedBench Linked Data Queries
System #TP #AR SST(ms)
FedX(cold) 97 369 307.8
FedX(warm) 97 0 8
SPLENDID 97 126 279
DARQ 113 0 7.7
ANAPSID 54 37 803.5
HiBISCuS(cold) 47 9 20.6
HiBISCuS(warm) 47 0 16
Overall
System #TP #AR SST(ms)
FedX(cold) 231 918 330
FedX(warm) 231 0 8
SPLENDID 231 315 299
DARQ 274 0 7.5
ANAPSID 134 143 554
HiBISCuS(cold) 123 54 35.6
HiBISCuS(warm) 123 0 22
33. FedX Extension with HiBISCuS
Improvement in 20/25 queries with net performance improvement 24.61%
0
50
100
150
200
250
300
350
400
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.
Queryexecutiontime(ms)
FedX (warm) FedX+HiBISCus
34. SPLENDID Extension with HiBISCuS
Improvement in 25/25 queries with net performance improvement 82.72%
0
200
400
600
800
1000
1200
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.
Queryexecutiontime(ms)
SPLENDID SPLENDID+HiBISCus
35. DARQ Extension with HiBISCuS
Improvement in 21/21 queries with net performance improvement 92.22%
0.01
0.1
1
10
100
1000
10000
100000
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg
Queryexecutiontime(ms)logscale
Hundreds
DARQ DARQ+HiBISCus
Notsupported
Notsupported
Runtimeerror
Runtimeerror
Runtimeerror
Runtimeerror
Timeout
Timeout
Notsupported
Notsupported
Timeout
Timeout
36. SPLENDID+HiBISCuS vs ANAPSID
Improvement in 24/24 queries with net performance improvement 98%
0.01
0.1
1
10
100
1000
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.
Queryexecutiontime(ms)logscale
Hundreds
ANAPSID SPLENDID+HiBISCus
ZeroResults
37. Conclusion and Future Work
• An overestimation of triple pattern-wise source
selection can be expensive
• Join-aware triple pattern-wise source selection is
more efficient than simple triple pattern-wise source
selection
• Performance improvements
– FedX : 24.61 %
– SPLENDID: 82.72 %
– DARQ: 92.22 %
– SPLENDID+HiBISCus is 98% faster than ANAPSID
• BigRDFBench is on the roadmap
38. Thanks!
homepage and source code:
https://code.google.com/p/hibiscusfederation/
{saleem,ngonga}@informatik.uni-leipzig.de
AKSW, University of Leipzig, Germany
39. Source Selection
• Triple pattern-wise source selection
– Ensures 100% recall
– Can over-estimate capable sources
– Can be expensive, e.g., total number of SPARQL ASK
requests used
– Performed by FedX, SPLENDID, LHD, DARQ, ADERIS etc.
• Join-aware triple-pattern wise source selection
– Ensures 100% recall
– May selects optimal/close to optimal capable sources
– Can be expensive, e.g., total number of SPARQL ASK
requests used
– Can significantly reduce the query execution time
– Performed by ANAPSID