HiBISCuS: Hypergraph-Based Source
Selection for SPARQL Endpoint
Federation
Muhammad Saleem, Axel-Cyrille Ngonga Ngomo
{lastname}@informatik.uni-leipzig.de
Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany
11th Extended Semantic Web Conferece (ESWC), Crete, Greece, 25th-29th of May 2014
SPARQL Endpoint Federation
S1 S2 S3 S4
RDF RDF RDF RDF
Parser
Source Selection
Federator Optimizer
Integrator
Get individual
triple patterns
Identify capable
source against
individual triple
patterns
Generate
optimized sub-
query execution
plan
Integrate sub-
queries results
Execute sub-
queries
Our Contribution
S1 S2 S3 S4
RDF RDF RDF RDF
Parser
Source Selection
Federator Optimizer
Integrator
Efficiently identify
capable source
against individual
triple patterns of
a SPARQL query
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Total triple pattern-wise selected sources = 12
Total SPARQL ASK queries : 9*5 = 45
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Total triple pattern-wise selected sources = 12
Total SPARQL ASK queries : 9*5 = 45
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
TP3 = S1
Optimal triple pattern-wise selected sources 5
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Problem Statement
• An overestimation of triple pattern-wise
source selection can be expensive
– Resources are wasted
– Query runtime is increased
– Extra traffic is generated
• How do we perform join-aware triple pattern
wise source selection in time efficient way?
HiBISCuS: Key Concept
• Makes use of the URI’s authorities
http://dbpedia.org/ontology/party
Scheme Authority Path
For URI details: http://tools.ietf.org/html/rfc3986
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
dbpedia:
party
?party
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
dbpedia:
party
?party
?x
nyt:topi
cPage
?page
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
dbpedia:
party
?party
?x
nyt:topi
cPage
?page
owl:
SameAs
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party
?party
nyt:topi
cPage
?page
Star simple hybrid Tail of hyperedge
HiBISCuS: Data Summaries
[] a ds:Service ;
ds:endpointUrl <http://dbpedia.org/sparql> ;
ds:capability [
ds:predicate dbpedia:party ;
ds:sbjAuthority <http://dbpedia.org/> ;
ds:objAuthority <http://dbpedia.org/> ;
] ;
ds:capability [
ds:predicate rdf:type ;
ds:sbjAuthority <http://dbpedia.org/> ;
ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct
classes
] ;
ds:capability [
ds:predicate dbpedia:postalCode ;
ds:sbjAuthority <http://dbpedia.org/> ;
#No objAuthority as the object value for dbpedia:postalCode is string
] ;
HiBISCuS: Triple Pattern-wise Source Selection
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia KEGG NYT SWDF LMDB Geo DrgBnk Jamendo
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.
auth.
dbpedia
Sbj. auth.
KEGG
Sbj. auth.
NYT
Sbj. auth.
SWDF
Sbj. auth.
LMDB
Sbj. auth.
Geo
Sbj. auth.
DrgBnk
Sbj. auth.
Jamendo
Sbj. auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia
Sbj. auth.
KEGG
Sbj. auth.
NYT
Sbj. auth.
SWDF
Sbj. auth.
LMDB
Sbj. auth.
Geo
Sbj. auth.
DrgBnk
Sbj. auth.
Jamendo
Sbj. auth.
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.
auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.
auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
NYT
Obj. auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
NYT
Obj. auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
Total triple pattern-wise selected sources = 5
Total SPARQL ASK queries : 0
Experimental Setup
• Benchmark
– FedBench
– Real-world datasets collection
– Real queries showing typical request
– Used all of the 25 queries
• HiBISCuS Extensions
– FedX 2.0
– DARQ
– SPLENDID
• We also included ANAPSID (version of 12/2013)
without HiBISCus extension
Experimental Setup
• Metrics
– Index generation time
– Index size
– Total triple pattern-wise sources selected
– Total number of SPARQL ASK requests used
– Source selection time
– Query execution time
Index Generation Time and Compression Ratio
FedX SPLENDIDLHD DARQ ANAPSID ADERIS Qtree HiBISCus
Index
Generation
Time (min) NA 75 75 102 6 6 - 36
Compression
Ratio NA 99.998 99.998 99.997 99.999 99.999 96.000 99.997
Compression Ratio = 1 - index size/ total datadump size
Efficient Source Selection
FedBench Cross Domain Queries
System #TP #AR SST(ms)
FedX(cold) 78 252 317.8
FedX(warm) 78 0 7.3
SPLENDID 78 99 320.8
DARQ 84 0 7.2
ANAPSID 36 43 186
HiBISCuS(cold) 35 27 63
HiBISCuS(warm) 35 0 30.4
FedBench Life Sciences Queries
System #TP #AR SST(ms)
FedX(cold) 56 297 375.5
FedX(warm) 56 0 8.2
SPLENDID 56 90 307.2
DARQ 77 0 7.5
ANAPSID 44 63 477.4
HiBISCuS(cold) 41 18 31.8
HiBISCuS(warm) 41 0 23.1
FedBench Linked Data Queries
System #TP #AR SST(ms)
FedX(cold) 97 369 307.8
FedX(warm) 97 0 8
SPLENDID 97 126 279
DARQ 113 0 7.7
ANAPSID 54 37 803.5
HiBISCuS(cold) 47 9 20.6
HiBISCuS(warm) 47 0 16
Overall
System #TP #AR SST(ms)
FedX(cold) 231 918 330
FedX(warm) 231 0 8
SPLENDID 231 315 299
DARQ 274 0 7.5
ANAPSID 134 143 554
HiBISCuS(cold) 123 54 35.6
HiBISCuS(warm) 123 0 22
Efficient Source Selection
FedBench Cross Domain Queries
System #TP #AR SST(ms)
FedX(cold) 78 252 317.8
FedX(warm) 78 0 7.3
SPLENDID 78 99 320.8
DARQ 84 0 7.2
ANAPSID 36 43 186
HiBISCuS(cold) 35 27 63
HiBISCuS(warm) 35 0 30.4
FedBench Life Sciences Queries
System #TP #AR SST(ms)
FedX(cold) 56 297 375.5
FedX(warm) 56 0 8.2
SPLENDID 56 90 307.2
DARQ 77 0 7.5
ANAPSID 44 63 477.4
HiBISCuS(cold) 41 18 31.8
HiBISCuS(warm) 41 0 23.1
FedBench Linked Data Queries
System #TP #AR SST(ms)
FedX(cold) 97 369 307.8
FedX(warm) 97 0 8
SPLENDID 97 126 279
DARQ 113 0 7.7
ANAPSID 54 37 803.5
HiBISCuS(cold) 47 9 20.6
HiBISCuS(warm) 47 0 16
Overall
System #TP #AR SST(ms)
FedX(cold) 231 918 330
FedX(warm) 231 0 8
SPLENDID 231 315 299
DARQ 274 0 7.5
ANAPSID 134 143 554
HiBISCuS(cold) 123 54 35.6
HiBISCuS(warm) 123 0 22
FedX Extension with HiBISCuS
Improvement in 20/25 queries with net performance improvement 24.61%
0
50
100
150
200
250
300
350
400
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.
Queryexecutiontime(ms)
FedX (warm) FedX+HiBISCus
SPLENDID Extension with HiBISCuS
Improvement in 25/25 queries with net performance improvement 82.72%
0
200
400
600
800
1000
1200
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.
Queryexecutiontime(ms)
SPLENDID SPLENDID+HiBISCus
DARQ Extension with HiBISCuS
Improvement in 21/21 queries with net performance improvement 92.22%
0.01
0.1
1
10
100
1000
10000
100000
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg
Queryexecutiontime(ms)logscale
Hundreds
DARQ DARQ+HiBISCus
Notsupported
Notsupported
Runtimeerror
Runtimeerror
Runtimeerror
Runtimeerror
Timeout
Timeout
Notsupported
Notsupported
Timeout
Timeout
SPLENDID+HiBISCuS vs ANAPSID
Improvement in 24/24 queries with net performance improvement 98%
0.01
0.1
1
10
100
1000
CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.
Queryexecutiontime(ms)logscale
Hundreds
ANAPSID SPLENDID+HiBISCus
ZeroResults
Conclusion and Future Work
• An overestimation of triple pattern-wise source
selection can be expensive
• Join-aware triple pattern-wise source selection is
more efficient than simple triple pattern-wise source
selection
• Performance improvements
– FedX : 24.61 %
– SPLENDID: 82.72 %
– DARQ: 92.22 %
– SPLENDID+HiBISCus is 98% faster than ANAPSID
• BigRDFBench is on the roadmap
Thanks!
homepage and source code:
https://code.google.com/p/hibiscusfederation/
{saleem,ngonga}@informatik.uni-leipzig.de
AKSW, University of Leipzig, Germany
Source Selection
• Triple pattern-wise source selection
– Ensures 100% recall
– Can over-estimate capable sources
– Can be expensive, e.g., total number of SPARQL ASK
requests used
– Performed by FedX, SPLENDID, LHD, DARQ, ADERIS etc.
• Join-aware triple-pattern wise source selection
– Ensures 100% recall
– May selects optimal/close to optimal capable sources
– Can be expensive, e.g., total number of SPARQL ASK
requests used
– Can significantly reduce the query execution time
– Performed by ANAPSID

HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

  • 1.
    HiBISCuS: Hypergraph-Based Source Selectionfor SPARQL Endpoint Federation Muhammad Saleem, Axel-Cyrille Ngonga Ngomo {lastname}@informatik.uni-leipzig.de Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany 11th Extended Semantic Web Conferece (ESWC), Crete, Greece, 25th-29th of May 2014
  • 2.
    SPARQL Endpoint Federation S1S2 S3 S4 RDF RDF RDF RDF Parser Source Selection Federator Optimizer Integrator Get individual triple patterns Identify capable source against individual triple patterns Generate optimized sub- query execution plan Integrate sub- queries results Execute sub- queries
  • 3.
    Our Contribution S1 S2S3 S4 RDF RDF RDF RDF Parser Source Selection Federator Optimizer Integrator Efficiently identify capable source against individual triple patterns of a SPARQL query
  • 4.
    Motivation FedBench (LD3): Returnfor all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2
  • 5.
    Motivation FedBench (LD3): Returnfor all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1
  • 6.
    Motivation FedBench (LD3): Returnfor all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1
  • 7.
    Motivation FedBench (LD3): Returnfor all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4
  • 8.
    Motivation FedBench (LD3): Returnfor all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 S6 S7 S8 S9
  • 9.
    Motivation FedBench (LD3): Returnfor all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 S6 S7 S8 S9 Total triple pattern-wise selected sources = 12 Total SPARQL ASK queries : 9*5 = 45
  • 10.
    Motivation FedBench (LD3): Returnfor all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 S6 S7 S8 S9 Total triple pattern-wise selected sources = 12 Total SPARQL ASK queries : 9*5 = 45
  • 11.
    Motivation FedBench (LD3): Returnfor all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = TP3 = S1 Optimal triple pattern-wise selected sources 5 KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 S6 S7 S8 S9
  • 12.
    Problem Statement • Anoverestimation of triple pattern-wise source selection can be expensive – Resources are wasted – Query runtime is increased – Extra traffic is generated • How do we perform join-aware triple pattern wise source selection in time efficient way?
  • 13.
    HiBISCuS: Key Concept •Makes use of the URI’s authorities http://dbpedia.org/ontology/party Scheme Authority Path For URI details: http://tools.ietf.org/html/rfc3986
  • 14.
    HiBISCuS: SPARQL Queryas Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President
  • 15.
    HiBISCuS: SPARQL Queryas Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality
  • 16.
    HiBISCuS: SPARQL Queryas Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality dbpedia: party ?party
  • 17.
    HiBISCuS: SPARQL Queryas Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality dbpedia: party ?party ?x nyt:topi cPage ?page
  • 18.
    HiBISCuS: SPARQL Queryas Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality dbpedia: party ?party ?x nyt:topi cPage ?page owl: SameAs
  • 19.
    HiBISCuS: SPARQL Queryas Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page Star simple hybrid Tail of hyperedge
  • 20.
    HiBISCuS: Data Summaries []a ds:Service ; ds:endpointUrl <http://dbpedia.org/sparql> ; ds:capability [ ds:predicate dbpedia:party ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority <http://dbpedia.org/> ; ] ; ds:capability [ ds:predicate rdf:type ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct classes ] ; ds:capability [ ds:predicate dbpedia:postalCode ; ds:sbjAuthority <http://dbpedia.org/> ; #No objAuthority as the object value for dbpedia:postalCode is string ] ;
  • 21.
    HiBISCuS: Triple Pattern-wiseSource Selection SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia KEGG NYT SWDF LMDB Geo DrgBnk Jamendo
  • 22.
    HiBISCuS: Triple Pattern-wiseSource Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia KEGG NYT SWDF DrgBnk LMDB Geo Jamendo Obj. auth. dbpedia Sbj. auth. KEGG Sbj. auth. NYT Sbj. auth. SWDF Sbj. auth. LMDB Sbj. auth. Geo Sbj. auth. DrgBnk Sbj. auth. Jamendo Sbj. auth.
  • 23.
    HiBISCuS: Triple Pattern-wiseSource Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia Sbj. auth. KEGG Sbj. auth. NYT Sbj. auth. SWDF Sbj. auth. LMDB Sbj. auth. Geo Sbj. auth. DrgBnk Sbj. auth. Jamendo Sbj. auth. dbpedia KEGG NYT SWDF DrgBnk LMDB Geo Jamendo Obj. auth.
  • 24.
    HiBISCuS: Triple Pattern-wiseSource Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia KEGG NYT SWDF DrgBnk LMDB Geo Jamendo Obj. auth.
  • 25.
    HiBISCuS: Triple Pattern-wiseSource Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page NYT Obj. auth.
  • 26.
    HiBISCuS: Triple Pattern-wiseSource Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page NYT Obj. auth.
  • 27.
    HiBISCuS: Triple Pattern-wiseSource Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page Total triple pattern-wise selected sources = 5 Total SPARQL ASK queries : 0
  • 28.
    Experimental Setup • Benchmark –FedBench – Real-world datasets collection – Real queries showing typical request – Used all of the 25 queries • HiBISCuS Extensions – FedX 2.0 – DARQ – SPLENDID • We also included ANAPSID (version of 12/2013) without HiBISCus extension
  • 29.
    Experimental Setup • Metrics –Index generation time – Index size – Total triple pattern-wise sources selected – Total number of SPARQL ASK requests used – Source selection time – Query execution time
  • 30.
    Index Generation Timeand Compression Ratio FedX SPLENDIDLHD DARQ ANAPSID ADERIS Qtree HiBISCus Index Generation Time (min) NA 75 75 102 6 6 - 36 Compression Ratio NA 99.998 99.998 99.997 99.999 99.999 96.000 99.997 Compression Ratio = 1 - index size/ total datadump size
  • 31.
    Efficient Source Selection FedBenchCross Domain Queries System #TP #AR SST(ms) FedX(cold) 78 252 317.8 FedX(warm) 78 0 7.3 SPLENDID 78 99 320.8 DARQ 84 0 7.2 ANAPSID 36 43 186 HiBISCuS(cold) 35 27 63 HiBISCuS(warm) 35 0 30.4 FedBench Life Sciences Queries System #TP #AR SST(ms) FedX(cold) 56 297 375.5 FedX(warm) 56 0 8.2 SPLENDID 56 90 307.2 DARQ 77 0 7.5 ANAPSID 44 63 477.4 HiBISCuS(cold) 41 18 31.8 HiBISCuS(warm) 41 0 23.1 FedBench Linked Data Queries System #TP #AR SST(ms) FedX(cold) 97 369 307.8 FedX(warm) 97 0 8 SPLENDID 97 126 279 DARQ 113 0 7.7 ANAPSID 54 37 803.5 HiBISCuS(cold) 47 9 20.6 HiBISCuS(warm) 47 0 16 Overall System #TP #AR SST(ms) FedX(cold) 231 918 330 FedX(warm) 231 0 8 SPLENDID 231 315 299 DARQ 274 0 7.5 ANAPSID 134 143 554 HiBISCuS(cold) 123 54 35.6 HiBISCuS(warm) 123 0 22
  • 32.
    Efficient Source Selection FedBenchCross Domain Queries System #TP #AR SST(ms) FedX(cold) 78 252 317.8 FedX(warm) 78 0 7.3 SPLENDID 78 99 320.8 DARQ 84 0 7.2 ANAPSID 36 43 186 HiBISCuS(cold) 35 27 63 HiBISCuS(warm) 35 0 30.4 FedBench Life Sciences Queries System #TP #AR SST(ms) FedX(cold) 56 297 375.5 FedX(warm) 56 0 8.2 SPLENDID 56 90 307.2 DARQ 77 0 7.5 ANAPSID 44 63 477.4 HiBISCuS(cold) 41 18 31.8 HiBISCuS(warm) 41 0 23.1 FedBench Linked Data Queries System #TP #AR SST(ms) FedX(cold) 97 369 307.8 FedX(warm) 97 0 8 SPLENDID 97 126 279 DARQ 113 0 7.7 ANAPSID 54 37 803.5 HiBISCuS(cold) 47 9 20.6 HiBISCuS(warm) 47 0 16 Overall System #TP #AR SST(ms) FedX(cold) 231 918 330 FedX(warm) 231 0 8 SPLENDID 231 315 299 DARQ 274 0 7.5 ANAPSID 134 143 554 HiBISCuS(cold) 123 54 35.6 HiBISCuS(warm) 123 0 22
  • 33.
    FedX Extension withHiBISCuS Improvement in 20/25 queries with net performance improvement 24.61% 0 50 100 150 200 250 300 350 400 CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg. Queryexecutiontime(ms) FedX (warm) FedX+HiBISCus
  • 34.
    SPLENDID Extension withHiBISCuS Improvement in 25/25 queries with net performance improvement 82.72% 0 200 400 600 800 1000 1200 CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg. Queryexecutiontime(ms) SPLENDID SPLENDID+HiBISCus
  • 35.
    DARQ Extension withHiBISCuS Improvement in 21/21 queries with net performance improvement 92.22% 0.01 0.1 1 10 100 1000 10000 100000 CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg Queryexecutiontime(ms)logscale Hundreds DARQ DARQ+HiBISCus Notsupported Notsupported Runtimeerror Runtimeerror Runtimeerror Runtimeerror Timeout Timeout Notsupported Notsupported Timeout Timeout
  • 36.
    SPLENDID+HiBISCuS vs ANAPSID Improvementin 24/24 queries with net performance improvement 98% 0.01 0.1 1 10 100 1000 CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg. Queryexecutiontime(ms)logscale Hundreds ANAPSID SPLENDID+HiBISCus ZeroResults
  • 37.
    Conclusion and FutureWork • An overestimation of triple pattern-wise source selection can be expensive • Join-aware triple pattern-wise source selection is more efficient than simple triple pattern-wise source selection • Performance improvements – FedX : 24.61 % – SPLENDID: 82.72 % – DARQ: 92.22 % – SPLENDID+HiBISCus is 98% faster than ANAPSID • BigRDFBench is on the roadmap
  • 38.
    Thanks! homepage and sourcecode: https://code.google.com/p/hibiscusfederation/ {saleem,ngonga}@informatik.uni-leipzig.de AKSW, University of Leipzig, Germany
  • 39.
    Source Selection • Triplepattern-wise source selection – Ensures 100% recall – Can over-estimate capable sources – Can be expensive, e.g., total number of SPARQL ASK requests used – Performed by FedX, SPLENDID, LHD, DARQ, ADERIS etc. • Join-aware triple-pattern wise source selection – Ensures 100% recall – May selects optimal/close to optimal capable sources – Can be expensive, e.g., total number of SPARQL ASK requests used – Can significantly reduce the query execution time – Performed by ANAPSID