SlideShare a Scribd company logo
1 of 67
Download to read offline
Federated SPARQL Queries
Processing with Replicated
Fragments
Gabriela Montoya1
Hala Skaf-Molli1
Pascal Molli1
Maria-Esther Vidal2
1LINA – Nantes University, France
({first.last}@univ-nantes.fr)
2Universidad Sim´on Bol´ıvar, Venezuela
(mvidal@ldc.usb.ve)
October 13, 2015
ISWC2015
Federated Query Engines poorly support
replication
Federated query engines allow to
consume linked data without
moving data.
Unfortunately, in presence of
replication, the performance of
federated query engines is
degraded.
2
Replicated data decreases federated query
engines performance
select distinct ?p ?m ?n ?d where {
?p dbprop : name ?m .
?p dbprop : n a t i o n a l i t y ?n .
?p dbprop : d o c t o r a l A d v i s o r ?d
}
#DBpedia Replicas FedX1
Execution Time (ms)
1 1,392
2 215,907
1
Schwarte et al. Fedx: Optimization techniques for federated query processing on
linked data. In ISWC2011
3
Users may replicate only the fragments
relevant for their queries
A triple pattern fragment is defined by the
dataset it has been replicated from and a
CONSTRUCT query with a triple pattern.
Fragment with the doctoral advisors triples:
<http://dbpedia.org/sparql, CONSTRUCT
WHERE { ?p dbprop:doctoralAdvisor ?a }>
Replicating fragments from different datasets
provides new data localities and opens new
opportunities for optimization.
4
Existing public endpoints may not be the
best choice for federated queries
DBpedia LinkedMDB
C1
select distinct * where {
?director dbo : nationality ?nat .
?film dbo : director ?director .
?movie owl : sameAs ?film .
?movie linkedmdb : genre ?genre }
client
5
Endpoints that replicate fragments give
place to new data localities
DBpedia LinkedMDB
C1
select distinct * where {
?director dbo : nationality ?nat .
?film dbo : director ?director .
?movie owl : sameAs ?film .
?movie linkedmdb : genre ?genre }
client
?d dbo:nationality ?n
?f dbo:director ?d
?m owl:sameAs ?f
?m linkedmdb:genre ?g
5
Selecting all the sources leads to poor
engine performance
DBpedia LinkedMDB
C1
select distinct * where {
?director dbo : nationality ?nat .
?film dbo : director ?director .
?movie owl : sameAs ?film .
?movie linkedmdb : genre ?genre }
client
?d dbo:nationality ?n
?f dbo:director ?d
?m owl:sameAs ?f
?m linkedmdb:genre ?g
Triples to
transfer
s1 s2 s3 s4 s5
DBpedia 166,177 3,229 3,229 0 0
LinkedMDB 76,180 13,430 0 13,430 0
C1 242,357 0 13,430 3,229 48
Execution
Time (s) 20.22 2.64 2.50 2.79 0.65
5
Selecting non-overlapping data may not
be good enough
DBpedia LinkedMDB
C1
select distinct * where {
?director dbo : nationality ?nat .
?film dbo : director ?director .
?movie owl : sameAs ?film .
?movie linkedmdb : genre ?genre }
client
?d dbo:nationality ?n
?f dbo:director ?d
?m owl:sameAs ?f
?m linkedmdb:genre ?g
Triples to
transfer
s1 s2 s3 s4 s5
DBpedia 166,177 3,229 3,229 0 0
LinkedMDB 76,180 13,430 0 13,430 0
C1 242,357 0 13,430 3,229 48
Execution
Time (s) 20.22 2.64 2.50 2.79 0.65
5
Selecting non-overlapping data may not
be good enough
DBpedia LinkedMDB
C1
select distinct * where {
?director dbo : nationality ?nat .
?film dbo : director ?director .
?movie owl : sameAs ?film .
?movie linkedmdb : genre ?genre }
client
?d dbo:nationality ?n
?f dbo:director ?d
?m owl:sameAs ?f
?m linkedmdb:genre ?g
Triples to
transfer
s1 s2 s3 s4 s5
DBpedia 166,177 3,229 3,229 0 0
LinkedMDB 76,180 13,430 0 13,430 0
C1 242,357 0 13,430 3,229 48
Execution
Time (s) 20.22 2.64 2.50 2.79 0.65
5
Selecting non-overlapping data may not
be good enough
DBpedia LinkedMDB
C1
select distinct * where {
?director dbo : nationality ?nat .
?film dbo : director ?director .
?movie owl : sameAs ?film .
?movie linkedmdb : genre ?genre }
client
?d dbo:nationality ?n
?f dbo:director ?d
?m owl:sameAs ?f
?m linkedmdb:genre ?g
Triples to
transfer
s1 s2 s3 s4 s5
DBpedia 166,177 3,229 3,229 0 0
LinkedMDB 76,180 13,430 0 13,430 0
C1 242,357 0 13,430 3,229 48
Execution
Time (s) 20.22 2.64 2.50 2.79 0.65
5
Selecting sources able of evaluating joins
reduces the number of transferred tuples
DBpedia LinkedMDB
C1
select distinct * where {
?director dbo : nationality ?nat .
?film dbo : director ?director .
?movie owl : sameAs ?film .
?movie linkedmdb : genre ?genre }
client
?d dbo:nationality ?n
?f dbo:director ?d
?m owl:sameAs ?f
?m linkedmdb:genre ?g
Triples to
transfer
s1 s2 s3 s4 s5
DBpedia 166,177 3,229 3,229 0 0
LinkedMDB 76,180 13,430 0 13,430 0
C1 242,357 0 13,430 3,229 48
Execution
Time (s) 20.22 2.64 2.50 2.79 0.65
5
The best choice transfers less intermediate
results
DBpedia LinkedMDB
C1
select distinct * where {
?director dbo : nationality ?nat .
?film dbo : director ?director .
?movie owl : sameAs ?film .
?movie linkedmdb : genre ?genre }
client
Triples to
transfer
s1 s2 s3 s4 s5
DBpedia 166,177 3,229 3,229 0 0
LinkedMDB 76,180 13,430 0 13,430 0
C1 242,357 0 13,430 3,229 48
Execution
Time (s) 20.22 2.64 2.50 2.79 0.65
5
???
DBpedia LinkedMDB
C1 C2 C3
select distinct
?director ?nat ?genre where {
?director dbo : nationality ?nat . (tp1)
?film dbo : director ?director . (tp2)
?movie owl : sameAs ?film . (tp3)
?movie linkedmdb : genre ?genre } (tp4)
f 2, f 6
f 4
f 2, f 7 f 3, f 5 f 3, f 4
f 2
tp1, tp2, tp4 tp1, tp2, tp3, tp4 tp2, tp3, tp4
F CONSTRUCT WHERE { %s% }
f2 ?film dbo:director ?director
f3 ?movie owl:sameAs ?film
f4 ?movie linkedmdb:genre ?genre
f5 ?movie linkedmdb:genre film genre:14
f6 ?director dbo:nationality dbr:France
f7 ?director dbo:nationality dbr:United Kingdom
6
Selecting less endpoints does not always
produce less intermediate results
?director
dbo:nationality
dbr:France
f5
C2, C4
?film
dbo:director
?director
f2
C3, C4, C5, C6
?director
dbo:nationality
dbr:United Kingdom
f6
C2, C5
?film
dbo:director
?director
f2
C3, C4, C5, C6
?director
dbo:nationality
dbr:United States
f7
C2, C6
?film
dbo:director
?director
f2
C3, C4, C5, C6
7
Triple pattern wise source selection misses
data localities
?director
dbo:nationality
dbr:France
f5
C2, C4
?film
dbo:director
?director
f2
C3, C4, C5, C6
?director
dbo:nationality
dbr:United Kingdom
f6
C2, C5
?film
dbo:director
?director
f2
C3, C4, C5, C6
?director
dbo:nationality
dbr:United States
f7
C2, C6
?film
dbo:director
?director
f2
C3, C4, C5, C6
Triples to
transfer
s1 s2
C2 27,462 0
C3 238,077 0
C4 0 141
C5 0 103
C6 0 1,026
7
Selecting endpoints in a BGP wise fashion
reduces the intermediate results
?director
dbo:nationality
dbr:France
f5
C2, C4
?film
dbo:director
?director
f2
C3, C4, C5, C6
?director
dbo:nationality
dbr:United Kingdom
f6
C2, C5
?film
dbo:director
?director
f2
C3, C4, C5, C6
?director
dbo:nationality
dbr:United States
f7
C2, C6
?film
dbo:director
?director
f2
C3, C4, C5, C6
Triples to
transfer
s1 s2
C2 27,462 0
C3 238,077 0
C4 0 141
C5 0 103
C6 0 1,026
8
Source Selection Problem with Fragment
Replication (SSP-FR)
Given a SPARQL query and a set of SPARQL
endpoints with replicated fragments, choose the
SPARQL endpoints to contact for each query triple
pattern in order to produce a complete query
answer and transfer the minimum amount of
data
9
Fedra performs a BGP aware source
selection, and exploits fragment localities
to reduce intermediate results
1. Fedra selects relevant fragments per triple
pattern and prunes fragments using query
containment.
2. Multiple relevant fragments → UNION
Reduction: try to reduce to one fragment.
3. One relevant fragment → BGP Reduction:
reduce to set covering problem to evaluate in
as few endpoints as possible.
10
BGP Reduction
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f1 { C1 }
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4}
f1 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y ?nat>
f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r >
f3 : <linkedmdb , ? movie owl : sameAs ? film >
f4 : <linkedmdb , ? movie linkedmdb : genre ? genre>
fragments mapping = {( f1 , {C1}) , ( f2 ,{C1 , C3}) ,
( f3 , {C1 , C2}) , ( f4 ,{C2 , C4})}
11
BGP Reduction
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f1 { C1 }
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4}
S = { tp1, tp2, tp3, tp4 }
CC1 = { tp1, tp2, tp3}
CC2 = { tp3, tp4}
CC3 = { tp2 }
CC4 = { tp4 }
11
BGP Reduction
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f1 { C1 }
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4}
S = { tp1, tp2, tp3, tp4 }
CC1 = { tp1, tp2, tp3}
CC2 = { tp3, tp4}
CC3 = { tp2 }
CC4 = { tp4 }
11
BGP Reduction
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f1 { C1 }
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4}
S = { tp1, tp2, tp3, tp4 }
CC1 = { tp1, tp2, tp3}
CC2 = { tp3, tp4}
CC3 = { tp2 }
CC4 = { tp4 }
11
BGP Reduction
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f1 { C1 }
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4}
S = { tp1, tp2, tp3, tp4 }
CC1 = { tp1, tp2, tp3}
CC2 = { tp3, tp4}
CC3 = { tp2 }
CC4 = { tp4 }
11
Union Reduction
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f5 {C2}
f6 {C1}
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2}
f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r >
f3 : <linkedmdb , ? movie owl : sameAs ? film >
f4 : <linkedmdb , ? movie linkedmdb : genre ? genre>
f5 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : France>
f6 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom>
fragments mapping = {( f2 , {C1 , C2 }) ,( f3 , {C1}) ,
( f4 , {C1 }) ,( f5 ,{ C2}) , ( f6 , {C1})} 12
Union Reduction
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f5 {C2}
f6 {C1}
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2}
S = { tp2, tp3, tp4 }
CC1 = { tp2, tp3}
CC2 = { tp3, tp4}
CC3 = { tp2 }
CC4 = { tp3 }
12
BGP Reduction
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f5 {C2}
f6 {C1}
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2}
S = { tp2, tp3, tp4 }
CC1 = { tp2, tp3}
CC2 = { tp3, tp4}
CC3 = { tp2 }
CC4 = { tp3 }
12
Fedra performs a BGP
aware source selection, and
exploits fragment localities
to reduce intermediate
results.
13
SPARQL Endpoints Federations with
Random Distribution of Fragments
LDFServer
Dataset
C1 C2
· · ·
C10
RandomQueryGenerator
14
SPARQL Endpoints Federations with
Random Distribution of Fragments
LDFServer
Dataset
C1 C2
· · ·
C10
RandomQueryGenerator
Diseasome
SWDF
LinkedMDB
GeoCoordinates
WatDiv (105 triples)
WatDiv (107 triples)
14
SPARQL Endpoints Federations with
Random Distribution of Fragments
LDFServer
Dataset
C1 C2
· · ·
C10
RandomQueryGenerator
q1
1, · · · , q1
100 q2
1, · · · , q2
100 q10
1 , · · · , q10
100
14
SPARQL Endpoints Federations with
Random Distribution of Fragments
LDFServer
Dataset
C1 C2
· · ·
C10
RandomQueryGenerator
q1
1, · · · , q1
100 q2
1, · · · , q2
100 q10
1 , · · · , q10
100
SELECT * WHERE {
?x1 rdfs : label ?x2 .
?x1 diseasome : geneId ?x3 .
?x1 diseasome : hgncId hgnc:5208
}
CONSTRUCT WHERE {
?x1 rdfs : label ?x2 }
CONSTRUCT WHERE {
?x1 diseasome : geneId ?x3 }
CONSTRUCT WHERE {
?x1 diseasome : hgncId hgnc:5208 }
14
SPARQL Endpoints Federations with
Random Distribution of Fragments
LDFServer
Dataset
C1 C2
· · ·
C10
RandomQueryGenerator
?x1 rdfs:label ?x2
?x1 diseasome:geneId ?x3
?x1 diseasome:hgncId hgnc:5208
q1
1, · · · , q1
100 q2
1, · · · , q2
100 q10
1 , · · · , q10
100
SELECT * WHERE {
?x1 rdfs : label ?x2 .
?x1 diseasome : geneId ?x3 .
?x1 diseasome : hgncId hgnc:5208
}
CONSTRUCT WHERE {
?x1 rdfs : label ?x2 }
CONSTRUCT WHERE {
?x1 diseasome : geneId ?x3 }
CONSTRUCT WHERE {
?x1 diseasome : hgncId hgnc:5208 }
14
SPARQL Endpoints Federations with
Random Distribution of Fragments
LDFServer
Dataset
C1 C2
· · ·
C10
RandomQueryGenerator
q1
1, · · · , q1
100 q2
1, · · · , q2
100 q10
1 , · · · , q10
100
Replication Factor = 3
14
Proxies are used to count the number of
transferred tuples
C1 C2
. . .
C10
Proxy Proxy
. . .
Proxy
Client
RandomQueryGenerator
15
Client evaluates random queries
C1 C2
. . .
C10
Proxy Proxy
. . .
Proxy
Client
RandomQueryGenerator
Fuseki 1.1.1
endpoints
qc
1, · · · , qc
100
15
Federated Query Engines are used to
perform query evaluation
C1 C2
. . .
C10
Proxy Proxy
. . .
Proxy
Client
RandomQueryGenerator
qc
1, · · · , qc
100
ANAPSID
FEDRA + ANAPSID
DAW + ANAPSID
FedX
FEDRA + FedX
DAW + FedX
15
Selecting less sources transfers less
redundant data
FEDRA should select less
sources than the engines
and DAW.
16
FEDRA uses known replicated fragments
to effectively reduce the number of
selected sources
q
qq
q
q
qq
q
qqqqqqq
q
q
q
q
q
q
qqq
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
0
10
20
30
40
Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100
NumberofSelectedSources
FEDRA+ANAPSID DAW+ANAPSID ANAPSID
17
Replicated fragments give FEDRA a
perfect summary of endpoints data
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
qqq
q
qq
q
qqq
q
qqq
q
q
q q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
0
10
20
30
40
Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100
NumberofSelectedSources
FEDRA+FedX DAW+FedX FedX
18
Number of Transferred Tuples matters
Using FEDRA for source
selection should reduce the
number of transferred tuples
during query evaluation.
19
FEDRA has delegated join evaluation to
the endpoints
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
qqqqqqq
qqqqqqq
100
102
104
106
Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100
NumberofTransferredTuples
FEDRA+ANAPSID DAW+ANAPSID ANAPSID
20
FEDRA achieves a great reduction on the
number of transferred tuples
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
100
102
104
106
Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100
NumberofTransferredTuples
FEDRA+FedX DAW+FedX FedX
21
FEDRA achieves a great reduction on the
number of transferred tuples
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
100
102
104
106
Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100
NumberofTransferredTuples
FEDRA+FedX DAW+FedX FedX
21
FEDRA achieves a great reduction on the
number of transferred tuples
q
qq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
qq
100
102
104
106
Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100
NumberofTransferredTuples
FEDRA+FedX DAW+FedX FedX
21
Conclusions
We addressed the problem of partial replication
in Linked Data.
Fedra performs a BGP aware source
selection, and exploits fragment localities to
reduce intermediate results.
Experimental results demonstrated that
Fedra achieves a great reduction of the
number of selected sources and the number of
transferred tuples by ANAPSID and FedX.
22
Perspectives
Take into account replicated fragments that
diverge.
Take into account preferences about the
endpoints.
Take advantage of replicated data for parallel
query processing.
23
Questions?
24
Results in the next slides are from a different setup
where Virtuoso 7.2.1 endpoints were used, and each
endpoint was deployed in a different cluster machine
25
ANAPSID Source Selection Time
qqqqqqqqqqq
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
q
q
qq
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
2
4
6
Diseasome GeoCoordinates LinkedMDB SWDF WatDiv1 WatDiv100
SourceSelectionTime(secs)
FEDRA+ANAPSID DAW+ANAPSID ANAPSID
26
FedX Source Selection Time
q
qqqqq
q
q
q
qqqqq
q
q
qqqq
q
qq
qq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
qqq
q
q
qq
1.0
1.5
2.0
Diseasome GeoCoordinates LinkedMDB SWDF WatDiv1 WatDiv100
SourceSelectionTime(secs)
FEDRA+FedX DAW+FedX FedX
27
ANAPSID Execution Time
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
qqq
q
q
q
qq
q
q
qqq
q
qq
q
qq
q
q
q
q
q
q
q
qqq
q
q
qq
q
q
q
q
qq
q
q
q
q
qqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
100
101
102
Diseasome GeoCoordinates LinkedMDB SWDF WatDiv1 WatDiv100
ExecutionTime(secs)
FEDRA+ANAPSID DAW+ANAPSID ANAPSID
28
FedX Execution Time
qq
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
qq
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
qqq
q
q
qq
qq
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
qqqqqq
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
101
102
103
Diseasome GeoCoordinates LinkedMDB SWDF WatDiv1 WatDiv100
ExecutionTime(secs)
FEDRA+FedX DAW+FedX FedX
29
FEDRA computes alternative sources per
fragment
DBpedia LinkedMDB
C1
?d dbo:nationality ?n
?f dbo:director ?d
?m owl:sameAs ?f
?m linkedmdb:genre ?g
f1 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y ? nat>
f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r >
f3 : <linkedmdb , ? movie owl : sameAs ? film >
f4 : <linkedmdb , ? movie linkedmdb : genre ? genre>
fragments mapping = {( f1 , {DBpedia , C1 }) , ( f2 ,{ DBpedia , C1 }) ,
( f3 , {LinkedMDB , C1 }) , ( f4 ,{ LinkedMDB , C1})}
30
Alternative Endpoints per Fragment are
Considered
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f1 { DBpedia, C1 }
tp2 ?film dbo:director ?director f2 { DBpedia, C1 }
tp3 ?movie owl:sameAs ?film f3 { LinkedMDB, C1}
tp4 ?movie linkedmdb:genre ?genre f4 { LinkedMDB, C1}
f1 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y ? nat>
f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r >
f3 : <linkedmdb , ? movie owl : sameAs ? film >
f4 : <linkedmdb , ? movie linkedmdb : genre ? genre>
fragments mapping = {( f1 , {DBpedia , C1 }) , ( f2 ,{ DBpedia , C1 }) ,
( f3 , {LinkedMDB , C1 }) , ( f4 ,{ LinkedMDB , C1})}
30
SSP is reduced to the set covering problem
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f1 { DBpedia, C1 }
tp2 ?film dbo:director ?director f2 { DBpedia, C1 }
tp3 ?movie owl:sameAs ?film f3 { LinkedMDB, C1}
tp4 ?movie linkedmdb:genre ?genre f4 { LinkedMDB, C1}
S = { tp1, tp2, tp3, tp4 }
CC1 = { tp1, tp2, tp3, tp4 }
CDBpedia = { tp1, tp2 }
CLinkedMDB = { tp3, tp4 }
30
Endpoints that can evaluate more joins
are chosen
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f1 { DBpedia, C1 }
tp2 ?film dbo:director ?director f2 { DBpedia, C1 }
tp3 ?movie owl:sameAs ?film f3 { LinkedMDB, C1}
tp4 ?movie linkedmdb:genre ?genre f4 { LinkedMDB, C1}
S = { tp1, tp2, tp3, tp4 }
CC1 = { tp1, tp2, tp3, tp4 }
CDBpedia = { tp1, tp2 }
CLinkedMDB = { tp3, tp4 }
30
It may be necessary to simplify to get the
best selection
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f5 {C1, C2 }
f6 {C1}
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2}
f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r >
f3 : <linkedmdb , ? movie owl : sameAs ? film >
f4 : <linkedmdb , ? movie linkedmdb : genre ? genre>
f5 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : France>
f6 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom>
fragments mapping = {( f2 , {C1 , C2 }) ,( f3 , {C1 }) ,
( f4 , {C1 }) ,( f5 ,{ C1 , C2 }) , ( f6 , {C1})}
31
It may be necessary to simplify to get the
best selection
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f5 {C1, C2 }
f6 {C1}
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2}
S = { tp1, tp2, tp3, tp4 }
CC1 = { tp1, tp2, tp3}
CC2 = { tp3, tp4}
CC3 = { tp2 }
CC4 = { tp3 }
31
It may be necessary to simplify to get the
best selection
BGP Triple Pattern Relevant Relevant
Fragments Endpoints
tp1 ?director dbo:nationality ?nat f5 {C1, C2 }
f6 {C1}
tp2 ?film dbo:director ?director f2 { C1, C3 }
tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 }
tp4 ?movie linkedmdb:genre ?genre f4 { C2}
S = { tp1, tp2, tp3, tp4 }
CC1 = { tp1, tp2, tp3}
CC2 = { tp3, tp4}
CC3 = { tp2 }
CC4 = { tp3 }
31
Statistical Significance of Data Redundancy Minimization
H0: Fedra selects the same number of sources as DAW does
Ha: Fedra selects less sources than DAW
Federation p-value
ANAPSID FedX
Diseasome 1.811e-08 8.371e-09
SWDF 2.28e-10 5.386e-11
LinkedMDB 5.082e-09 5.254e-11
Geocoordinates 1.301e-05 1.301e-05
WatDiv1 6.209e-07 1.006e-07
WatDiv100 1.563e-05 3.623e-07
For all the federations and engines, the obtained p-values2
allow to discard the null hypothesis (H0) in favor of the
alternative hypothesis (Ha).
2
The Wilcoxon signed rank test was computed using R
32
Statistical Significance of Data Transfer Minimization
H0: using sources selected by Fedra leads to transfer the same
number of tuples as using sources selected by DAW
Ha: using sources selected by Fedra leads to transfer less
tuples than using sources selected by DAW
Federation p-value
ANAPSID FedX
Diseasome 3.314e-12 2.821e-06
SWDF 1.472e-08 0.7621
LinkedMDB 2.368e-08 0.001274
Geocoordinates 1.921e-05 1.183e-06
WatDiv1 8.431e-05 7.246e-09
WatDiv100 9.986e-06 0.0001301
For all the federations and engines except SWDF+FedX, the
obtained p-values3
allow to discard the null hypothesis (H0) in
favor of the alternative hypothesis (Ha).
3
The Wilcoxon signed rank test was computed using R
33
Statistical Significance of Source Selection Time Reduction
H0: using sources selected by Fedra leads to the same source
selection time as using sources selected by DAW
Ha: using sources selected by Fedra leads to lower source
selection time than using sources selected by DAW
Federation p-value
ANAPSID FedX
Diseasome 1 < 2.2e-16
SWDF 1 < 2.2e-16
LinkedMDB 1.284e-11 < 2.2e-16
Geocoordinates < 2.2e-16 < 2.2e-16
WatDiv1 1 < 2.2e-16
WatDiv100 < 2.2e-16 < 2.2e-16
For all the federations and engines except Diseasome+ANAPSID, SWDF+ANAPSID
and WatDiv1+ANAPSID, the obtained p-values4 allow to discard the null hypothesis
(H0) in favor of the alternative hypothesis (Ha).
4
The Wilcoxon signed rank test was computed using R
34
Statistical Significance of Execution Time Reduction
H0: using sources selected by Fedra leads to the same
execution time as using sources selected by DAW
Ha: using sources selected by Fedra leads to lower execution
time than using sources selected by DAW
Federation p-value
ANAPSID FedX
Diseasome 0.0001547 < 2.2e-16
SWDF 1 6.794e-06
LinkedMDB < 2.2e-16 9.223e-15
Geocoordinates < 2.2e-16 7.87e-13
WatDiv1 1 6.315e-16
WatDiv100 5.392e-09 1.384e-14
For all the federations and engines except SWDF+ANAPSID and
WatDiv1+ANAPSID, the obtained p-values5 allow to discard the null hypothesis (H0)
in favor of the alternative hypothesis (Ha).
5
The Wilcoxon signed rank test was computed using R
35
Source Selection may not be enough
?director dbo:nationality ?nat ?film dbo:director ?director
f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r >
f5 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : France>
f6 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom>
fragments mapping = {( f2 , {C1 , C2}) , ( f5 ,{ C1}) ,
( f6 , {C2})} 36
Source Selection may not be enough
?director dbo:nationality ?nat
f5
?director dbo:nationality ?nat
f6
?film dbo:director ?director
f2
f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r >
f5 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : France>
f6 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom>
fragments mapping = {( f2 , {C1 , C2}) , ( f5 ,{ C1}) ,
( f6 , {C2})} 36
Source Selection may not be enough
?director dbo:nationality ?nat
f5
{C1}
?director dbo:nationality ?nat
f6
{C2}
?film dbo:director ?director
f2
{C1, C2}
f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r >
f5 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : France>
f6 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom>
fragments mapping = {( f2 , {C1 , C2}) , ( f5 ,{ C1}) ,
( f6 , {C2})} 36
Source Selection may not be enough
?director dbo:nationality ?nat
f5
{ C1}
?director dbo:nationality ?nat
f6
{ C2}
?film dbo:director ?director
f2
{C1, C2}
f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r >
f5 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : France>
f6 : <dbpedia ,
? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom>
fragments mapping = {( f2 , {C1 , C2}) , ( f5 ,{ C1}) ,
( f6 , {C2})} 36

More Related Content

What's hot

2009 11 06 3gpp Ietf Ipv6 Shanghai Nat64
2009 11 06 3gpp Ietf Ipv6 Shanghai Nat642009 11 06 3gpp Ietf Ipv6 Shanghai Nat64
2009 11 06 3gpp Ietf Ipv6 Shanghai Nat64yacc2000
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleDirk Roorda
 
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...Dr.-Ing. Thomas Hartmann
 
27.2.10 lab extract an executable from a pcap
27.2.10 lab   extract an executable from a pcap27.2.10 lab   extract an executable from a pcap
27.2.10 lab extract an executable from a pcapFreddy Buenaño
 
Tutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow AbstractorTutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow AbstractorYuuki Takano
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
27.1.5 lab convert data into a universal format
27.1.5 lab   convert data into a universal format27.1.5 lab   convert data into a universal format
27.1.5 lab convert data into a universal formatFreddy Buenaño
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)Yuuki Takano
 
On unifying query languages for RDF streams
On unifying query languages for RDF streamsOn unifying query languages for RDF streams
On unifying query languages for RDF streamsDaniele Dell'Aglio
 
Dynamic Load-balancing On Graphics Processors
Dynamic Load-balancing On Graphics ProcessorsDynamic Load-balancing On Graphics Processors
Dynamic Load-balancing On Graphics Processorsdaced
 
Kernel Recipes 2013 - Nftables, what motivations and what solutions
Kernel Recipes 2013 - Nftables, what motivations and what solutionsKernel Recipes 2013 - Nftables, what motivations and what solutions
Kernel Recipes 2013 - Nftables, what motivations and what solutionsAnne Nicolas
 
Analysis of BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)
Analysis of BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)Analysis of BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)
Analysis of BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)Wesley De Neve
 
High-Performance Physics Solver Design for Next Generation Consoles
High-Performance Physics Solver Design for Next Generation ConsolesHigh-Performance Physics Solver Design for Next Generation Consoles
High-Performance Physics Solver Design for Next Generation ConsolesSlide_N
 

What's hot (17)

2009 11 06 3gpp Ietf Ipv6 Shanghai Nat64
2009 11 06 3gpp Ietf Ipv6 Shanghai Nat642009 11 06 3gpp Ietf Ipv6 Shanghai Nat64
2009 11 06 3gpp Ietf Ipv6 Shanghai Nat64
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
defense
defensedefense
defense
 
Serialization in Go
Serialization in GoSerialization in Go
Serialization in Go
 
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
2014.10 - Towards Description Set Profiles for RDF Using SPARQL as Intermedia...
 
27.2.10 lab extract an executable from a pcap
27.2.10 lab   extract an executable from a pcap27.2.10 lab   extract an executable from a pcap
27.2.10 lab extract an executable from a pcap
 
Sockets and Socket-Buffer
Sockets and Socket-BufferSockets and Socket-Buffer
Sockets and Socket-Buffer
 
Tutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow AbstractorTutorial of SF-TAP Flow Abstractor
Tutorial of SF-TAP Flow Abstractor
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
27.1.5 lab convert data into a universal format
27.1.5 lab   convert data into a universal format27.1.5 lab   convert data into a universal format
27.1.5 lab convert data into a universal format
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
On unifying query languages for RDF streams
On unifying query languages for RDF streamsOn unifying query languages for RDF streams
On unifying query languages for RDF streams
 
Dynamic Load-balancing On Graphics Processors
Dynamic Load-balancing On Graphics ProcessorsDynamic Load-balancing On Graphics Processors
Dynamic Load-balancing On Graphics Processors
 
Kernel Recipes 2013 - Nftables, what motivations and what solutions
Kernel Recipes 2013 - Nftables, what motivations and what solutionsKernel Recipes 2013 - Nftables, what motivations and what solutions
Kernel Recipes 2013 - Nftables, what motivations and what solutions
 
Analysis of BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)
Analysis of BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)Analysis of BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)
Analysis of BSDL-based content adaptation for JPEG 2000 and HD Photo (JPEG XR)
 
High-Performance Physics Solver Design for Next Generation Consoles
High-Performance Physics Solver Design for Next Generation ConsolesHigh-Performance Physics Solver Design for Next Generation Consoles
High-Performance Physics Solver Design for Next Generation Consoles
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 

Viewers also liked

SEO Horror Stories
SEO Horror StoriesSEO Horror Stories
SEO Horror Storiespointit
 
semlavssws2015
semlavssws2015semlavssws2015
semlavssws2015hala Skaf
 
Presentacion 5 medidas de seguridad para tus servidores dedicados
Presentacion 5 medidas de seguridad para tus servidores dedicadosPresentacion 5 medidas de seguridad para tus servidores dedicados
Presentacion 5 medidas de seguridad para tus servidores dedicadosservidoresdedic
 
Transparencia y Comunicación en el Parlamento.
Transparencia y Comunicación en el Parlamento.Transparencia y Comunicación en el Parlamento.
Transparencia y Comunicación en el Parlamento.Rafa Rubio
 
Parlamentos abiertos. nuevas tendencias en la comunicacion parlamentaria
Parlamentos abiertos. nuevas tendencias en la comunicacion parlamentariaParlamentos abiertos. nuevas tendencias en la comunicacion parlamentaria
Parlamentos abiertos. nuevas tendencias en la comunicacion parlamentariaRafa Rubio
 
Administración Pública y Linked Data: oportunidades y riesgos
Administración Pública y Linked Data: oportunidades y riesgosAdministración Pública y Linked Data: oportunidades y riesgos
Administración Pública y Linked Data: oportunidades y riesgosOscar Corcho
 
Feta: Federated QuEry TrAcking for Linked Data
Feta: Federated QuEry TrAcking for Linked DataFeta: Federated QuEry TrAcking for Linked Data
Feta: Federated QuEry TrAcking for Linked Dataserrano-p
 
Policies Composition Based on Data Usage Context
Policies Composition Based on Data Usage ContextPolicies Composition Based on Data Usage Context
Policies Composition Based on Data Usage Contextserrano-p
 

Viewers also liked (9)

SEO Horror Stories
SEO Horror StoriesSEO Horror Stories
SEO Horror Stories
 
semlavssws2015
semlavssws2015semlavssws2015
semlavssws2015
 
Presentacion 5 medidas de seguridad para tus servidores dedicados
Presentacion 5 medidas de seguridad para tus servidores dedicadosPresentacion 5 medidas de seguridad para tus servidores dedicados
Presentacion 5 medidas de seguridad para tus servidores dedicados
 
Transparencia y Comunicación en el Parlamento.
Transparencia y Comunicación en el Parlamento.Transparencia y Comunicación en el Parlamento.
Transparencia y Comunicación en el Parlamento.
 
Parlamentos abiertos. nuevas tendencias en la comunicacion parlamentaria
Parlamentos abiertos. nuevas tendencias en la comunicacion parlamentariaParlamentos abiertos. nuevas tendencias en la comunicacion parlamentaria
Parlamentos abiertos. nuevas tendencias en la comunicacion parlamentaria
 
Administración Pública y Linked Data: oportunidades y riesgos
Administración Pública y Linked Data: oportunidades y riesgosAdministración Pública y Linked Data: oportunidades y riesgos
Administración Pública y Linked Data: oportunidades y riesgos
 
Wise09
Wise09Wise09
Wise09
 
Feta: Federated QuEry TrAcking for Linked Data
Feta: Federated QuEry TrAcking for Linked DataFeta: Federated QuEry TrAcking for Linked Data
Feta: Federated QuEry TrAcking for Linked Data
 
Policies Composition Based on Data Usage Context
Policies Composition Based on Data Usage ContextPolicies Composition Based on Data Usage Context
Policies Composition Based on Data Usage Context
 

Similar to Federated SPARQL Query Processing With Replicated Fragment

PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...Thomas Minier
 
Relations between archive formats
Relations between archive formatsRelations between archive formats
Relations between archive formatsAnge Albertini
 
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]RootedCON
 
Fun with PRB, VRFs and NetNS on Linux - What is it, how does it work, what ca...
Fun with PRB, VRFs and NetNS on Linux - What is it, how does it work, what ca...Fun with PRB, VRFs and NetNS on Linux - What is it, how does it work, what ca...
Fun with PRB, VRFs and NetNS on Linux - What is it, how does it work, what ca...Maximilan Wilhelm
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86Droidcon Berlin
 
Linked Library Data in the wild
Linked Library Data in the wildLinked Library Data in the wild
Linked Library Data in the wildPhil John
 
Combinational logic circuits
Combinational logic circuitsCombinational logic circuits
Combinational logic circuitsAswiniT3
 
Picobgp - A simple deamon for routing advertising
Picobgp - A simple deamon for routing advertisingPicobgp - A simple deamon for routing advertising
Picobgp - A simple deamon for routing advertisingClaudio Mignanti
 
Efficient JIT to 32-bit Arches
Efficient JIT to 32-bit ArchesEfficient JIT to 32-bit Arches
Efficient JIT to 32-bit ArchesNetronome
 
Kernel Recipes 2013 - Overview display in the Linux kernel
Kernel Recipes 2013 - Overview display in the Linux kernelKernel Recipes 2013 - Overview display in the Linux kernel
Kernel Recipes 2013 - Overview display in the Linux kernelAnne Nicolas
 
Map-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingMap-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingAlexander Schätzle
 
Using FME for Interoperability between GIS and non-GIS Systems
Using FME for Interoperability between GIS and non-GIS SystemsUsing FME for Interoperability between GIS and non-GIS Systems
Using FME for Interoperability between GIS and non-GIS SystemsSafe Software
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming languageAshwini Mathur
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
 

Similar to Federated SPARQL Query Processing With Replicated Fragment (20)

PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
 
GCC
GCCGCC
GCC
 
Relations between archive formats
Relations between archive formatsRelations between archive formats
Relations between archive formats
 
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
 
Fun with PRB, VRFs and NetNS on Linux - What is it, how does it work, what ca...
Fun with PRB, VRFs and NetNS on Linux - What is it, how does it work, what ca...Fun with PRB, VRFs and NetNS on Linux - What is it, how does it work, what ca...
Fun with PRB, VRFs and NetNS on Linux - What is it, how does it work, what ca...
 
Testable Code
Testable CodeTestable Code
Testable Code
 
Next Stop, Android
Next Stop, AndroidNext Stop, Android
Next Stop, Android
 
new_age_graphics_android_x86
new_age_graphics_android_x86new_age_graphics_android_x86
new_age_graphics_android_x86
 
I pv6 eigrp
I pv6 eigrpI pv6 eigrp
I pv6 eigrp
 
IPv6 EIGRP
IPv6 EIGRPIPv6 EIGRP
IPv6 EIGRP
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010
 
Linked Library Data in the wild
Linked Library Data in the wildLinked Library Data in the wild
Linked Library Data in the wild
 
Combinational logic circuits
Combinational logic circuitsCombinational logic circuits
Combinational logic circuits
 
Picobgp - A simple deamon for routing advertising
Picobgp - A simple deamon for routing advertisingPicobgp - A simple deamon for routing advertising
Picobgp - A simple deamon for routing advertising
 
Efficient JIT to 32-bit Arches
Efficient JIT to 32-bit ArchesEfficient JIT to 32-bit Arches
Efficient JIT to 32-bit Arches
 
Kernel Recipes 2013 - Overview display in the Linux kernel
Kernel Recipes 2013 - Overview display in the Linux kernelKernel Recipes 2013 - Overview display in the Linux kernel
Kernel Recipes 2013 - Overview display in the Linux kernel
 
Map-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingMap-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP Processing
 
Using FME for Interoperability between GIS and non-GIS Systems
Using FME for Interoperability between GIS and non-GIS SystemsUsing FME for Interoperability between GIS and non-GIS Systems
Using FME for Interoperability between GIS and non-GIS Systems
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming language
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 

Recently uploaded

《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 

Recently uploaded (20)

《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 

Federated SPARQL Query Processing With Replicated Fragment

  • 1. Federated SPARQL Queries Processing with Replicated Fragments Gabriela Montoya1 Hala Skaf-Molli1 Pascal Molli1 Maria-Esther Vidal2 1LINA – Nantes University, France ({first.last}@univ-nantes.fr) 2Universidad Sim´on Bol´ıvar, Venezuela (mvidal@ldc.usb.ve) October 13, 2015 ISWC2015
  • 2. Federated Query Engines poorly support replication Federated query engines allow to consume linked data without moving data. Unfortunately, in presence of replication, the performance of federated query engines is degraded. 2
  • 3. Replicated data decreases federated query engines performance select distinct ?p ?m ?n ?d where { ?p dbprop : name ?m . ?p dbprop : n a t i o n a l i t y ?n . ?p dbprop : d o c t o r a l A d v i s o r ?d } #DBpedia Replicas FedX1 Execution Time (ms) 1 1,392 2 215,907 1 Schwarte et al. Fedx: Optimization techniques for federated query processing on linked data. In ISWC2011 3
  • 4. Users may replicate only the fragments relevant for their queries A triple pattern fragment is defined by the dataset it has been replicated from and a CONSTRUCT query with a triple pattern. Fragment with the doctoral advisors triples: <http://dbpedia.org/sparql, CONSTRUCT WHERE { ?p dbprop:doctoralAdvisor ?a }> Replicating fragments from different datasets provides new data localities and opens new opportunities for optimization. 4
  • 5. Existing public endpoints may not be the best choice for federated queries DBpedia LinkedMDB C1 select distinct * where { ?director dbo : nationality ?nat . ?film dbo : director ?director . ?movie owl : sameAs ?film . ?movie linkedmdb : genre ?genre } client 5
  • 6. Endpoints that replicate fragments give place to new data localities DBpedia LinkedMDB C1 select distinct * where { ?director dbo : nationality ?nat . ?film dbo : director ?director . ?movie owl : sameAs ?film . ?movie linkedmdb : genre ?genre } client ?d dbo:nationality ?n ?f dbo:director ?d ?m owl:sameAs ?f ?m linkedmdb:genre ?g 5
  • 7. Selecting all the sources leads to poor engine performance DBpedia LinkedMDB C1 select distinct * where { ?director dbo : nationality ?nat . ?film dbo : director ?director . ?movie owl : sameAs ?film . ?movie linkedmdb : genre ?genre } client ?d dbo:nationality ?n ?f dbo:director ?d ?m owl:sameAs ?f ?m linkedmdb:genre ?g Triples to transfer s1 s2 s3 s4 s5 DBpedia 166,177 3,229 3,229 0 0 LinkedMDB 76,180 13,430 0 13,430 0 C1 242,357 0 13,430 3,229 48 Execution Time (s) 20.22 2.64 2.50 2.79 0.65 5
  • 8. Selecting non-overlapping data may not be good enough DBpedia LinkedMDB C1 select distinct * where { ?director dbo : nationality ?nat . ?film dbo : director ?director . ?movie owl : sameAs ?film . ?movie linkedmdb : genre ?genre } client ?d dbo:nationality ?n ?f dbo:director ?d ?m owl:sameAs ?f ?m linkedmdb:genre ?g Triples to transfer s1 s2 s3 s4 s5 DBpedia 166,177 3,229 3,229 0 0 LinkedMDB 76,180 13,430 0 13,430 0 C1 242,357 0 13,430 3,229 48 Execution Time (s) 20.22 2.64 2.50 2.79 0.65 5
  • 9. Selecting non-overlapping data may not be good enough DBpedia LinkedMDB C1 select distinct * where { ?director dbo : nationality ?nat . ?film dbo : director ?director . ?movie owl : sameAs ?film . ?movie linkedmdb : genre ?genre } client ?d dbo:nationality ?n ?f dbo:director ?d ?m owl:sameAs ?f ?m linkedmdb:genre ?g Triples to transfer s1 s2 s3 s4 s5 DBpedia 166,177 3,229 3,229 0 0 LinkedMDB 76,180 13,430 0 13,430 0 C1 242,357 0 13,430 3,229 48 Execution Time (s) 20.22 2.64 2.50 2.79 0.65 5
  • 10. Selecting non-overlapping data may not be good enough DBpedia LinkedMDB C1 select distinct * where { ?director dbo : nationality ?nat . ?film dbo : director ?director . ?movie owl : sameAs ?film . ?movie linkedmdb : genre ?genre } client ?d dbo:nationality ?n ?f dbo:director ?d ?m owl:sameAs ?f ?m linkedmdb:genre ?g Triples to transfer s1 s2 s3 s4 s5 DBpedia 166,177 3,229 3,229 0 0 LinkedMDB 76,180 13,430 0 13,430 0 C1 242,357 0 13,430 3,229 48 Execution Time (s) 20.22 2.64 2.50 2.79 0.65 5
  • 11. Selecting sources able of evaluating joins reduces the number of transferred tuples DBpedia LinkedMDB C1 select distinct * where { ?director dbo : nationality ?nat . ?film dbo : director ?director . ?movie owl : sameAs ?film . ?movie linkedmdb : genre ?genre } client ?d dbo:nationality ?n ?f dbo:director ?d ?m owl:sameAs ?f ?m linkedmdb:genre ?g Triples to transfer s1 s2 s3 s4 s5 DBpedia 166,177 3,229 3,229 0 0 LinkedMDB 76,180 13,430 0 13,430 0 C1 242,357 0 13,430 3,229 48 Execution Time (s) 20.22 2.64 2.50 2.79 0.65 5
  • 12. The best choice transfers less intermediate results DBpedia LinkedMDB C1 select distinct * where { ?director dbo : nationality ?nat . ?film dbo : director ?director . ?movie owl : sameAs ?film . ?movie linkedmdb : genre ?genre } client Triples to transfer s1 s2 s3 s4 s5 DBpedia 166,177 3,229 3,229 0 0 LinkedMDB 76,180 13,430 0 13,430 0 C1 242,357 0 13,430 3,229 48 Execution Time (s) 20.22 2.64 2.50 2.79 0.65 5
  • 13. ??? DBpedia LinkedMDB C1 C2 C3 select distinct ?director ?nat ?genre where { ?director dbo : nationality ?nat . (tp1) ?film dbo : director ?director . (tp2) ?movie owl : sameAs ?film . (tp3) ?movie linkedmdb : genre ?genre } (tp4) f 2, f 6 f 4 f 2, f 7 f 3, f 5 f 3, f 4 f 2 tp1, tp2, tp4 tp1, tp2, tp3, tp4 tp2, tp3, tp4 F CONSTRUCT WHERE { %s% } f2 ?film dbo:director ?director f3 ?movie owl:sameAs ?film f4 ?movie linkedmdb:genre ?genre f5 ?movie linkedmdb:genre film genre:14 f6 ?director dbo:nationality dbr:France f7 ?director dbo:nationality dbr:United Kingdom 6
  • 14. Selecting less endpoints does not always produce less intermediate results ?director dbo:nationality dbr:France f5 C2, C4 ?film dbo:director ?director f2 C3, C4, C5, C6 ?director dbo:nationality dbr:United Kingdom f6 C2, C5 ?film dbo:director ?director f2 C3, C4, C5, C6 ?director dbo:nationality dbr:United States f7 C2, C6 ?film dbo:director ?director f2 C3, C4, C5, C6 7
  • 15. Triple pattern wise source selection misses data localities ?director dbo:nationality dbr:France f5 C2, C4 ?film dbo:director ?director f2 C3, C4, C5, C6 ?director dbo:nationality dbr:United Kingdom f6 C2, C5 ?film dbo:director ?director f2 C3, C4, C5, C6 ?director dbo:nationality dbr:United States f7 C2, C6 ?film dbo:director ?director f2 C3, C4, C5, C6 Triples to transfer s1 s2 C2 27,462 0 C3 238,077 0 C4 0 141 C5 0 103 C6 0 1,026 7
  • 16. Selecting endpoints in a BGP wise fashion reduces the intermediate results ?director dbo:nationality dbr:France f5 C2, C4 ?film dbo:director ?director f2 C3, C4, C5, C6 ?director dbo:nationality dbr:United Kingdom f6 C2, C5 ?film dbo:director ?director f2 C3, C4, C5, C6 ?director dbo:nationality dbr:United States f7 C2, C6 ?film dbo:director ?director f2 C3, C4, C5, C6 Triples to transfer s1 s2 C2 27,462 0 C3 238,077 0 C4 0 141 C5 0 103 C6 0 1,026 8
  • 17. Source Selection Problem with Fragment Replication (SSP-FR) Given a SPARQL query and a set of SPARQL endpoints with replicated fragments, choose the SPARQL endpoints to contact for each query triple pattern in order to produce a complete query answer and transfer the minimum amount of data 9
  • 18. Fedra performs a BGP aware source selection, and exploits fragment localities to reduce intermediate results 1. Fedra selects relevant fragments per triple pattern and prunes fragments using query containment. 2. Multiple relevant fragments → UNION Reduction: try to reduce to one fragment. 3. One relevant fragment → BGP Reduction: reduce to set covering problem to evaluate in as few endpoints as possible. 10
  • 19. BGP Reduction BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f1 { C1 } tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2 } tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4} f1 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y ?nat> f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r > f3 : <linkedmdb , ? movie owl : sameAs ? film > f4 : <linkedmdb , ? movie linkedmdb : genre ? genre> fragments mapping = {( f1 , {C1}) , ( f2 ,{C1 , C3}) , ( f3 , {C1 , C2}) , ( f4 ,{C2 , C4})} 11
  • 20. BGP Reduction BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f1 { C1 } tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2 } tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4} S = { tp1, tp2, tp3, tp4 } CC1 = { tp1, tp2, tp3} CC2 = { tp3, tp4} CC3 = { tp2 } CC4 = { tp4 } 11
  • 21. BGP Reduction BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f1 { C1 } tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2 } tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4} S = { tp1, tp2, tp3, tp4 } CC1 = { tp1, tp2, tp3} CC2 = { tp3, tp4} CC3 = { tp2 } CC4 = { tp4 } 11
  • 22. BGP Reduction BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f1 { C1 } tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2 } tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4} S = { tp1, tp2, tp3, tp4 } CC1 = { tp1, tp2, tp3} CC2 = { tp3, tp4} CC3 = { tp2 } CC4 = { tp4 } 11
  • 23. BGP Reduction BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f1 { C1 } tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2 } tp4 ?movie linkedmdb:genre ?genre f4 { C2, C4} S = { tp1, tp2, tp3, tp4 } CC1 = { tp1, tp2, tp3} CC2 = { tp3, tp4} CC3 = { tp2 } CC4 = { tp4 } 11
  • 24. Union Reduction BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f5 {C2} f6 {C1} tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 } tp4 ?movie linkedmdb:genre ?genre f4 { C2} f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r > f3 : <linkedmdb , ? movie owl : sameAs ? film > f4 : <linkedmdb , ? movie linkedmdb : genre ? genre> f5 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : France> f6 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom> fragments mapping = {( f2 , {C1 , C2 }) ,( f3 , {C1}) , ( f4 , {C1 }) ,( f5 ,{ C2}) , ( f6 , {C1})} 12
  • 25. Union Reduction BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f5 {C2} f6 {C1} tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 } tp4 ?movie linkedmdb:genre ?genre f4 { C2} S = { tp2, tp3, tp4 } CC1 = { tp2, tp3} CC2 = { tp3, tp4} CC3 = { tp2 } CC4 = { tp3 } 12
  • 26. BGP Reduction BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f5 {C2} f6 {C1} tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 } tp4 ?movie linkedmdb:genre ?genre f4 { C2} S = { tp2, tp3, tp4 } CC1 = { tp2, tp3} CC2 = { tp3, tp4} CC3 = { tp2 } CC4 = { tp3 } 12
  • 27. Fedra performs a BGP aware source selection, and exploits fragment localities to reduce intermediate results. 13
  • 28. SPARQL Endpoints Federations with Random Distribution of Fragments LDFServer Dataset C1 C2 · · · C10 RandomQueryGenerator 14
  • 29. SPARQL Endpoints Federations with Random Distribution of Fragments LDFServer Dataset C1 C2 · · · C10 RandomQueryGenerator Diseasome SWDF LinkedMDB GeoCoordinates WatDiv (105 triples) WatDiv (107 triples) 14
  • 30. SPARQL Endpoints Federations with Random Distribution of Fragments LDFServer Dataset C1 C2 · · · C10 RandomQueryGenerator q1 1, · · · , q1 100 q2 1, · · · , q2 100 q10 1 , · · · , q10 100 14
  • 31. SPARQL Endpoints Federations with Random Distribution of Fragments LDFServer Dataset C1 C2 · · · C10 RandomQueryGenerator q1 1, · · · , q1 100 q2 1, · · · , q2 100 q10 1 , · · · , q10 100 SELECT * WHERE { ?x1 rdfs : label ?x2 . ?x1 diseasome : geneId ?x3 . ?x1 diseasome : hgncId hgnc:5208 } CONSTRUCT WHERE { ?x1 rdfs : label ?x2 } CONSTRUCT WHERE { ?x1 diseasome : geneId ?x3 } CONSTRUCT WHERE { ?x1 diseasome : hgncId hgnc:5208 } 14
  • 32. SPARQL Endpoints Federations with Random Distribution of Fragments LDFServer Dataset C1 C2 · · · C10 RandomQueryGenerator ?x1 rdfs:label ?x2 ?x1 diseasome:geneId ?x3 ?x1 diseasome:hgncId hgnc:5208 q1 1, · · · , q1 100 q2 1, · · · , q2 100 q10 1 , · · · , q10 100 SELECT * WHERE { ?x1 rdfs : label ?x2 . ?x1 diseasome : geneId ?x3 . ?x1 diseasome : hgncId hgnc:5208 } CONSTRUCT WHERE { ?x1 rdfs : label ?x2 } CONSTRUCT WHERE { ?x1 diseasome : geneId ?x3 } CONSTRUCT WHERE { ?x1 diseasome : hgncId hgnc:5208 } 14
  • 33. SPARQL Endpoints Federations with Random Distribution of Fragments LDFServer Dataset C1 C2 · · · C10 RandomQueryGenerator q1 1, · · · , q1 100 q2 1, · · · , q2 100 q10 1 , · · · , q10 100 Replication Factor = 3 14
  • 34. Proxies are used to count the number of transferred tuples C1 C2 . . . C10 Proxy Proxy . . . Proxy Client RandomQueryGenerator 15
  • 35. Client evaluates random queries C1 C2 . . . C10 Proxy Proxy . . . Proxy Client RandomQueryGenerator Fuseki 1.1.1 endpoints qc 1, · · · , qc 100 15
  • 36. Federated Query Engines are used to perform query evaluation C1 C2 . . . C10 Proxy Proxy . . . Proxy Client RandomQueryGenerator qc 1, · · · , qc 100 ANAPSID FEDRA + ANAPSID DAW + ANAPSID FedX FEDRA + FedX DAW + FedX 15
  • 37. Selecting less sources transfers less redundant data FEDRA should select less sources than the engines and DAW. 16
  • 38. FEDRA uses known replicated fragments to effectively reduce the number of selected sources q qq q q qq q qqqqqqq q q q q q q qqq q qq q q q qq q q q q q q q q q q q q q qqq q q 0 10 20 30 40 Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100 NumberofSelectedSources FEDRA+ANAPSID DAW+ANAPSID ANAPSID 17
  • 39. Replicated fragments give FEDRA a perfect summary of endpoints data q q q q q q q q q q qqq q q q qq q q q q qqq q qq q qqq q qqq q q q q q q q qq q q q qq q q q q q q q q qq q q q q 0 10 20 30 40 Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100 NumberofSelectedSources FEDRA+FedX DAW+FedX FedX 18
  • 40. Number of Transferred Tuples matters Using FEDRA for source selection should reduce the number of transferred tuples during query evaluation. 19
  • 41. FEDRA has delegated join evaluation to the endpoints q q q q qq q q q q q q qq q q q q q q q q qq q q q q q q q q qq q q q q qq q q qqqqqqq qqqqqqq 100 102 104 106 Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100 NumberofTransferredTuples FEDRA+ANAPSID DAW+ANAPSID ANAPSID 20
  • 42. FEDRA achieves a great reduction on the number of transferred tuples q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qqqq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq 100 102 104 106 Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100 NumberofTransferredTuples FEDRA+FedX DAW+FedX FedX 21
  • 43. FEDRA achieves a great reduction on the number of transferred tuples q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qqqq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq 100 102 104 106 Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100 NumberofTransferredTuples FEDRA+FedX DAW+FedX FedX 21
  • 44. FEDRA achieves a great reduction on the number of transferred tuples q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qqqq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq 100 102 104 106 Diseasome Geocoordinates LinkedMDB SWDF WatDiv1 WatDiv100 NumberofTransferredTuples FEDRA+FedX DAW+FedX FedX 21
  • 45. Conclusions We addressed the problem of partial replication in Linked Data. Fedra performs a BGP aware source selection, and exploits fragment localities to reduce intermediate results. Experimental results demonstrated that Fedra achieves a great reduction of the number of selected sources and the number of transferred tuples by ANAPSID and FedX. 22
  • 46. Perspectives Take into account replicated fragments that diverge. Take into account preferences about the endpoints. Take advantage of replicated data for parallel query processing. 23
  • 48. Results in the next slides are from a different setup where Virtuoso 7.2.1 endpoints were used, and each endpoint was deployed in a different cluster machine 25
  • 49. ANAPSID Source Selection Time qqqqqqqqqqq q qqq q q q q q q q q q q q q q q q q q q q q q q q qqqqqqqqqqqqqqqqqqqq q q q qq q q qq q q q q q q q q qq q qqq q q q q q q q q q q q q q q q qq 2 4 6 Diseasome GeoCoordinates LinkedMDB SWDF WatDiv1 WatDiv100 SourceSelectionTime(secs) FEDRA+ANAPSID DAW+ANAPSID ANAPSID 26
  • 50. FedX Source Selection Time q qqqqq q q q qqqqq q q qqqq q qq qq q qq qq q q q q q q q q q q q qq q qqq q q qq 1.0 1.5 2.0 Diseasome GeoCoordinates LinkedMDB SWDF WatDiv1 WatDiv100 SourceSelectionTime(secs) FEDRA+FedX DAW+FedX FedX 27
  • 52. FedX Execution Time qq q qq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q qq qqq q q q q q q q q q q q q q qqq q qqq q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qqqqqq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 101 102 103 Diseasome GeoCoordinates LinkedMDB SWDF WatDiv1 WatDiv100 ExecutionTime(secs) FEDRA+FedX DAW+FedX FedX 29
  • 53. FEDRA computes alternative sources per fragment DBpedia LinkedMDB C1 ?d dbo:nationality ?n ?f dbo:director ?d ?m owl:sameAs ?f ?m linkedmdb:genre ?g f1 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y ? nat> f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r > f3 : <linkedmdb , ? movie owl : sameAs ? film > f4 : <linkedmdb , ? movie linkedmdb : genre ? genre> fragments mapping = {( f1 , {DBpedia , C1 }) , ( f2 ,{ DBpedia , C1 }) , ( f3 , {LinkedMDB , C1 }) , ( f4 ,{ LinkedMDB , C1})} 30
  • 54. Alternative Endpoints per Fragment are Considered BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f1 { DBpedia, C1 } tp2 ?film dbo:director ?director f2 { DBpedia, C1 } tp3 ?movie owl:sameAs ?film f3 { LinkedMDB, C1} tp4 ?movie linkedmdb:genre ?genre f4 { LinkedMDB, C1} f1 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y ? nat> f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r > f3 : <linkedmdb , ? movie owl : sameAs ? film > f4 : <linkedmdb , ? movie linkedmdb : genre ? genre> fragments mapping = {( f1 , {DBpedia , C1 }) , ( f2 ,{ DBpedia , C1 }) , ( f3 , {LinkedMDB , C1 }) , ( f4 ,{ LinkedMDB , C1})} 30
  • 55. SSP is reduced to the set covering problem BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f1 { DBpedia, C1 } tp2 ?film dbo:director ?director f2 { DBpedia, C1 } tp3 ?movie owl:sameAs ?film f3 { LinkedMDB, C1} tp4 ?movie linkedmdb:genre ?genre f4 { LinkedMDB, C1} S = { tp1, tp2, tp3, tp4 } CC1 = { tp1, tp2, tp3, tp4 } CDBpedia = { tp1, tp2 } CLinkedMDB = { tp3, tp4 } 30
  • 56. Endpoints that can evaluate more joins are chosen BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f1 { DBpedia, C1 } tp2 ?film dbo:director ?director f2 { DBpedia, C1 } tp3 ?movie owl:sameAs ?film f3 { LinkedMDB, C1} tp4 ?movie linkedmdb:genre ?genre f4 { LinkedMDB, C1} S = { tp1, tp2, tp3, tp4 } CC1 = { tp1, tp2, tp3, tp4 } CDBpedia = { tp1, tp2 } CLinkedMDB = { tp3, tp4 } 30
  • 57. It may be necessary to simplify to get the best selection BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f5 {C1, C2 } f6 {C1} tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 } tp4 ?movie linkedmdb:genre ?genre f4 { C2} f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r > f3 : <linkedmdb , ? movie owl : sameAs ? film > f4 : <linkedmdb , ? movie linkedmdb : genre ? genre> f5 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : France> f6 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom> fragments mapping = {( f2 , {C1 , C2 }) ,( f3 , {C1 }) , ( f4 , {C1 }) ,( f5 ,{ C1 , C2 }) , ( f6 , {C1})} 31
  • 58. It may be necessary to simplify to get the best selection BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f5 {C1, C2 } f6 {C1} tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 } tp4 ?movie linkedmdb:genre ?genre f4 { C2} S = { tp1, tp2, tp3, tp4 } CC1 = { tp1, tp2, tp3} CC2 = { tp3, tp4} CC3 = { tp2 } CC4 = { tp3 } 31
  • 59. It may be necessary to simplify to get the best selection BGP Triple Pattern Relevant Relevant Fragments Endpoints tp1 ?director dbo:nationality ?nat f5 {C1, C2 } f6 {C1} tp2 ?film dbo:director ?director f2 { C1, C3 } tp3 ?movie owl:sameAs ?film f3 { C1, C2, C4 } tp4 ?movie linkedmdb:genre ?genre f4 { C2} S = { tp1, tp2, tp3, tp4 } CC1 = { tp1, tp2, tp3} CC2 = { tp3, tp4} CC3 = { tp2 } CC4 = { tp3 } 31
  • 60. Statistical Significance of Data Redundancy Minimization H0: Fedra selects the same number of sources as DAW does Ha: Fedra selects less sources than DAW Federation p-value ANAPSID FedX Diseasome 1.811e-08 8.371e-09 SWDF 2.28e-10 5.386e-11 LinkedMDB 5.082e-09 5.254e-11 Geocoordinates 1.301e-05 1.301e-05 WatDiv1 6.209e-07 1.006e-07 WatDiv100 1.563e-05 3.623e-07 For all the federations and engines, the obtained p-values2 allow to discard the null hypothesis (H0) in favor of the alternative hypothesis (Ha). 2 The Wilcoxon signed rank test was computed using R 32
  • 61. Statistical Significance of Data Transfer Minimization H0: using sources selected by Fedra leads to transfer the same number of tuples as using sources selected by DAW Ha: using sources selected by Fedra leads to transfer less tuples than using sources selected by DAW Federation p-value ANAPSID FedX Diseasome 3.314e-12 2.821e-06 SWDF 1.472e-08 0.7621 LinkedMDB 2.368e-08 0.001274 Geocoordinates 1.921e-05 1.183e-06 WatDiv1 8.431e-05 7.246e-09 WatDiv100 9.986e-06 0.0001301 For all the federations and engines except SWDF+FedX, the obtained p-values3 allow to discard the null hypothesis (H0) in favor of the alternative hypothesis (Ha). 3 The Wilcoxon signed rank test was computed using R 33
  • 62. Statistical Significance of Source Selection Time Reduction H0: using sources selected by Fedra leads to the same source selection time as using sources selected by DAW Ha: using sources selected by Fedra leads to lower source selection time than using sources selected by DAW Federation p-value ANAPSID FedX Diseasome 1 < 2.2e-16 SWDF 1 < 2.2e-16 LinkedMDB 1.284e-11 < 2.2e-16 Geocoordinates < 2.2e-16 < 2.2e-16 WatDiv1 1 < 2.2e-16 WatDiv100 < 2.2e-16 < 2.2e-16 For all the federations and engines except Diseasome+ANAPSID, SWDF+ANAPSID and WatDiv1+ANAPSID, the obtained p-values4 allow to discard the null hypothesis (H0) in favor of the alternative hypothesis (Ha). 4 The Wilcoxon signed rank test was computed using R 34
  • 63. Statistical Significance of Execution Time Reduction H0: using sources selected by Fedra leads to the same execution time as using sources selected by DAW Ha: using sources selected by Fedra leads to lower execution time than using sources selected by DAW Federation p-value ANAPSID FedX Diseasome 0.0001547 < 2.2e-16 SWDF 1 6.794e-06 LinkedMDB < 2.2e-16 9.223e-15 Geocoordinates < 2.2e-16 7.87e-13 WatDiv1 1 6.315e-16 WatDiv100 5.392e-09 1.384e-14 For all the federations and engines except SWDF+ANAPSID and WatDiv1+ANAPSID, the obtained p-values5 allow to discard the null hypothesis (H0) in favor of the alternative hypothesis (Ha). 5 The Wilcoxon signed rank test was computed using R 35
  • 64. Source Selection may not be enough ?director dbo:nationality ?nat ?film dbo:director ?director f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r > f5 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : France> f6 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom> fragments mapping = {( f2 , {C1 , C2}) , ( f5 ,{ C1}) , ( f6 , {C2})} 36
  • 65. Source Selection may not be enough ?director dbo:nationality ?nat f5 ?director dbo:nationality ?nat f6 ?film dbo:director ?director f2 f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r > f5 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : France> f6 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom> fragments mapping = {( f2 , {C1 , C2}) , ( f5 ,{ C1}) , ( f6 , {C2})} 36
  • 66. Source Selection may not be enough ?director dbo:nationality ?nat f5 {C1} ?director dbo:nationality ?nat f6 {C2} ?film dbo:director ?director f2 {C1, C2} f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r > f5 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : France> f6 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom> fragments mapping = {( f2 , {C1 , C2}) , ( f5 ,{ C1}) , ( f6 , {C2})} 36
  • 67. Source Selection may not be enough ?director dbo:nationality ?nat f5 { C1} ?director dbo:nationality ?nat f6 { C2} ?film dbo:director ?director f2 {C1, C2} f2 : <dbpedia , ? f i l m dbo : d i r e c t o r ? d i r e c t o r > f5 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : France> f6 : <dbpedia , ? d i r e c t o r dbo : n a t i o n a l i t y dbr : United Kingdom> fragments mapping = {( f2 , {C1 , C2}) , ( f5 ,{ C1}) , ( f6 , {C2})} 36