SlideShare a Scribd company logo
Towards A Scalable Semantic-based Distributed
Approach for SPARQL query evaluation
Gezim Sejdiu, Damien Graux, Imran Khan, Ioanna Lytra, Hajira Jabeen, and Jens Lehmann
2 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
❖ Introduction
❖ Approach
➢ System Architecture Overview
➢ Distributed Algorithm Description
❖ Evaluation
❖ Conclusion and Future work
Outline
3 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
4 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
➢ As of August 2018
5 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
➢ Based on LOD Stats (http://lodstats.aksw.org/ )
~10, 000 datasets
Openly available online
using Semantic Web
standards
6 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
➢ Based on LOD Stats (http://lodstats.aksw.org/ )
~10, 000 datasets
Openly available online
using Semantic Web
standards
many datasets
RDFized and kept private
(e.g. Supply chain,
manufacture, ethereum
dataset, etc.)
7 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
8 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
Supply
Chain Mng.
Find relevant
information
about
motorcycle part
9 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
Media
Publishing
Discover what
others say
about news
stories
Supply
Chain Mng.
Find relevant
information
about
motorcycle part
10 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
Media
Publishing
Discover what
others say
about news
stories
Exploring
Tourism
Exploring the
neighborhood
(events, point
of interests,
etc)
Supply
Chain Mng.
Find relevant
information
about
motorcycle part
11 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
Media
Publishing
Discover what
others say
about news
stories
Exploring
Tourism
Exploring the
neighborhood
(events, point
of interests,
etc)
Entity
Linking
Which datasets
are good
candidates for
interlinking?
Supply
Chain Mng.
Find relevant
information
about
motorcycle part
12 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Introduction
Source: LOD-Cloud (http://lod-cloud.net/ )
❖ Dealing with such amount of data makes many tasks hard to be
solved on single machines
Media
Publishing
Discover what
others say
about news
stories
Exploring
Tourism
Exploring the
neighborhood
(events, point
of interests,
etc)
Entity
Linking
Which datasets
are good
candidates for
interlinking?
Supply
Chain Mng.
Find relevant
information
about
motorcycle part
These processes require, both efficient storage
strategies and query processing engine.
13 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
SANSA Overview
❖ SANSA [1] is a processing data flow engine that provides data
distribution, and fault tolerance for distributed computations over
RDF large-scale datasets
❖ SANSA includes several libraries:
➢ Read / Write RDF / OWL library
➢ Querying library
➢ Inference library
➢ ML- Machine Learning core library
BigDataEurope
Inference
Knowledge Distribution &
Representation
DeployCoreAPIs&Libraries
Local Cluster
Standalone Resource manager
Querying
Machine Learning
14 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Semantic
map map
Distributed Data
Structures
Results
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn ?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview
15 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
RDFData
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
16 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
System Architecture Overview (workflow)
SANSA Engine
RDF Layer
Data Ingestion
RDFData
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
17 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
RDFData
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
18 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
RDFData
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
System Architecture Overview (workflow)
19 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
RDFData
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
System Architecture Overview (workflow)
20 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
RDFData
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
System Architecture Overview (workflow)
21 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
RDFData
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
System Architecture Overview (workflow)
22 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
RDFData
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
23 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn ?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
24 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Semantic
map map
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn ?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
25 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Semantic
map map
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn ?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
#1
select (p=:owns)
26 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Semantic
map map
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn ?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
#1
select (p=:owns)
#2
select (p=:madeIn,
o=:Ingolstadt)
27 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Semantic
map map
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn ?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
#1
select (p=:owns)
#2
select (p=:madeIn,
o=:Ingolstadt)
#3
join (?c)
28 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Semantic
map map
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn ?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
#1
select (p=:owns)
#2
select (p=:madeIn,
o=:Ingolstadt)
#3
join (?c)
#4
project (?p)
29 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Semantic
map map
Distributed Data
Structures
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn ?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
30 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Approach
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Semantic
map map
Distributed Data
Structures
Results
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn ?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
System Architecture Overview (workflow)
31 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
❖ Experimental Setup
➢ Cluster configuration
■ 6 machines (1 master, 5 workers): Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (32 Cores),
128 GB RAM, 12 TB SATA RAID-5, Spark-2.4.0, Hadoop 2.8.0, Scala 2.11.11 and Java 8
➢ Datasets (all in nt format)
➢ Distributed SPARQL query evaluators we compare with:
■ SHARD [2], SPARQLGX-SDE [3], and Sparklify [4]
Evaluation
LUBM WatDiv
1K 2K 3K 10M 100M
#nr. of triples 138,280,374 276,349,040 414,493,296 10,916,457 108,997,714
size (GB) 24 49 70 1.5 15
32 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Runtime (s) (mean)
Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours)
C3 n/a 38.79 72.94 90.48
F3 n/a 38.41 74.69 n/a
L3 n/a 21.05 73.16 72.84
S3 n/a 26.27 70.1 79.7
C3 n/a 181.51 96.59 300.82
F3 n/a 162.86 91.2 n/a
L3 n/a 84.09 82.17 189.89
S3 n/a 123.6 93.02 176.2
Q1 774.93 103.74 103.57 226.21
Q2 fail fail 3348.51 329.69
Q3 772.55 126.31 107.25 235.31
Q4 988.28 182.52 111.89 294.8
Q5 771.69 101.05 100.37 226.21
Q6 fail 73.05 100.72 207.06
Q7 fail 160.94 113.03 277.08
Q8 fail 179.56 114.83 309.39
Q9 fail 204.62 114.25 326.29
Q10 780.05 106.26 110.18 232.72
Q11 783.2 112.23 105.13 231.36
Q12 fail 159.65 105.86 283.53
Q13 778.16 100.06 90.87 220.28
Q14 688.44 74.64 100.58 204.43
Performance analysis on large-scale RDF datasets
Evaluation
Watdiv-10MWatdiv-100MLUBM-1K
33 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Runtime (s) (mean)
Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours)
C3 n/a 38.79 72.94 90.48
F3 n/a 38.41 74.69 n/a
L3 n/a 21.05 73.16 72.84
S3 n/a 26.27 70.1 79.7
C3 n/a 181.51 96.59 300.82
F3 n/a 162.86 91.2 n/a
L3 n/a 84.09 82.17 189.89
S3 n/a 123.6 93.02 176.2
Q1 774.93 103.74 103.57 226.21
Q2 fail fail 3348.51 329.69
Q3 772.55 126.31 107.25 235.31
Q4 988.28 182.52 111.89 294.8
Q5 771.69 101.05 100.37 226.21
Q6 fail 73.05 100.72 207.06
Q7 fail 160.94 113.03 277.08
Q8 fail 179.56 114.83 309.39
Q9 fail 204.62 114.25 326.29
Q10 780.05 106.26 110.18 232.72
Q11 783.2 112.23 105.13 231.36
Q12 fail 159.65 105.86 283.53
Q13 778.16 100.06 90.87 220.28
Q14 688.44 74.64 100.58 204.43
Performance analysis on large-scale RDF datasets
Evaluation
Watdiv-10MWatdiv-100MLUBM-1K
34 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Runtime (s) (mean)
Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours)
C3 n/a 38.79 72.94 90.48
F3 n/a 38.41 74.69 n/a
L3 n/a 21.05 73.16 72.84
S3 n/a 26.27 70.1 79.7
C3 n/a 181.51 96.59 300.82
F3 n/a 162.86 91.2 n/a
L3 n/a 84.09 82.17 189.89
S3 n/a 123.6 93.02 176.2
Q1 774.93 103.74 103.57 226.21
Q2 fail fail 3348.51 329.69
Q3 772.55 126.31 107.25 235.31
Q4 988.28 182.52 111.89 294.8
Q5 771.69 101.05 100.37 226.21
Q6 fail 73.05 100.72 207.06
Q7 fail 160.94 113.03 277.08
Q8 fail 179.56 114.83 309.39
Q9 fail 204.62 114.25 326.29
Q10 780.05 106.26 110.18 232.72
Q11 783.2 112.23 105.13 231.36
Q12 fail 159.65 105.86 283.53
Q13 778.16 100.06 90.87 220.28
Q14 688.44 74.64 100.58 204.43
Performance analysis on large-scale RDF datasets
Evaluation
Watdiv-10MWatdiv-100MLUBM-1K
We can see that our
approach
(SANSA.Semantic)
evaluates most of the
queries as opposed to
SHARD. SHARD system
fails to evaluate most of
the LUBM queries and
its parser does not
support Watdiv queries.
35 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Runtime (s) (mean)
Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours)
C3 n/a 38.79 72.94 90.48
F3 n/a 38.41 74.69 n/a
L3 n/a 21.05 73.16 72.84
S3 n/a 26.27 70.1 79.7
C3 n/a 181.51 96.59 300.82
F3 n/a 162.86 91.2 n/a
L3 n/a 84.09 82.17 189.89
S3 n/a 123.6 93.02 176.2
Q1 774.93 103.74 103.57 226.21
Q2 fail fail 3348.51 329.69
Q3 772.55 126.31 107.25 235.31
Q4 988.28 182.52 111.89 294.8
Q5 771.69 101.05 100.37 226.21
Q6 fail 73.05 100.72 207.06
Q7 fail 160.94 113.03 277.08
Q8 fail 179.56 114.83 309.39
Q9 fail 204.62 114.25 326.29
Q10 780.05 106.26 110.18 232.72
Q11 783.2 112.23 105.13 231.36
Q12 fail 159.65 105.86 283.53
Q13 778.16 100.06 90.87 220.28
Q14 688.44 74.64 100.58 204.43
Performance analysis on large-scale RDF datasets
Evaluation
Watdiv-10MWatdiv-100MLUBM-1K
On the other hand,
SPARQLGX-SDE
performs better than
both Sparklify and our
approach, when the
size of the dataset is
considerably small (e.g.
less than 25GB). This
behavior is due to the
large partitioning
overhead for Sparklify
and our approach.
36 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Runtime (s) (mean)
Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours)
C3 n/a 38.79 72.94 90.48
F3 n/a 38.41 74.69 n/a
L3 n/a 21.05 73.16 72.84
S3 n/a 26.27 70.1 79.7
C3 n/a 181.51 96.59 300.82
F3 n/a 162.86 91.2 n/a
L3 n/a 84.09 82.17 189.89
S3 n/a 123.6 93.02 176.2
Q1 774.93 103.74 103.57 226.21
Q2 fail fail 3348.51 329.69
Q3 772.55 126.31 107.25 235.31
Q4 988.28 182.52 111.89 294.8
Q5 771.69 101.05 100.37 226.21
Q6 fail 73.05 100.72 207.06
Q7 fail 160.94 113.03 277.08
Q8 fail 179.56 114.83 309.39
Q9 fail 204.62 114.25 326.29
Q10 780.05 106.26 110.18 232.72
Q11 783.2 112.23 105.13 231.36
Q12 fail 159.65 105.86 283.53
Q13 778.16 100.06 90.87 220.28
Q14 688.44 74.64 100.58 204.43
Performance analysis on large-scale RDF datasets
Evaluation
Watdiv-10MWatdiv-100MLUBM-1K
However, Sparklify
performs better
compared to
SPARQLGX-SDE when
the size of the dataset
increases (see
Watdiv-100M results)
and the queries involve
more joins (see
LUBM-1K results). This
is due to the Spark SQL
optimizer and Sparqlify
self-joins optimizers.
37 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Evaluation
Sizeup analysis (on LUBM dataset)
38 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Evaluation
Sizeup analysis (on LUBM dataset)
Query execution
time for our
approach grows
linearly when the
size of the datasets
increases.
SHARD suffers from
the expensive
overhead of
MapReduce joins
which impact its
performance.
39 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Evaluation
Sizeup analysis (on LUBM dataset)
Sparqlify and
SPARQLGX-SDE
performs better
when the size of the
datasets increases
as compared with
our approach and
SHARD.
40 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Evaluation
Node scalability (on LUBM-1K)
41 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Evaluation
Node scalability (on LUBM-1K)
We see that as the
number of nodes
increases, the
runtime cost of our
query engine
decreases linearly as
compared with the
SHARD, which keeps
staying constant.
42 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Evaluation
Node scalability (on LUBM-1K)
SPARQLGX-SDE and
Sparklify perform
better when the
number of nodes
increases compared
to our approach and
SHARD
43 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
❖ Processing large-scale RDF datasets:
➢ data-intensive and computing-intensive
➢ challenge to develop fast and efficient engine that can handle large scale
RDF datasets
❖ A scalable semantic-based query engine (integrated into the larger
SANSA framework) for efficient evaluation of SPARQL queries over
distributed RDF datasets using the Spark framework
❖ As a future work we plan to expand our parser and add statistics to
the query engine while evaluating queries
Conclusion
THANK YOU!
SANSA Semantic Analytics Stack
https://github.com/SANSA-Stack
● SANSA-RDF
● SANSA-OWL
● SANSA-Query
● SANSA-Inference
● SANSA-ML
● SANSA-Examples
● SANSA-Notebooks
Gezim Sejdiu
PhD Student
Smart Data Analytics / University of Bonn
http://sda.tech
@Gezim_Sejdiu
https://gezimsejdiu.github.io/
<http://sansa-stack.net/>
45 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
[1]. J. Lehmann, G. Sejdiu, L. B¨uhmann, P. Westphal, C. Stadler, I. Ermilov, S. Bin, N. Chakraborty, M. Saleem,
A.-C. N. Ngonga, and H. Jabeen. Distributed Semantic Analytics using the SANSA Stack. In Proceedings of 16th
International Semantic Web Conference - Resources Track (ISWC’2017), 2017.
[2]. K. Rohloff and R. E. Schantz. High-performance, massively scalable distributed systems using the
mapreduce software framework: The shard triple-store. In Programming Support Innovations for Emerging
Distributed Applications, PSI EtA ’10, pages 4:1–4:5, New York, NY, USA, 2010. ACM.
[3]. D. Graux, L. Jachiet, P. Genev`es, and N. Layaida. Sparqlgx: Efficient distributed evaluation of sparql with
apache spark. In P. Groth, E. Simperl, A. Gray, M. Sabou, M. Kr¨otzsch, F. Lecue, F. Fl¨ock, and Y. Gil, editors,
The Semantic Web – ISWC 2016, pages 80–87, Cham, 2016.
[4]. C. Stadler, G. Sejdiu, D. Graux, and J. Lehmann. Sparklify: A Scalable Software Component for Efficient
evaluation of SPARQL queries over distributed RDF datasets. In Proceedings of 18th International Semantic
Web Conference (ISWC’2019), 2019.
References
46 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Backup Slides
47 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn
Evaluation
Overall analysis of queries on LUBM-1K dataset (cluser mode)

More Related Content

What's hot

Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org sopekmir
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistStratos Kontopoulos
 
Structured Data for the Financial Industry
Structured Data for the Financial Industry Structured Data for the Financial Industry
Structured Data for the Financial Industry sopekmir
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesSrinath Srinivasa
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingsopekmir
 
Linked Data in Scholarly Communication
Linked Data in Scholarly CommunicationLinked Data in Scholarly Communication
Linked Data in Scholarly CommunicationBernhard Haslhofer
 
The Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) ModelThe Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) ModelBernhard Haslhofer
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 

What's hot (15)

Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
 
Flink Meetup Septmeber 2017 2018
Flink Meetup Septmeber 2017 2018Flink Meetup Septmeber 2017 2018
Flink Meetup Septmeber 2017 2018
 
Structured Data for the Financial Industry
Structured Data for the Financial Industry Structured Data for the Financial Industry
Structured Data for the Financial Industry
 
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
 
euBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic DataeuBusinessGraph Company and Economic Data
euBusinessGraph Company and Economic Data
 
FIBO & Schema.org
FIBO & Schema.orgFIBO & Schema.org
FIBO & Schema.org
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and Opportunities
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reporting
 
Linked Data in Scholarly Communication
Linked Data in Scholarly CommunicationLinked Data in Scholarly Communication
Linked Data in Scholarly Communication
 
The Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) ModelThe Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) Model
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 

Similar to Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation - SEMANTiCS 2019 talk

DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkGezim Sejdiu
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationSören Auer
 
Rank | Analyse | Lead | Search
Rank | Analyse | Lead | SearchRank | Analyse | Lead | Search
Rank | Analyse | Lead | Searchsopekmir
 
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTourismFastForward
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityOpen Cyber University of Korea
 
Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseGiorgio Orsi
 
UGent Research Projects on Linked Data in Architecture and Construction
UGent Research Projects on Linked Data in Architecture and ConstructionUGent Research Projects on Linked Data in Architecture and Construction
UGent Research Projects on Linked Data in Architecture and ConstructionPieter Pauwels
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and OntarioBigData_Europe
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
Python time series analysis and visualization for self-driving cars
Python time series analysis and visualization for self-driving carsPython time series analysis and visualization for self-driving cars
Python time series analysis and visualization for self-driving carsAndreas Pawlik
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Gautier Poupeau
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSemLib Project
 
A Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesA Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesKarwan Jacksi
 
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Fabrizio Orlandi
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikisSören Auer
 

Similar to Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation - SEMANTiCS 2019 talk (20)

DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Rank | Analyse | Lead | Search
Rank | Analyse | Lead | SearchRank | Analyse | Lead | Search
Rank | Analyse | Lead | Search
 
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
UGent Research Projects on Linked Data in Architecture and Construction
UGent Research Projects on Linked Data in Architecture and ConstructionUGent Research Projects on Linked Data in Architecture and Construction
UGent Research Projects on Linked Data in Architecture and Construction
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Python time series analysis and visualization for self-driving cars
Python time series analysis and visualization for self-driving carsPython time series analysis and visualization for self-driving cars
Python time series analysis and visualization for self-driving cars
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentation
 
RDF data clustering
RDF data clusteringRDF data clustering
RDF data clustering
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
A Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesA Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD Resources
 
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Linked data and semantic wikis
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
 

Recently uploaded

THYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingTHYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingJocelyn Atis
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxSultanMuhammadGhauri
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxmuralinath2
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsAreesha Ahmad
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsMichel Dumontier
 
Plant Biotechnology undergraduates note.pptx
Plant Biotechnology undergraduates note.pptxPlant Biotechnology undergraduates note.pptx
Plant Biotechnology undergraduates note.pptxyusufzako14
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionpablovgd
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfbinhminhvu04
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Sérgio Sacani
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesrodneykiptoo8
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptxrakeshsharma20142015
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversitySteffi Friedrichs
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptxCherry
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsmuralinath2
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...Subhajit Sahu
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)aishnasrivastava
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insectanitaento25
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
 

Recently uploaded (20)

THYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursingTHYROID-PARATHYROID medical surgical nursing
THYROID-PARATHYROID medical surgical nursing
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Plant Biotechnology undergraduates note.pptx
Plant Biotechnology undergraduates note.pptxPlant Biotechnology undergraduates note.pptx
Plant Biotechnology undergraduates note.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniques
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 

Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation - SEMANTiCS 2019 talk

  • 1. Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu, Damien Graux, Imran Khan, Ioanna Lytra, Hajira Jabeen, and Jens Lehmann
  • 2. 2 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn ❖ Introduction ❖ Approach ➢ System Architecture Overview ➢ Distributed Algorithm Description ❖ Evaluation ❖ Conclusion and Future work Outline
  • 3. 3 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published
  • 4. 4 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published ➢ As of August 2018
  • 5. 5 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published ➢ Based on LOD Stats (http://lodstats.aksw.org/ ) ~10, 000 datasets Openly available online using Semantic Web standards
  • 6. 6 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Over the last years, the size of the Semantic Web has increased and several large-scale datasets were published ➢ Based on LOD Stats (http://lodstats.aksw.org/ ) ~10, 000 datasets Openly available online using Semantic Web standards many datasets RDFized and kept private (e.g. Supply chain, manufacture, ethereum dataset, etc.)
  • 7. 7 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines
  • 8. 8 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines Supply Chain Mng. Find relevant information about motorcycle part
  • 9. 9 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines Media Publishing Discover what others say about news stories Supply Chain Mng. Find relevant information about motorcycle part
  • 10. 10 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines Media Publishing Discover what others say about news stories Exploring Tourism Exploring the neighborhood (events, point of interests, etc) Supply Chain Mng. Find relevant information about motorcycle part
  • 11. 11 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines Media Publishing Discover what others say about news stories Exploring Tourism Exploring the neighborhood (events, point of interests, etc) Entity Linking Which datasets are good candidates for interlinking? Supply Chain Mng. Find relevant information about motorcycle part
  • 12. 12 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Introduction Source: LOD-Cloud (http://lod-cloud.net/ ) ❖ Dealing with such amount of data makes many tasks hard to be solved on single machines Media Publishing Discover what others say about news stories Exploring Tourism Exploring the neighborhood (events, point of interests, etc) Entity Linking Which datasets are good candidates for interlinking? Supply Chain Mng. Find relevant information about motorcycle part These processes require, both efficient storage strategies and query processing engine.
  • 13. 13 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn SANSA Overview ❖ SANSA [1] is a processing data flow engine that provides data distribution, and fault tolerance for distributed computations over RDF large-scale datasets ❖ SANSA includes several libraries: ➢ Read / Write RDF / OWL library ➢ Querying library ➢ Inference library ➢ ML- Machine Learning core library BigDataEurope Inference Knowledge Distribution & Representation DeployCoreAPIs&Libraries Local Cluster Standalone Resource manager Querying Machine Learning
  • 14. 14 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning Query Layer Semantic map map Distributed Data Structures Results RDFData SELECT ?p WHERE { ?p :owns ?c . ?c :madeIn ?Ingolstadt . } SPARQL Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview
  • 15. 15 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach RDFData Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow)
  • 16. 16 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach System Architecture Overview (workflow) SANSA Engine RDF Layer Data Ingestion RDFData Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany
  • 17. 17 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning RDFData Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow)
  • 18. 18 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning RDFData Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn System Architecture Overview (workflow)
  • 19. 19 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning RDFData Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt System Architecture Overview (workflow)
  • 20. 20 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning RDFData Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany System Architecture Overview (workflow)
  • 21. 21 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning RDFData Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen System Architecture Overview (workflow)
  • 22. 22 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning RDFData Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow)
  • 23. 23 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning RDFData SELECT ?p WHERE { ?p :owns ?c . ?c :madeIn ?Ingolstadt . } SPARQL Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow)
  • 24. 24 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning Query Layer Semantic map map RDFData SELECT ?p WHERE { ?p :owns ?c . ?c :madeIn ?Ingolstadt . } SPARQL Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow)
  • 25. 25 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning Query Layer Semantic map map RDFData SELECT ?p WHERE { ?p :owns ?c . ?c :madeIn ?Ingolstadt . } SPARQL Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow) #1 select (p=:owns)
  • 26. 26 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning Query Layer Semantic map map RDFData SELECT ?p WHERE { ?p :owns ?c . ?c :madeIn ?Ingolstadt . } SPARQL Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow) #1 select (p=:owns) #2 select (p=:madeIn, o=:Ingolstadt)
  • 27. 27 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning Query Layer Semantic map map RDFData SELECT ?p WHERE { ?p :owns ?c . ?c :madeIn ?Ingolstadt . } SPARQL Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow) #1 select (p=:owns) #2 select (p=:madeIn, o=:Ingolstadt) #3 join (?c)
  • 28. 28 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning Query Layer Semantic map map RDFData SELECT ?p WHERE { ?p :owns ?c . ?c :madeIn ?Ingolstadt . } SPARQL Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow) #1 select (p=:owns) #2 select (p=:madeIn, o=:Ingolstadt) #3 join (?c) #4 project (?p)
  • 29. 29 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning Query Layer Semantic map map Distributed Data Structures RDFData SELECT ?p WHERE { ?p :owns ?c . ?c :madeIn ?Ingolstadt . } SPARQL Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow)
  • 30. 30 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Approach SANSA Engine RDF Layer Data Ingestion Partitioning Query Layer Semantic map map Distributed Data Structures Results RDFData SELECT ?p WHERE { ?p :owns ?c . ?c :madeIn ?Ingolstadt . } SPARQL Joy :owns Car1 Joy :livesIn Bonn Car1 :typeOf Car Car1 :madeBy Audi Car1 :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany Joy :owns Car1 :livesIn Bonn Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt Bonn :cityOf Germany Audi :memeberOf Volkswagen Ingolstadt :cityOf Germany System Architecture Overview (workflow)
  • 31. 31 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn ❖ Experimental Setup ➢ Cluster configuration ■ 6 machines (1 master, 5 workers): Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (32 Cores), 128 GB RAM, 12 TB SATA RAID-5, Spark-2.4.0, Hadoop 2.8.0, Scala 2.11.11 and Java 8 ➢ Datasets (all in nt format) ➢ Distributed SPARQL query evaluators we compare with: ■ SHARD [2], SPARQLGX-SDE [3], and Sparklify [4] Evaluation LUBM WatDiv 1K 2K 3K 10M 100M #nr. of triples 138,280,374 276,349,040 414,493,296 10,916,457 108,997,714 size (GB) 24 49 70 1.5 15
  • 32. 32 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Runtime (s) (mean) Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours) C3 n/a 38.79 72.94 90.48 F3 n/a 38.41 74.69 n/a L3 n/a 21.05 73.16 72.84 S3 n/a 26.27 70.1 79.7 C3 n/a 181.51 96.59 300.82 F3 n/a 162.86 91.2 n/a L3 n/a 84.09 82.17 189.89 S3 n/a 123.6 93.02 176.2 Q1 774.93 103.74 103.57 226.21 Q2 fail fail 3348.51 329.69 Q3 772.55 126.31 107.25 235.31 Q4 988.28 182.52 111.89 294.8 Q5 771.69 101.05 100.37 226.21 Q6 fail 73.05 100.72 207.06 Q7 fail 160.94 113.03 277.08 Q8 fail 179.56 114.83 309.39 Q9 fail 204.62 114.25 326.29 Q10 780.05 106.26 110.18 232.72 Q11 783.2 112.23 105.13 231.36 Q12 fail 159.65 105.86 283.53 Q13 778.16 100.06 90.87 220.28 Q14 688.44 74.64 100.58 204.43 Performance analysis on large-scale RDF datasets Evaluation Watdiv-10MWatdiv-100MLUBM-1K
  • 33. 33 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Runtime (s) (mean) Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours) C3 n/a 38.79 72.94 90.48 F3 n/a 38.41 74.69 n/a L3 n/a 21.05 73.16 72.84 S3 n/a 26.27 70.1 79.7 C3 n/a 181.51 96.59 300.82 F3 n/a 162.86 91.2 n/a L3 n/a 84.09 82.17 189.89 S3 n/a 123.6 93.02 176.2 Q1 774.93 103.74 103.57 226.21 Q2 fail fail 3348.51 329.69 Q3 772.55 126.31 107.25 235.31 Q4 988.28 182.52 111.89 294.8 Q5 771.69 101.05 100.37 226.21 Q6 fail 73.05 100.72 207.06 Q7 fail 160.94 113.03 277.08 Q8 fail 179.56 114.83 309.39 Q9 fail 204.62 114.25 326.29 Q10 780.05 106.26 110.18 232.72 Q11 783.2 112.23 105.13 231.36 Q12 fail 159.65 105.86 283.53 Q13 778.16 100.06 90.87 220.28 Q14 688.44 74.64 100.58 204.43 Performance analysis on large-scale RDF datasets Evaluation Watdiv-10MWatdiv-100MLUBM-1K
  • 34. 34 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Runtime (s) (mean) Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours) C3 n/a 38.79 72.94 90.48 F3 n/a 38.41 74.69 n/a L3 n/a 21.05 73.16 72.84 S3 n/a 26.27 70.1 79.7 C3 n/a 181.51 96.59 300.82 F3 n/a 162.86 91.2 n/a L3 n/a 84.09 82.17 189.89 S3 n/a 123.6 93.02 176.2 Q1 774.93 103.74 103.57 226.21 Q2 fail fail 3348.51 329.69 Q3 772.55 126.31 107.25 235.31 Q4 988.28 182.52 111.89 294.8 Q5 771.69 101.05 100.37 226.21 Q6 fail 73.05 100.72 207.06 Q7 fail 160.94 113.03 277.08 Q8 fail 179.56 114.83 309.39 Q9 fail 204.62 114.25 326.29 Q10 780.05 106.26 110.18 232.72 Q11 783.2 112.23 105.13 231.36 Q12 fail 159.65 105.86 283.53 Q13 778.16 100.06 90.87 220.28 Q14 688.44 74.64 100.58 204.43 Performance analysis on large-scale RDF datasets Evaluation Watdiv-10MWatdiv-100MLUBM-1K We can see that our approach (SANSA.Semantic) evaluates most of the queries as opposed to SHARD. SHARD system fails to evaluate most of the LUBM queries and its parser does not support Watdiv queries.
  • 35. 35 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Runtime (s) (mean) Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours) C3 n/a 38.79 72.94 90.48 F3 n/a 38.41 74.69 n/a L3 n/a 21.05 73.16 72.84 S3 n/a 26.27 70.1 79.7 C3 n/a 181.51 96.59 300.82 F3 n/a 162.86 91.2 n/a L3 n/a 84.09 82.17 189.89 S3 n/a 123.6 93.02 176.2 Q1 774.93 103.74 103.57 226.21 Q2 fail fail 3348.51 329.69 Q3 772.55 126.31 107.25 235.31 Q4 988.28 182.52 111.89 294.8 Q5 771.69 101.05 100.37 226.21 Q6 fail 73.05 100.72 207.06 Q7 fail 160.94 113.03 277.08 Q8 fail 179.56 114.83 309.39 Q9 fail 204.62 114.25 326.29 Q10 780.05 106.26 110.18 232.72 Q11 783.2 112.23 105.13 231.36 Q12 fail 159.65 105.86 283.53 Q13 778.16 100.06 90.87 220.28 Q14 688.44 74.64 100.58 204.43 Performance analysis on large-scale RDF datasets Evaluation Watdiv-10MWatdiv-100MLUBM-1K On the other hand, SPARQLGX-SDE performs better than both Sparklify and our approach, when the size of the dataset is considerably small (e.g. less than 25GB). This behavior is due to the large partitioning overhead for Sparklify and our approach.
  • 36. 36 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Runtime (s) (mean) Queries SHARD SPARQLGX-SDE SANSA.Sparklify SANSA.Semantic (ours) C3 n/a 38.79 72.94 90.48 F3 n/a 38.41 74.69 n/a L3 n/a 21.05 73.16 72.84 S3 n/a 26.27 70.1 79.7 C3 n/a 181.51 96.59 300.82 F3 n/a 162.86 91.2 n/a L3 n/a 84.09 82.17 189.89 S3 n/a 123.6 93.02 176.2 Q1 774.93 103.74 103.57 226.21 Q2 fail fail 3348.51 329.69 Q3 772.55 126.31 107.25 235.31 Q4 988.28 182.52 111.89 294.8 Q5 771.69 101.05 100.37 226.21 Q6 fail 73.05 100.72 207.06 Q7 fail 160.94 113.03 277.08 Q8 fail 179.56 114.83 309.39 Q9 fail 204.62 114.25 326.29 Q10 780.05 106.26 110.18 232.72 Q11 783.2 112.23 105.13 231.36 Q12 fail 159.65 105.86 283.53 Q13 778.16 100.06 90.87 220.28 Q14 688.44 74.64 100.58 204.43 Performance analysis on large-scale RDF datasets Evaluation Watdiv-10MWatdiv-100MLUBM-1K However, Sparklify performs better compared to SPARQLGX-SDE when the size of the dataset increases (see Watdiv-100M results) and the queries involve more joins (see LUBM-1K results). This is due to the Spark SQL optimizer and Sparqlify self-joins optimizers.
  • 37. 37 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Evaluation Sizeup analysis (on LUBM dataset)
  • 38. 38 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Evaluation Sizeup analysis (on LUBM dataset) Query execution time for our approach grows linearly when the size of the datasets increases. SHARD suffers from the expensive overhead of MapReduce joins which impact its performance.
  • 39. 39 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Evaluation Sizeup analysis (on LUBM dataset) Sparqlify and SPARQLGX-SDE performs better when the size of the datasets increases as compared with our approach and SHARD.
  • 40. 40 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Evaluation Node scalability (on LUBM-1K)
  • 41. 41 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Evaluation Node scalability (on LUBM-1K) We see that as the number of nodes increases, the runtime cost of our query engine decreases linearly as compared with the SHARD, which keeps staying constant.
  • 42. 42 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Evaluation Node scalability (on LUBM-1K) SPARQLGX-SDE and Sparklify perform better when the number of nodes increases compared to our approach and SHARD
  • 43. 43 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn ❖ Processing large-scale RDF datasets: ➢ data-intensive and computing-intensive ➢ challenge to develop fast and efficient engine that can handle large scale RDF datasets ❖ A scalable semantic-based query engine (integrated into the larger SANSA framework) for efficient evaluation of SPARQL queries over distributed RDF datasets using the Spark framework ❖ As a future work we plan to expand our parser and add statistics to the query engine while evaluating queries Conclusion
  • 44. THANK YOU! SANSA Semantic Analytics Stack https://github.com/SANSA-Stack ● SANSA-RDF ● SANSA-OWL ● SANSA-Query ● SANSA-Inference ● SANSA-ML ● SANSA-Examples ● SANSA-Notebooks Gezim Sejdiu PhD Student Smart Data Analytics / University of Bonn http://sda.tech @Gezim_Sejdiu https://gezimsejdiu.github.io/ <http://sansa-stack.net/>
  • 45. 45 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn [1]. J. Lehmann, G. Sejdiu, L. B¨uhmann, P. Westphal, C. Stadler, I. Ermilov, S. Bin, N. Chakraborty, M. Saleem, A.-C. N. Ngonga, and H. Jabeen. Distributed Semantic Analytics using the SANSA Stack. In Proceedings of 16th International Semantic Web Conference - Resources Track (ISWC’2017), 2017. [2]. K. Rohloff and R. E. Schantz. High-performance, massively scalable distributed systems using the mapreduce software framework: The shard triple-store. In Programming Support Innovations for Emerging Distributed Applications, PSI EtA ’10, pages 4:1–4:5, New York, NY, USA, 2010. ACM. [3]. D. Graux, L. Jachiet, P. Genev`es, and N. Layaida. Sparqlgx: Efficient distributed evaluation of sparql with apache spark. In P. Groth, E. Simperl, A. Gray, M. Sabou, M. Kr¨otzsch, F. Lecue, F. Fl¨ock, and Y. Gil, editors, The Semantic Web – ISWC 2016, pages 80–87, Cham, 2016. [4]. C. Stadler, G. Sejdiu, D. Graux, and J. Lehmann. Sparklify: A Scalable Software Component for Efficient evaluation of SPARQL queries over distributed RDF datasets. In Proceedings of 18th International Semantic Web Conference (ISWC’2019), 2019. References
  • 46. 46 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Backup Slides
  • 47. 47 Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation Gezim Sejdiu - University of Bonn Evaluation Overall analysis of queries on LUBM-1K dataset (cluser mode)