SlideShare a Scribd company logo
Extending LargeRDFBench for Multi-Source
Data at Scale for SPARQL Endpoint Federation
Ali Hasnain, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo,
Dietrich Rebholz-Schuhmann
Agenda
• SPARQL Endpoints Query Federation
• Federated Queries Benchmark
– Datasets
– Input queries
– Important query features
– Benchmark generation
• LargeRDFBench and extended LargeRDFBench
• Evaluation and results
• Conclusion
2
SPARQL Endpoints Query federation
3
Endpoint 1 Endpoint 2 Endpoint 3 Endpoint 4
RDF RDF RDF RDF
Parsing
Source Selection
Federator Optimzer
Integrator
Rewrite query and
get Individual Triple
Patterns
Identify capable
source against
Individual Triple
Patterns
Generate
optimized sub-
query Exe. Plan
Integrate sub-
queries results
Execute sub-
queries
Federation
Engine
Federated Benchmark Components
• Datasets
• Queries
• Performance metrics
• Execution rules
4
Important Datasets Features
RDF Datasets used in the federation benchmark should vary:
– Number of triples
– Number of classes
– Number of resources
– Number of properties
– Number of objects
– Average properties per class
– Average instances per class
– Average in-degree and out-degree
– Structuredness or coherence
5
Important Queries Features
SPARQL queries used in the federation benchmark should vary:
– Number of triple patterns
– Number of join vertices
– Mean join vertex degree,
– Number of sources span
– Query result set sizes
– Mean triple pattern selectivity
– BGP-restricted triple pattern selectivity
– Join-restricted triple pattern selectivity
– Join vertex types (`star', `path', `hybrid', `sink')
– SPARQL clauses used (e.g., LIMIT, UNION, OPTIONAL, FILTER etc.)
6
SPARQL Queries as Directed Hypergraphs
7
Important performance metrics
– Result set completeness and correctness
– Number of sources selected
– Number of SPARQL ASK requests used during source selection
– Source selection time
– Number of endpoint requests
– Number of intermediate results
– Overall query execution time
8
LargeRDFBench
• SPARQL query federation
benchmark
• 13 interconnect real datasets
– 4 Life sciences
– 6 Cross domain
– 3 Large data
• 32 queries of varying complexities
– 14 simple
– 10 complex
– 8 large data
• Multiple performance metrics
• Contains both SPARQL 1.0 and
SPARQL 1.1 versions of the
same queries
9
LargeRDFBench Datasets Statistics
10
Why Extended LargeRDFBench?
• 24 queries only span over 2
datasets, i.e. 2 SPARQL
SERVICES used in the SPARQL
1.1 version of the queries
• Federation engines that optimize
the ordering of SPARQL
SERVICES cannot be fully tested
• Two SERVICES means only Two
possible ordering
• Goal: add more queries which span
over more than two datasets
11
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
S1 S3 S5 S7 S9 S11 S13 C1 C3 C5 C7 C9 L1 L3 L5 L7#RelevantSources
Extended LargeRDFBench
• SPARQL query federation benchmark
• 13 interconnect real datasets
– 4 Life sciences
– 6 Cross domain
– 3 Large data
• 32 queries of varying complexities
– 14 simple
– 10 complex
– 8 large data
– 8 Complex+High sources (CH1-CH8)
• Multiple performance metrics
• Contains both SPARQL 1.0 and
SPARQL 1.1 versions of the same
queries
12
Complex+High Data Sources Queries
• CH1:
– 4 LargeRDFBench data sources involved
– Total triple patterns = 16, result size = 384
– High mean join vertex degree
– High number of incoming and outgoing edges of a
join node
– High number of triple patterns in a single SERVICE
– Runtime for this query ranges from 1 second
(costFed) to over 1 hour (SemaGrow)
13
Complex+High Data Sources Queries
• CH2:
– 4 LargeRDFBench data sources involved
– Total triple patterns = 10, result size = 840
– Low mean triple patterns selectivity (i.e., 0.00005) and high BGP-
restricted (i.e., 0.2595) and Join-restricted (i.e., 0.58115) triple
pattern selectivities
– Triple patterns of this query are selective, i.e., they can have
smaller result sizes
– Less selective when involved in joins with other triple patterns.
– Choosing the right join order which quick converges to smaller
result size is crucial
– The FILTER combined with REGEX made this query particularly
very selective
14
Complex+High Data Sources Queries
• CH3:
– 5 LargeRDFBench data sources involved
– Triple patterns = 11 and result size = 48
– High number of join vertices (i.e., 7 from 11 triple patterns),
moderate join vertex degree (i.e., 2.71), and low mean BGP-
restricted triple pattern selectivity (i.e., 0.0196)
– Mix values for the important query features that can challenge the
federation engines
15
Complex+High Data Sources Queries
• CH4:
– 6 LargeRDFBench data sources involved
– Total number of triple patterns = 12, result size = 1248
– High join vertices (i.e., 10 join vertices from 12 triple patterns)
– Join order optimisation is challenging due to more joins vs. less number
of triple patterns
16
Complex+High Data Sources Queries
• CH5:
– 7 LargeRDFBench data sources involved
– Triple patterns = 18, result size = 5 using LIMIT clause
– Number of bounds subjects and objects in the triple patterns
– 4 triple patterns with subject bound and 2 triple patterns with object
bound
– One triple pattern that contains unbound predicate
– Challenging to accurately estimate the triple patterns as well as the
joins cardinalities
17
• CH6:
– 8 LargeRDFBench data sources involved along with bound subjects
and objects
– Triple patterns = 24, result size = 16
– Low mean triple patterns selectivity (i.e., 0.00002) and high BGP-
restricted (i.e., 0.2522) and Join-restricted (i.e., 0.3186) triple pattern
selectivities
18
Complex+High Data Sources Queries
• CH7:
– 9 LargeRDFBench data sources involved
– Triple patterns = 21, result size = 775 using LIMIT clause
– There are a total of 14 join nodes in this query with 5 Star, 3 Path, 4
Sink, and 2 Hybrid join nodes
– more join nodes and join ordering could not be a trivial task
19
Complex+High Data Sources Queries
• CH8
– 10 LargeRDFBench data sources involved and contains OPTIONAL,
FILTER, and LIMIT
– Highest number of triple patterns = 33, result size = 1
– 19 join nodes in this query with 7 Star, 4 Path, 6 Sink, and 2 Hybrid join
nodes
– None of the federation engines is able to execute this query within the
limit of 1 hour
20
Complex+High Data Sources Queries
Evaluation Results
21
Evaluation Results
22
Hard to rank the federation engines due to many RE, ZR, TO
Unstability exposed
This work was supported by grants from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).
Thanks
24
Additional Slides
25
Key Findings
• Extended LargeRDF-Bench can be extremely costly or can be
executed extremely fast when proper optimised query plan is selected.
• However, the number of timeout and runtime errors suggesting that
choosing the optimised query plans for these queries is not a trivial
task.
• The results revealed FedX, CostFed, and ANAPSID can result in
incomplete or zero results.
26
Why extended LargeRDFBench
• In LargeRDFBench 24 out of total 32 queries which only require 2 data
sources to get the complete result set of the queries.
• Federation engines which optimise the ordering of the execution of
SPARQL SERVICES in federated SPARQL 1.1 queries (e.g. QuWeDa)
cannot be fully tested with existing LargeRDFBench queries.
• This is because if there are only two SERVICES used in the query,
there are only two possible orderings of the execution of these
SERVICES.
• The goal of this extension was to fill this gap by adding more federated
queries which require more data sources to get the complete result set
of the query.
28
Number of Triple patterns
29
0
5
10
15
20
25
30
35 S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
L1
L2
L3
L4
L5
L6
L7
L8
CH1
CH2
CH3
CH4
CH5
CH6
CH7
CH8
#TriplePatterns
FedBench
FedBench-Mean
LargeRDFBench
LargeRDFBench-Mean
STD. ± 1.33 (FedBench) vs. ± 6.15 (LargeRDBFench)
Number of Join Vertices
30
STD. ± 1.33 (FedBench) vs. ± 3.63 (LargeRDBFench)
0
2
4
6
8
10
12
14
16
18
20
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
L1
L2
L3
L4
L5
L6
L7
L8
CH1
CH2
CH3
CH4
CH5
CH6
CH7
CH8
#JoinVertices
FedBench
FedBench-Mean
LargeRDFBench
LargeRDFBench-Mean
Mean Join Vertices degree
31
STD. ± 1.33 (FedBench) vs. ± 6.15 (LargeRDBFench)
0
1
2
3
4
5
6
7 S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
L1
L2
L3
L4
L5
L6
L7
L8
CH1
CH2
CH3
CH4
CH5
CH6
CH7
CH8
MeanJoinVerticesDegree
FedBench FedBench-Mean
LargeRDFBench LargeRDFBench-Mean
Number of Results
32
STD. ± 2397 (FedBench) vs. ± 104236 (LargeRDBFench)
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
S1 S3 S5 S7 S9 S11 S13 C1 C3 C5 C7 C9 L1 L3 L5 L7 CH1 CH3 CH5 CH7
#Results(logscale)
FedBench FedBench-Mean
LargeRDFBench LargeRDFBench-Mean
Mean triple patterns selectivity
33
STD. ± 0.11 (FedBench) vs. ± 0.13 (LargeRDBFench)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
L1
L2
L3
L4
L5
L6
L7
L8
CH1
CH2
CH3
CH4
CH5
CH6
CH7
CH8
MeanTriplePatternSelectivity
FedBench LargeRDFBench
Mean BGP-Rest. TP selectivity
34
STD. ± 0.31 (FedBench) vs. ± 0.22 (LargeRDBFench)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
L1
L2
L3
L4
L5
L6
L7
L8
CH1
CH2
CH3
CH4
CH5
CH6
CH7
CH8
MeanBGP-restrictedTriplePatternSelectivity
FedBench LargeRDFBench
Mean Join-Rest. TP selectivity
35
STD. ± 0.13 (FedBench) vs. ± 0.15 (LargeRDBFench)
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
L1
L2
L3
L4
L5
L6
L7
L8
CH1
CH2
CH3
CH4
CH5
CH6
CH7
CH8
MeanJoin-restrictedTriplePatternSelectivity(logscale)
FedBench LargeRDFBench

More Related Content

Similar to Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint Federation

Query processing-and-optimization
Query processing-and-optimizationQuery processing-and-optimization
Query processing-and-optimization
WBUTTUTORIALS
 
Strategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked DataStrategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked Data
Rakebul Hasan
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
Rakebul Hasan
 
Preso-v0.1
Preso-v0.1Preso-v0.1
Preso-v0.1
Muhammad Arif
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
 
Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?
Denodo
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF data
Roi Blanco
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.
Tatiana Tarasova
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
Michael Stack
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
Sergey Petrunya
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
Michael Gerke
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
Shaghayegh (Sherry) Sahebi
 
Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series - Late...
Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series -  Late...Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series -  Late...
Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series - Late...
Chromatography & Mass Spectrometry Solutions
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Databricks
 
Lightning Talk: Get Even More Value from MongoDB Applications
Lightning Talk: Get Even More Value from MongoDB ApplicationsLightning Talk: Get Even More Value from MongoDB Applications
Lightning Talk: Get Even More Value from MongoDB Applications
MongoDB
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover
 
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf dataIEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEEFINALYEARSTUDENTPROJECTS
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
Richie Rump
 

Similar to Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint Federation (20)

Query processing-and-optimization
Query processing-and-optimizationQuery processing-and-optimization
Query processing-and-optimization
 
Strategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked DataStrategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked Data
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
Preso-v0.1
Preso-v0.1Preso-v0.1
Preso-v0.1
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
 
Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF data
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
 
Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series - Late...
Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series -  Late...Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series -  Late...
Chromatography: Part 4 of 4 Pesticide Residue Analysis Webinar Series - Late...
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Lightning Talk: Get Even More Value from MongoDB Applications
Lightning Talk: Get Even More Value from MongoDB ApplicationsLightning Talk: Get Even More Value from MongoDB Applications
Lightning Talk: Get Even More Value from MongoDB Applications
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf dataIEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
 

More from Holistic Benchmarking of Big Linked Data

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Holistic Benchmarking of Big Linked Data
 
Benchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT ProjectBenchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT Project
Holistic Benchmarking of Big Linked Data
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Holistic Benchmarking of Big Linked Data
 
The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018
Holistic Benchmarking of Big Linked Data
 
Benchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systemsBenchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systems
Holistic Benchmarking of Big Linked Data
 
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation FrameworkSQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
Holistic Benchmarking of Big Linked Data
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federationLargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
Holistic Benchmarking of Big Linked Data
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017
Holistic Benchmarking of Big Linked Data
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
Holistic Benchmarking of Big Linked Data
 
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Holistic Benchmarking of Big Linked Data
 
An Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link DiscoveryAn Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link Discovery
Holistic Benchmarking of Big Linked Data
 
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
Holistic Benchmarking of Big Linked Data
 
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery ToolsSPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
Holistic Benchmarking of Big Linked Data
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignIntroducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Holistic Benchmarking of Big Linked Data
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018
Holistic Benchmarking of Big Linked Data
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
Holistic Benchmarking of Big Linked Data
 
Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)
Holistic Benchmarking of Big Linked Data
 
Leopard ISWC Semantic Web Challenge 2017
Leopard ISWC Semantic Web Challenge 2017Leopard ISWC Semantic Web Challenge 2017
Leopard ISWC Semantic Web Challenge 2017
Holistic Benchmarking of Big Linked Data
 

More from Holistic Benchmarking of Big Linked Data (20)

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
Benchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT ProjectBenchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT Project
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
 
The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018
 
Benchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systemsBenchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systems
 
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation FrameworkSQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federationLargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
 
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)
 
An Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link DiscoveryAn Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link Discovery
 
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
 
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery ToolsSPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignIntroducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
 
MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
 
Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
 
Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)
 
Leopard ISWC Semantic Web Challenge 2017
Leopard ISWC Semantic Web Challenge 2017Leopard ISWC Semantic Web Challenge 2017
Leopard ISWC Semantic Web Challenge 2017
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 

Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint Federation

  • 1. Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint Federation Ali Hasnain, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo, Dietrich Rebholz-Schuhmann
  • 2. Agenda • SPARQL Endpoints Query Federation • Federated Queries Benchmark – Datasets – Input queries – Important query features – Benchmark generation • LargeRDFBench and extended LargeRDFBench • Evaluation and results • Conclusion 2
  • 3. SPARQL Endpoints Query federation 3 Endpoint 1 Endpoint 2 Endpoint 3 Endpoint 4 RDF RDF RDF RDF Parsing Source Selection Federator Optimzer Integrator Rewrite query and get Individual Triple Patterns Identify capable source against Individual Triple Patterns Generate optimized sub- query Exe. Plan Integrate sub- queries results Execute sub- queries Federation Engine
  • 4. Federated Benchmark Components • Datasets • Queries • Performance metrics • Execution rules 4
  • 5. Important Datasets Features RDF Datasets used in the federation benchmark should vary: – Number of triples – Number of classes – Number of resources – Number of properties – Number of objects – Average properties per class – Average instances per class – Average in-degree and out-degree – Structuredness or coherence 5
  • 6. Important Queries Features SPARQL queries used in the federation benchmark should vary: – Number of triple patterns – Number of join vertices – Mean join vertex degree, – Number of sources span – Query result set sizes – Mean triple pattern selectivity – BGP-restricted triple pattern selectivity – Join-restricted triple pattern selectivity – Join vertex types (`star', `path', `hybrid', `sink') – SPARQL clauses used (e.g., LIMIT, UNION, OPTIONAL, FILTER etc.) 6
  • 7. SPARQL Queries as Directed Hypergraphs 7
  • 8. Important performance metrics – Result set completeness and correctness – Number of sources selected – Number of SPARQL ASK requests used during source selection – Source selection time – Number of endpoint requests – Number of intermediate results – Overall query execution time 8
  • 9. LargeRDFBench • SPARQL query federation benchmark • 13 interconnect real datasets – 4 Life sciences – 6 Cross domain – 3 Large data • 32 queries of varying complexities – 14 simple – 10 complex – 8 large data • Multiple performance metrics • Contains both SPARQL 1.0 and SPARQL 1.1 versions of the same queries 9
  • 11. Why Extended LargeRDFBench? • 24 queries only span over 2 datasets, i.e. 2 SPARQL SERVICES used in the SPARQL 1.1 version of the queries • Federation engines that optimize the ordering of SPARQL SERVICES cannot be fully tested • Two SERVICES means only Two possible ordering • Goal: add more queries which span over more than two datasets 11 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 S1 S3 S5 S7 S9 S11 S13 C1 C3 C5 C7 C9 L1 L3 L5 L7#RelevantSources
  • 12. Extended LargeRDFBench • SPARQL query federation benchmark • 13 interconnect real datasets – 4 Life sciences – 6 Cross domain – 3 Large data • 32 queries of varying complexities – 14 simple – 10 complex – 8 large data – 8 Complex+High sources (CH1-CH8) • Multiple performance metrics • Contains both SPARQL 1.0 and SPARQL 1.1 versions of the same queries 12
  • 13. Complex+High Data Sources Queries • CH1: – 4 LargeRDFBench data sources involved – Total triple patterns = 16, result size = 384 – High mean join vertex degree – High number of incoming and outgoing edges of a join node – High number of triple patterns in a single SERVICE – Runtime for this query ranges from 1 second (costFed) to over 1 hour (SemaGrow) 13
  • 14. Complex+High Data Sources Queries • CH2: – 4 LargeRDFBench data sources involved – Total triple patterns = 10, result size = 840 – Low mean triple patterns selectivity (i.e., 0.00005) and high BGP- restricted (i.e., 0.2595) and Join-restricted (i.e., 0.58115) triple pattern selectivities – Triple patterns of this query are selective, i.e., they can have smaller result sizes – Less selective when involved in joins with other triple patterns. – Choosing the right join order which quick converges to smaller result size is crucial – The FILTER combined with REGEX made this query particularly very selective 14
  • 15. Complex+High Data Sources Queries • CH3: – 5 LargeRDFBench data sources involved – Triple patterns = 11 and result size = 48 – High number of join vertices (i.e., 7 from 11 triple patterns), moderate join vertex degree (i.e., 2.71), and low mean BGP- restricted triple pattern selectivity (i.e., 0.0196) – Mix values for the important query features that can challenge the federation engines 15
  • 16. Complex+High Data Sources Queries • CH4: – 6 LargeRDFBench data sources involved – Total number of triple patterns = 12, result size = 1248 – High join vertices (i.e., 10 join vertices from 12 triple patterns) – Join order optimisation is challenging due to more joins vs. less number of triple patterns 16
  • 17. Complex+High Data Sources Queries • CH5: – 7 LargeRDFBench data sources involved – Triple patterns = 18, result size = 5 using LIMIT clause – Number of bounds subjects and objects in the triple patterns – 4 triple patterns with subject bound and 2 triple patterns with object bound – One triple pattern that contains unbound predicate – Challenging to accurately estimate the triple patterns as well as the joins cardinalities 17
  • 18. • CH6: – 8 LargeRDFBench data sources involved along with bound subjects and objects – Triple patterns = 24, result size = 16 – Low mean triple patterns selectivity (i.e., 0.00002) and high BGP- restricted (i.e., 0.2522) and Join-restricted (i.e., 0.3186) triple pattern selectivities 18 Complex+High Data Sources Queries
  • 19. • CH7: – 9 LargeRDFBench data sources involved – Triple patterns = 21, result size = 775 using LIMIT clause – There are a total of 14 join nodes in this query with 5 Star, 3 Path, 4 Sink, and 2 Hybrid join nodes – more join nodes and join ordering could not be a trivial task 19 Complex+High Data Sources Queries
  • 20. • CH8 – 10 LargeRDFBench data sources involved and contains OPTIONAL, FILTER, and LIMIT – Highest number of triple patterns = 33, result size = 1 – 19 join nodes in this query with 7 Star, 4 Path, 6 Sink, and 2 Hybrid join nodes – None of the federation engines is able to execute this query within the limit of 1 hour 20 Complex+High Data Sources Queries
  • 22. Evaluation Results 22 Hard to rank the federation engines due to many RE, ZR, TO Unstability exposed
  • 23. This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
  • 26. Key Findings • Extended LargeRDF-Bench can be extremely costly or can be executed extremely fast when proper optimised query plan is selected. • However, the number of timeout and runtime errors suggesting that choosing the optimised query plans for these queries is not a trivial task. • The results revealed FedX, CostFed, and ANAPSID can result in incomplete or zero results. 26
  • 27. Why extended LargeRDFBench • In LargeRDFBench 24 out of total 32 queries which only require 2 data sources to get the complete result set of the queries. • Federation engines which optimise the ordering of the execution of SPARQL SERVICES in federated SPARQL 1.1 queries (e.g. QuWeDa) cannot be fully tested with existing LargeRDFBench queries. • This is because if there are only two SERVICES used in the query, there are only two possible orderings of the execution of these SERVICES. • The goal of this extension was to fill this gap by adding more federated queries which require more data sources to get the complete result set of the query. 28
  • 28. Number of Triple patterns 29 0 5 10 15 20 25 30 35 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 L1 L2 L3 L4 L5 L6 L7 L8 CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8 #TriplePatterns FedBench FedBench-Mean LargeRDFBench LargeRDFBench-Mean STD. ± 1.33 (FedBench) vs. ± 6.15 (LargeRDBFench)
  • 29. Number of Join Vertices 30 STD. ± 1.33 (FedBench) vs. ± 3.63 (LargeRDBFench) 0 2 4 6 8 10 12 14 16 18 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 L1 L2 L3 L4 L5 L6 L7 L8 CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8 #JoinVertices FedBench FedBench-Mean LargeRDFBench LargeRDFBench-Mean
  • 30. Mean Join Vertices degree 31 STD. ± 1.33 (FedBench) vs. ± 6.15 (LargeRDBFench) 0 1 2 3 4 5 6 7 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 L1 L2 L3 L4 L5 L6 L7 L8 CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8 MeanJoinVerticesDegree FedBench FedBench-Mean LargeRDFBench LargeRDFBench-Mean
  • 31. Number of Results 32 STD. ± 2397 (FedBench) vs. ± 104236 (LargeRDBFench) 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 S1 S3 S5 S7 S9 S11 S13 C1 C3 C5 C7 C9 L1 L3 L5 L7 CH1 CH3 CH5 CH7 #Results(logscale) FedBench FedBench-Mean LargeRDFBench LargeRDFBench-Mean
  • 32. Mean triple patterns selectivity 33 STD. ± 0.11 (FedBench) vs. ± 0.13 (LargeRDBFench) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 L1 L2 L3 L4 L5 L6 L7 L8 CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8 MeanTriplePatternSelectivity FedBench LargeRDFBench
  • 33. Mean BGP-Rest. TP selectivity 34 STD. ± 0.31 (FedBench) vs. ± 0.22 (LargeRDBFench) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 L1 L2 L3 L4 L5 L6 L7 L8 CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8 MeanBGP-restrictedTriplePatternSelectivity FedBench LargeRDFBench
  • 34. Mean Join-Rest. TP selectivity 35 STD. ± 0.13 (FedBench) vs. ± 0.15 (LargeRDBFench) 0.000001 0.00001 0.0001 0.001 0.01 0.1 1 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 L1 L2 L3 L4 L5 L6 L7 L8 CH1 CH2 CH3 CH4 CH5 CH6 CH7 CH8 MeanJoin-restrictedTriplePatternSelectivity(logscale) FedBench LargeRDFBench