SlideShare a Scribd company logo
1 of 21
Download to read offline
An Empirical Evaluation of RDF Graph
Partitioning Techniques
Adnan Akhter, Axel-Cyrille Ngonga Ngomo and Muhammad Saleem
EKAW, Nancy, France
November 14th, 2018
1
Motivation: Handling Big Datasets
* Image Reference https://lod-cloud.net/clouds/lod-cloud.svg
 Linked Data has grown significantly
 UniProt (Over 10 billion triples)
 Linked TCGA (Over 20 billion triples)
 Issues with bigger datasets
 Performance
 Availability
 Security
 Scalability
 Maintenance
 One of the solutions is partitioning
2
Motivation: Partitioning Techniques Used in RDF Clustered Triple Stores
System Partitioning technique System Partitioning technique
AdPart Subject hash + workload adaptive PigSparql Hash + Triple-based files
AdPart-NA Subject hash S2RDF Extended vertical partitioning
CliqueSquare Hybrid (Hash + VP) Sedge Subject hash
DREAM No partitioning; full replication Sempala VP
EAGRE METIS SHAPE Semantic hash partitioning
gStoreD Partitioning agnostic SHARD Hash
H-RDF-3X METIS TriAD Hash-based sharding
H2RDF+ H-Base partitioner (range) TriAD-SG METIS + Horizontal sharding
HadoopRDF VP + predicate files on HDFS WARP METIS on query workload
* Table Reference https://bit.ly/2JUqH5H
3
Which partitioning technique leads to better performance?
Partitioning Techniques Used
 Horizontal Partitioning
 Subject-based Partitioning
 Predicate-based Partitioning
 Hierarchical Partitioning
 Minimal Edgecut Partitioning
 Recursive-Bisection Partitioning
 Total Communication Volume Minimization Partitioning
4
Image Reference: https://bit.ly/2D1W0KA
Example RDF Triples with Corresponding Techniques
5
* Total three partitions generated using each technique
Evaluation Setup
6
7
Partitioning Environments Used
 Clustered-based
 Koral
 Physically-distributed
 FedX (index-free heuristic-based)
 SemaGrow (index-assisted cost-based)
Other Evaluation Setups (1 / 2)
 Datasets
 Semantic Web Dog Food (SWDF)
 DBpedia
 Benchmark queries (generated by FEASIBLE benchmark generator)
 Basic Graph Pattern (BGP-only)
 Fully Featured (FF)
 Number of benchmark queries
 300 queries for each, i.e., BGP and fully featured
 Total 1200 queries
8
Other Evaluation Setups (2 / 2)
 Number of partitions
 Total 10 partitions for each dataset, i.e., SWDF and DBpedia
 Time out
 Three minutes for each query
 Performance metrix
 Partitions generation time
 Overall benchmark query execution time
 Average query execution time
 Number of timeout queries for each benchmark
 The ranking score of the partitioning techniques
 Total number of sources selected for the complete benchmark execution in a purely federated environment
 Partitioning imbalance among the generated partitions
9
Evaluation Results
10
Partitioning Time
11
1
10
100
1000
10000
100000
PB SB Hi Ho TC ME RB
Partitioningtimeinsec
(logscale)
SWDF DBpedia
Partititioning
Technique
Total Time Taken
(in seconds)
Horizontal 21228
Subject-based 35034
Predicate-based 35152
Hierarchical-based 36158
TCV-Min 70260
Recursive-Bisection 70316
Min-Edgecut 70344
Higher
complexity
Execution Time (FedX)
12
Partititioning
Technique
Rank
Horizontal 1
Recursive-Bisection 2
Subject-based 3
TCV-Min 4
Hierarchical-based 5
Min-Edgecut 6
Predicate-based 7
Execution Time (SemaGrow)
13
Partititioning
Technique
Rank
Predicate-based 1
TCV-Min 2
Hierarchical-based 3
Recursive-Bisection 4
Subject-based 5
Min-Edgecut 6
Horizontal 7
Execution Time (Koral)
14
Partititioning
Technique
Rank
Min-Edgecut 1
Subject-based 2
TCV-Min 3
Predicate-based 4
Horizontal 5
Hierarchical-based 6
Recursive-Bisection 7
Total Distinct Sources Selected (Physically Distributed Environment)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
BGP-Only Fully Featured BGP-Only Fully Featured BGP-only Fully Featured
SWDF DBpedia Combined (600 queries) Overall (1200
queries)
Totalnumberofsourcesselected
Predicate-Based Subject-Based Hierarchical Horizontal TCV-Min Min-Edgecut Recursive-Bisection
15
Spearman's Rank Correlation b/w Runtimes and Number of Sources Selected
16Positive correlation between runtimes and number of sources selected
Overall Rank-Wise Ranking of Partitioning Techniques (1 / 2)
17
18
Overall Rank-Wise Ranking of Partitioning Techniques(2 / 2)
Conclusion
 We presented an evaluation of seven RDF partitioning techniques
 Our overall results of query runtime suggest that TCV-Min leads to smallest query runtimes
followed by Predicate-based, Horizontal, Recursive-Bisection, Subject-based, Hierarchical-based,
and Min-Edgecut, respectively
 Number of sources selected has a direct relation with query runtimes
 Thus, partitioning techniques which minimize the total number of sources selected generally lead
to better runtime performances
19
This work was supported by grants from the EU H2020 Framework Program
provided for the project HOBBIT (GA no. 688227).
20
Questions / Comments ???
Thanks!
Adnan Akhter
akhter@informatik.uni-leipzig.de
21

More Related Content

What's hot

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
KamleshKumar394
 

What's hot (20)

Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
search engine
search enginesearch engine
search engine
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubes
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By  Kabul KurniawanKnowledge Graph for Cybersecurity: An Introduction By  Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase Configuration
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
 
TCP connection management in SDN
TCP connection management in SDNTCP connection management in SDN
TCP connection management in SDN
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 

Similar to An Empirical Evaluation of RDF Graph Partitioning Techniques

Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 

Similar to An Empirical Evaluation of RDF Graph Partitioning Techniques (20)

Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Terark Product and Technology
Terark Product and TechnologyTerark Product and Technology
Terark Product and Technology
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 

Recently uploaded

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Recently uploaded (20)

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 

An Empirical Evaluation of RDF Graph Partitioning Techniques

  • 1. An Empirical Evaluation of RDF Graph Partitioning Techniques Adnan Akhter, Axel-Cyrille Ngonga Ngomo and Muhammad Saleem EKAW, Nancy, France November 14th, 2018 1
  • 2. Motivation: Handling Big Datasets * Image Reference https://lod-cloud.net/clouds/lod-cloud.svg  Linked Data has grown significantly  UniProt (Over 10 billion triples)  Linked TCGA (Over 20 billion triples)  Issues with bigger datasets  Performance  Availability  Security  Scalability  Maintenance  One of the solutions is partitioning 2
  • 3. Motivation: Partitioning Techniques Used in RDF Clustered Triple Stores System Partitioning technique System Partitioning technique AdPart Subject hash + workload adaptive PigSparql Hash + Triple-based files AdPart-NA Subject hash S2RDF Extended vertical partitioning CliqueSquare Hybrid (Hash + VP) Sedge Subject hash DREAM No partitioning; full replication Sempala VP EAGRE METIS SHAPE Semantic hash partitioning gStoreD Partitioning agnostic SHARD Hash H-RDF-3X METIS TriAD Hash-based sharding H2RDF+ H-Base partitioner (range) TriAD-SG METIS + Horizontal sharding HadoopRDF VP + predicate files on HDFS WARP METIS on query workload * Table Reference https://bit.ly/2JUqH5H 3 Which partitioning technique leads to better performance?
  • 4. Partitioning Techniques Used  Horizontal Partitioning  Subject-based Partitioning  Predicate-based Partitioning  Hierarchical Partitioning  Minimal Edgecut Partitioning  Recursive-Bisection Partitioning  Total Communication Volume Minimization Partitioning 4 Image Reference: https://bit.ly/2D1W0KA
  • 5. Example RDF Triples with Corresponding Techniques 5 * Total three partitions generated using each technique
  • 7. 7 Partitioning Environments Used  Clustered-based  Koral  Physically-distributed  FedX (index-free heuristic-based)  SemaGrow (index-assisted cost-based)
  • 8. Other Evaluation Setups (1 / 2)  Datasets  Semantic Web Dog Food (SWDF)  DBpedia  Benchmark queries (generated by FEASIBLE benchmark generator)  Basic Graph Pattern (BGP-only)  Fully Featured (FF)  Number of benchmark queries  300 queries for each, i.e., BGP and fully featured  Total 1200 queries 8
  • 9. Other Evaluation Setups (2 / 2)  Number of partitions  Total 10 partitions for each dataset, i.e., SWDF and DBpedia  Time out  Three minutes for each query  Performance metrix  Partitions generation time  Overall benchmark query execution time  Average query execution time  Number of timeout queries for each benchmark  The ranking score of the partitioning techniques  Total number of sources selected for the complete benchmark execution in a purely federated environment  Partitioning imbalance among the generated partitions 9
  • 11. Partitioning Time 11 1 10 100 1000 10000 100000 PB SB Hi Ho TC ME RB Partitioningtimeinsec (logscale) SWDF DBpedia Partititioning Technique Total Time Taken (in seconds) Horizontal 21228 Subject-based 35034 Predicate-based 35152 Hierarchical-based 36158 TCV-Min 70260 Recursive-Bisection 70316 Min-Edgecut 70344 Higher complexity
  • 12. Execution Time (FedX) 12 Partititioning Technique Rank Horizontal 1 Recursive-Bisection 2 Subject-based 3 TCV-Min 4 Hierarchical-based 5 Min-Edgecut 6 Predicate-based 7
  • 13. Execution Time (SemaGrow) 13 Partititioning Technique Rank Predicate-based 1 TCV-Min 2 Hierarchical-based 3 Recursive-Bisection 4 Subject-based 5 Min-Edgecut 6 Horizontal 7
  • 14. Execution Time (Koral) 14 Partititioning Technique Rank Min-Edgecut 1 Subject-based 2 TCV-Min 3 Predicate-based 4 Horizontal 5 Hierarchical-based 6 Recursive-Bisection 7
  • 15. Total Distinct Sources Selected (Physically Distributed Environment) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 BGP-Only Fully Featured BGP-Only Fully Featured BGP-only Fully Featured SWDF DBpedia Combined (600 queries) Overall (1200 queries) Totalnumberofsourcesselected Predicate-Based Subject-Based Hierarchical Horizontal TCV-Min Min-Edgecut Recursive-Bisection 15
  • 16. Spearman's Rank Correlation b/w Runtimes and Number of Sources Selected 16Positive correlation between runtimes and number of sources selected
  • 17. Overall Rank-Wise Ranking of Partitioning Techniques (1 / 2) 17
  • 18. 18 Overall Rank-Wise Ranking of Partitioning Techniques(2 / 2)
  • 19. Conclusion  We presented an evaluation of seven RDF partitioning techniques  Our overall results of query runtime suggest that TCV-Min leads to smallest query runtimes followed by Predicate-based, Horizontal, Recursive-Bisection, Subject-based, Hierarchical-based, and Min-Edgecut, respectively  Number of sources selected has a direct relation with query runtimes  Thus, partitioning techniques which minimize the total number of sources selected generally lead to better runtime performances 19
  • 20. This work was supported by grants from the EU H2020 Framework Program provided for the project HOBBIT (GA no. 688227). 20
  • 21. Questions / Comments ??? Thanks! Adnan Akhter akhter@informatik.uni-leipzig.de 21