SlideShare a Scribd company logo
An Empirical Evaluation of RDF Graph
Partitioning Techniques
Adnan Akhter, Axel-Cyrille Ngonga Ngomo and Muhammad Saleem
EKAW, Nancy, France
November 14th, 2018
1
Motivation: Handling Big Datasets
* Image Reference https://lod-cloud.net/clouds/lod-cloud.svg
 Linked Data has grown significantly
 UniProt (Over 10 billion triples)
 Linked TCGA (Over 20 billion triples)
 Issues with bigger datasets
 Performance
 Availability
 Security
 Scalability
 Maintenance
 One of the solutions is partitioning
2
Motivation: Partitioning Techniques Used in RDF Clustered Triple Stores
System Partitioning technique System Partitioning technique
AdPart Subject hash + workload adaptive PigSparql Hash + Triple-based files
AdPart-NA Subject hash S2RDF Extended vertical partitioning
CliqueSquare Hybrid (Hash + VP) Sedge Subject hash
DREAM No partitioning; full replication Sempala VP
EAGRE METIS SHAPE Semantic hash partitioning
gStoreD Partitioning agnostic SHARD Hash
H-RDF-3X METIS TriAD Hash-based sharding
H2RDF+ H-Base partitioner (range) TriAD-SG METIS + Horizontal sharding
HadoopRDF VP + predicate files on HDFS WARP METIS on query workload
* Table Reference https://bit.ly/2JUqH5H
3
Which partitioning technique leads to better performance?
Partitioning Techniques Used
 Horizontal Partitioning
 Subject-based Partitioning
 Predicate-based Partitioning
 Hierarchical Partitioning
 Minimal Edgecut Partitioning
 Recursive-Bisection Partitioning
 Total Communication Volume Minimization Partitioning
4
Image Reference: https://bit.ly/2D1W0KA
Example RDF Triples with Corresponding Techniques
5
* Total three partitions generated using each technique
Evaluation Setup
6
7
Partitioning Environments Used
 Clustered-based
 Koral
 Physically-distributed
 FedX (index-free heuristic-based)
 SemaGrow (index-assisted cost-based)
Other Evaluation Setups (1 / 2)
 Datasets
 Semantic Web Dog Food (SWDF)
 DBpedia
 Benchmark queries (generated by FEASIBLE benchmark generator)
 Basic Graph Pattern (BGP-only)
 Fully Featured (FF)
 Number of benchmark queries
 300 queries for each, i.e., BGP and fully featured
 Total 1200 queries
8
Other Evaluation Setups (2 / 2)
 Number of partitions
 Total 10 partitions for each dataset, i.e., SWDF and DBpedia
 Time out
 Three minutes for each query
 Performance metrix
 Partitions generation time
 Overall benchmark query execution time
 Average query execution time
 Number of timeout queries for each benchmark
 The ranking score of the partitioning techniques
 Total number of sources selected for the complete benchmark execution in a purely federated environment
 Partitioning imbalance among the generated partitions
9
Evaluation Results
10
Partitioning Time
11
1
10
100
1000
10000
100000
PB SB Hi Ho TC ME RB
Partitioningtimeinsec
(logscale)
SWDF DBpedia
Partititioning
Technique
Total Time Taken
(in seconds)
Horizontal 21228
Subject-based 35034
Predicate-based 35152
Hierarchical-based 36158
TCV-Min 70260
Recursive-Bisection 70316
Min-Edgecut 70344
Higher
complexity
Execution Time (FedX)
12
Partititioning
Technique
Rank
Horizontal 1
Recursive-Bisection 2
Subject-based 3
TCV-Min 4
Hierarchical-based 5
Min-Edgecut 6
Predicate-based 7
Execution Time (SemaGrow)
13
Partititioning
Technique
Rank
Predicate-based 1
TCV-Min 2
Hierarchical-based 3
Recursive-Bisection 4
Subject-based 5
Min-Edgecut 6
Horizontal 7
Execution Time (Koral)
14
Partititioning
Technique
Rank
Min-Edgecut 1
Subject-based 2
TCV-Min 3
Predicate-based 4
Horizontal 5
Hierarchical-based 6
Recursive-Bisection 7
Total Distinct Sources Selected (Physically Distributed Environment)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
BGP-Only Fully Featured BGP-Only Fully Featured BGP-only Fully Featured
SWDF DBpedia Combined (600 queries) Overall (1200
queries)
Totalnumberofsourcesselected
Predicate-Based Subject-Based Hierarchical Horizontal TCV-Min Min-Edgecut Recursive-Bisection
15
Spearman's Rank Correlation b/w Runtimes and Number of Sources Selected
16Positive correlation between runtimes and number of sources selected
Overall Rank-Wise Ranking of Partitioning Techniques (1 / 2)
17
18
Overall Rank-Wise Ranking of Partitioning Techniques(2 / 2)
Conclusion
 We presented an evaluation of seven RDF partitioning techniques
 Our overall results of query runtime suggest that TCV-Min leads to smallest query runtimes
followed by Predicate-based, Horizontal, Recursive-Bisection, Subject-based, Hierarchical-based,
and Min-Edgecut, respectively
 Number of sources selected has a direct relation with query runtimes
 Thus, partitioning techniques which minimize the total number of sources selected generally lead
to better runtime performances
19
This work was supported by grants from the EU H2020 Framework Program
provided for the project HOBBIT (GA no. 688227).
20
Questions / Comments ???
Thanks!
Adnan Akhter
akhter@informatik.uni-leipzig.de
21

More Related Content

What's hot

Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
The HDF-EOS Tools and Information Center
 
search engine
search enginesearch engine
search engine
Musaib Khan
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
Robert Grossman
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
Ian Foster
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
The HDF-EOS Tools and Information Center
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
Robert Grossman
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
The HDF-EOS Tools and Information Center
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubes
EUDAT
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
Ansgar Scherp
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By  Kabul KurniawanKnowledge Graph for Cybersecurity: An Introduction By  Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
Kabul Kurniawan
 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase Configuration
Dan Han
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
The HDF-EOS Tools and Information Center
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
Kabul Kurniawan
 
TCP connection management in SDN
TCP connection management in SDNTCP connection management in SDN
TCP connection management in SDN
Chao Chen
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
Xiaoqian Liu
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Ian Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
Ian Foster
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
KamleshKumar394
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
Robert Grossman
 

What's hot (20)

Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
search engine
search enginesearch engine
search engine
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubes
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By  Kabul KurniawanKnowledge Graph for Cybersecurity: An Introduction By  Kabul Kurniawan
Knowledge Graph for Cybersecurity: An Introduction By Kabul Kurniawan
 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase Configuration
 
MATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and CapabilitiesMATLAB and Scientific Data: New Features and Capabilities
MATLAB and Scientific Data: New Features and Capabilities
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
 
TCP connection management in SDN
TCP connection management in SDNTCP connection management in SDN
TCP connection management in SDN
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 

Similar to An Empirical Evaluation of RDF Graph Partitioning Techniques

Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Denodo
 
Terark Product and Technology
Terark Product and TechnologyTerark Product and Technology
Terark Product and Technology
Xinyuan Fu
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
Jungsu Heo
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
Denodo
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Timescale
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET Journal
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
Denodo
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
Gerald Muecke
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
shrey mehrotra
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
t_ivanov
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
appaji intelhunt
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
Hakka Labs
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Safe Software
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Ceph Community
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
Dylan Tong
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.
Tatiana Tarasova
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 

Similar to An Empirical Evaluation of RDF Graph Partitioning Techniques (20)

Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Terark Product and Technology
Terark Product and TechnologyTerark Product and Technology
Terark Product and Technology
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 

Recently uploaded

ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 

Recently uploaded (20)

ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 

An Empirical Evaluation of RDF Graph Partitioning Techniques

  • 1. An Empirical Evaluation of RDF Graph Partitioning Techniques Adnan Akhter, Axel-Cyrille Ngonga Ngomo and Muhammad Saleem EKAW, Nancy, France November 14th, 2018 1
  • 2. Motivation: Handling Big Datasets * Image Reference https://lod-cloud.net/clouds/lod-cloud.svg  Linked Data has grown significantly  UniProt (Over 10 billion triples)  Linked TCGA (Over 20 billion triples)  Issues with bigger datasets  Performance  Availability  Security  Scalability  Maintenance  One of the solutions is partitioning 2
  • 3. Motivation: Partitioning Techniques Used in RDF Clustered Triple Stores System Partitioning technique System Partitioning technique AdPart Subject hash + workload adaptive PigSparql Hash + Triple-based files AdPart-NA Subject hash S2RDF Extended vertical partitioning CliqueSquare Hybrid (Hash + VP) Sedge Subject hash DREAM No partitioning; full replication Sempala VP EAGRE METIS SHAPE Semantic hash partitioning gStoreD Partitioning agnostic SHARD Hash H-RDF-3X METIS TriAD Hash-based sharding H2RDF+ H-Base partitioner (range) TriAD-SG METIS + Horizontal sharding HadoopRDF VP + predicate files on HDFS WARP METIS on query workload * Table Reference https://bit.ly/2JUqH5H 3 Which partitioning technique leads to better performance?
  • 4. Partitioning Techniques Used  Horizontal Partitioning  Subject-based Partitioning  Predicate-based Partitioning  Hierarchical Partitioning  Minimal Edgecut Partitioning  Recursive-Bisection Partitioning  Total Communication Volume Minimization Partitioning 4 Image Reference: https://bit.ly/2D1W0KA
  • 5. Example RDF Triples with Corresponding Techniques 5 * Total three partitions generated using each technique
  • 7. 7 Partitioning Environments Used  Clustered-based  Koral  Physically-distributed  FedX (index-free heuristic-based)  SemaGrow (index-assisted cost-based)
  • 8. Other Evaluation Setups (1 / 2)  Datasets  Semantic Web Dog Food (SWDF)  DBpedia  Benchmark queries (generated by FEASIBLE benchmark generator)  Basic Graph Pattern (BGP-only)  Fully Featured (FF)  Number of benchmark queries  300 queries for each, i.e., BGP and fully featured  Total 1200 queries 8
  • 9. Other Evaluation Setups (2 / 2)  Number of partitions  Total 10 partitions for each dataset, i.e., SWDF and DBpedia  Time out  Three minutes for each query  Performance metrix  Partitions generation time  Overall benchmark query execution time  Average query execution time  Number of timeout queries for each benchmark  The ranking score of the partitioning techniques  Total number of sources selected for the complete benchmark execution in a purely federated environment  Partitioning imbalance among the generated partitions 9
  • 11. Partitioning Time 11 1 10 100 1000 10000 100000 PB SB Hi Ho TC ME RB Partitioningtimeinsec (logscale) SWDF DBpedia Partititioning Technique Total Time Taken (in seconds) Horizontal 21228 Subject-based 35034 Predicate-based 35152 Hierarchical-based 36158 TCV-Min 70260 Recursive-Bisection 70316 Min-Edgecut 70344 Higher complexity
  • 12. Execution Time (FedX) 12 Partititioning Technique Rank Horizontal 1 Recursive-Bisection 2 Subject-based 3 TCV-Min 4 Hierarchical-based 5 Min-Edgecut 6 Predicate-based 7
  • 13. Execution Time (SemaGrow) 13 Partititioning Technique Rank Predicate-based 1 TCV-Min 2 Hierarchical-based 3 Recursive-Bisection 4 Subject-based 5 Min-Edgecut 6 Horizontal 7
  • 14. Execution Time (Koral) 14 Partititioning Technique Rank Min-Edgecut 1 Subject-based 2 TCV-Min 3 Predicate-based 4 Horizontal 5 Hierarchical-based 6 Recursive-Bisection 7
  • 15. Total Distinct Sources Selected (Physically Distributed Environment) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 BGP-Only Fully Featured BGP-Only Fully Featured BGP-only Fully Featured SWDF DBpedia Combined (600 queries) Overall (1200 queries) Totalnumberofsourcesselected Predicate-Based Subject-Based Hierarchical Horizontal TCV-Min Min-Edgecut Recursive-Bisection 15
  • 16. Spearman's Rank Correlation b/w Runtimes and Number of Sources Selected 16Positive correlation between runtimes and number of sources selected
  • 17. Overall Rank-Wise Ranking of Partitioning Techniques (1 / 2) 17
  • 18. 18 Overall Rank-Wise Ranking of Partitioning Techniques(2 / 2)
  • 19. Conclusion  We presented an evaluation of seven RDF partitioning techniques  Our overall results of query runtime suggest that TCV-Min leads to smallest query runtimes followed by Predicate-based, Horizontal, Recursive-Bisection, Subject-based, Hierarchical-based, and Min-Edgecut, respectively  Number of sources selected has a direct relation with query runtimes  Thus, partitioning techniques which minimize the total number of sources selected generally lead to better runtime performances 19
  • 20. This work was supported by grants from the EU H2020 Framework Program provided for the project HOBBIT (GA no. 688227). 20
  • 21. Questions / Comments ??? Thanks! Adnan Akhter akhter@informatik.uni-leipzig.de 21