SlideShare a Scribd company logo
1 of 47
Download to read offline
STREAMING GRAPH
PARTITIONING
Zainab Abbas1, Vasiliki Kalavri2,
Paris Carbone1, and Vladimir Vlassov1
1. KTH Royal Institute of Technology, Stockholm, Sweden
2. ETH Zurich, Switzerland
An Experimental Study
Distributed Graph Analysis
Input graph partitioning computation
Iterative Applications
Loading Streaming
Partitioning
3
Streaming Applications
Loading
4
Streaming
Partitioning
Streaming Partitioning
5
Ingest input as stream
Partitioning on the fly
Partial graph
knowledge
Single Pass
Streaming
Partitioning
Iterative Connected Components
6
1
34
5
2 6
87
Iterative Connected Components
7
1
34
5
2 6
87
2
4 3
1
3
1
5
1
2
54 3
7 8
8
6 6
7
Iterative Connected Components
8
1
34
5
2 6
87
2 4 3
1 3
15 1 2 5
4 3
7 8
86
6 7
Streaming Connected Components (union-find)
9
1
34
5
2
6
87
Component ID Vertices
3
2
5
4
6
7
Component ID Vertices
2 2,3
Streaming Connected Components (union-find)
10
1
34
5
2
6
87
3
2
5
4
6
7
Component ID Vertices
4 4,5
Component ID Vertices
2 2,3
Streaming Connected Components (union-find)
11
1
34
5
2
6
87
3
2
5
4
6
7
Component ID Vertices
4 4,5
Component ID Vertices
2 2,3
6 6,7
Streaming Connected Components (union-find)
12
1
34
5
2
6
87
7
8
1
4
3
2
5
4
6
7
Component ID Vertices
4 4,5
7 7,8
Component ID Vertices
2 2,3
6 6,7
Streaming Connected Components (union-find)
13
1
34
5
2
6
87
7
8
1
4
3
2
5
4
6
7
Component ID Vertices
4 4,5
7 7,8
Component ID Vertices
2 2,3
6 6,7
1 1,4
Streaming Connected Components (union-find)
14
1
34
5
2
6
87
6
8
1
2
3
5
7
8
1
4
1
3
3
2
5
4
6
7
Component ID Vertices
4 4,5
7 7,8
1 1,3
Component ID Vertices
2 2,3
6 6,7
1 1,4
Streaming Connected Components (union-find)
15
1
34
5
2
6
87
6
8
1
2
3
5
7
8
1
4
1
3
3
2
5
4
6
7
Component ID Vertices
4 4,5
7 7,8
1 1,3
Component ID Vertices
2 2,3
6 6,7,8
1 1,4
Streaming Connected Components (union-find)
16
1
34
5
2
6
87
6
8
1
2
3
5
7
8
1
4
1
3
3
2
5
4
6
7
Component ID Vertices
4 4,5
7 7,8
1 1,2,3
Component ID Vertices
2 2,3
6 6,7,8
1 1,4
Streaming Connected Components (union-find)
17
1
34
5
2
6
87
6
8
1
2
3
5
7
8
1
4
1
3
3
2
5
4
6
7
Component ID Vertices
4 4,5
7 7,8
1 1,2,3
Component ID Vertices
2 2,3,5
6 6,7,8
1 1,4
Streaming Connected Components (union-find)
18
1
34
5
2
6
87
6
8
1
2
3
5
7
8
1
4
1
3
3
2
5
4
6
7
Component ID Vertices
4 4,5
7 7,8
1 1,2,3
Component ID Vertices
2 2,3,5
6 6,7,8
1 1,4
Component ID Vertices
1 5,4,1,2,3
6 6,7,8
Contributions
• A unified comparison framework on Apache Flink
• Classification and experimental comparison
• Performance of partitioning algorithm
• Effect of algorithms on applications
19
PARTITIONING ALGORITHMS
20
Partitioning Algorithms
21
S
2
W
2
O
2
C
2
N
2
G
2
G
2
SIGOPS,
2007
SIGKDD,
2012
WSDM,
2012
OSDI,
2012
CIKM,
2015
NIPS,
2012
GRADES,
2013
GRADES,
2013
Partitioning Algorithms
22
SIGOPS,
2007
SIGKDD,
2012
WSDM,
2012
O
2
C
2
N
2
GRADES,
2013
G
2
SIGOPS,
2007
SIGKDD,
2012
WSDM,
2012
OSDI,
2012
CIKM,
2015
NIPS,
2012
GRADES,
2013
GRADES,
2013
Partitioning Algorithms
23
SIGOPS,
2007
SIGKDD,
2012
WSDM,
2012
OSDI,
2012
CIKM,
2015
NIPS,
2012
GRADES,
2013
GRADES,
2013
SIGOPS,
2007
SIGKDD,
2012
WSDM,
2012
OSDI,
2012
CIKM,
2015
NIPS,
2012
GRADES,
2013
GRADES,
2013
Partitioning Algorithms
24
O
2
C
2
N
2
G
2
SIGOPS,
2007
SIGKDD,
2012
WSDM,
2012
OSDI,
2012
CIKM,
2015
NIPS,
2012
GRADES,
2013
GRADES,
2013
Partitioning Algorithms
25
C
2
G
2
SIGOPS,
2007
SIGKDD,
2012
WSDM,
2012
OSDI,
2012
CIKM,
2015
NIPS,
2012
GRADES,
2013
GRADES,
2013
Partitioning Algorithms
26
SIGOPS,
2007
SIGKDD,
2012
WSDM,
2012
OSDI,
2012
CIKM,
2015
NIPS,
2012
GRADES,
2013
GRADES,
2013
EXPERIMENTS
27
Goals
Our work aims at identifying:
• 1) the benefits of using more complex partitioning methods
• 2) the partitioning overhead for an application
• 3) the effect of partitioning quality on the application performance.
28
Datasets and Setup
• On-premises cluster
• A virtualized environment at Amazon consisting of 17x r3.2xlarge
EC2 instances 29
RESULTS
30
Preview
• Partitioning Performance
• Throughput
• Partitioning Quality
• Cuts and load balance
• Application Performance
31
Performance (throughput)
• Vertex partitioning
32
Performance (throughput)
Vertex partitioning Edge partitioning
33
Performance (throughput)
Vertex partitioning Edge partitioning
34
Hash outperforms others in
terms of throughput
Partitioning Quality
• Edge-cuts
35
Partitioning Quality
• Edge-cuts
36
Partitioning Quality
• Edge-cuts
37
• Vertex-cuts (Replication factor)
Partitioning Quality
• Edge-cuts
38
• Vertex-cuts (Replication factor)
Partitioning Quality
• Vertex partitioning load balance
39
Dataset Hash Fennel LDG
Twitter 1.0 1.1 1.13
RMAT 1.0 1.1 1.5
Flickr 1.0 1.0 1.0
Partitioning Quality
• Vertex partitioning load balance
40
• Edge partitioning load balance
Dataset Hash Fennel LDG
Twitter 1.0 1.1 1.13
RMAT 1.0 1.1 1.5
Flickr 1.0 1.0 1.0
Dataset Hash DBH Greedy Grid HDRF
Friendster 1.0 1.001 1.0 1.0 1.0
Twitter 1.0 1.001 1.0 1.0 1.0
Flickr 1.001 1.002 3.98 1.0 3.98
Hash provides high cuts
but good balance
Iterative Application Performance
Input: Twitter graph
41
Iterative Application Performance
Input: Twitter graph
42
Streaming Application Performance
Bipartiteness Check Connected Components
43
Streaming Application Performance
Bipartiteness Check Connected Components
44
Conclusion
The trade-off between balancing and reducing
cuts remains
Streaming and iterative applications behave
differently
45
High
Low
Open Questions
• Can we design partitioning algorithms with minimal state requirements
for modern stream processing engines?
• Can we adopt existing partitioning algorithms for continuous stream
processing?
• Can we design algorithms with fewer constraints or assumptions about
the input graph?
46
THANKS EVERYONE J
Zainab Abbas
zainabab@kth.se
Vasiliki Kalavri
kalavriv@inf.ethz.ch
Paris Carbone
parisc@kth.se
Vladimir Vlassov
vladv@kth.se
47

More Related Content

Similar to VLDB 2018 presentation paper title: Streaming Graph Partitioning

Application Assessment - Executive Summary Report
Application Assessment - Executive Summary ReportApplication Assessment - Executive Summary Report
Application Assessment - Executive Summary Report
CAST
 

Similar to VLDB 2018 presentation paper title: Streaming Graph Partitioning (20)

Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
Challenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache SparkChallenging Web-Scale Graph Analytics with Apache Spark
Challenging Web-Scale Graph Analytics with Apache Spark
 
Application Assessment - Executive Summary Report
Application Assessment - Executive Summary ReportApplication Assessment - Executive Summary Report
Application Assessment - Executive Summary Report
 
Mixed Scanning and DFT Techniques for Arithmetic Core
Mixed Scanning and DFT Techniques for Arithmetic CoreMixed Scanning and DFT Techniques for Arithmetic Core
Mixed Scanning and DFT Techniques for Arithmetic Core
 
VLSI design flow.pptx
VLSI design flow.pptxVLSI design flow.pptx
VLSI design flow.pptx
 
Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
GIT presentation
GIT presentationGIT presentation
GIT presentation
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Grow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM StackGrow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM Stack
 
Short.course.introduction.to.vhdl
Short.course.introduction.to.vhdlShort.course.introduction.to.vhdl
Short.course.introduction.to.vhdl
 
High Speed Optimized AES using Parallel Processing Implementation
High Speed Optimized AES using Parallel Processing ImplementationHigh Speed Optimized AES using Parallel Processing Implementation
High Speed Optimized AES using Parallel Processing Implementation
 
E0364025031
E0364025031E0364025031
E0364025031
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1
 
Distribute Storage System May-2014
Distribute Storage System May-2014Distribute Storage System May-2014
Distribute Storage System May-2014
 
TiReX: Tiled Regular eXpression matching architecture
TiReX: Tiled Regular eXpression matching architectureTiReX: Tiled Regular eXpression matching architecture
TiReX: Tiled Regular eXpression matching architecture
 
Chani index
Chani indexChani index
Chani index
 
Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners Short.course.introduction.to.vhdl for beginners
Short.course.introduction.to.vhdl for beginners
 
G-Store: High-Performance Graph Store for Trillion-Edge Processing
G-Store: High-Performance Graph Store for Trillion-Edge ProcessingG-Store: High-Performance Graph Store for Trillion-Edge Processing
G-Store: High-Performance Graph Store for Trillion-Edge Processing
 

Recently uploaded

Recently uploaded (10)

Understanding Poverty: A Community Questionnaire
Understanding Poverty: A Community QuestionnaireUnderstanding Poverty: A Community Questionnaire
Understanding Poverty: A Community Questionnaire
 
ServiceNow CIS-Discovery Exam Dumps 2024
ServiceNow CIS-Discovery Exam Dumps 2024ServiceNow CIS-Discovery Exam Dumps 2024
ServiceNow CIS-Discovery Exam Dumps 2024
 
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptxDAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
 
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docxThe Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
 
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdfACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
 
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfOracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
 
Breathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptxBreathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptx
 
OC Streetcar Final Presentation-Downtown Santa Ana
OC Streetcar Final Presentation-Downtown Santa AnaOC Streetcar Final Presentation-Downtown Santa Ana
OC Streetcar Final Presentation-Downtown Santa Ana
 
Deciding The Topic of our Magazine.pptx.
Deciding The Topic of our Magazine.pptx.Deciding The Topic of our Magazine.pptx.
Deciding The Topic of our Magazine.pptx.
 
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfMicrosoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
 

VLDB 2018 presentation paper title: Streaming Graph Partitioning