This is a presentation from a paper title "Application Driven Graph Partitioning" published in SIGMOD 2020 presented at the weekly reading group at Systopia Lab at UBC.
Link: https://dl.acm.org/doi/abs/10.1145/3318464.3389745
Abstract:
Graph partitioning is crucial to parallel computations on
large graphs. The choice of partitioning strategies has strong
impact on not only the performance of graph algorithms,
but also the design of the algorithms. For an algorithm of
our interest, what partitioning strategy fits it the best and
improves its parallel execution? Is it possible to develop
graph algorithms with partition transparency, such that the
algorithms work under different partitions without changes?
This paper aims to answer these questions. We propose an
application-driven hybrid partitioning strategy that, given a
graph algorithm A, learns a cost model for A as polynomial
regression. We develop partitioners that given the learned
cost model, refine an edge-cut or vertex-cut partition to a
hybrid partition and reduce the parallel cost of A. Moreover,
we identify a general condition under which graph-centric
algorithms are partition transparent. We show that a number
of graph algorithms can be made partition transparent. Using
real-life and synthetic graphs, we experimentally verify that
our partitioning strategy improves the performance of a
variety of graph computations, up to 22.5 times.
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Application driven graph partitioning - SIGMOD'20
1. Hadi Sinaee, University of British Columbia, June 2021
Application Driven Graph Partitioning
Wenfei Fan, et al. SIGMOD’20
For Systopia’s reading group by Hadi Sinaee
University of British Columbia, Canada
June 11th, 2021
1
How do we do it?
What are the metrics for it?
2. Hadi Sinaee, University of British Columbia, June 2021
Why do we partition a graph?!
Graph
Sub
graph
Sub
graph
.
.
.
Sub
graph
Part-2
Part-3
Part-K
Sub
graph
Part-1
● A Graph
● K Units of Computation
How do we do the partitioning?
Which partitioning is considered a good one?
2
3. Hadi Sinaee, University of British Columbia, June 2021
Edge Partitioning(VertexCut)
Vertex Partitioning(EdgeCut)
How do we do the partitioning?
3
Hybrid Partitioning
4. Hadi Sinaee, University of British Columbia, June 2021
Which partitioning is considered a good one?
1. Lower Replication
2. Well-Balanced Sub Graphs
4
5. Hadi Sinaee, University of British Columbia, June 2021
https://twitter.com/NetflixFilm/status/1291442611962560513?s=20
5
Vertex Partitioning
Edge Partitioning
Hybrid Partitioning
6. Hadi Sinaee, University of British Columbia, June 2021
Which partitioning is considered A GOOD ONE ??!
6
+
Workload?
Access to nodes?
...
v.s
7. Hadi Sinaee, University of British Columbia, June 2021
“For an algorithm of our interest, what
partitioning strategy fits it the best and
improves its parallel execution?”
7
8. Hadi Sinaee, University of British Columbia, June 2021
Example - Common Neighbors(CN)
8
(u, v1,v3)
.
.
.
(u, vi,vj)
...
How many pairs of this form exists?
Pick any two nodes from u’s incoming nodes
In-Degree: number of incoming nodes
C(n,2) = n*(n-1)/2
(u, v1,v2)
u
V
2
V
1
9. Hadi Sinaee, University of British Columbia, June 2021
Example - Vertex Partitioning and CN
9
5 Vertices + 9 Edges
(Well-Balanced)
F1(10) > F2(2)
(workload)
F1(6) == F2(6)
(workload)
F1: 3 Vertices + 6 Edges
F2: 7 Vertices + 11 Edges
10. Hadi Sinaee, University of British Columbia, June 2021
What Happened?!
10
Picked An Algorithm
(CN)
Defined A Cost Model For CN
Is this a good
partitioning for CN?
No
Yes
We’re good!
11. Hadi Sinaee, University of British Columbia, June 2021
What Happened?!
11
Picked An Algorithm
(A)
Defined A Cost Model For CN
Is this a good
partitioning for CN?
No
Yes
We’re good!
Repartition the graph
A Graph Partition
12. Hadi Sinaee, University of British Columbia, June 2021
What is the process for learning the cost model?
12
feature vector
cost function
training and testing
13. Hadi Sinaee, University of British Columbia, June 2021
What is the feature vector?
13
In-degree & out-degree
(Each Partition)
In-degree & out-degree
(Graph)
Number of mirrors across all partitions
*When there are multiple copies of a
node, we consider one of them as the
master and the rest as mirrors!
14. Hadi Sinaee, University of British Columbia, June 2021
What is the process for learning the cost model?
14
feature vector
cost function
training and testing
15. Hadi Sinaee, University of British Columbia, June 2021
What is the cost function?!
15
Algorithm
Partition i
Computation Cost Communication Cost
Cost Function
(Partition i)
= +
16. Hadi Sinaee, University of British Columbia, June 2021
Computation and Communication costs!
16
17. Hadi Sinaee, University of British Columbia, June 2021
Computation and Communication costs?
17
18. Hadi Sinaee, University of British Columbia, June 2021
Computation and Communication costs?
18
X1 x2 x3 x4 x5 x6
(1+ x1+ x2+ x3+ x4+ x5+ x6)p=2
x1
x1*x1
x1*x2
…
x6*x6
P=2
polynomial function of order P
w1
w2
w3
…
wn
w1*x1 + w2*(x1*x1) + … wn*(x6*x6)
19. Hadi Sinaee, University of British Columbia, June 2021
Computation and Communication costs?
19
dummy!
polynomial function of order P
20. Hadi Sinaee, University of British Columbia, June 2021
Computation and Communication costs?
20
*When there are multiple copies of a
node, we consider one of them as the
master and the rest as its mirrors!
21. Hadi Sinaee, University of British Columbia, June 2021
What is the process for learning the cost model?
21
feature vector
cost function
training and testing
polynomial function of order P
22. Hadi Sinaee, University of British Columbia, June 2021
How do we train our models?
22
-training data set
-computed costs for v
Run algorithm A on Real-World Graphs and Synthesis Graphs.
Prevent
Overfitting
Mean Squared Relative Error
(MSRE)
w1
w2
w3
…
wn
23. Hadi Sinaee, University of British Columbia, June 2021
What is the process for learning the cost model?
23
feature vector
cost function
training and testing
polynomial function of order P
24. Hadi Sinaee, University of British Columbia, June 2021
What Happened?!
24
Picked An Algorithm
(A)
Defined A Cost Model For CN
Is this a good
partitioning for CN?
No
Yes
We’re good!
Repartition the graph
A Graph Partition
25. Hadi Sinaee, University of British Columbia, June 2021
What Happened?! - CN
25
Picked An Algorithm
(A)
Defined A Cost Model For CN
Is this a good
partitioning for CN?
No
Yes
We’re good!
Repartition the graph
A Graph Partition
26. Hadi Sinaee, University of British Columbia, June 2021
How to use our trained models?
26
From Edge-Cut to Hybrid-Cut (E2H)
Balance Computation
Balance Communication
EMigrate
E2H Process
Given:
1. Edge-Cut Part.
2. Learned h and g
3. Budget B
Goal:
Hybrid Part. reduces:
1. Find candidates( using BFS ), move (nodes
and all their edges) from overloaded
to underloaded partitions
2. Continues until no migration needed!
27. Hadi Sinaee, University of British Columbia, June 2021
How to use our trained models?
27
From Edge-Cut to Hybrid-Cut (E2H)
E2H Process
Given:
1. Edge-Cut Part.
2. Learned h and g
3. Budget B
Goal:
Hybrid Part. reduces:
Balance Computation
Balance Communication
EMigrate
ESplit What happened if we couldn’t migrate
anymore?
e.g. e high-degree nodes in power law graphs
Selects a node with a subset of edges for the
migration
1. Find candidates( using BFS ), move (e-cut
nodes and all their edges) from
overloaded to underloaded partitions
2. Continues until no migration needed!
28. Hadi Sinaee, University of British Columbia, June 2021
How to use our trained models?
28
From Edge-Cut to Hybrid-Cut (E2H)
E2H Process
Given:
1. Edge-Cut Part.
2. Learned h and g
3. Budget B
Goal:
Hybrid Part. reduces:
Balance Computation
Balance Communication
EMigrate
ESplit Selects a node with a subset of edges for the
migration
I.e. cuts a node into multiple v-cut nodes
MAssign Assign master nodes in the border node set
1. Find candidates( using BFS ), move (e-cut
nodes and all their edges) from
overloaded to underloaded partitions
2. Continues until no migration needed!
29. Hadi Sinaee, University of British Columbia, June 2021
Example!
29
marked for migration
add if it’s in our budget
BFS Order
30. Hadi Sinaee, University of British Columbia, June 2021
Example!
30
marked for migration
add if it’s in our budget
BFS Order
EMigrate
t3
EMigrate
t2
aborted since it exceeds F2 budget!
32. Hadi Sinaee, University of British Columbia, June 2021
Parallelized E2H
32
Worker 1 Worker 2 Worker N
*Shared Nothing Distributed State
...
Initial Edge-Cut With N Partition ...
33. Hadi Sinaee, University of British Columbia, June 2021
Parallelized E2H
33
Worker 1 Worker 2 Worker N
*Shared Nothing Distributed State
...
Initial Edge-Cut With N Partition ...
Worker i ...
Underloaded Partitions
Worker N
Overloaded Partition
34. Hadi Sinaee, University of British Columbia, June 2021
Parallelized E2H
34
Worker 1 Worker 2 Worker N
*Shared Nothing Distributed State
...
Initial Edge-Cut With N Partitions ...
Worker i ...
Underloaded Partitions
Worker N
Overloaded Partition
EMigrate
c1
c2
c3
?
35. Hadi Sinaee, University of British Columbia, June 2021
Parallelized E2H
35
Worker 1 Worker 2 Worker N
*Shared Nothing Distributed State
...
Initial Edge-Cut With N Partitions ...
Worker i ...
Underloaded Partitions
Worker N
Overloaded Partition
EMigrate
c1
c2
c3
c1
c2
c3
36. Hadi Sinaee, University of British Columbia, June 2021
Parallelized E2H
36
Worker 1 Worker 2 Worker N
*Shared Nothing Distributed State
...
Initial Edge-Cut With N Partitions ...
Worker i ...
Underloaded Partitions
Worker N
Overloaded Partition
EMigrate
c1
c2
c3
c1
c2
c3
37. Hadi Sinaee, University of British Columbia, June 2021
Parallelized E2H
37
Worker 1 Worker 2 Worker N
*Shared Nothing Distributed State
...
Initial Edge-Cut With N Partitions ...
Worker i ...
Underloaded Partitions
Worker N
Overloaded Partition
EMigrate
c3
38. Hadi Sinaee, University of British Columbia, June 2021
Parallelized E2H
38
Worker 1 Worker 2 Worker N
*Shared Nothing Distributed State
...
Initial Edge-Cut With N Partitions ...
Worker i ...
Underloaded Partitions
Worker N
Overloaded Partition
EMigrate
c3
c3
The process continues until all candidates are either accepted
by some workers or rejected by all of them!
39. Hadi Sinaee, University of British Columbia, June 2021
Experiments - Setup
39
- LiveJournal(4.8M Nodes, 68M Edges)
- Twitter(42M Nodes, 1.5B Edges)
- UK Web(106M Nodes, 3.7B Edges)
- Common Neighbours(CN)
- Triangle Counting(TC)
- Page Rank(PR)
- 80K Training Samples, 20K Tests
- PyTorch
- NVIDIA Tesla V100 GPU
- 32 machines in an HPC cluster
- 12 cores Xeon 2.2GHz + 128GB RAM
- 10Gbps NIC
- Each partition is processed by 1 worker
- Each worker runs on 1 excl. core
- Each expr. = avg(repeated 5 times)
40. Hadi Sinaee, University of British Columbia, June 2021
Experiments - Reading!
40
Edge-Cut Partitioners Vertex-Cut Partitioners Hybrid Partitioner
Post processed by Parallel version of E2H
(ParE2H)
Post processed by Parallel version of V2H
(ParV2H)
41. Hadi Sinaee, University of British Columbia, June 2021
Experiments - Reading!
41
Edge-Cut Partitioners Vertex-Cut Partitioners Hybrid Partitioners
Post processed partitions
42. Hadi Sinaee, University of British Columbia, June 2021
Experiments - Application Speedup of CN
42
better
worse
Number of partitions
43. Hadi Sinaee, University of British Columbia, June 2021
Experiments - Application Speedup of TC
43
better
worse
vertex-cut
Edge-cut based
44. Hadi Sinaee, University of British Columbia, June 2021
Experiments - Application Speedup of PR
44
better
worse
vertex-cut
Edge-cut based
45. Hadi Sinaee, University of British Columbia, June 2021
Experiments - Scalability
45
On Average:
- ParE2H takes ~12% total run-time
- ParV2H takes:
1. 0.1% HNE run-time
2. ~23% HGrid run-time
Number of partitions G = Synthesis Graph
CN algorithm
46. Hadi Sinaee, University of British Columbia, June 2021
Experiments - Scalability
46
On Average:
- ParE2H takes ~12% total run-time
- ParV2H takes:
1. 0.1% HNE run-time
2. ~23% HGrid run-time
Number of partitions G = Synthetic Graph
CN algorithm
47. Hadi Sinaee, University of British Columbia, June 2021
Experiments - Impact of different phases
47
EMigrate: CN(~68%), TC(~26%), PR(75%)
ESplit: CN(1.1 times), TC(2.7 times)
MAssign: CN, TC, PR ~ (20%-30%)