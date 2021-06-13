Successfully reported this slideshow.
Application driven graph partitioning - SIGMOD'20

This is a presentation from a paper title "Application Driven Graph Partitioning" published in SIGMOD 2020 presented at the weekly reading group at Systopia Lab at UBC.

Link: https://dl.acm.org/doi/abs/10.1145/3318464.3389745

Abstract:
Graph partitioning is crucial to parallel computations on
large graphs. The choice of partitioning strategies has strong
impact on not only the performance of graph algorithms,
but also the design of the algorithms. For an algorithm of
our interest, what partitioning strategy fits it the best and
improves its parallel execution? Is it possible to develop
graph algorithms with partition transparency, such that the
algorithms work under different partitions without changes?
This paper aims to answer these questions. We propose an
application-driven hybrid partitioning strategy that, given a
graph algorithm A, learns a cost model for A as polynomial
regression. We develop partitioners that given the learned
cost model, refine an edge-cut or vertex-cut partition to a
hybrid partition and reduce the parallel cost of A. Moreover,
we identify a general condition under which graph-centric
algorithms are partition transparent. We show that a number
of graph algorithms can be made partition transparent. Using
real-life and synthetic graphs, we experimentally verify that
our partitioning strategy improves the performance of a
variety of graph computations, up to 22.5 times.

License: CC Attribution-NonCommercial License

Application driven graph partitioning - SIGMOD'20

  1. 1. Hadi Sinaee, University of British Columbia, June 2021 Application Driven Graph Partitioning Wenfei Fan, et al. SIGMOD’20 For Systopia’s reading group by Hadi Sinaee University of British Columbia, Canada June 11th, 2021 1 How do we do it? What are the metrics for it?
  2. 2. Hadi Sinaee, University of British Columbia, June 2021 Why do we partition a graph?! Graph Sub graph Sub graph . . . Sub graph Part-2 Part-3 Part-K Sub graph Part-1 ● A Graph ● K Units of Computation How do we do the partitioning? Which partitioning is considered a good one? 2
  3. 3. Hadi Sinaee, University of British Columbia, June 2021 Edge Partitioning(VertexCut) Vertex Partitioning(EdgeCut) How do we do the partitioning? 3 Hybrid Partitioning
  4. 4. Hadi Sinaee, University of British Columbia, June 2021 Which partitioning is considered a good one? 1. Lower Replication 2. Well-Balanced Sub Graphs 4
  5. 5. Hadi Sinaee, University of British Columbia, June 2021 https://twitter.com/NetflixFilm/status/1291442611962560513?s=20 5 Vertex Partitioning Edge Partitioning Hybrid Partitioning
  6. 6. Hadi Sinaee, University of British Columbia, June 2021 Which partitioning is considered A GOOD ONE ??! 6 + Workload? Access to nodes? ... v.s
  7. 7. Hadi Sinaee, University of British Columbia, June 2021 “For an algorithm of our interest, what partitioning strategy fits it the best and improves its parallel execution?” 7
  8. 8. Hadi Sinaee, University of British Columbia, June 2021 Example - Common Neighbors(CN) 8 (u, v1,v3) . . . (u, vi,vj) ... How many pairs of this form exists? Pick any two nodes from u’s incoming nodes In-Degree: number of incoming nodes C(n,2) = n*(n-1)/2 (u, v1,v2) u V 2 V 1
  9. 9. Hadi Sinaee, University of British Columbia, June 2021 Example - Vertex Partitioning and CN 9 5 Vertices + 9 Edges (Well-Balanced) F1(10) > F2(2) (workload) F1(6) == F2(6) (workload) F1: 3 Vertices + 6 Edges F2: 7 Vertices + 11 Edges
  10. 10. Hadi Sinaee, University of British Columbia, June 2021 What Happened?! 10 Picked An Algorithm (CN) Defined A Cost Model For CN Is this a good partitioning for CN? No Yes We’re good!
  11. 11. Hadi Sinaee, University of British Columbia, June 2021 What Happened?! 11 Picked An Algorithm (A) Defined A Cost Model For CN Is this a good partitioning for CN? No Yes We’re good! Repartition the graph A Graph Partition
  12. 12. Hadi Sinaee, University of British Columbia, June 2021 What is the process for learning the cost model? 12 feature vector cost function training and testing
  13. 13. Hadi Sinaee, University of British Columbia, June 2021 What is the feature vector? 13 In-degree & out-degree (Each Partition) In-degree & out-degree (Graph) Number of mirrors across all partitions *When there are multiple copies of a node, we consider one of them as the master and the rest as mirrors!
  14. 14. Hadi Sinaee, University of British Columbia, June 2021 What is the process for learning the cost model? 14 feature vector cost function training and testing
  15. 15. Hadi Sinaee, University of British Columbia, June 2021 What is the cost function?! 15 Algorithm Partition i Computation Cost Communication Cost Cost Function (Partition i) = +
  16. 16. Hadi Sinaee, University of British Columbia, June 2021 Computation and Communication costs! 16
  17. 17. Hadi Sinaee, University of British Columbia, June 2021 Computation and Communication costs? 17
  18. 18. Hadi Sinaee, University of British Columbia, June 2021 Computation and Communication costs? 18 X1 x2 x3 x4 x5 x6 (1+ x1+ x2+ x3+ x4+ x5+ x6)p=2 x1 x1*x1 x1*x2 … x6*x6 P=2 polynomial function of order P w1 w2 w3 … wn w1*x1 + w2*(x1*x1) + … wn*(x6*x6)
  19. 19. Hadi Sinaee, University of British Columbia, June 2021 Computation and Communication costs? 19 dummy! polynomial function of order P
  20. 20. Hadi Sinaee, University of British Columbia, June 2021 Computation and Communication costs? 20 *When there are multiple copies of a node, we consider one of them as the master and the rest as its mirrors!
  21. 21. Hadi Sinaee, University of British Columbia, June 2021 What is the process for learning the cost model? 21 feature vector cost function training and testing polynomial function of order P
  22. 22. Hadi Sinaee, University of British Columbia, June 2021 How do we train our models? 22 -training data set -computed costs for v Run algorithm A on Real-World Graphs and Synthesis Graphs. Prevent Overfitting Mean Squared Relative Error (MSRE) w1 w2 w3 … wn
  23. 23. Hadi Sinaee, University of British Columbia, June 2021 What is the process for learning the cost model? 23 feature vector cost function training and testing polynomial function of order P
  24. 24. Hadi Sinaee, University of British Columbia, June 2021 What Happened?! 24 Picked An Algorithm (A) Defined A Cost Model For CN Is this a good partitioning for CN? No Yes We’re good! Repartition the graph A Graph Partition
  25. 25. Hadi Sinaee, University of British Columbia, June 2021 What Happened?! - CN 25 Picked An Algorithm (A) Defined A Cost Model For CN Is this a good partitioning for CN? No Yes We’re good! Repartition the graph A Graph Partition
  26. 26. Hadi Sinaee, University of British Columbia, June 2021 How to use our trained models? 26 From Edge-Cut to Hybrid-Cut (E2H) Balance Computation Balance Communication EMigrate E2H Process Given: 1. Edge-Cut Part. 2. Learned h and g 3. Budget B Goal: Hybrid Part. reduces: 1. Find candidates( using BFS ), move (nodes and all their edges) from overloaded to underloaded partitions 2. Continues until no migration needed!
  27. 27. Hadi Sinaee, University of British Columbia, June 2021 How to use our trained models? 27 From Edge-Cut to Hybrid-Cut (E2H) E2H Process Given: 1. Edge-Cut Part. 2. Learned h and g 3. Budget B Goal: Hybrid Part. reduces: Balance Computation Balance Communication EMigrate ESplit What happened if we couldn’t migrate anymore? e.g. e high-degree nodes in power law graphs Selects a node with a subset of edges for the migration 1. Find candidates( using BFS ), move (e-cut nodes and all their edges) from overloaded to underloaded partitions 2. Continues until no migration needed!
  28. 28. Hadi Sinaee, University of British Columbia, June 2021 How to use our trained models? 28 From Edge-Cut to Hybrid-Cut (E2H) E2H Process Given: 1. Edge-Cut Part. 2. Learned h and g 3. Budget B Goal: Hybrid Part. reduces: Balance Computation Balance Communication EMigrate ESplit Selects a node with a subset of edges for the migration I.e. cuts a node into multiple v-cut nodes MAssign Assign master nodes in the border node set 1. Find candidates( using BFS ), move (e-cut nodes and all their edges) from overloaded to underloaded partitions 2. Continues until no migration needed!
  29. 29. Hadi Sinaee, University of British Columbia, June 2021 Example! 29 marked for migration add if it’s in our budget BFS Order
  30. 30. Hadi Sinaee, University of British Columbia, June 2021 Example! 30 marked for migration add if it’s in our budget BFS Order EMigrate t3 EMigrate t2 aborted since it exceeds F2 budget!
  31. 31. Hadi Sinaee, University of British Columbia, June 2021 Example! 31 ESplit t2 MAssign
  32. 32. Hadi Sinaee, University of British Columbia, June 2021 Parallelized E2H 32 Worker 1 Worker 2 Worker N *Shared Nothing Distributed State ... Initial Edge-Cut With N Partition ...
  33. 33. Hadi Sinaee, University of British Columbia, June 2021 Parallelized E2H 33 Worker 1 Worker 2 Worker N *Shared Nothing Distributed State ... Initial Edge-Cut With N Partition ... Worker i ... Underloaded Partitions Worker N Overloaded Partition
  34. 34. Hadi Sinaee, University of British Columbia, June 2021 Parallelized E2H 34 Worker 1 Worker 2 Worker N *Shared Nothing Distributed State ... Initial Edge-Cut With N Partitions ... Worker i ... Underloaded Partitions Worker N Overloaded Partition EMigrate c1 c2 c3 ?
  35. 35. Hadi Sinaee, University of British Columbia, June 2021 Parallelized E2H 35 Worker 1 Worker 2 Worker N *Shared Nothing Distributed State ... Initial Edge-Cut With N Partitions ... Worker i ... Underloaded Partitions Worker N Overloaded Partition EMigrate c1 c2 c3 c1 c2 c3
  36. 36. Hadi Sinaee, University of British Columbia, June 2021 Parallelized E2H 36 Worker 1 Worker 2 Worker N *Shared Nothing Distributed State ... Initial Edge-Cut With N Partitions ... Worker i ... Underloaded Partitions Worker N Overloaded Partition EMigrate c1 c2 c3 c1 c2 c3
  37. 37. Hadi Sinaee, University of British Columbia, June 2021 Parallelized E2H 37 Worker 1 Worker 2 Worker N *Shared Nothing Distributed State ... Initial Edge-Cut With N Partitions ... Worker i ... Underloaded Partitions Worker N Overloaded Partition EMigrate c3
  38. 38. Hadi Sinaee, University of British Columbia, June 2021 Parallelized E2H 38 Worker 1 Worker 2 Worker N *Shared Nothing Distributed State ... Initial Edge-Cut With N Partitions ... Worker i ... Underloaded Partitions Worker N Overloaded Partition EMigrate c3 c3 The process continues until all candidates are either accepted by some workers or rejected by all of them!
  39. 39. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Setup 39 - LiveJournal(4.8M Nodes, 68M Edges) - Twitter(42M Nodes, 1.5B Edges) - UK Web(106M Nodes, 3.7B Edges) - Common Neighbours(CN) - Triangle Counting(TC) - Page Rank(PR) - 80K Training Samples, 20K Tests - PyTorch - NVIDIA Tesla V100 GPU - 32 machines in an HPC cluster - 12 cores Xeon 2.2GHz + 128GB RAM - 10Gbps NIC - Each partition is processed by 1 worker - Each worker runs on 1 excl. core - Each expr. = avg(repeated 5 times)
  40. 40. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Reading! 40 Edge-Cut Partitioners Vertex-Cut Partitioners Hybrid Partitioner Post processed by Parallel version of E2H (ParE2H) Post processed by Parallel version of V2H (ParV2H)
  41. 41. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Reading! 41 Edge-Cut Partitioners Vertex-Cut Partitioners Hybrid Partitioners Post processed partitions
  42. 42. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Application Speedup of CN 42 better worse Number of partitions
  43. 43. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Application Speedup of TC 43 better worse vertex-cut Edge-cut based
  44. 44. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Application Speedup of PR 44 better worse vertex-cut Edge-cut based
  45. 45. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Scalability 45 On Average: - ParE2H takes ~12% total run-time - ParV2H takes: 1. 0.1% HNE run-time 2. ~23% HGrid run-time Number of partitions G = Synthesis Graph CN algorithm
  46. 46. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Scalability 46 On Average: - ParE2H takes ~12% total run-time - ParV2H takes: 1. 0.1% HNE run-time 2. ~23% HGrid run-time Number of partitions G = Synthetic Graph CN algorithm
  47. 47. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Impact of diﬀerent phases 47 EMigrate: CN(~68%), TC(~26%), PR(75%) ESplit: CN(1.1 times), TC(2.7 times) MAssign: CN, TC, PR ~ (20%-30%)
  48. 48. Hadi Sinaee, University of British Columbia, June 2021 Experiments - Eﬀiciency 48

