Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

1,230 views

Published on

This talk was given by Daniel Abadi at VLDB 2016

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

  1. 1. Leopard: Lightweight Partitioning and Replication for Dynamic Graphs Jiewen Huang and Daniel Abadi Yale University
  2. 2. Facebook Social Graph
  3. 3. Social Graphs
  4. 4. Web Graphs
  5. 5. Semantic Graphs
  6. 6. Many systems use hash partitioning ● Results in many edges being “cut” Given a graph G and an integer k, partition the vertices into k disjoint sets such that: ● as few cuts as possible ● as balanced as possible Graph Partitioning NP Hard
  7. 7. Multilevel scheme Coarsening phase State of the Art
  8. 8. The only constant is change. -------- Heraclitus To Make the Problem more Complicated Social graphs: new people and friendships Semantic Web graphs: new knowledge Web graphs: new websites and links
  9. 9. Dynamic Graphs A Partition 1 Partition 2 Is partition 1 still the better partition for A?
  10. 10. Repartitioning the entire graph upon every change is way too expensive New Framework Leopard: ● Locally reassess partitioning as a result of changes without a full re-partitioning ● Integrates consideration of replication with partitioning
  11. 11. Outline Background and Motivation LEOPARD Overview Computation Skipping Replication Experiments
  12. 12. Algorithm Overview For each added/deleted edge <V1, V2> Compute best partition for V1 using a heuristic Re-assign V1 if needed The same for V2
  13. 13. Example: Adding an Edge A B Partition 1 Partition 2
  14. 14. Compute the Partition for B A B Partition 1 Partition 2# neighbours: 1 # vertices: 5 # neighbours: 3 # vertices: 3 Goals: (1) few cuts and (2) balanced Heuristic: # neighbours * (1 - #vertices/capacity) 1 * (1 - 5/6) = 0.17 3 * (1 - 3/6) = 1.5 Higher score This heuristic is simple for the sake of presentation. More advanced heuristics are discussed in the paper
  15. 15. Compute the Partition for A A B Partition 1 Partition 2# neighbours: 1 # vertices: 4 # neighbours: 2 # vertices: 4 Goals: (1) few cuts and (2) balanced Heuristic: # neighbours * (1 - #vertices/capacity) 1 * (1 - 4/6) = 0.33 2 * (1 - 4/6) = 0.66 Higher score
  16. 16. Example: Adding an Edge B Partition 1 Partition 2 A (1) B stays put (2) A moves to partition 2
  17. 17. Outline Background and Motivation Leopard Overview Computation Skipping Replication Experiments
  18. 18. Computation cost For each new edge, must: For both vertexes involved in the edge: Calculate the heuristic for each partition (May involve communication for remote vertex location lookup)
  19. 19. Computation Skipping Observation: As the number of neighbors of a vertex increases, the influence of a new neighbor decreases.
  20. 20. Computation Skipping Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain threshold, recompute the partition for the vertex. For example, threshold = # accumulated changes / # neighbors = 20%. (1) Compute the partition when V has 10 neighbors. Then 2 new edges are added for V: 2 / 12 = 17% < 20%. Don’t recompute (2) When 1 more new edge is added for V: 3 / 13 = 23% > 20%. Recompute the partition for V. Reset # accumulated changes to 0.
  21. 21. Outline Background and Motivation Leopard Overview Computation Skipping Replication Experiments
  22. 22. Goals of replication: fault tolerance (k copies for each data point/block) further cut reduction Replication
  23. 23. It takes two parameters: ● minimum: fault tolerance ● average: cut reduction Minimum-Average Replication
  24. 24. Example # copies vertices 2 A,C,D,E,H,J,K,L 3 F,I 4 B,G min = 2 average = 2.5 first copy replica
  25. 25. Example # copies vertices 2 A,C,D,E,H,J,K,L 3 F,I 4 B,G min = 2 average = 2.5
  26. 26. How Many Copies? A Partition 1 Partition 4Partition 3Partition 2 0.1 0.40.30.2 minimum = 2 average = 3 Scores of each partition
  27. 27. How Many Copies? A Partition 1 Partition 4Partition 3Partition 2 0.1 0.40.30.2 minimum = 2 average = 3 minimum requirementWhat about them?
  28. 28. Always keep the last n computed scores. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 cutoff: top avg-1/k-1 percent of scores
  29. 29. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 2 cutoff: 30th highest score
  30. 30. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 2 cutoff: 30th highest score
  31. 31. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 3 cutoff: 30th highest score
  32. 32. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th # copies: 4 cutoff: 30th highest score
  33. 33. Outline Background and Motivation Leopard Experiments
  34. 34. Experiment Setup ● Comparison points ○ Leopard with FENNEL heustitics ○ One-pass FENNEL (no vertex reassignment) ○ METIS (static graphs) ○ ParMETIS (repartitioning for dynamic graphs) ○ Hash Partitioning ● Graph Datasets ○ Type: social graphs, collaboration graphs, Web graphs, email graphs, and synthetic graphs ○ Size: up to 66 million vertices and 1.8 billion edges
  35. 35. Edge Cut
  36. 36. Computation Skipping
  37. 37. Effect of Replication on Edge Cut
  38. 38. Thanks! Q & A

×