Successfully reported this slideshow.
Upcoming SlideShare
×

# Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

1,230 views

Published on

This talk was given by Daniel Abadi at VLDB 2016

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

1. 1. Leopard: Lightweight Partitioning and Replication for Dynamic Graphs Jiewen Huang and Daniel Abadi Yale University
3. 3. Social Graphs
4. 4. Web Graphs
5. 5. Semantic Graphs
6. 6. Many systems use hash partitioning ● Results in many edges being “cut” Given a graph G and an integer k, partition the vertices into k disjoint sets such that: ● as few cuts as possible ● as balanced as possible Graph Partitioning NP Hard
7. 7. Multilevel scheme Coarsening phase State of the Art
8. 8. The only constant is change. -------- Heraclitus To Make the Problem more Complicated Social graphs: new people and friendships Semantic Web graphs: new knowledge Web graphs: new websites and links
9. 9. Dynamic Graphs A Partition 1 Partition 2 Is partition 1 still the better partition for A?
10. 10. Repartitioning the entire graph upon every change is way too expensive New Framework Leopard: ● Locally reassess partitioning as a result of changes without a full re-partitioning ● Integrates consideration of replication with partitioning
11. 11. Outline Background and Motivation LEOPARD Overview Computation Skipping Replication Experiments
12. 12. Algorithm Overview For each added/deleted edge <V1, V2> Compute best partition for V1 using a heuristic Re-assign V1 if needed The same for V2
13. 13. Example: Adding an Edge A B Partition 1 Partition 2
14. 14. Compute the Partition for B A B Partition 1 Partition 2# neighbours: 1 # vertices: 5 # neighbours: 3 # vertices: 3 Goals: (1) few cuts and (2) balanced Heuristic: # neighbours * (1 - #vertices/capacity) 1 * (1 - 5/6) = 0.17 3 * (1 - 3/6) = 1.5 Higher score This heuristic is simple for the sake of presentation. More advanced heuristics are discussed in the paper
15. 15. Compute the Partition for A A B Partition 1 Partition 2# neighbours: 1 # vertices: 4 # neighbours: 2 # vertices: 4 Goals: (1) few cuts and (2) balanced Heuristic: # neighbours * (1 - #vertices/capacity) 1 * (1 - 4/6) = 0.33 2 * (1 - 4/6) = 0.66 Higher score
16. 16. Example: Adding an Edge B Partition 1 Partition 2 A (1) B stays put (2) A moves to partition 2
17. 17. Outline Background and Motivation Leopard Overview Computation Skipping Replication Experiments
18. 18. Computation cost For each new edge, must: For both vertexes involved in the edge: Calculate the heuristic for each partition (May involve communication for remote vertex location lookup)
19. 19. Computation Skipping Observation: As the number of neighbors of a vertex increases, the influence of a new neighbor decreases.
20. 20. Computation Skipping Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain threshold, recompute the partition for the vertex. For example, threshold = # accumulated changes / # neighbors = 20%. (1) Compute the partition when V has 10 neighbors. Then 2 new edges are added for V: 2 / 12 = 17% < 20%. Don’t recompute (2) When 1 more new edge is added for V: 3 / 13 = 23% > 20%. Recompute the partition for V. Reset # accumulated changes to 0.
21. 21. Outline Background and Motivation Leopard Overview Computation Skipping Replication Experiments
22. 22. Goals of replication: fault tolerance (k copies for each data point/block) further cut reduction Replication
23. 23. It takes two parameters: ● minimum: fault tolerance ● average: cut reduction Minimum-Average Replication
24. 24. Example # copies vertices 2 A,C,D,E,H,J,K,L 3 F,I 4 B,G min = 2 average = 2.5 first copy replica
25. 25. Example # copies vertices 2 A,C,D,E,H,J,K,L 3 F,I 4 B,G min = 2 average = 2.5
26. 26. How Many Copies? A Partition 1 Partition 4Partition 3Partition 2 0.1 0.40.30.2 minimum = 2 average = 3 Scores of each partition
27. 27. How Many Copies? A Partition 1 Partition 4Partition 3Partition 2 0.1 0.40.30.2 minimum = 2 average = 3 minimum requirementWhat about them?
28. 28. Always keep the last n computed scores. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 cutoff: top avg-1/k-1 percent of scores
29. 29. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 2 cutoff: 30th highest score
30. 30. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 2 cutoff: 30th highest score
31. 31. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 3 cutoff: 30th highest score
32. 32. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th # copies: 4 cutoff: 30th highest score
33. 33. Outline Background and Motivation Leopard Experiments
34. 34. Experiment Setup ● Comparison points ○ Leopard with FENNEL heustitics ○ One-pass FENNEL (no vertex reassignment) ○ METIS (static graphs) ○ ParMETIS (repartitioning for dynamic graphs) ○ Hash Partitioning ● Graph Datasets ○ Type: social graphs, collaboration graphs, Web graphs, email graphs, and synthetic graphs ○ Size: up to 66 million vertices and 1.8 billion edges
35. 35. Edge Cut
36. 36. Computation Skipping
37. 37. Effect of Replication on Edge Cut
38. 38. Thanks! Q & A