Your SlideShare is downloading. ×
0
GraphLab under the hood               Zuhair Khayyat12/10/12                             1
GraphLab overview: GraphLab 1.0●   GraphLab: A New Framework For Parallel    Machine Learning           –   high-level abs...
GraphLab overview: GraphLab 1.0●   How GraphLab 1.0 works?           –   Represent the users data by a directed graph     ...
GraphLab overview: GraphLab 1.012/10/12                              4
GraphLab overview: Distributed                  GraphLab 1.0   ●   Distributed GraphLab: A Framework for       Machine Lea...
GraphLab overview: Distributed                  GraphLab 1.012/10/12                                    6
GraphLab overview: Distributed                  GraphLab 1.012/10/12                                      7            Wor...
PowerGraph: Introduction   ●   GraphLab 2.1   ●   Problems of highly skewed power-law graphs:           –   Workload imbal...
PowerGraph: New Abstraction●   Original Functions:           –   Update           –   Finalize           –   Fold         ...
PowerGraph: Gather12/10/12                                          10           Worker 1                    Worker 2
PowerGraph: Apply12/10/12                                             11           Worker 1                       Worker 2
PowerGraph: Scatter12/10/12                                       12           Worker 1                 Worker 2
PowerGraph: Vertex Cut                                       A   B   A   H               A                           B    ...
PowerGraph: Vertex Cut                                   A       B   CA              B   A    H                           ...
PowerGraph: Vertex Cut (Greedy)A              B   A   H       A       BA              G   B   C                           ...
PowerGraph: Experiment12/10/12                            16
PowerGraph: Experiment12/10/12                            17
PowerGraph: Discussion   ●   Isnt it similar to Pregel Mode?           –   Partially process the vertex if a message exist...
PowerGraph and Mizan   ●   In Mizan we use partial replication:       W0                 W1       W0               W1     ...
GraphChi: Introduction   ●   Asynchronous Disk-based version of       GraphLab   ●   Utilizing parallel sliding window    ...
GraphChi: Graph Constrains   ●   Graph does not fit in memory   ●   A vertex, its edges and values fits in memory12/10/12 ...
GraphChi: Disk storage   ●   Compressed sparse row (CSR):           –   Compressed adjacency list with indexes of the     ...
GraphChi: Loading the graph   ●   Input graph is split into P disjoint intervals to balance       edges, each associated w...
GraphChi: Parallel Sliding Windows   ●   Each interval is processed in parallel   ●   P sequential disk access are require...
GraphChi: Example      Executing interval (1,2):12/10/12                                  25           (1,2)      (3,4)   ...
GraphChi: Example      Executing interval (3,4):12/10/12                                  26           (1,2)      (3,4)   ...
GraphChi: Example12/10/12                       27
GraphChi: Evolving Graphs   ●   Adding an edge is reflected on the intervals and       shards if read   ●   Deleting an ed...
GraphChi: Preprocessing12/10/12                             29
Thank you12/10/12               30
The Blog wants YOU12/10/12                                  31           thegraphsblog.wordpress.com/
Upcoming SlideShare
Loading in...5
×

Graphlab under the hood

1,164

Published on

Published in: Education

Transcript of "Graphlab under the hood"

  1. 1. GraphLab under the hood Zuhair Khayyat12/10/12 1
  2. 2. GraphLab overview: GraphLab 1.0● GraphLab: A New Framework For Parallel Machine Learning – high-level abstractions for machine learning problems – Shared-memory multiprocessor – Assume no fault tolerance needed – Concurrent access precessing models with sequential-consistency guarantees12/10/12 2
  3. 3. GraphLab overview: GraphLab 1.0● How GraphLab 1.0 works? – Represent the users data by a directed graph – Each block of data is represented by a vertex and a directed edge – Shared data table – User functions: ● Update: modify the vertex and edges state, read only to shared table ● Fold: sequential aggregation to a key entry in12/10/12 the shared table, modify vertex data 3 ● Merge: Parallelize Fold function ● Apply: Finalize the key entry in the shared table
  4. 4. GraphLab overview: GraphLab 1.012/10/12 4
  5. 5. GraphLab overview: Distributed GraphLab 1.0 ● Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud – Fault tolerance using snapshot algorithm – Improved distributed parallel processing – Two stage partitioning: ● Atoms generated by ParMetis ● Ghosts generated by the intersection of the atoms12/10/12 – Finalize() function for vertex synchronization5
  6. 6. GraphLab overview: Distributed GraphLab 1.012/10/12 6
  7. 7. GraphLab overview: Distributed GraphLab 1.012/10/12 7 Worker 1 Worker 2 GHosts
  8. 8. PowerGraph: Introduction ● GraphLab 2.1 ● Problems of highly skewed power-law graphs: – Workload imbalance ==> performance degradations – Limiting Scalability – Hard to partition if the graph is too large – Storage – Non-parallel computation12/10/12 8
  9. 9. PowerGraph: New Abstraction● Original Functions: – Update – Finalize – Fold – Merge – Apply: The synchronization apply● Introduce GAS model: – Gather: in, out or all neighbors12/10/12 – Apply: The GAS model apply 9 – Scatter
  10. 10. PowerGraph: Gather12/10/12 10 Worker 1 Worker 2
  11. 11. PowerGraph: Apply12/10/12 11 Worker 1 Worker 2
  12. 12. PowerGraph: Scatter12/10/12 12 Worker 1 Worker 2
  13. 13. PowerGraph: Vertex Cut A B A H A B A G B CG B H C D H C C H C IF D E D I I E F E I E D F H F G 12/10/12 13
  14. 14. PowerGraph: Vertex Cut A B CA B A H DA G B C F HB H C D IC H C I A HD E D I A G E BE F E I C DF H F G F G 12/10/12 14 E I C I
  15. 15. PowerGraph: Vertex Cut (Greedy)A B A H A BA G B C G H CB H C DC H C I B C C DD E D IE F E I E H I EF H F G 12/10/12 15 F G
  16. 16. PowerGraph: Experiment12/10/12 16
  17. 17. PowerGraph: Experiment12/10/12 17
  18. 18. PowerGraph: Discussion ● Isnt it similar to Pregel Mode? – Partially process the vertex if a message exists ● Gather, Apply and Scatter are commutative and associative operations. What if the computation is not commutative! – Sum up the message values in a specific order to get the same floating point rounding error.12/10/12 18
  19. 19. PowerGraph and Mizan ● In Mizan we use partial replication: W0 W1 W0 W1 b b e e c a f c a a f d g d g Compute Phase Communication Phase12/10/12 19
  20. 20. GraphChi: Introduction ● Asynchronous Disk-based version of GraphLab ● Utilizing parallel sliding window – Very small number of non-sequential accesses to the disk ● Support for graph updates – Based on Kineograph, a distributed system for processing a continuous in-flow of graph12/10/12 updates, while simultaneously running 20 advanced graph mining algorithms.
  21. 21. GraphChi: Graph Constrains ● Graph does not fit in memory ● A vertex, its edges and values fits in memory12/10/12 21
  22. 22. GraphChi: Disk storage ● Compressed sparse row (CSR): – Compressed adjacency list with indexes of the edges. – Fast access to the out-degree vertices. ● Compressed Sparse Column (CSC): – CSR for the transpose graph – Fast access to the in-degree vertices ● Shard: Store the edges data12/10/12 22
  23. 23. GraphChi: Loading the graph ● Input graph is split into P disjoint intervals to balance edges, each associated with a shard ● A shard contains data of the edges of an interval ● The sub graph is constructed as reading its interval12/10/12 23
  24. 24. GraphChi: Parallel Sliding Windows ● Each interval is processed in parallel ● P sequential disk access are required to process each interval ● The length of intervals vary with graph distribution ● P * P disk access required for one superstep12/10/12 24
  25. 25. GraphChi: Example Executing interval (1,2):12/10/12 25 (1,2) (3,4) (5,6)
  26. 26. GraphChi: Example Executing interval (3,4):12/10/12 26 (1,2) (3,4) (5,6)
  27. 27. GraphChi: Example12/10/12 27
  28. 28. GraphChi: Evolving Graphs ● Adding an edge is reflected on the intervals and shards if read ● Deleting an edge causes that edge to be ignored ● Adding and deleting edges are handled after processing the current interval.12/10/12 28
  29. 29. GraphChi: Preprocessing12/10/12 29
  30. 30. Thank you12/10/12 30
  31. 31. The Blog wants YOU12/10/12 31 thegraphsblog.wordpress.com/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×