Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- GraphChi big graph processing by huguk 2217 views
- Large-Scale Graph Computation on Ju... by Aapo Kyrölä 2538 views
- Graph processing - Powergraph and G... by Amir Payberah 1452 views
- Apache Spark GraphX highlights. by Doug Needham 1030 views
- Machine Learning in the Cloud with ... by Danny Bickson 1508 views
- Next generation analytics with yarn... by Impetus Technologies 2323 views

1,746 views

Published on

Published in:
Education

No Downloads

Total views

1,746

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

54

Comments

0

Likes

4

No embeds

No notes for slide

- 1. GraphLab under the hood Zuhair Khayyat12/10/12 1
- 2. GraphLab overview: GraphLab 1.0● GraphLab: A New Framework For Parallel Machine Learning – high-level abstractions for machine learning problems – Shared-memory multiprocessor – Assume no fault tolerance needed – Concurrent access precessing models with sequential-consistency guarantees12/10/12 2
- 3. GraphLab overview: GraphLab 1.0● How GraphLab 1.0 works? – Represent the users data by a directed graph – Each block of data is represented by a vertex and a directed edge – Shared data table – User functions: ● Update: modify the vertex and edges state, read only to shared table ● Fold: sequential aggregation to a key entry in12/10/12 the shared table, modify vertex data 3 ● Merge: Parallelize Fold function ● Apply: Finalize the key entry in the shared table
- 4. GraphLab overview: GraphLab 1.012/10/12 4
- 5. GraphLab overview: Distributed GraphLab 1.0 ● Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud – Fault tolerance using snapshot algorithm – Improved distributed parallel processing – Two stage partitioning: ● Atoms generated by ParMetis ● Ghosts generated by the intersection of the atoms12/10/12 – Finalize() function for vertex synchronization5
- 6. GraphLab overview: Distributed GraphLab 1.012/10/12 6
- 7. GraphLab overview: Distributed GraphLab 1.012/10/12 7 Worker 1 Worker 2 GHosts
- 8. PowerGraph: Introduction ● GraphLab 2.1 ● Problems of highly skewed power-law graphs: – Workload imbalance ==> performance degradations – Limiting Scalability – Hard to partition if the graph is too large – Storage – Non-parallel computation12/10/12 8
- 9. PowerGraph: New Abstraction● Original Functions: – Update – Finalize – Fold – Merge – Apply: The synchronization apply● Introduce GAS model: – Gather: in, out or all neighbors12/10/12 – Apply: The GAS model apply 9 – Scatter
- 10. PowerGraph: Gather12/10/12 10 Worker 1 Worker 2
- 11. PowerGraph: Apply12/10/12 11 Worker 1 Worker 2
- 12. PowerGraph: Scatter12/10/12 12 Worker 1 Worker 2
- 13. PowerGraph: Vertex Cut A B A H A B A G B CG B H C D H C C H C IF D E D I I E F E I E D F H F G 12/10/12 13
- 14. PowerGraph: Vertex Cut A B CA B A H DA G B C F HB H C D IC H C I A HD E D I A G E BE F E I C DF H F G F G 12/10/12 14 E I C I
- 15. PowerGraph: Vertex Cut (Greedy)A B A H A BA G B C G H CB H C DC H C I B C C DD E D IE F E I E H I EF H F G 12/10/12 15 F G
- 16. PowerGraph: Experiment12/10/12 16
- 17. PowerGraph: Experiment12/10/12 17
- 18. PowerGraph: Discussion ● Isnt it similar to Pregel Mode? – Partially process the vertex if a message exists ● Gather, Apply and Scatter are commutative and associative operations. What if the computation is not commutative! – Sum up the message values in a specific order to get the same floating point rounding error.12/10/12 18
- 19. PowerGraph and Mizan ● In Mizan we use partial replication: W0 W1 W0 W1 b b e e c a f c a a f d g d g Compute Phase Communication Phase12/10/12 19
- 20. GraphChi: Introduction ● Asynchronous Disk-based version of GraphLab ● Utilizing parallel sliding window – Very small number of non-sequential accesses to the disk ● Support for graph updates – Based on Kineograph, a distributed system for processing a continuous in-flow of graph12/10/12 updates, while simultaneously running 20 advanced graph mining algorithms.
- 21. GraphChi: Graph Constrains ● Graph does not fit in memory ● A vertex, its edges and values fits in memory12/10/12 21
- 22. GraphChi: Disk storage ● Compressed sparse row (CSR): – Compressed adjacency list with indexes of the edges. – Fast access to the out-degree vertices. ● Compressed Sparse Column (CSC): – CSR for the transpose graph – Fast access to the in-degree vertices ● Shard: Store the edges data12/10/12 22
- 23. GraphChi: Loading the graph ● Input graph is split into P disjoint intervals to balance edges, each associated with a shard ● A shard contains data of the edges of an interval ● The sub graph is constructed as reading its interval12/10/12 23
- 24. GraphChi: Parallel Sliding Windows ● Each interval is processed in parallel ● P sequential disk access are required to process each interval ● The length of intervals vary with graph distribution ● P * P disk access required for one superstep12/10/12 24
- 25. GraphChi: Example Executing interval (1,2):12/10/12 25 (1,2) (3,4) (5,6)
- 26. GraphChi: Example Executing interval (3,4):12/10/12 26 (1,2) (3,4) (5,6)
- 27. GraphChi: Example12/10/12 27
- 28. GraphChi: Evolving Graphs ● Adding an edge is reflected on the intervals and shards if read ● Deleting an edge causes that edge to be ignored ● Adding and deleting edges are handled after processing the current interval.12/10/12 28
- 29. GraphChi: Preprocessing12/10/12 29
- 30. Thank you12/10/12 30
- 31. The Blog wants YOU12/10/12 31 thegraphsblog.wordpress.com/

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment