Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Large Graph Processing by Zuhair khayyat 1752 views
- Comparing pregel related systems by Prashant Raaghav 601 views
- Pregel and giraph by Cao Manh Dat 534 views
- 2014.02.13 (Strata) Graph Analysis ... by Avery Ching 1720 views
- 2011.06.29. Giraph - Hadoop Summit ... by Avery Ching 16040 views
- Giraph at Hadoop Summit 2014 by Claudio Martella 12851 views

368 views

Published on

network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a “think like a vertex” programming model to support iterative graph computation. This vertex-centric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithm-specific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we

propose a new “think like a graph” programming paradigm. Under this graph-centric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph++, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graph-centric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on well-partitioned data.

Published in:
Data & Analytics

No Downloads

Total views

368

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

16

Comments

0

Likes

1

No embeds

No notes for slide

- 1. From “Think Like a Vertex” to “Think Like a Graph” Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson
- 2. Large Scale Graph Processing Graph data is everywhere and growing rapidly! 126 million blogs 50 billion Web pages 90 trillion emails Analyzing graph data is increasingly important. Retail: customer segmentation and recommendation Consumer Insights: key influencer analysis Telco: churn prediction Biology & Health Care: disease propagation analysis Government: Terrorists detection What is the right programming model for large-scale graph processing?
- 3. Existing Graph Processing Systems Divide input graphs into partitions Employ vertex-centric programming model Programmers “think like a vertex” Operate on a vertex and its edges Communication to other vertices Message passing (e.g Pregel/Giraph) Scheduling of updates (e.g GraphLab)
- 4. “Think like a vertex” “Think like a graph” “Think like a vertex” “Think like a graph” Partition: A collection of vertices A proper subgraph Computation: A vertex and its edges A subgraph Communication: 1-hop at a time e.g ABD Multiple-hops at a time e.g. AD
- 5. Graph-Centric Programming Model Expose subgraphs to programmers Internal vertices vs boundary vertices Information exchange between internal vertices of a partition is immediate Messages are only sent from boundary vertices of a partition to internal vertices of a different partition internal vertex external vertex (primary copy) (local copy) message
- 6. Advantages of graph-centric model Any algorithm expressed in the vertex-centric model can be expressed in the graph-centric model Allow lower-level access for algorithm-specific optimizations Use of existing off-the-shell graph algorithms on subgraphs Local asynchronous computation Natural translation of existing graph-centric parallel algorithms Provide sufficiently high-level abstraction for easy of use
- 7. Example: Weakly Connected Component (WCC) compute() on each subgraph If 0th superstep Sequential WCC on subgraph Send labels of boundary vertices to their corresponding internal vertices Else Use the received messages to update the labels of vertices in the subgraph Merge connected components For boundary vertices with label change, send labels to their corresponding internal vertices vertex-centricmodelg
- 8. Hybrid Execution Model Can we keep the simple vertex-centric programming model but still improve performance? Key: Differentiate internal messages and external messages What messages can be used in local computation? Vertex-centric model Only messages (external and internal) from previous superstep Hybrid model External messages from previous superstep Internal messages from previous + current superstep (local asynchronous computation) Only apply to a limited set of graph algorithms
- 9. Giraph++: A New Graph Processing System Built on top of Apache Giraph Support vertex-centric model (VM), graph-centric model (GM) & hybrid model (HM) in the same Giraph++ system Contributed to Apache Giraph Project Planed in future Giraph release
- 10. A Peek of Performance Results 10-node cluster Per node: 32GB RAM, 8 cores, 7 workers Network: 1Gbit Dataset: 4 web graphs # nodes: 19million ~ 428million # edges: 298million ~ 1.0billion Significant improvement in execution time and network messages, especially with better graph partitioning strategy Up to 62X speedup in execution time! Up to 200X reduction in network messages! Data Set Time(sec) Networkmsgs VM: vertex-centric model GM: graph-centric model HM: hybrid model HP: hash partitioning GP: graph partitioningWCC Algorithm
- 11. Conclusion “Think like a vertex” “Think like a graph” Take advantage of local graph structure in a partition Enable complex and flexible graph algorithms Can be exploited to various graph applications Bring significant performance improvement, especially with better graph partitioning strategy A valuable complement to existing vertex-centric model

No public clipboards found for this slide

Be the first to comment