In Proceedings of the 2010 ACM SIGMOD International
Conference on Management of data (pp. 135-146). ACM

Grzegorz Malewicz...
Source: SIGMETRICS ’09 Tutorial – MapReduce: The Programming Model and Practice, by Jerry Zhao

2
Outline
• Introduction

• Computation Model
• Writing a Pregel Program
• System Implementation
• Applications
• Experiment...
The Problem
• Many practical computing problems concern large
graphs.
Large graph data

Graph algorithms

Web graph
Transp...
Want to Process a Large Scale Graph? The Options:
1. Crafting a custom distributed infrastructure.
Substantial engineerin...
Pregel
• Google, to overcome, these challenges came up with
Pregel.
Provides scalability
Fault-tolerance
Flexibility to...
Bulk Synchronous Parallel
Input

All Vote
to Halt

Output

•
•
•
•

Series of iterations (supersteps) .
Each vertex V invo...
Advantage? In Vertex-Centric Approach
• Users focus on a local action.
• Processing each item independently.
• Ensures tha...
Outline
• Introduction

• Computation Model
• Writing a Pregel Program
• System Implementation
• Applications
• Experiment...
Model of Computation
All Vote
to Halt

•
•
•
•

Outpu
t

A Directed Graph is given to Pregel.
It runs the computation at e...
Vertex State Machine

• Algorithm termination is based on every vertex
voting to halt.
• In superstep 0, every vertex is i...
3

6

2

1
Blue Arrows
are messages.

6
3

6

2

1
6

6

6

6
2

6

6

6

6

6

Blue vertices
have voted
to halt.

Example...
Outline
• Introduction

• Computation Model
• Writing a Pregel Program
• System Implementation
• Applications
• Experiment...
The C++ API
• Subclassing the predefined Vertex class, and writes
a Compute method.
Compute() method: which will be execu...
The C++ API – Vertex Class
3 value types
Override this!

in msgs

Vertex
Edge
out msg

15
The C++ API
Message passing:
• No guaranteed message delivery order.
• Messages are delivered exactly once.
• Can send mes...
The C++ API
Combiners (not active by default):
• Sending a message to another vertex that exists on a
different machine ha...
The C++ API
Aggregators:
• A mechanism for global communication, monitoring,
and data.
Each vertex can produce a value in...
The C++ API
Topology mutations:
• Some graph algorithms need to change the graph's
topology.
E.g. A clustering algorithm ...
The C++ API
Input and output:
• It has Reader/Writer for common file formats:
Text file
Vertices in a relational DB
Row...
Outline
• Introduction

• Computation Model
• Writing a Pregel Program
• System Implementation
• Applications
• Experiment...
Implementation
• Pregel was designed for the Google cluster
architecture.
• Persistent data is stored as files on a distri...
System Architecture
• Executable is copied to many machines.
• One machine becomes the Master.
Maintains worker.
Recover...
Pregel Execution
1. User programs are copied on machines.
2. One machine becomes the master.
 Other computer can find the...
Source: http://www.cnblogs.com/huangfox/archive/2013/01/03/2843103.html

25
Fault Tolerance
• Checkpointing
The master periodically instructs the workers to save the
state of their partitions to pe...
Outline
• Introduction

• Computation Model
• Writing a Pregel Program
• System Implementation
• Applications
• Experiment...
Application – Page Rank
• A = A given page
• T1 …. Tn = Pages that point to page A (citations)
• d = Damping factor betwee...
Application – Page Rank

Source: Wikipedia

29
Application – Page Rank
Store and carry PageRank
class PageRankVertex
: public Vertex<double, void, double> {
public:
virt...
Application – Shortest Path
class ShortestPathVertex
a constant larger than
: public Vertex<int, int, int> {
any feasible ...
Example: SSSP in Pregel

1
10

2

0

9

3

5

4

6

7

2

32
Example: SSSP in Pregel

1
10
10

2

0

9

3

5

4

6

7
5
2

33
Example: SSSP in Pregel

1

10
10

2

0

9

3

5

4

6

7

5

2

34
Example: SSSP in Pregel

2

5

14

8

10

0

11

1

10

9

3

12

4

6

7

5

2

7

35
Example: SSSP in Pregel

1

8

11

10

2

0

9

3

5

4

6

7

5

2

7

36
Example: SSSP in Pregel

9

1

8

11

10

0

14

13

2

9

3

5

4

7

5

2

6

15

7

37
Example: SSSP in Pregel

1

8

9

10

2

0

9

3

5

4

6

7

5

2

7

38
Example: SSSP in Pregel

1

8

9

10

2

0

9

3

5

4

7

5

2

6

13

7

39
Example: SSSP in Pregel

1

8

9

10

2

0

9

3

5

4

6

7

5

2

7

40
Outline
• Introduction

• Computation Model
• Writing a Pregel Program
• System Implementation
• Applications
• Experiment...
Experiments
• 300 multicore commodity PCs used.
• Only running time is counted.
Checkpointing disabled.

• Measures scala...
SSSP - 1 billion vertex binary tree:
# of Pregel workers varies from 50 to 800
174 s
16 times workers
↓
Speedup of 10

17....
SSSP – binary trees:
varying graph sizes on 800 worker tasks

702 s

17.3 s

Graph with a low average
outdegree the runtim...
SSSP – log-normal random graphs (mean outdegree = 127.1):
varying graph sizes on 800 worker tasks

The runtime
Increases l...
Outline
• Introduction

• Computation Model
• Writing a Pregel Program
• System Implementation
• Applications
• Experiment...
Related Work
• MapReduce
Pregel is similar in concept to MapReduce, but with a
natural graph API and much more efficient ...
Related Work
• The closest matches to Pregel are:
Parallel Boost Graph Library[22],[23]
 Pregel provides fault-tolerance...
Outline
• Introduction

• Computation Model
• Writing a Pregel Program
• System Implementation
• Applications
• Experiment...
Conclusion & Future Work
• Pregel is a scalable and fault-tolerant platform with
an API that is sufficiently flexible to e...
Comment
• No comparison with other systems.
• The user has to modify Pregel a lot in order to
personalize it to his/her ne...
Any questions?

THANK YOU

52
Upcoming SlideShare
Loading in …5
×

Pregel

2,381 views

Published on

Google's graph processing framework.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,381
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
95
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • 什麼是區域性? 簡單的來說,就是存取資料或資源時,很常存取或是相關的資料放在一起、或很近的地方的特性
  • Each cluster consists of thousands of commodity PCs organized into racks with high intra-rack bandwidth.Clusters are interconnected but distributed geographically.
  • Sum of all PageRanks = Number of pages
  • Sum of all PageRanks = 1
  • Pregel

    1. 1. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 135-146). ACM Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski Pregel: A System for Large-Scale Graph Processing
    2. 2. Source: SIGMETRICS ’09 Tutorial – MapReduce: The Programming Model and Practice, by Jerry Zhao 2
    3. 3. Outline • Introduction • Computation Model • Writing a Pregel Program • System Implementation • Applications • Experiments • Related Work • Conclusion & Future Work 3
    4. 4. The Problem • Many practical computing problems concern large graphs. Large graph data Graph algorithms Web graph Transportation routes Citation relationships Social networks PageRank Shortest path Connected components Clustering techniques • Efficient processing of large graphs is challenging: Poor locality of memory access Very little work per vertex Changing degree of parallelism Running over many machines makes the problem worse 4
    5. 5. Want to Process a Large Scale Graph? The Options: 1. Crafting a custom distributed infrastructure. Substantial engineering effort. 2. Relying on an existing distributed platform: e.g. Map Reduce. Inefficient: Must store graph state in each state  too much communication between stages. 3. Using a single-computer graph algorithm library. Not scalable.  4. Using an existing parallel graph system. Not fault tolerance.  5
    6. 6. Pregel • Google, to overcome, these challenges came up with Pregel. Provides scalability Fault-tolerance Flexibility to express arbitrary algorithms • The high level organization of Pregel programs is inspired by Valiant’s Bulk Synchronous Parallel model [45]. [45] Leslie G. Valiant, A Bridging Model for Parallel Computation. Comm. ACM 33(8), 1990 6
    7. 7. Bulk Synchronous Parallel Input All Vote to Halt Output • • • • Series of iterations (supersteps) . Each vertex V invokes a function in parallel. Can read messages sent in previous superstep (S-1). Can send messages, to be read at the next superstep (S+1). • Can modify state of outgoing edges. 7
    8. 8. Advantage? In Vertex-Centric Approach • Users focus on a local action. • Processing each item independently. • Ensures that pregel programs are inherently free of deadlocks and data races common in asynchronous systems. 8
    9. 9. Outline • Introduction • Computation Model • Writing a Pregel Program • System Implementation • Applications • Experiments • Related Work • Conclusion & Future Work 9
    10. 10. Model of Computation All Vote to Halt • • • • Outpu t A Directed Graph is given to Pregel. It runs the computation at each vertex. Until all nodes vote for halt. Pregel gives you a directed graph back. 10
    11. 11. Vertex State Machine • Algorithm termination is based on every vertex voting to halt. • In superstep 0, every vertex is in the active state. • A vertex deactivates itself by voting to halt. • It can be reactivated by receiving an (external) message. 11
    12. 12. 3 6 2 1 Blue Arrows are messages. 6 3 6 2 1 6 6 6 6 2 6 6 6 6 6 Blue vertices have voted to halt. Example: Finding the largest value in a graph 12
    13. 13. Outline • Introduction • Computation Model • Writing a Pregel Program • System Implementation • Applications • Experiments • Related Work • Conclusion & Future Work 13
    14. 14. The C++ API • Subclassing the predefined Vertex class, and writes a Compute method. Compute() method: which will be executed at each active vertex in every superstep. • Can get/set vertex value. GetValue() / MutableValue() • Can get/set outgoing edges values. GetOutEdgeIterator() • Can send/receive messages. SendMessageTo() / Compute() 14
    15. 15. The C++ API – Vertex Class 3 value types Override this! in msgs Vertex Edge out msg 15
    16. 16. The C++ API Message passing: • No guaranteed message delivery order. • Messages are delivered exactly once. • Can send messages to any node. • If dest_vertex doesn’t exist, user’s function is called. void SendMessageTo(const string& dest_vertex, const MessageValue& message); 16
    17. 17. The C++ API Combiners (not active by default): • Sending a message to another vertex that exists on a different machine has some overhead. • User specifies a way to reduce many messages into one value (ala Reduce in MR). by overriding the Combine() method. Must be commutative and associative. • Exceedingly useful in certain contexts (e.g., 4x speedup on shortest-path computation). 17
    18. 18. The C++ API Aggregators: • A mechanism for global communication, monitoring, and data. Each vertex can produce a value in a superstep S for the Aggregator to use. The Aggregated value is available to all the vertices in superstep S+1. • Aggregators can be used for statistics and for global communication. E.g., Sum applied to out-edge count of each vertex.  generates the total number of edges in the graph and communicate it to all the vertices. 18
    19. 19. The C++ API Topology mutations: • Some graph algorithms need to change the graph's topology. E.g. A clustering algorithm may need to replace a cluster with a node • Vertices can create / destroy vertices at will. • Resolving conflicting requests: Partial ordering: E Remove,V Remove,V Add, E Add. User-defined handlers: You fix the conflicts on your own. 19
    20. 20. The C++ API Input and output: • It has Reader/Writer for common file formats: Text file Vertices in a relational DB Rows in BigTable • User can customize Reader/Writer for new input/outputs. Subclassing Reader/Writer classes. 20
    21. 21. Outline • Introduction • Computation Model • Writing a Pregel Program • System Implementation • Applications • Experiments • Related Work • Conclusion & Future Work 21
    22. 22. Implementation • Pregel was designed for the Google cluster architecture. • Persistent data is stored as files on a distributed storage system like GFS or BigTable. • Temporary data is stored on local disk. • Vertices are assigned to the machines based on their vertex-ID ( hash(ID) ) so that it can easily be understood that which node is where. 22
    23. 23. System Architecture • Executable is copied to many machines. • One machine becomes the Master. Maintains worker. Recovers faults of workers. Provides Web-UI monitoring tool of job progress. • Other machines become Workers. Processes its task. Communicates with the other workers. 23
    24. 24. Pregel Execution 1. User programs are copied on machines. 2. One machine becomes the master.  Other computer can find the master using name service and register themselves to it.  The master determines how many partitions the graph have 3. The master assigns one or more partitions and a portion of user input to each worker. 4. The workers run the compute function for active vertices and send the messages asynchronously.  There is one thread for each partition in each worker.  When the superstep is finished workers tell the master how many vertices will be active for next superstep. 24
    25. 25. Source: http://www.cnblogs.com/huangfox/archive/2013/01/03/2843103.html 25
    26. 26. Fault Tolerance • Checkpointing The master periodically instructs the workers to save the state of their partitions to persistent storage.  e.g., Vertex values, edge values, incoming messages. • Failure detection Using regular “ping” messages. • Recovery The master reassigns graph partitions to the currently available workers. The workers all reload their partition state from most recent available checkpoint. 26
    27. 27. Outline • Introduction • Computation Model • Writing a Pregel Program • System Implementation • Applications • Experiments • Related Work • Conclusion & Future Work 27
    28. 28. Application – Page Rank • A = A given page • T1 …. Tn = Pages that point to page A (citations) • d = Damping factor between 0 and 1 (usually kept as 0.85) • C(T) = number of links going out of T • PR(A) = the PageRank of page A PR ( A) PR (T1 ) (1 d ) d ( C (T1 ) PR (T2 ) ........ C (T2 ) PR (Tn ) ) C (Tn ) 28
    29. 29. Application – Page Rank Source: Wikipedia 29
    30. 30. Application – Page Rank Store and carry PageRank class PageRankVertex : public Vertex<double, void, double> { public: virtual void Compute(MessageIterator* msgs) { if (superstep() >= 1) { double sum = 0; for (; !msgs->Done(); msgs->Next()) sum += msgs->Value(); *MutableValue() = 0.15 / NumVertices() + 0.85 * sum; } if (superstep() < 30) { const int64 n = GetOutEdgeIterator().size(); SendMessageToAllNeighbors(GetValue() / n); } else VoteToHalt(); For convergence, either there is a limit on } the number of supersteps or aggregators }; are used to detect convergence. 30
    31. 31. Application – Shortest Path class ShortestPathVertex a constant larger than : public Vertex<int, int, int> { any feasible distance void Compute(MessageIterator* msgs) { int mindist = IsSource(vertex_id()) ? 0 : INF; In the 1st superstep, only for (; !msgs->Done(); msgs->Next()) the source vertex will mindist = min(mindist, msgs->Value()); update its value (from INF if (mindist < GetValue()) { to zero) *MutableValue() = mindist; OutEdgeIterator iter = GetOutEdgeIterator(); for (; !iter.Done(); iter.Next()) SendMessageTo(iter.Target(),mindist + iter.GetValue()); } VoteToHalt(); } }; 31
    32. 32. Example: SSSP in Pregel 1 10 2 0 9 3 5 4 6 7 2 32
    33. 33. Example: SSSP in Pregel 1 10 10 2 0 9 3 5 4 6 7 5 2 33
    34. 34. Example: SSSP in Pregel 1 10 10 2 0 9 3 5 4 6 7 5 2 34
    35. 35. Example: SSSP in Pregel 2 5 14 8 10 0 11 1 10 9 3 12 4 6 7 5 2 7 35
    36. 36. Example: SSSP in Pregel 1 8 11 10 2 0 9 3 5 4 6 7 5 2 7 36
    37. 37. Example: SSSP in Pregel 9 1 8 11 10 0 14 13 2 9 3 5 4 7 5 2 6 15 7 37
    38. 38. Example: SSSP in Pregel 1 8 9 10 2 0 9 3 5 4 6 7 5 2 7 38
    39. 39. Example: SSSP in Pregel 1 8 9 10 2 0 9 3 5 4 7 5 2 6 13 7 39
    40. 40. Example: SSSP in Pregel 1 8 9 10 2 0 9 3 5 4 6 7 5 2 7 40
    41. 41. Outline • Introduction • Computation Model • Writing a Pregel Program • System Implementation • Applications • Experiments • Related Work • Conclusion & Future Work 41
    42. 42. Experiments • 300 multicore commodity PCs used. • Only running time is counted. Checkpointing disabled. • Measures scalability of Worker tasks. • Measures scalability w.r.t. # of Vertices. in binary trees and log-normal trees. • Naïve single-source shortest paths (SSSP) implementation. The weight of all edges = 1 42
    43. 43. SSSP - 1 billion vertex binary tree: # of Pregel workers varies from 50 to 800 174 s 16 times workers ↓ Speedup of 10 17.3 s 43
    44. 44. SSSP – binary trees: varying graph sizes on 800 worker tasks 702 s 17.3 s Graph with a low average outdegree the runtime Increases linearly in the graph size. 44
    45. 45. SSSP – log-normal random graphs (mean outdegree = 127.1): varying graph sizes on 800 worker tasks The runtime Increases linearly in the graph size, too. 45
    46. 46. Outline • Introduction • Computation Model • Writing a Pregel Program • System Implementation • Applications • Experiments • Related Work • Conclusion & Future Work 46
    47. 47. Related Work • MapReduce Pregel is similar in concept to MapReduce, but with a natural graph API and much more efficient support for iterative computations over the graph. • Bulk Synchronous Parallel model the Oxford BSP Library[38], Green BSP library[21], BSPlib[26] and Paderborn University BSP library.  The scalability and fault-tolerance implementation has not been evaluated beyond several dozen machines,  and none of them provides a graph-specific API. 47
    48. 48. Related Work • The closest matches to Pregel are: Parallel Boost Graph Library[22],[23]  Pregel provides fault-tolerance CGMgraph[8]  object-oriented programming style at some performance cost • There have been few systems reporting experimental results for graphs at the scale of billions of vertices. 48
    49. 49. Outline • Introduction • Computation Model • Writing a Pregel Program • System Implementation • Applications • Experiments • Related Work • Conclusion & Future Work 49
    50. 50. Conclusion & Future Work • Pregel is a scalable and fault-tolerant platform with an API that is sufficiently flexible to express arbitrary graph algorithms. • Future work Relaxing the synchronicity of the model.  Not to wait for slower workers at inter-superstep barriers. Assigning vertices to machines to minimize inter-machine communication. Caring dense graphs in which most vertices send messages to most other vertices. 50
    51. 51. Comment • No comparison with other systems. • The user has to modify Pregel a lot in order to personalize it to his/her needs. • No failure detection is mentioned for the master, making it a single point of failure. 51
    52. 52. Any questions? THANK YOU 52

    ×