Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Batch Graph Processing Frameworks
1. Comparison of Graph Processing Frameworks
Alex Averbuch
Swedish Institute of Computer Science
averbuch@sics.se
January 25, 2012
Alex Averbuch Big Graph Processing 1 / 36
2. Frameworks Compared
• Pregel: a system for large-scale graph processing. G. Malewicz,
M.H. Austern, AJ Bik, J.C. Dehnert, I. Horn, N. Leiser, and G.
Czajkowski. PODC, 2009.
• Signal/collect: graph algorithms for the (semantic) web. P.
Stutz, A. Bernstein, and W. Cohen. The Semantic Web - ISWC,
2010.
Alex Averbuch Big Graph Processing 2 / 36
3. Background — Big Graphs Everywhere
• Real world web and social graphs continue to grow
• 2008 → Google estimates number of web pages at 1 trillion
• March 2011 → LinkedIn has over 120 million registered users
• September 2011 → Twitter has over 100 million active users
• September 2011 → Facebook has over 800 million active users
Alex Averbuch Big Graph Processing 3 / 36
4. Background — Big Graphs Everywhere
• Real world web and social graphs continue to grow
• 2008 → Google estimates number of web pages at 1 trillion
• March 2011 → LinkedIn has over 120 million registered users
• September 2011 → Twitter has over 100 million active users
• September 2011 → Facebook has over 800 million active users
Data: The New Oil
Alex Averbuch Big Graph Processing 3 / 36
5. Background — Big Graphs Everywhere
• Real world web and social graphs continue to grow
• 2008 → Google estimates number of web pages at 1 trillion
• March 2011 → LinkedIn has over 120 million registered users
• September 2011 → Twitter has over 100 million active users
• September 2011 → Facebook has over 800 million active users
Data: The New Oil
• Relevant, personalized user information relies on graph algorithms
• Popularity rank → determine popular users, news, jobs, etc.
• Shortest paths → find how users, groups, etc. are connected
• Clustering → discover related people, groups, interests, etc.
Alex Averbuch Big Graph Processing 3 / 36
6. Background — The Vertex Centric Model
Definition: Vertex Centric Graph Computing Model
• computations execute on a compute graph
• same topology as that of data graph
• vertices are computational units
• edges are communication channels
• vertices interact with other vertices using messages
• computation proceeds in iterations. in each iteration, vertices:
1 perform some computation
2 communicate with other vertices
Alex Averbuch Big Graph Processing 4 / 36
7. Background — The Vertex Centric Model
Definition: Vertex Centric Graph Computing Model
• computations execute on a compute graph
• same topology as that of data graph
• vertices are computational units
• edges are communication channels
• vertices interact with other vertices using messages
• computation proceeds in iterations. in each iteration, vertices:
1 perform some computation
2 communicate with other vertices
1
4 2
* *
1
Alex Averbuch Big Graph Processing 4 / 36
8. Background — The Vertex Centric Model
Definition: Vertex Centric Graph Computing Model
• computations execute on a compute graph
• same topology as that of data graph
• vertices are computational units
• edges are communication channels
• vertices interact with other vertices using messages
• computation proceeds in iterations. in each iteration, vertices:
1 perform some computation
2 communicate with other vertices
1
4
4 2
0 - - -
1
1
iteration 0
Alex Averbuch Big Graph Processing 4 / 36
9. Background — The Vertex Centric Model
Definition: Vertex Centric Graph Computing Model
• computations execute on a compute graph
• same topology as that of data graph
• vertices are computational units
• edges are communication channels
• vertices interact with other vertices using messages
• computation proceeds in iterations. in each iteration, vertices:
1 perform some computation
2 communicate with other vertices
1
4 2
0 4 1 -
1
iteration 0
Alex Averbuch Big Graph Processing 4 / 36
10. Background — The Vertex Centric Model
Definition: Vertex Centric Graph Computing Model
• computations execute on a compute graph
• same topology as that of data graph
• vertices are computational units
• edges are communication channels
• vertices interact with other vertices using messages
• computation proceeds in iterations. in each iteration, vertices:
1 perform some computation
2 communicate with other vertices
5
1
3
4 2
0 4 1 -
1
iteration 1
Alex Averbuch Big Graph Processing 4 / 36
11. Background — The Vertex Centric Model
Definition: Vertex Centric Graph Computing Model
• computations execute on a compute graph
• same topology as that of data graph
• vertices are computational units
• edges are communication channels
• vertices interact with other vertices using messages
• computation proceeds in iterations. in each iteration, vertices:
1 perform some computation
2 communicate with other vertices
1
4 2
0 3 1 5
1
iteration 1
Alex Averbuch Big Graph Processing 4 / 36
12. Background — The Vertex Centric Model
Definition: Vertex Centric Graph Computing Model
• computations execute on a compute graph
• same topology as that of data graph
• vertices are computational units
• edges are communication channels
• vertices interact with other vertices using messages
• computation proceeds in iterations. in each iteration, vertices:
1 perform some computation
2 communicate with other vertices
4
1
4 2
0 3 1 5
1
iteration 2
Alex Averbuch Big Graph Processing 4 / 36
13. Background — The Vertex Centric Model
Definition: Vertex Centric Graph Computing Model
• computations execute on a compute graph
• same topology as that of data graph
• vertices are computational units
• edges are communication channels
• vertices interact with other vertices using messages
• computation proceeds in iterations. in each iteration, vertices:
1 perform some computation
2 communicate with other vertices
1
4 2
0 3 1 4
1
iteration 2
Alex Averbuch Big Graph Processing 4 / 36
14. Background — The Vertex Centric Model
Definition: Vertex Centric Graph Computing Model
• computations execute on a compute graph
• same topology as that of data graph
• vertices are computational units
• edges are communication channels
• vertices interact with other vertices using messages
• computation proceeds in iterations. in each iteration, vertices:
1 perform some computation
2 communicate with other vertices
1
4 2
0 3 1 4
1
Alex Averbuch Big Graph Processing 4 / 36
15. Pregel — Contributions
• parallel programming model (for processing graphs)
• distributed execution model (for processing graphs)
• (limited) evaluation → using big data sets
Alex Averbuch Big Graph Processing 5 / 36
16. Pregel — Overview
• vertex centric graph computing model
• in each iteration a compute function is invoked on each vertex
1 reads messages sent to it in previous iteration
2 modifies its state & local graph topology
3 sends messages to other vertices
4 votes to halt (to become inactive)
Vote to halt
Active Inactive
Message received
Vertex State Machine
Alex Averbuch Big Graph Processing 6 / 36
17. Pregel — Programming Model (Vertex & Edge)
• Vertex (v)
• v.id → unique identifier
• v.state → arbitrary vertex state
• v.outEdges : List[Edge] → list of edges that have v as source
• v.compute() : per iteration, calculates new state
1 reads incoming messages, from previous iteration
2 sends (unbounded number of) messages to other vertices
3 if destination non-existent, call handler (create vertex/remove edge)
4 modifies its state and that of its outgoing edges
5 adds/removes edges to/from outEdges
6 votes to halt
• Edge (e)
• e.targetId → identifier of target vertex
• e.state → arbitrary edge state
• no associated computation
Alex Averbuch Big Graph Processing 7 / 36
18. Pregel — Programming Model (Combiner & Aggregator)
• Combiner
• combines multiple messages into one (like reducer in M/R)
• combined using commutative & associative function
• reduces network traffic & message buffer size
• e.g. in SSSP vertex only cares about length of shortest path
Alex Averbuch Big Graph Processing 8 / 36
19. Pregel — Programming Model (Combiner & Aggregator)
• Combiner
• combines multiple messages into one (like reducer in M/R)
• combined using commutative & associative function
• reduces network traffic & message buffer size
• e.g. in SSSP vertex only cares about length of shortest path
• Aggregator
• globally shared/aggregated state
1 vertices write to aggregator variable locally
2 globally aggregated value available to all vertices in next iteration
• aggregated using commutative & associative function
• pre-defined aggregators: min, max, sum
Alex Averbuch Big Graph Processing 8 / 36
20. Pregel — Programming Model (Topology Mutations)
• determinism in the presence of conflicts is achieved by:
1 partial ordering
1 remove edges
2 remove vertices (implicitly removes edges)
3 add vertices
4 add edges
2 conflict handlers
• example conflict → vertices with same ID created simultaneously
• extend conflict handler() of Vertex class
• same handler called for all conflict types
Alex Averbuch Big Graph Processing 9 / 36
21. Pregel — Programming Model (Topology Mutations)
• determinism in the presence of conflicts is achieved by:
1 partial ordering
1 remove edges
2 remove vertices (implicitly removes edges)
3 add vertices
4 add edges
2 conflict handlers
• example conflict → vertices with same ID created simultaneously
• extend conflict handler() of Vertex class
• same handler called for all conflict types
• most topology changes are seen in next iteration
• self mutations (remove out edge, remove self) are immediate
Alex Averbuch Big Graph Processing 9 / 36
22. Pregel — Programming Model Example — Vertex
Code: Vertex program for Single Source Shortest Path (SSSP)
class S h o rt es t Pa t hV e rt e x : public Vertex < int , int , int > {
void Compute ( MessageIt e ra to r * msgs ) {
// i n i t i a l i z a t i o n
int mindist = IsSource ( vertex_id () ) ? 0 : INF ;
// read incoming me s s a g e s & update mindist
for (; ! msgs - > Done () ; msgs - > Next () )
mindist = min ( mindist , msgs - > Value () ) ;
// send updated mindist to n e i g h b o r s
if ( mindist < GetValue () ) {
* MutableValue () = mindist ;
O utEdgeIterator iter = G e t O u t E d g e I t e r a t o r () ;
for (; ! iter . Done () ; iter . Next () )
SendMessageTo ( iter . Target () ,
mindist + iter . GetValue () ) ;
}
// d e a c t i v a t e unless / until another message arrives
VoteToHalt () ;
}
};
Alex Averbuch Big Graph Processing 10 / 36
23. Pregel — Execution Model
• vertex scheduling: all active vertices, per iteration
• termination: no active vertices & no messages in transit
Scheduler: Pregel (Bulk Synchronous Parallel)
while (∃v ∈ V : v.active = true) do
for all v ∈ V parallel do
if (v.active = true) then
v.compute()
Alex Averbuch Big Graph Processing 11 / 36
31. Pregel — Execution — Without Combiner
4 mode: pregel
2
3
2
1 1
5 total iterations: 5
total operations: 31
1 total messages: 23
4 1
1 5
0 2 2
3 2
1 6 1
4
1 1
3
2
3
4 6
5 2 1
7
Alex Averbuch Big Graph Processing 12 / 36
32. Pregel — Programming Model Example — Combiner
Code: Combiner program for Single Source Shortest Path (SSSP)
class MinIntCombiner : public Combiner < int > {
virtual void Combine ( Me s sa ge It e ra to r * msgs ) {
// i n i t i a l i z a t i o n
int mindist = INF ;
// read messages & update mindist
for (; ! msgs - > Done () ; msgs - > Next () )
mindist = min ( mindist , msgs - > Value () ) ;
// only emit minimum message value ( d i s t a n c e )
Output ( " combined_so ur ce " , mindist ) ;
}
};
Alex Averbuch Big Graph Processing 13 / 36
43. Pregel — Typical Program
• Client
1 load input data into workers
2 notify master to “start processing”
3 wait for master to complete
Alex Averbuch Big Graph Processing 17 / 36
44. Pregel — Typical Program
• Client
1 load input data into workers
2 notify master to “start processing”
3 wait for master to complete
• Master
1 repeat until no active workers
• signal workers to process
• wait for all workers to finish
• update active-worker count
Alex Averbuch Big Graph Processing 17 / 36
45. Pregel — Typical Program
• Client
1 load input data into workers
2 notify master to “start processing”
3 wait for master to complete
• Master
1 repeat until no active workers
• signal workers to process
• wait for all workers to finish
• update active-worker count
• Worker
1 repeat until inactive
• wait for “start iteration” from master
• read data from in-queue
• perform local processing
• write data to out-queue & transmit
• update active/inactive status
• notify master
Alex Averbuch Big Graph Processing 17 / 36
46. Pregel — Typical Program
• Client
1 load input data into workers
2 notify master to “start processing”
3 wait for master to complete
• Master
1 repeat until no active workers
• signal workers to process
• wait for all workers to finish
• update active-worker count
2 notify client about completion
• Worker
1 repeat until inactive
• wait for “start iteration” from master
• read data from in-queue
• perform local processing
• write data to out-queue & transmit
• update active/inactive status
• notify master
Alex Averbuch Big Graph Processing 17 / 36
47. Pregel — Typical Program
• Client
1 load input data into workers
2 notify master to “start processing”
3 wait for master to complete
4 extract result data from workers
• Master
1 repeat until no active workers
• signal workers to process
• wait for all workers to finish
• update active-worker count
2 notify client about completion
• Worker
1 repeat until inactive
• wait for “start iteration” from master
• read data from in-queue
• perform local processing
• write data to out-queue & transmit
• update active/inactive status
• notify master
Alex Averbuch Big Graph Processing 17 / 36
48. Pregel — Fault Tolerance
• logging
1 checkpointing → state persisted at beginning of every n-th iteration
• master persists → progress of execution, aggregate values
• workers persist → vertex values, edge values, messages
2 confined recovery → workers log out-messages from their partitions
Alex Averbuch Big Graph Processing 18 / 36
49. Pregel — Fault Tolerance
• logging
1 checkpointing → state persisted at beginning of every n-th iteration
• master persists → progress of execution, aggregate values
• workers persist → vertex values, edge values, messages
2 confined recovery → workers log out-messages from their partitions
• failure detection
• heart beats
• worker gets no heartbeat from master → worker terminates
• master gets no heartbeat from worker → marks worker as failed
Alex Averbuch Big Graph Processing 18 / 36
50. Pregel — Fault Tolerance
• logging
1 checkpointing → state persisted at beginning of every n-th iteration
• master persists → progress of execution, aggregate values
• workers persist → vertex values, edge values, messages
2 confined recovery → workers log out-messages from their partitions
• failure detection
• heart beats
• worker gets no heartbeat from master → worker terminates
• master gets no heartbeat from worker → marks worker as failed
• failure recovery
• partition(s) belonging to failed worker(s) are reassigned
• lost partitions recovered from checkpoints
• missing iterations recomputed (using logged messages)
Alex Averbuch Big Graph Processing 18 / 36
51. Pregel — Evaluation — Scaling
• algorithm → Single Source Shortest Path (SSSP)
• hardware → 800 worker tasks scheduled on 300 multicore machines
• graph
• random, log-normal out-degree distribution (mean = 127.1)
• up to 1,000,000,000 vertices / 127,000,000,000 edges
Results: Scalability of Pregel (SSSP)
800
700
Runtime (seconds)
600
500
400
300
200
100
200 400 600 800 1000
Number of vertices (millions)
Alex Averbuch Big Graph Processing 19 / 36
52. Signal/Collect — Contributions
• parallel programming model (for processing graphs)
• parallel execution model (for processing graphs)
• (limited) evaluation → benefits of various scheduling policies
Alex Averbuch Big Graph Processing 20 / 36
53. Signal/Collect — Programming Model (Vertex)
• Vertex (v)
• v.id → unique identifier
• v.state → arbitrary vertex state
• v.lastSignalState → v.state at time of last signal()
• v.outEdges : List[edge] → list of edges that have v as source
• v.signalMap : Map(vid,signal) → last received messages
• vid - identifier of sender vertex
• signal - last received signal from that vertex
• v.uncollectedSignals : List[signal] → list of signals
received since collect() was last executed
• v.collect() : calculates new vertex state
1 collect incoming signals
2 process those signals (possibly using v.state)
3 return new vertex state
Alex Averbuch Big Graph Processing 21 / 36
54. Signal/Collect — Programming Model (Edge)
• Edge (e)
• e.source → source vertex
• e.sourceId → identifier of source vertex (e.source.id)
• e.targetId → identifier of target vertex
• e.state → arbitrary edge state
• e.signal() → calculates the signal to send, then sends it
• signals are sent along edges of the compute graph
Alex Averbuch Big Graph Processing 22 / 36
55. Signal/Collect — Programming Model Example
Code: Single Source Shortest Path (SSSP)
class Location ( id : Any , initialState : Int ) extends Vertex {
def collect : Int = min ( state , min ( u n c o l l e c t e d S i g n a l s ) )
}
class Path ( sourceId : Any , targetId : Int ) extends Edge {
def signal : Int = source . state + weight
}
• vertex state → shortest known path length to vertex from source
• edge state (weight) → path length of that individual edge
• signal → shortest known path length from source through edge
Alex Averbuch Big Graph Processing 23 / 36
56. Signal/Collect — Execution Model
• different Execution Modes (scheduling policies)
1 synchronous
2 synchronous score-guided
3 asynchronous
4 asynchronous scheduled
Alex Averbuch Big Graph Processing 24 / 36
57. Signal/Collect — Execution Model
• different Execution Modes (scheduling policies)
1 synchronous
2 synchronous score-guided
3 asynchronous
4 asynchronous scheduled
• execution mode dictates when signal & collect are called
Definition: Internal methods used by execution engine
procedure v.executeSignalOperation()
lastSignalState ← state
for all e ∈ outGoingEdges do
e.target.uncollectedSignals.append(e.signal())
e.target.signalMap.put(e.sourceId,e.signal())
procedure v.executeCollectOperation()
state ← collect()
uncollectedSignals ← Nil
Alex Averbuch Big Graph Processing 24 / 36
58. Signal/Collect — Execution — Synchronous
• vertex scheduling: all vertices (unordered) per iteration
• termination: all iterations
Scheduler: Synchronous
for i ← 1 to num iterations do
for all v ∈ V parallel do
v.executeSignalOperation()
for all v ∈ V parallel do
v.executeCollectOperation()
Alex Averbuch Big Graph Processing 25 / 36
66. Signal/Collect — Execution — Synchronous Guided
• vertex scheduling: all active vertices (unordered) per iteration
• termination: all iterations or (no signal and no collect)
Alex Averbuch Big Graph Processing 27 / 36
67. Signal/Collect — Execution — Synchronous Guided
• vertex scheduling: all active vertices (unordered) per iteration
• termination: all iterations or (no signal and no collect)
• extension v.signalScore() “importance for vertex to signal”
• may change ⇐⇒ v.state changes
• default → 1 if changed, 0 otherwise
• extension v.collectScore() “importance for vertex to collect”
• may change ⇐⇒ v.uncollectedSignals changes
• default → v.uncollectedSignals.size()
Alex Averbuch Big Graph Processing 27 / 36
68. Signal/Collect — Execution — Synchronous Guided
• vertex scheduling: all active vertices (unordered) per iteration
• termination: all iterations or (no signal and no collect)
Scheduler: Synchronous Score-Guided
done ← false
while (iterations < num iterations) ∧ (done = false) do
done ← true
iterations ← iterations + 1
for all v ∈ V parallel do
if v.signalScore() > signal threshold then
done ← false
v.executeSignalOperation()
for all v ∈ V parallel do
if v.collectScore() > collect threshold then
done ← false
v.executeCollectOperation()
Alex Averbuch Big Graph Processing 27 / 36
76. Signal/Collect — Execution — Asynchronous
• vertex scheduling: random
• termination: max operations or (no signal and no collect)
Alex Averbuch Big Graph Processing 29 / 36
77. Signal/Collect — Execution — Asynchronous
• vertex scheduling: random
• termination: max operations or (no signal and no collect)
• no guarantee on order of execution
• some vertices may signal while others collect
• no guarantee that all vertices are executed same amount of times
• asynchronous mode not usable for every algorithm
• use when correctness not dependent on strict execution order
Alex Averbuch Big Graph Processing 29 / 36
78. Signal/Collect — Execution — Asynchronous
• vertex scheduling: random
• termination: max operations or (no signal and no collect)
Scheduler: Asynchronous
ops ← 0
while (ops < num ops) ∧ ∃v ∈ V :
(v.signalScore > signal threshold) ∨ (v.collectScore > collect threshold) do
S ← random subset of V
for all v ∈ S do
next ← random(signal/collect)
if (next = signal) ∧ (v.signalScore > signal threshold) then
v.executeSignalOperation()
ops ← ops + 1
else if (next = collect) ∧ (v.collectScore > collect threshold) then
v.executeCollectOperation()
ops ← ops + 1
Alex Averbuch Big Graph Processing 29 / 36
79. Signal/Collect — Execution — Asynchronous Scheduled
• vertex scheduling: scheduler-dependent
• termination: scheduler-dependent
• scheduler: schedules vertices’ signal & collect operations
Alex Averbuch Big Graph Processing 30 / 36
80. Signal/Collect — Execution — Asynchronous Scheduled
• vertex scheduling: scheduler-dependent
• termination: scheduler-dependent
• scheduler: schedules vertices’ signal & collect operations
• e.g. “eager” scheduler → tries to signal right after collection
Scheduler: “Eager”
for all v ∈ V do
if (v.collectScore() > collect threshold) then
v.executeCollectOperation()
if (v.signalScore() > signal threshold) then
v.executeSignalOperation()
Alex Averbuch Big Graph Processing 30 / 36
81. Signal/Collect — Execution — Asynchronous Scheduled
• vertex scheduling: scheduler-dependent
• termination: scheduler-dependent
• scheduler: schedules vertices’ signal & collect operations
• benefit of this execution mode depends on:
1 impact on convergence (number of operations)
2 cost of operations
3 cost of scheduling
Alex Averbuch Big Graph Processing 30 / 36
82. Signal/Collect — Execution — Asynchronous Scheduled
Scheduler: “SSSP” — minimize messages & computation
Signal ← {v source} // sorted set
while (Signal = {}) do
for top k v ∈ Signal do
v.executeSignalOperation()
Signal.remove(v)
for all v ∈ V do
if (v.collectScore > collect threshold) then
v.executeCollectOperation()
if (v.signalScore > signal threshold) then
Signal.put(v)
if (v destination.state ≤ min(Signal)) then
Signal ← {}
// returns distance of shortest next step
function signalSort(v)
Distances ← {∞}
for all e ∈ v.outEdges do
Distances.put(e.signal)
return min(Distances)
Alex Averbuch Big Graph Processing 31 / 36