Batch Graph Processing Frameworks

1,058 views

Published on

A comparison of two distributed computation frameworks, for the batch processing of graph datasets

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,058
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
38
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Batch Graph Processing Frameworks

  1. 1. Comparison of Graph Processing Frameworks Alex Averbuch Swedish Institute of Computer Science averbuch@sics.se January 25, 2012 Alex Averbuch Big Graph Processing 1 / 36
  2. 2. Frameworks Compared • Pregel: a system for large-scale graph processing. G. Malewicz, M.H. Austern, AJ Bik, J.C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. PODC, 2009. • Signal/collect: graph algorithms for the (semantic) web. P. Stutz, A. Bernstein, and W. Cohen. The Semantic Web - ISWC, 2010. Alex Averbuch Big Graph Processing 2 / 36
  3. 3. Background — Big Graphs Everywhere • Real world web and social graphs continue to grow • 2008 → Google estimates number of web pages at 1 trillion • March 2011 → LinkedIn has over 120 million registered users • September 2011 → Twitter has over 100 million active users • September 2011 → Facebook has over 800 million active users Alex Averbuch Big Graph Processing 3 / 36
  4. 4. Background — Big Graphs Everywhere • Real world web and social graphs continue to grow • 2008 → Google estimates number of web pages at 1 trillion • March 2011 → LinkedIn has over 120 million registered users • September 2011 → Twitter has over 100 million active users • September 2011 → Facebook has over 800 million active users Data: The New Oil Alex Averbuch Big Graph Processing 3 / 36
  5. 5. Background — Big Graphs Everywhere • Real world web and social graphs continue to grow • 2008 → Google estimates number of web pages at 1 trillion • March 2011 → LinkedIn has over 120 million registered users • September 2011 → Twitter has over 100 million active users • September 2011 → Facebook has over 800 million active users Data: The New Oil • Relevant, personalized user information relies on graph algorithms • Popularity rank → determine popular users, news, jobs, etc. • Shortest paths → find how users, groups, etc. are connected • Clustering → discover related people, groups, interests, etc. Alex Averbuch Big Graph Processing 3 / 36
  6. 6. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices Alex Averbuch Big Graph Processing 4 / 36
  7. 7. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 * * 1 Alex Averbuch Big Graph Processing 4 / 36
  8. 8. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 4 2 0 - - - 1 1 iteration 0 Alex Averbuch Big Graph Processing 4 / 36
  9. 9. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 0 4 1 - 1 iteration 0 Alex Averbuch Big Graph Processing 4 / 36
  10. 10. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 5 1 3 4 2 0 4 1 - 1 iteration 1 Alex Averbuch Big Graph Processing 4 / 36
  11. 11. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 0 3 1 5 1 iteration 1 Alex Averbuch Big Graph Processing 4 / 36
  12. 12. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 4 1 4 2 0 3 1 5 1 iteration 2 Alex Averbuch Big Graph Processing 4 / 36
  13. 13. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 0 3 1 4 1 iteration 2 Alex Averbuch Big Graph Processing 4 / 36
  14. 14. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 0 3 1 4 1 Alex Averbuch Big Graph Processing 4 / 36
  15. 15. Pregel — Contributions • parallel programming model (for processing graphs) • distributed execution model (for processing graphs) • (limited) evaluation → using big data sets Alex Averbuch Big Graph Processing 5 / 36
  16. 16. Pregel — Overview • vertex centric graph computing model • in each iteration a compute function is invoked on each vertex 1 reads messages sent to it in previous iteration 2 modifies its state & local graph topology 3 sends messages to other vertices 4 votes to halt (to become inactive) Vote to halt Active Inactive Message received Vertex State Machine Alex Averbuch Big Graph Processing 6 / 36
  17. 17. Pregel — Programming Model (Vertex & Edge) • Vertex (v) • v.id → unique identifier • v.state → arbitrary vertex state • v.outEdges : List[Edge] → list of edges that have v as source • v.compute() : per iteration, calculates new state 1 reads incoming messages, from previous iteration 2 sends (unbounded number of) messages to other vertices 3 if destination non-existent, call handler (create vertex/remove edge) 4 modifies its state and that of its outgoing edges 5 adds/removes edges to/from outEdges 6 votes to halt • Edge (e) • e.targetId → identifier of target vertex • e.state → arbitrary edge state • no associated computation Alex Averbuch Big Graph Processing 7 / 36
  18. 18. Pregel — Programming Model (Combiner & Aggregator) • Combiner • combines multiple messages into one (like reducer in M/R) • combined using commutative & associative function • reduces network traffic & message buffer size • e.g. in SSSP vertex only cares about length of shortest path Alex Averbuch Big Graph Processing 8 / 36
  19. 19. Pregel — Programming Model (Combiner & Aggregator) • Combiner • combines multiple messages into one (like reducer in M/R) • combined using commutative & associative function • reduces network traffic & message buffer size • e.g. in SSSP vertex only cares about length of shortest path • Aggregator • globally shared/aggregated state 1 vertices write to aggregator variable locally 2 globally aggregated value available to all vertices in next iteration • aggregated using commutative & associative function • pre-defined aggregators: min, max, sum Alex Averbuch Big Graph Processing 8 / 36
  20. 20. Pregel — Programming Model (Topology Mutations) • determinism in the presence of conflicts is achieved by: 1 partial ordering 1 remove edges 2 remove vertices (implicitly removes edges) 3 add vertices 4 add edges 2 conflict handlers • example conflict → vertices with same ID created simultaneously • extend conflict handler() of Vertex class • same handler called for all conflict types Alex Averbuch Big Graph Processing 9 / 36
  21. 21. Pregel — Programming Model (Topology Mutations) • determinism in the presence of conflicts is achieved by: 1 partial ordering 1 remove edges 2 remove vertices (implicitly removes edges) 3 add vertices 4 add edges 2 conflict handlers • example conflict → vertices with same ID created simultaneously • extend conflict handler() of Vertex class • same handler called for all conflict types • most topology changes are seen in next iteration • self mutations (remove out edge, remove self) are immediate Alex Averbuch Big Graph Processing 9 / 36
  22. 22. Pregel — Programming Model Example — Vertex Code: Vertex program for Single Source Shortest Path (SSSP) class S h o rt es t Pa t hV e rt e x : public Vertex < int , int , int > { void Compute ( MessageIt e ra to r * msgs ) { // i n i t i a l i z a t i o n int mindist = IsSource ( vertex_id () ) ? 0 : INF ; // read incoming me s s a g e s & update mindist for (; ! msgs - > Done () ; msgs - > Next () ) mindist = min ( mindist , msgs - > Value () ) ; // send updated mindist to n e i g h b o r s if ( mindist < GetValue () ) { * MutableValue () = mindist ; O utEdgeIterator iter = G e t O u t E d g e I t e r a t o r () ; for (; ! iter . Done () ; iter . Next () ) SendMessageTo ( iter . Target () , mindist + iter . GetValue () ) ; } // d e a c t i v a t e unless / until another message arrives VoteToHalt () ; } }; Alex Averbuch Big Graph Processing 10 / 36
  23. 23. Pregel — Execution Model • vertex scheduling: all active vertices, per iteration • termination: no active vertices & no messages in transit Scheduler: Pregel (Bulk Synchronous Parallel) while (∃v ∈ V : v.active = true) do for all v ∈ V parallel do if (v.active = true) then v.compute() Alex Averbuch Big Graph Processing 11 / 36
  24. 24. Pregel — Execution — Without Combiner - mode: pregel 2 3 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 12 / 36
  25. 25. Pregel — Execution — Without Combiner - mode: pregel 2 3 - - 1 5 iteration: 0 1 computing vertices: 13 1 messages: 3 - 1 1 - total operations: 13 0 2 2 - total messages: 3 3 3 - 1 1 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 12 / 36
  26. 26. Pregel — Execution — Without Combiner - mode: pregel 4 2 3 - 1 1 5 iteration: 1 2 computing vertices: 3 1 messages: 6 - 1 1 - 2 total operations: 16 0 2 2 3 total messages: 9 3 5 1 - 1 2 4 1 1 3 - 3 5 4 * - 2 1 - Alex Averbuch Big Graph Processing 12 / 36
  27. 27. Pregel — Execution — Without Combiner 4 6 mode: pregel 2 3 2 7 1 1 5 iteration: 2 6 computing vertices: 6 1 messages: 7 5 1 1 - 7 total operations: 22 0 2 2 2 total messages: 16 3 4 1 - 1 6 4 1 1 3 2 3 4 * 7 2 1 5 - Alex Averbuch Big Graph Processing 12 / 36
  28. 28. Pregel — Execution — Without Combiner 4 mode: pregel 2 3 2 1 1 5 iteration: 3 5 computing vertices: 5 1 messages: 6 4 1 1 6 6 total operations: 27 0 2 2 2 total messages: 22 3 6 1 7 1 4 1 1 3 2 10 3 9 4 * 5 2 1 7 8 Alex Averbuch Big Graph Processing 12 / 36
  29. 29. Pregel — Execution — Without Combiner 4 mode: pregel 2 3 2 1 1 5 iteration: 4 computing vertices: 3 1 messages: 1 4 1 1 5 total operations: 30 0 2 2 2 total messages: 23 3 6 1 6 1 4 1 1 3 2 3 4 7 5 2 1 7 Alex Averbuch Big Graph Processing 12 / 36
  30. 30. Pregel — Execution — Without Combiner 4 mode: pregel 2 3 2 1 1 5 iteration: 5 computing vertices: 1 1 messages: 0 4 1 1 5 total operations: 31 0 2 2 2 total messages: 23 3 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 12 / 36
  31. 31. Pregel — Execution — Without Combiner 4 mode: pregel 2 3 2 1 1 5 total iterations: 5 total operations: 31 1 total messages: 23 4 1 1 5 0 2 2 3 2 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 12 / 36
  32. 32. Pregel — Programming Model Example — Combiner Code: Combiner program for Single Source Shortest Path (SSSP) class MinIntCombiner : public Combiner < int > { virtual void Combine ( Me s sa ge It e ra to r * msgs ) { // i n i t i a l i z a t i o n int mindist = INF ; // read messages & update mindist for (; ! msgs - > Done () ; msgs - > Next () ) mindist = min ( mindist , msgs - > Value () ) ; // only emit minimum message value ( d i s t a n c e ) Output ( " combined_so ur ce " , mindist ) ; } }; Alex Averbuch Big Graph Processing 13 / 36
  33. 33. Pregel — Execution — With Combiner - mode: pregel (combined) 2 3 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 14 / 36
  34. 34. Pregel — Execution — With Combiner - mode: pregel (combined) 2 3 - - 1 5 iteration: 0 1 computing vertices: 13 1 messages: 3 - 1 1 - total operations: 13 0 2 2 - total messages: 3 3 3 - 1 1 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 14 / 36
  35. 35. Pregel — Execution — With Combiner - mode: pregel (combined) 4 2 3 - 1 1 5 iteration: 1 2 computing vertices: 3 1 messages: 6 - 1 1 - 2 total operations: 16 0 2 2 3 total messages: 9 3 5 1 - 1 2 4 1 1 3 - 3 5 4 * - 2 1 - Alex Averbuch Big Graph Processing 14 / 36
  36. 36. Pregel — Execution — With Combiner 4 6 mode: pregel (combined) 2 3 2 7 1 1 5 iteration: 2 6 computing vertices: 6 1 messages: 5 5 1 1 - 7 total operations: 22 0 2 2 2 total messages: 14 3 4 1 - 1 6 4 1 1 3 2 3 4 * 7 2 1 5 - Alex Averbuch Big Graph Processing 14 / 36
  37. 37. Pregel — Execution — With Combiner 4 mode: pregel (combined) 2 3 2 1 1 5 iteration: 3 5 computing vertices: 5 1 messages: 3 4 1 1 6 6 total operations: 27 0 2 2 2 total messages: 17 3 6 1 7 1 4 1 1 3 2 10 3 9 4 * 5 2 1 7 8 Alex Averbuch Big Graph Processing 14 / 36
  38. 38. Pregel — Execution — With Combiner 4 mode: pregel (combined) 2 3 2 1 1 5 iteration: 4 computing vertices: 3 1 messages: 1 4 1 1 5 total operations: 30 0 2 2 2 total messages: 18 3 6 1 6 1 4 1 1 3 2 3 4 7 5 2 1 7 Alex Averbuch Big Graph Processing 14 / 36
  39. 39. Pregel — Execution — With Combiner 4 mode: pregel (combined) 2 3 2 1 1 5 iteration: 5 computing vertices: 1 1 messages: 0 4 1 1 5 total operations: 31 0 2 2 2 total messages: 18 3 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 14 / 36
  40. 40. Pregel — Execution — With Combiner 4 mode: pregel (combined) 2 3 2 1 1 5 total iterations: 5 total operations: 31 1 total messages: 18 4 1 1 5 0 2 2 3 2 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 14 / 36
  41. 41. Pregel — Execution — Comparison • combiner vs no combiner • algorithm → Single Source Shortest Path (SSSP) • sample graph → 13 vertices / 19 edges • cost → iterations, operations, message buffers Results: Cost comparison of execution modes (SSSP) Iterations Operations Messages Pregel 5 31 23 Pregel + Combiner 5 31 18 Alex Averbuch Big Graph Processing 15 / 36
  42. 42. Pregel — Architecture Synchronization & Master Aggregatation Worker outQ Worker outQ Worker outQinQ Partition inQ Partition inQ Partition Loading & Checkpointing Graph Dataset (Combined) Messages Alex Averbuch Big Graph Processing 16 / 36
  43. 43. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete Alex Averbuch Big Graph Processing 17 / 36
  44. 44. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete • Master 1 repeat until no active workers • signal workers to process • wait for all workers to finish • update active-worker count Alex Averbuch Big Graph Processing 17 / 36
  45. 45. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete • Master 1 repeat until no active workers • signal workers to process • wait for all workers to finish • update active-worker count • Worker 1 repeat until inactive • wait for “start iteration” from master • read data from in-queue • perform local processing • write data to out-queue & transmit • update active/inactive status • notify master Alex Averbuch Big Graph Processing 17 / 36
  46. 46. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete • Master 1 repeat until no active workers • signal workers to process • wait for all workers to finish • update active-worker count 2 notify client about completion • Worker 1 repeat until inactive • wait for “start iteration” from master • read data from in-queue • perform local processing • write data to out-queue & transmit • update active/inactive status • notify master Alex Averbuch Big Graph Processing 17 / 36
  47. 47. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete 4 extract result data from workers • Master 1 repeat until no active workers • signal workers to process • wait for all workers to finish • update active-worker count 2 notify client about completion • Worker 1 repeat until inactive • wait for “start iteration” from master • read data from in-queue • perform local processing • write data to out-queue & transmit • update active/inactive status • notify master Alex Averbuch Big Graph Processing 17 / 36
  48. 48. Pregel — Fault Tolerance • logging 1 checkpointing → state persisted at beginning of every n-th iteration • master persists → progress of execution, aggregate values • workers persist → vertex values, edge values, messages 2 confined recovery → workers log out-messages from their partitions Alex Averbuch Big Graph Processing 18 / 36
  49. 49. Pregel — Fault Tolerance • logging 1 checkpointing → state persisted at beginning of every n-th iteration • master persists → progress of execution, aggregate values • workers persist → vertex values, edge values, messages 2 confined recovery → workers log out-messages from their partitions • failure detection • heart beats • worker gets no heartbeat from master → worker terminates • master gets no heartbeat from worker → marks worker as failed Alex Averbuch Big Graph Processing 18 / 36
  50. 50. Pregel — Fault Tolerance • logging 1 checkpointing → state persisted at beginning of every n-th iteration • master persists → progress of execution, aggregate values • workers persist → vertex values, edge values, messages 2 confined recovery → workers log out-messages from their partitions • failure detection • heart beats • worker gets no heartbeat from master → worker terminates • master gets no heartbeat from worker → marks worker as failed • failure recovery • partition(s) belonging to failed worker(s) are reassigned • lost partitions recovered from checkpoints • missing iterations recomputed (using logged messages) Alex Averbuch Big Graph Processing 18 / 36
  51. 51. Pregel — Evaluation — Scaling • algorithm → Single Source Shortest Path (SSSP) • hardware → 800 worker tasks scheduled on 300 multicore machines • graph • random, log-normal out-degree distribution (mean = 127.1) • up to 1,000,000,000 vertices / 127,000,000,000 edges Results: Scalability of Pregel (SSSP) 800 700 Runtime (seconds) 600 500 400 300 200 100 200 400 600 800 1000 Number of vertices (millions) Alex Averbuch Big Graph Processing 19 / 36
  52. 52. Signal/Collect — Contributions • parallel programming model (for processing graphs) • parallel execution model (for processing graphs) • (limited) evaluation → benefits of various scheduling policies Alex Averbuch Big Graph Processing 20 / 36
  53. 53. Signal/Collect — Programming Model (Vertex) • Vertex (v) • v.id → unique identifier • v.state → arbitrary vertex state • v.lastSignalState → v.state at time of last signal() • v.outEdges : List[edge] → list of edges that have v as source • v.signalMap : Map(vid,signal) → last received messages • vid - identifier of sender vertex • signal - last received signal from that vertex • v.uncollectedSignals : List[signal] → list of signals received since collect() was last executed • v.collect() : calculates new vertex state 1 collect incoming signals 2 process those signals (possibly using v.state) 3 return new vertex state Alex Averbuch Big Graph Processing 21 / 36
  54. 54. Signal/Collect — Programming Model (Edge) • Edge (e) • e.source → source vertex • e.sourceId → identifier of source vertex (e.source.id) • e.targetId → identifier of target vertex • e.state → arbitrary edge state • e.signal() → calculates the signal to send, then sends it • signals are sent along edges of the compute graph Alex Averbuch Big Graph Processing 22 / 36
  55. 55. Signal/Collect — Programming Model Example Code: Single Source Shortest Path (SSSP) class Location ( id : Any , initialState : Int ) extends Vertex { def collect : Int = min ( state , min ( u n c o l l e c t e d S i g n a l s ) ) } class Path ( sourceId : Any , targetId : Int ) extends Edge { def signal : Int = source . state + weight } • vertex state → shortest known path length to vertex from source • edge state (weight) → path length of that individual edge • signal → shortest known path length from source through edge Alex Averbuch Big Graph Processing 23 / 36
  56. 56. Signal/Collect — Execution Model • different Execution Modes (scheduling policies) 1 synchronous 2 synchronous score-guided 3 asynchronous 4 asynchronous scheduled Alex Averbuch Big Graph Processing 24 / 36
  57. 57. Signal/Collect — Execution Model • different Execution Modes (scheduling policies) 1 synchronous 2 synchronous score-guided 3 asynchronous 4 asynchronous scheduled • execution mode dictates when signal & collect are called Definition: Internal methods used by execution engine procedure v.executeSignalOperation() lastSignalState ← state for all e ∈ outGoingEdges do e.target.uncollectedSignals.append(e.signal()) e.target.signalMap.put(e.sourceId,e.signal()) procedure v.executeCollectOperation() state ← collect() uncollectedSignals ← Nil Alex Averbuch Big Graph Processing 24 / 36
  58. 58. Signal/Collect — Execution — Synchronous • vertex scheduling: all vertices (unordered) per iteration • termination: all iterations Scheduler: Synchronous for i ← 1 to num iterations do for all v ∈ V parallel do v.executeSignalOperation() for all v ∈ V parallel do v.executeCollectOperation() Alex Averbuch Big Graph Processing 25 / 36
  59. 59. Signal/Collect — Execution — Synchronous - mode: synchronous 2 3 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 26 / 36
  60. 60. Signal/Collect — Execution — Synchronous - - mode: synchronous - 2 3 - - 1 1 5 iteration: 0 1 - - signaling vertices: 13 1 collecting vertices: 13 - 1 1 - messages: 19 - 0 2 - 2 total operations: 26 3 3 - total messages: 19 3 - - 1 - 1 1 - 4 1 1 3 - - 3 - - 4 * - - 2 1 - - Alex Averbuch Big Graph Processing 26 / 36
  61. 61. Signal/Collect — Execution — Synchronous 4 - mode: synchronous 4 2 3 2 - 1 1 5 iteration: 1 1 2 - signaling vertices: 13 1 collecting vertices: 13 5 1 1 - messages: 19 2 0 2 - 2 total operations: 52 3 2 5 total messages: 38 3 - - 1 - 1 1 2 4 1 1 3 2 - 3 - 5 4 * - 5 2 1 - - Alex Averbuch Big Graph Processing 26 / 36
  62. 62. Signal/Collect — Execution — Synchronous 4 6 mode: synchronous 4 2 3 2 7 1 1 5 iteration: 2 1 2 6 signaling vertices: 13 1 collecting vertices: 13 4 1 1 6 messages: 19 2 0 2 7 2 total operations: 78 3 2 4 total messages: 57 3 - 1 6 1 6 1 2 4 1 1 3 2 - 3 - 5 4 * 7 5 2 1 7 - Alex Averbuch Big Graph Processing 26 / 36
  63. 63. Signal/Collect — Execution — Synchronous 4 6 mode: synchronous 4 2 3 2 7 1 1 5 iteration: 3 1 2 5 signaling vertices: 13 1 collecting vertices: 13 4 1 1 5 messages: 19 2 0 2 6 2 total operations: 104 3 2 4 total messages: 76 3 7 1 6 1 6 1 2 4 1 1 3 2 10 3 9 7 5 4 7 5 2 1 7 8 Alex Averbuch Big Graph Processing 26 / 36
  64. 64. Signal/Collect — Execution — Synchronous 4 6 mode: synchronous 4 2 3 2 7 1 1 5 iteration: 4 1 2 5 signaling vertices: 13 1 collecting vertices: 13 4 1 1 5 messages: 19 2 0 2 6 2 total operations: 130 3 2 4 total messages: 95 3 6 1 6 1 6 1 2 4 1 1 3 2 10 3 9 6 5 4 7 5 2 1 7 8 Alex Averbuch Big Graph Processing 26 / 36
  65. 65. Signal/Collect — Execution — Synchronous 4 mode: synchronous 2 3 2 1 1 5 total iterations: 4 total operations: 130 1 total messages: 95 4 1 1 5 0 2 2 3 2 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 26 / 36
  66. 66. Signal/Collect — Execution — Synchronous Guided • vertex scheduling: all active vertices (unordered) per iteration • termination: all iterations or (no signal and no collect) Alex Averbuch Big Graph Processing 27 / 36
  67. 67. Signal/Collect — Execution — Synchronous Guided • vertex scheduling: all active vertices (unordered) per iteration • termination: all iterations or (no signal and no collect) • extension v.signalScore() “importance for vertex to signal” • may change ⇐⇒ v.state changes • default → 1 if changed, 0 otherwise • extension v.collectScore() “importance for vertex to collect” • may change ⇐⇒ v.uncollectedSignals changes • default → v.uncollectedSignals.size() Alex Averbuch Big Graph Processing 27 / 36
  68. 68. Signal/Collect — Execution — Synchronous Guided • vertex scheduling: all active vertices (unordered) per iteration • termination: all iterations or (no signal and no collect) Scheduler: Synchronous Score-Guided done ← false while (iterations < num iterations) ∧ (done = false) do done ← true iterations ← iterations + 1 for all v ∈ V parallel do if v.signalScore() > signal threshold then done ← false v.executeSignalOperation() for all v ∈ V parallel do if v.collectScore() > collect threshold then done ← false v.executeCollectOperation() Alex Averbuch Big Graph Processing 27 / 36
  69. 69. Signal/Collect — Execution — Synchronous Guided - mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 28 / 36
  70. 70. Signal/Collect — Execution — Synchronous Guided - mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 - 1 1 5 iteration: 0 1 signaling vertices: 1 1 collecting vertices: 3 - 1 1 - messages: 3 0 2 2 3 total operations: 4 3 total messages: 3 3 - 1 1 1 4 1 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 28 / 36
  71. 71. Signal/Collect — Execution — Synchronous Guided 4 mode: synchronous-guided 4 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 1 1 5 iteration: 1 2 signaling vertices: 3 1 collecting vertices: 6 5 1 1 - messages: 6 2 0 2 2 2 total operations: 13 3 5 total messages: 9 1 - 1 2 4 1 1 3 2 3 5 4 * 5 2 1 - Alex Averbuch Big Graph Processing 28 / 36
  72. 72. Signal/Collect — Execution — Synchronous Guided 4 6 mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 7 1 1 5 iteration: 2 6 signaling vertices: 6 1 collecting vertices: 5 4 1 1 6 messages: 7 0 2 7 2 total operations: 24 3 2 4 total messages: 16 1 6 1 6 4 1 1 3 2 3 4 * 7 5 2 1 7 Alex Averbuch Big Graph Processing 28 / 36
  73. 73. Signal/Collect — Execution — Synchronous Guided 4 mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 1 1 5 iteration: 3 5 signaling vertices: 4 1 collecting vertices: 3 4 1 1 5 messages: 6 0 2 6 2 total operations: 31 3 2 total messages: 22 6 1 7 1 4 1 1 3 2 10 3 9 4 7 5 2 1 7 8 Alex Averbuch Big Graph Processing 28 / 36
  74. 74. Signal/Collect — Execution — Synchronous Guided 4 mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 1 1 5 iteration: 4 signaling vertices: 2 1 collecting vertices: 1 4 1 1 5 messages: 1 0 2 2 2 total operations: 34 3 total messages: 23 6 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 28 / 36
  75. 75. Signal/Collect — Execution — Synchronous Guided 4 mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 1 1 5 total iterations: 4 total operations: 34 1 total messages: 23 4 1 1 5 0 2 2 3 2 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 28 / 36
  76. 76. Signal/Collect — Execution — Asynchronous • vertex scheduling: random • termination: max operations or (no signal and no collect) Alex Averbuch Big Graph Processing 29 / 36
  77. 77. Signal/Collect — Execution — Asynchronous • vertex scheduling: random • termination: max operations or (no signal and no collect) • no guarantee on order of execution • some vertices may signal while others collect • no guarantee that all vertices are executed same amount of times • asynchronous mode not usable for every algorithm • use when correctness not dependent on strict execution order Alex Averbuch Big Graph Processing 29 / 36
  78. 78. Signal/Collect — Execution — Asynchronous • vertex scheduling: random • termination: max operations or (no signal and no collect) Scheduler: Asynchronous ops ← 0 while (ops < num ops) ∧ ∃v ∈ V : (v.signalScore > signal threshold) ∨ (v.collectScore > collect threshold) do S ← random subset of V for all v ∈ S do next ← random(signal/collect) if (next = signal) ∧ (v.signalScore > signal threshold) then v.executeSignalOperation() ops ← ops + 1 else if (next = collect) ∧ (v.collectScore > collect threshold) then v.executeCollectOperation() ops ← ops + 1 Alex Averbuch Big Graph Processing 29 / 36
  79. 79. Signal/Collect — Execution — Asynchronous Scheduled • vertex scheduling: scheduler-dependent • termination: scheduler-dependent • scheduler: schedules vertices’ signal & collect operations Alex Averbuch Big Graph Processing 30 / 36
  80. 80. Signal/Collect — Execution — Asynchronous Scheduled • vertex scheduling: scheduler-dependent • termination: scheduler-dependent • scheduler: schedules vertices’ signal & collect operations • e.g. “eager” scheduler → tries to signal right after collection Scheduler: “Eager” for all v ∈ V do if (v.collectScore() > collect threshold) then v.executeCollectOperation() if (v.signalScore() > signal threshold) then v.executeSignalOperation() Alex Averbuch Big Graph Processing 30 / 36
  81. 81. Signal/Collect — Execution — Asynchronous Scheduled • vertex scheduling: scheduler-dependent • termination: scheduler-dependent • scheduler: schedules vertices’ signal & collect operations • benefit of this execution mode depends on: 1 impact on convergence (number of operations) 2 cost of operations 3 cost of scheduling Alex Averbuch Big Graph Processing 30 / 36
  82. 82. Signal/Collect — Execution — Asynchronous Scheduled Scheduler: “SSSP” — minimize messages & computation Signal ← {v source} // sorted set while (Signal = {}) do for top k v ∈ Signal do v.executeSignalOperation() Signal.remove(v) for all v ∈ V do if (v.collectScore > collect threshold) then v.executeCollectOperation() if (v.signalScore > signal threshold) then Signal.put(v) if (v destination.state ≤ min(Signal)) then Signal ← {} // returns distance of shortest next step function signalSort(v) Distances ← {∞} for all e ∈ v.outEdges do Distances.put(e.signal) return min(Distances) Alex Averbuch Big Graph Processing 31 / 36
  83. 83. Signal/Collect — Execution — Asynchronous Scheduled - mode: asynchronous-scheduled 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 32 / 36
  84. 84. Signal/Collect — Execution — Asynchronous Scheduled - mode: asynchronous-scheduled 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 - 2 iteration: 0 1 1 5 signaling vertices: 1 1 collecting vertices: 3 1 - 1 messages: 3 1 - 0 2 2 total operations: 4 3 3 total messages: 3 5 3 - 1 1 1 2 4 1 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 32 / 36

×