SlideShare a Scribd company logo
1 of 94
Download to read offline
Comparison of Graph Processing Frameworks

                    Alex Averbuch

           Swedish Institute of Computer Science

                      averbuch@sics.se


                  January 25, 2012




                Alex Averbuch   Big Graph Processing   1 / 36
Frameworks Compared




   • Pregel: a system for large-scale graph processing. G. Malewicz,
     M.H. Austern, AJ Bik, J.C. Dehnert, I. Horn, N. Leiser, and G.
     Czajkowski. PODC, 2009.
   • Signal/collect: graph algorithms for the (semantic) web. P.
     Stutz, A. Bernstein, and W. Cohen. The Semantic Web - ISWC,
     2010.




                          Alex Averbuch   Big Graph Processing         2 / 36
Background — Big Graphs Everywhere
   • Real world web and social graphs continue to grow
       • 2008 → Google estimates number of web pages at 1 trillion
       • March 2011 → LinkedIn has over 120 million registered users
       • September 2011 → Twitter has over 100 million active users
       • September 2011 → Facebook has over 800 million active users




                           Alex Averbuch   Big Graph Processing        3 / 36
Background — Big Graphs Everywhere
   • Real world web and social graphs continue to grow
       • 2008 → Google estimates number of web pages at 1 trillion
       • March 2011 → LinkedIn has over 120 million registered users
       • September 2011 → Twitter has over 100 million active users
       • September 2011 → Facebook has over 800 million active users

                   Data: The New Oil




                           Alex Averbuch   Big Graph Processing        3 / 36
Background — Big Graphs Everywhere
   • Real world web and social graphs continue to grow
       • 2008 → Google estimates number of web pages at 1 trillion
       • March 2011 → LinkedIn has over 120 million registered users
       • September 2011 → Twitter has over 100 million active users
       • September 2011 → Facebook has over 800 million active users

                   Data: The New Oil




   • Relevant, personalized user information relies on graph algorithms
       • Popularity rank → determine popular users, news, jobs, etc.
       • Shortest paths → find how users, groups, etc. are connected
       • Clustering → discover related people, groups, interests, etc.

                           Alex Averbuch   Big Graph Processing           3 / 36
Background — The Vertex Centric Model



  Definition: Vertex Centric Graph Computing Model
    • computations execute on a compute graph
        • same topology as that of data graph
        • vertices are computational units
        • edges are communication channels
        • vertices interact with other vertices using messages
    • computation proceeds in iterations. in each iteration, vertices:
        1 perform some computation
        2 communicate with other vertices




                             Alex Averbuch   Big Graph Processing        4 / 36
Background — The Vertex Centric Model
  Definition: Vertex Centric Graph Computing Model
    • computations execute on a compute graph
        • same topology as that of data graph
        • vertices are computational units
        • edges are communication channels
        • vertices interact with other vertices using messages
    • computation proceeds in iterations. in each iteration, vertices:
        1 perform some computation
        2 communicate with other vertices



                                                        1



                         4                   2
                  *                                                     *

                                 1




                             Alex Averbuch       Big Graph Processing       4 / 36
Background — The Vertex Centric Model
  Definition: Vertex Centric Graph Computing Model
    • computations execute on a compute graph
        • same topology as that of data graph
        • vertices are computational units
        • edges are communication channels
        • vertices interact with other vertices using messages
    • computation proceeds in iterations. in each iteration, vertices:
        1 perform some computation
        2 communicate with other vertices



                                                        1

                         4
                         4                   2
                  0              -                     -                -

                                 1

                                 1

                                iteration 0
                             Alex Averbuch       Big Graph Processing       4 / 36
Background — The Vertex Centric Model
  Definition: Vertex Centric Graph Computing Model
    • computations execute on a compute graph
        • same topology as that of data graph
        • vertices are computational units
        • edges are communication channels
        • vertices interact with other vertices using messages
    • computation proceeds in iterations. in each iteration, vertices:
        1 perform some computation
        2 communicate with other vertices



                                                        1



                         4                   2
                  0             4                      1                -

                                 1



                                iteration 0
                             Alex Averbuch       Big Graph Processing       4 / 36
Background — The Vertex Centric Model
  Definition: Vertex Centric Graph Computing Model
    • computations execute on a compute graph
        • same topology as that of data graph
        • vertices are computational units
        • edges are communication channels
        • vertices interact with other vertices using messages
    • computation proceeds in iterations. in each iteration, vertices:
        1 perform some computation
        2 communicate with other vertices

                                                        5
                                                        1

                                             3
                         4                   2
                  0             4                      1                -

                                 1



                                iteration 1
                             Alex Averbuch       Big Graph Processing       4 / 36
Background — The Vertex Centric Model
  Definition: Vertex Centric Graph Computing Model
    • computations execute on a compute graph
        • same topology as that of data graph
        • vertices are computational units
        • edges are communication channels
        • vertices interact with other vertices using messages
    • computation proceeds in iterations. in each iteration, vertices:
        1 perform some computation
        2 communicate with other vertices



                                                        1



                         4                   2
                  0             3                      1                5

                                 1



                                iteration 1
                             Alex Averbuch       Big Graph Processing       4 / 36
Background — The Vertex Centric Model
  Definition: Vertex Centric Graph Computing Model
    • computations execute on a compute graph
        • same topology as that of data graph
        • vertices are computational units
        • edges are communication channels
        • vertices interact with other vertices using messages
    • computation proceeds in iterations. in each iteration, vertices:
        1 perform some computation
        2 communicate with other vertices

                                                        4
                                                        1



                         4                   2
                  0             3                      1                5

                                 1



                                iteration 2
                             Alex Averbuch       Big Graph Processing       4 / 36
Background — The Vertex Centric Model
  Definition: Vertex Centric Graph Computing Model
    • computations execute on a compute graph
        • same topology as that of data graph
        • vertices are computational units
        • edges are communication channels
        • vertices interact with other vertices using messages
    • computation proceeds in iterations. in each iteration, vertices:
        1 perform some computation
        2 communicate with other vertices



                                                        1



                         4                   2
                  0             3                      1                4

                                 1



                                iteration 2
                             Alex Averbuch       Big Graph Processing       4 / 36
Background — The Vertex Centric Model
  Definition: Vertex Centric Graph Computing Model
    • computations execute on a compute graph
        • same topology as that of data graph
        • vertices are computational units
        • edges are communication channels
        • vertices interact with other vertices using messages
    • computation proceeds in iterations. in each iteration, vertices:
        1 perform some computation
        2 communicate with other vertices



                                                        1



                         4                   2
                  0             3                      1                4

                                 1




                             Alex Averbuch       Big Graph Processing       4 / 36
Pregel — Contributions




    • parallel programming model (for processing graphs)
    • distributed execution model (for processing graphs)
    • (limited) evaluation → using big data sets




                            Alex Averbuch   Big Graph Processing   5 / 36
Pregel — Overview


   • vertex centric graph computing model
   • in each iteration a compute function is invoked on each vertex
        1 reads messages sent to it in previous iteration
        2 modifies its state & local graph topology
        3 sends messages to other vertices
        4 votes to halt (to become inactive)

                                    Vote to halt

                       Active                         Inactive

                                Message received

                           Vertex State Machine




                           Alex Averbuch     Big Graph Processing     6 / 36
Pregel — Programming Model (Vertex & Edge)


   • Vertex (v)
       • v.id → unique identifier
       • v.state → arbitrary vertex state
       • v.outEdges : List[Edge] → list of edges that have v as source
       • v.compute() : per iteration, calculates new state
            1   reads incoming messages, from previous iteration
            2   sends (unbounded number of) messages to other vertices
            3   if destination non-existent, call handler (create vertex/remove edge)
            4   modifies its state and that of its outgoing edges
            5   adds/removes edges to/from outEdges
            6   votes to halt
   • Edge (e)
       • e.targetId → identifier of target vertex
       • e.state → arbitrary edge state
       • no associated computation




                              Alex Averbuch   Big Graph Processing                      7 / 36
Pregel — Programming Model (Combiner & Aggregator)



   • Combiner
       • combines multiple messages into one (like reducer in M/R)
       • combined using commutative & associative function
       • reduces network traffic & message buffer size
       • e.g. in SSSP vertex only cares about length of shortest path




                            Alex Averbuch   Big Graph Processing        8 / 36
Pregel — Programming Model (Combiner & Aggregator)



   • Combiner
       • combines multiple messages into one (like reducer in M/R)
       • combined using commutative & associative function
       • reduces network traffic & message buffer size
       • e.g. in SSSP vertex only cares about length of shortest path
   • Aggregator
       • globally shared/aggregated state
            1 vertices write to aggregator variable locally
            2 globally aggregated value available to all vertices in next iteration
        • aggregated using commutative & associative function
        • pre-defined aggregators: min, max, sum




                              Alex Averbuch   Big Graph Processing                    8 / 36
Pregel — Programming Model (Topology Mutations)



   • determinism in the presence of conflicts is achieved by:
       1 partial ordering
             1   remove edges
             2   remove vertices (implicitly removes edges)
             3   add vertices
             4   add edges
       2   conflict handlers
             • example conflict → vertices with same ID created simultaneously
             • extend conflict handler() of Vertex class
             • same handler called for all conflict types




                               Alex Averbuch   Big Graph Processing             9 / 36
Pregel — Programming Model (Topology Mutations)



   • determinism in the presence of conflicts is achieved by:
       1 partial ordering
             1   remove edges
             2   remove vertices (implicitly removes edges)
             3   add vertices
             4   add edges
       2   conflict handlers
             • example conflict → vertices with same ID created simultaneously
             • extend conflict handler() of Vertex class
             • same handler called for all conflict types
   • most topology changes are seen in next iteration
       • self mutations (remove out edge, remove self) are immediate




                               Alex Averbuch   Big Graph Processing             9 / 36
Pregel — Programming Model Example — Vertex

  Code: Vertex program for Single Source Shortest Path (SSSP)

  class S h o rt es t Pa t hV e rt e x : public Vertex < int , int , int > {
    void Compute ( MessageIt e ra to r * msgs ) {
      // i n i t i a l i z a t i o n
      int mindist = IsSource ( vertex_id () ) ? 0 : INF ;

           // read incoming me s s a g e s & update mindist
           for (; ! msgs - > Done () ; msgs - > Next () )
             mindist = min ( mindist , msgs - > Value () ) ;

           // send updated mindist to n e i g h b o r s
           if ( mindist < GetValue () ) {
             * MutableValue () = mindist ;
             O utEdgeIterator iter = G e t O u t E d g e I t e r a t o r () ;
             for (; ! iter . Done () ; iter . Next () )
                SendMessageTo ( iter . Target () ,
                                mindist + iter . GetValue () ) ;
           }

           // d e a c t i v a t e unless / until another message arrives
           VoteToHalt () ;
       }
  };



                                         Alex Averbuch   Big Graph Processing   10 / 36
Pregel — Execution Model




    • vertex scheduling: all active vertices, per iteration
    • termination: no active vertices & no messages in transit

  Scheduler: Pregel (Bulk Synchronous Parallel)
    while (∃v ∈ V : v.active = true) do
       for all v ∈ V parallel do
           if (v.active = true) then
               v.compute()




                                 Alex Averbuch   Big Graph Processing   11 / 36
Pregel — Execution — Without Combiner


                                -                                                                  mode: pregel
                                            2
                    3
                                                        -

            -                       1
                                                                        5
        1
                                                -               1
                1                                                                   -
    *                                   2               2
            3               -

        1                                                           -                   1
                                                    4
                -           1
                                                                               3
                                            -
                                                            3
                4                                                                           *

                        -                   2                                   1
                                                                -




                                                Alex Averbuch               Big Graph Processing                  12 / 36
Pregel — Execution — Without Combiner


                                        -                                                                          mode: pregel
                                                    2
                            3
                                                                -

                    -                       1
                                                                                5                          iteration: 0
        1                                                                                                  computing vertices: 13
            1                                                                                              messages: 3
                                                        -               1
                        1                                                                   -
                                                                                                           total operations: 13
    0                                           2               2
                                    -                                                                      total messages: 3
                    3
                    3                                                       -
                1                                                                               1
            1                                               4
                        -           1
                                                                                       3
                                                    -
                                                                    3
                        4                                                                           *

                                -                   2                                   1
                                                                        -




                                                        Alex Averbuch               Big Graph Processing                            12 / 36
Pregel — Execution — Without Combiner


                            -                                                                              mode: pregel
                4                       2
                    3
                                                        -

            1                   1
                                                                        5                          iteration: 1
                                2                                                                  computing vertices: 3
        1                                                                                          messages: 6
                                                -               1
                    1                                                               -
            2                                                                                      total operations: 16
    0                               2                   2
                            3                                                                      total messages: 9
            3                           5

        1                                                           -                   1
                            2                       4
                1           1
                                                                               3
                                        -
                                                            3
            5 4                                                                             *

                        -                   2                                   1
                                                                -




                                                Alex Averbuch               Big Graph Processing                           12 / 36
Pregel — Execution — Without Combiner


                            4                   6                                                          mode: pregel
                                        2
                    3
                                                            2
                                                                             7
            1                   1
                                                                        5                          iteration: 2
                                                                6                                  computing vertices: 6
        1                                                                                          messages: 7
                                            5                   1
                1                                                                    -
                                                                7                                  total operations: 22
    0                               2                       2
                            2                                                                      total messages: 16
            3                           4

        1                                                           -                    1
                                                    6
                                                        4
                1           1
                                                                                 3
                                        2
                                                                3
                4                                                                            *
                                        7
                                            2                                    1
                        5                                       -




                                                Alex Averbuch               Big Graph Processing                           12 / 36
Pregel — Execution — Without Combiner


                            4                                                                         mode: pregel
                                        2
                    3
                                                    2

            1                   1
                                                                 5                            iteration: 3
                                                            5                                 computing vertices: 5
        1                                                                                     messages: 6
                                            4               1
                1                                                             6
                                                         6                                    total operations: 27
    0                               2                2
                            2                                                                 total messages: 22
            3

                                                             6                    1       7
        1
                                                4
                1           1
                                                                          3
                                        2
                                                    10 3              9
                4                                                                     *

                        5               2                                 1
                                                            7
                                                                              8




                                            Alex Averbuch            Big Graph Processing                             12 / 36
Pregel — Execution — Without Combiner


                            4                                                                      mode: pregel
                                        2
                    3
                                                    2

            1                   1
                                                                5                          iteration: 4
                                                                                           computing vertices: 3
        1                                                                                  messages: 1
                                            4               1
                1                                                          5
                                                                                           total operations: 30
    0                               2               2
                            2                                                              total messages: 23
            3

                                                            6                  1     6
        1
                                                4
                1           1
                                                                       3
                                        2
                                                        3
                4                                                                7


                        5               2                               1
                                                            7




                                            Alex Averbuch           Big Graph Processing                           12 / 36
Pregel — Execution — Without Combiner


                            4                                                                      mode: pregel
                                        2
                    3
                                                    2

            1                   1
                                                                5                          iteration: 5
                                                                                           computing vertices: 1
        1                                                                                  messages: 0
                                            4               1
                1                                                          5
                                                                                           total operations: 31
    0                               2               2
                            2                                                              total messages: 23
            3

        1                                                   6                  1
                                                4
                1           1
                                                                       3
                                        2
                                                        3
                4                                                                6


                        5               2                               1
                                                            7




                                            Alex Averbuch           Big Graph Processing                           12 / 36
Pregel — Execution — Without Combiner


                            4                                                                      mode: pregel
                                        2
                    3
                                                    2

            1                   1
                                                                5                          total iterations: 5
                                                                                           total operations: 31
        1                                                                                  total messages: 23
                                            4               1
                1                                                          5
    0                               2               2
            3               2

        1                                                   6                  1
                                                4
                1           1
                                                                       3
                                        2
                                                        3
                4                                                                6


                        5               2                               1
                                                            7




                                            Alex Averbuch           Big Graph Processing                          12 / 36
Pregel — Programming Model Example — Combiner



  Code: Combiner program for Single Source Shortest Path (SSSP)

  class MinIntCombiner : public Combiner < int > {
    virtual void Combine ( Me s sa ge It e ra to r * msgs ) {
      // i n i t i a l i z a t i o n
      int mindist = INF ;

           // read messages & update mindist
           for (; ! msgs - > Done () ; msgs - > Next () )
             mindist = min ( mindist , msgs - > Value () ) ;

           // only emit minimum message value ( d i s t a n c e )
           Output ( " combined_so ur ce " , mindist ) ;
       }
  };




                                     Alex Averbuch   Big Graph Processing   13 / 36
Pregel — Execution — With Combiner


                                -                                                                  mode: pregel (combined)
                                            2
                    3
                                                        -

            -                       1
                                                                        5
        1
                                                -               1
                1                                                                   -
    *                                   2               2
            3               -

        1                                                           -                   1
                                                    4
                -           1
                                                                               3
                                            -
                                                            3
                4                                                                           *

                        -                   2                                   1
                                                                -




                                                Alex Averbuch               Big Graph Processing                             14 / 36
Pregel — Execution — With Combiner


                                        -                                                                  mode: pregel (combined)
                                                    2
                            3
                                                                -

                    -                       1
                                                                                5                            iteration: 0
        1                                                                                                    computing vertices: 13
            1                                                                                                messages: 3
                                                        -               1
                        1                                                                   -
                                                                                                             total operations: 13
    0                                           2               2
                                    -                                                                        total messages: 3
                    3
                    3                                                       -
                1                                                                               1
            1                                               4
                        -           1
                                                                                       3
                                                    -
                                                                    3
                        4                                                                           *

                                -                   2                                   1
                                                                        -




                                                        Alex Averbuch               Big Graph Processing                              14 / 36
Pregel — Execution — With Combiner


                            -                                                                      mode: pregel (combined)
                4                       2
                    3
                                                        -

            1                   1
                                                                        5                            iteration: 1
                                2                                                                    computing vertices: 3
        1                                                                                            messages: 6
                                                -               1
                    1                                                               -
            2                                                                                        total operations: 16
    0                               2                   2
                            3                                                                        total messages: 9
            3                           5

        1                                                           -                   1
                            2                       4
                1           1
                                                                               3
                                        -
                                                            3
            5 4                                                                             *

                        -                   2                                   1
                                                                -




                                                Alex Averbuch               Big Graph Processing                             14 / 36
Pregel — Execution — With Combiner


                            4                   6                                                  mode: pregel (combined)
                                        2
                    3
                                                            2
                                                                             7
            1                   1
                                                                        5                            iteration: 2
                                                                6                                    computing vertices: 6
        1                                                                                            messages: 5
                                            5                   1
                1                                                                    -
                                                                7                                    total operations: 22
    0                               2                       2
                            2                                                                        total messages: 14
            3                           4

        1                                                           -                    1
                                                    6
                                                        4
                1           1
                                                                                 3
                                        2
                                                                3
                4                                                                            *
                                        7
                                            2                                    1
                        5                                       -




                                                Alex Averbuch               Big Graph Processing                             14 / 36
Pregel — Execution — With Combiner


                            4                                                                 mode: pregel (combined)
                                        2
                    3
                                                    2

            1                   1
                                                                 5                              iteration: 3
                                                            5                                   computing vertices: 5
        1                                                                                       messages: 3
                                            4               1
                1                                                             6
                                                         6                                      total operations: 27
    0                               2                2
                            2                                                                   total messages: 17
            3

                                                             6                    1       7
        1
                                                4
                1           1
                                                                          3
                                        2
                                                    10 3              9
                4                                                                     *

                        5               2                                 1
                                                            7
                                                                              8




                                            Alex Averbuch            Big Graph Processing                               14 / 36
Pregel — Execution — With Combiner


                            4                                                              mode: pregel (combined)
                                        2
                    3
                                                    2

            1                   1
                                                                5                            iteration: 4
                                                                                             computing vertices: 3
        1                                                                                    messages: 1
                                            4               1
                1                                                          5
                                                                                             total operations: 30
    0                               2               2
                            2                                                                total messages: 18
            3

                                                            6                  1     6
        1
                                                4
                1           1
                                                                       3
                                        2
                                                        3
                4                                                                7


                        5               2                               1
                                                            7




                                            Alex Averbuch           Big Graph Processing                             14 / 36
Pregel — Execution — With Combiner


                            4                                                              mode: pregel (combined)
                                        2
                    3
                                                    2

            1                   1
                                                                5                            iteration: 5
                                                                                             computing vertices: 1
        1                                                                                    messages: 0
                                            4               1
                1                                                          5
                                                                                             total operations: 31
    0                               2               2
                            2                                                                total messages: 18
            3

        1                                                   6                  1
                                                4
                1           1
                                                                       3
                                        2
                                                        3
                4                                                                6


                        5               2                               1
                                                            7




                                            Alex Averbuch           Big Graph Processing                             14 / 36
Pregel — Execution — With Combiner


                            4                                                              mode: pregel (combined)
                                        2
                    3
                                                    2

            1                   1
                                                                5                            total iterations: 5
                                                                                             total operations: 31
        1                                                                                    total messages: 18
                                            4               1
                1                                                          5
    0                               2               2
            3               2

        1                                                   6                  1
                                                4
                1           1
                                                                       3
                                        2
                                                        3
                4                                                                6


                        5               2                               1
                                                            7




                                            Alex Averbuch           Big Graph Processing                             14 / 36
Pregel — Execution — Comparison



    • combiner vs no combiner
        • algorithm → Single Source Shortest Path (SSSP)
        • sample graph → 13 vertices / 19 edges
        • cost → iterations, operations, message buffers


  Results: Cost comparison of execution modes (SSSP)
                                 Iterations       Operations       Messages
         Pregel                       5              31              23
         Pregel + Combiner            5              31              18




                            Alex Averbuch   Big Graph Processing              15 / 36
Pregel — Architecture


                                                           Synchronization &
                                          Master
                                                           Aggregatation




 Worker     outQ                    Worker         outQ                        Worker       outQ

inQ   Partition                  inQ      Partition                           inQ   Partition




              Loading &
              Checkpointing
                                    Graph Dataset
                                                                      (Combined) Messages




                              Alex Averbuch    Big Graph Processing                             16 / 36
Pregel — Typical Program

   • Client
       1 load input data into workers
       2 notify master to “start processing”
       3 wait for master to complete




                            Alex Averbuch   Big Graph Processing   17 / 36
Pregel — Typical Program

   • Client
       1 load input data into workers
       2 notify master to “start processing”
       3 wait for master to complete


   • Master
      1 repeat until no active workers
             • signal workers to process
             • wait for all workers to finish
             • update active-worker count




                              Alex Averbuch    Big Graph Processing   17 / 36
Pregel — Typical Program

   • Client
       1 load input data into workers
       2 notify master to “start processing”
       3 wait for master to complete


   • Master
      1 repeat until no active workers
             • signal workers to process
             • wait for all workers to finish
             • update active-worker count


   • Worker
      1 repeat until inactive
             •   wait for “start iteration” from master
             •   read data from in-queue
             •   perform local processing
             •   write data to out-queue & transmit
             •   update active/inactive status
             •   notify master


                                Alex Averbuch   Big Graph Processing   17 / 36
Pregel — Typical Program

   • Client
       1 load input data into workers
       2 notify master to “start processing”
       3 wait for master to complete


   • Master
      1 repeat until no active workers
             • signal workers to process
             • wait for all workers to finish
             • update active-worker count
       2   notify client about completion
   • Worker
      1 repeat until inactive
             •   wait for “start iteration” from master
             •   read data from in-queue
             •   perform local processing
             •   write data to out-queue & transmit
             •   update active/inactive status
             •   notify master


                                Alex Averbuch   Big Graph Processing   17 / 36
Pregel — Typical Program

   • Client
       1 load input data into workers
       2 notify master to “start processing”
       3 wait for master to complete
       4 extract result data from workers
   • Master
       1 repeat until no active workers
             • signal workers to process
             • wait for all workers to finish
             • update active-worker count
       2   notify client about completion
   • Worker
      1 repeat until inactive
             •   wait for “start iteration” from master
             •   read data from in-queue
             •   perform local processing
             •   write data to out-queue & transmit
             •   update active/inactive status
             •   notify master


                                Alex Averbuch   Big Graph Processing   17 / 36
Pregel — Fault Tolerance


    • logging
        1 checkpointing → state persisted at beginning of every n-th iteration
              • master persists → progress of execution, aggregate values
              • workers persist → vertex values, edge values, messages
        2   confined recovery → workers log out-messages from their partitions




                               Alex Averbuch   Big Graph Processing              18 / 36
Pregel — Fault Tolerance


    • logging
        1 checkpointing → state persisted at beginning of every n-th iteration
              • master persists → progress of execution, aggregate values
              • workers persist → vertex values, edge values, messages
        2   confined recovery → workers log out-messages from their partitions
    • failure detection
         • heart beats
         • worker gets no heartbeat from master → worker terminates
         • master gets no heartbeat from worker → marks worker as failed




                               Alex Averbuch   Big Graph Processing              18 / 36
Pregel — Fault Tolerance


    • logging
        1 checkpointing → state persisted at beginning of every n-th iteration
               • master persists → progress of execution, aggregate values
               • workers persist → vertex values, edge values, messages
         2   confined recovery → workers log out-messages from their partitions
    • failure detection
         • heart beats
         • worker gets no heartbeat from master → worker terminates
         • master gets no heartbeat from worker → marks worker as failed
    • failure recovery
         • partition(s) belonging to failed worker(s) are reassigned
         • lost partitions recovered from checkpoints
         • missing iterations recomputed (using logged messages)




                                Alex Averbuch   Big Graph Processing             18 / 36
Pregel — Evaluation — Scaling

    • algorithm → Single Source Shortest Path (SSSP)
    • hardware → 800 worker tasks scheduled on 300 multicore machines
    • graph
         • random, log-normal out-degree distribution (mean = 127.1)
         • up to 1,000,000,000 vertices / 127,000,000,000 edges


  Results: Scalability of Pregel (SSSP)
                                   800
                                   700
               Runtime (seconds)




                                   600
                                   500
                                   400
                                   300
                                   200
                                   100

                                         200    400       600       800         1000
                                          Number of vertices (millions)

                                         Alex Averbuch   Big Graph Processing          19 / 36
Signal/Collect — Contributions




    • parallel programming model (for processing graphs)
    • parallel execution model (for processing graphs)
    • (limited) evaluation → benefits of various scheduling policies




                            Alex Averbuch   Big Graph Processing      20 / 36
Signal/Collect — Programming Model (Vertex)


   • Vertex (v)
       • v.id → unique identifier
       • v.state → arbitrary vertex state
       • v.lastSignalState → v.state at time of last signal()
       • v.outEdges : List[edge] → list of edges that have v as source
       • v.signalMap : Map(vid,signal) → last received messages
            • vid - identifier of sender vertex
            • signal - last received signal from that vertex
       • v.uncollectedSignals :     List[signal] → list of signals
         received since collect() was last executed
       • v.collect() : calculates new vertex state
            1 collect incoming signals
            2 process those signals (possibly using v.state)
            3 return new vertex state




                             Alex Averbuch   Big Graph Processing        21 / 36
Signal/Collect — Programming Model (Edge)




   • Edge (e)
       • e.source → source vertex
       • e.sourceId → identifier of source vertex (e.source.id)
       • e.targetId → identifier of target vertex
       • e.state → arbitrary edge state
       • e.signal() → calculates the signal to send, then sends it
            • signals are sent along edges of the compute graph




                            Alex Averbuch   Big Graph Processing     22 / 36
Signal/Collect — Programming Model Example



  Code: Single Source Shortest Path (SSSP)

  class Location ( id : Any , initialState : Int ) extends Vertex {
    def collect : Int = min ( state , min ( u n c o l l e c t e d S i g n a l s ) )
  }

  class Path ( sourceId : Any , targetId : Int ) extends Edge {
    def signal : Int = source . state + weight
  }



     • vertex state → shortest known path length to vertex from source
     • edge state (weight) → path length of that individual edge
     • signal → shortest known path length from source through edge




                                    Alex Averbuch   Big Graph Processing              23 / 36
Signal/Collect — Execution Model

   • different Execution Modes (scheduling policies)
       1 synchronous
       2 synchronous score-guided
       3 asynchronous
       4 asynchronous scheduled




                         Alex Averbuch   Big Graph Processing   24 / 36
Signal/Collect — Execution Model

    • different Execution Modes (scheduling policies)
        1 synchronous
        2 synchronous score-guided
        3 asynchronous
        4 asynchronous scheduled

    • execution mode dictates when signal & collect are called

  Definition: Internal methods used by execution engine
    procedure v.executeSignalOperation()
       lastSignalState ← state
       for all e ∈ outGoingEdges do
           e.target.uncollectedSignals.append(e.signal())
           e.target.signalMap.put(e.sourceId,e.signal())

    procedure v.executeCollectOperation()
       state ← collect()
       uncollectedSignals ← Nil




                                 Alex Averbuch   Big Graph Processing   24 / 36
Signal/Collect — Execution — Synchronous




    • vertex scheduling: all vertices (unordered) per iteration
    • termination: all iterations

  Scheduler: Synchronous
   for i ← 1 to num iterations do
       for all v ∈ V parallel do
           v.executeSignalOperation()
       for all v ∈ V parallel do
           v.executeCollectOperation()




                                Alex Averbuch   Big Graph Processing   25 / 36
Signal/Collect — Execution — Synchronous


                                -                                                                  mode: synchronous
                                            2
                    3
                                                        -

            -                       1
                                                                        5
        1
                                                -               1
                1                                                                   -
    *                                   2               2
            3               -

        1                                                           -                   1
                                                    4
                -           1
                                                                               3
                                            -
                                                            3
                4                                                                           *

                        -                   2                                   1
                                                                -




                                                Alex Averbuch               Big Graph Processing                       26 / 36
Signal/Collect — Execution — Synchronous


                                        -                   -
                                                                                                                      mode: synchronous
                        -                               2
                                3
                                                                        -
                                                                                             -
                    1                       1
                                                                                        5                            iteration: 0
        1                                   -                               -                                        signaling vertices: 13
            1                                                                                                        collecting vertices: 13
                                                            -               1
                            1                                                                        -               messages: 19
                    -
    0                                           2                           -
                                                                        2                                            total operations: 26
                    3                   3           -
                                                                                                                     total messages: 19
                    3                                                               -                            -
                1                                               -                                        1
            1                           -                           4
                            1           1
                                                                                                 3
                                                    -
                                                                        -   3                -
                    -       4                                                                                *
                                                        -

                                    -                   2                                        1
                                                                                -
                                                                                                     -




                                                            Alex Averbuch                   Big Graph Processing                               26 / 36
Signal/Collect — Execution — Synchronous


                                    4                   -
                                                                                                                  mode: synchronous
                        4                       2
                            3
                                                                    2
                                                                                         -
                    1                   1
                                                                                    5                            iteration: 1
        1                               2                               -                                        signaling vertices: 13
            1                                                                                                    collecting vertices: 13
                                                    5                   1
                            1                                                                    -               messages: 19
                    2
    0                                       2                           -
                                                                    2                                            total operations: 52
                    3               2           5
                                                                                                                 total messages: 38
                    3                                                           -                            -
                1                                           -                                        1
            1                       2                           4
                        1           1
                                                                                             3
                                                2
                                                                    -   3                -
                    5 4                                                                                  *
                                                -

                                5                   2                                        1
                                                                            -
                                                                                                 -




                                                        Alex Averbuch                   Big Graph Processing                               26 / 36
Signal/Collect — Execution — Synchronous


                                    4                   6
                                                                                                              mode: synchronous
                        4                       2
                            3
                                                                    2
                                                                                     7
                    1                   1
                                                                                5                            iteration: 2
        1                               2                               6                                    signaling vertices: 13
            1                                                                                                collecting vertices: 13
                                                    4                   1
                            1                                                                6               messages: 19
                    2
    0                                       2                           7
                                                                    2                                        total operations: 78
                    3               2           4
                                                                                                             total messages: 57
                    3                                                                                    -
                1                                                           6                    1
                                                            6
            1                       2                           4
                        1           1
                                                                                         3
                                                2
                                                                    -   3            -
                    5 4                                                                              *
                                                7

                                5                   2                                    1
                                                                        7
                                                                                             -




                                                        Alex Averbuch               Big Graph Processing                               26 / 36
Signal/Collect — Execution — Synchronous


                                    4                   6
                                                                                                             mode: synchronous
                        4                       2
                            3
                                                                    2
                                                                                      7
                    1                   1
                                                                                 5                          iteration: 3
        1                               2                                5                                  signaling vertices: 13
            1                                                                                               collecting vertices: 13
                                                    4                    1
                            1                                                                 5             messages: 19
                    2
    0                                       2                            6
                                                                     2                                      total operations: 104
                    3               2           4
                                                                                                            total messages: 76
                    3                                                                                 7
                1                                                            6                    1
                                                            6
            1                       2                           4
                        1           1
                                                                                          3
                                                2
                                                                    10 3              9
                                                                                                  7
                    5 4
                                                7

                                5                   2                                     1
                                                                         7
                                                                                              8




                                                        Alex Averbuch                Big Graph Processing                             26 / 36
Signal/Collect — Execution — Synchronous


                                    4                   6
                                                                                                             mode: synchronous
                        4                       2
                            3
                                                                    2
                                                                                      7
                    1                   1
                                                                                 5                          iteration: 4
        1                               2                                5                                  signaling vertices: 13
            1                                                                                               collecting vertices: 13
                                                    4                    1
                            1                                                                 5             messages: 19
                    2
    0                                       2                            6
                                                                     2                                      total operations: 130
                    3               2           4
                                                                                                            total messages: 95
                    3                                                                                 6
                1                                                            6                    1
                                                            6
            1                       2                           4
                        1           1
                                                                                          3
                                                2
                                                                    10 3              9
                                                                                                  6
                    5 4
                                                7

                                5                   2                                     1
                                                                         7
                                                                                              8




                                                        Alex Averbuch                Big Graph Processing                             26 / 36
Signal/Collect — Execution — Synchronous


                            4                                                               mode: synchronous
                                        2
                    3
                                                    2

            1                   1
                                                                5                          total iterations: 4
                                                                                           total operations: 130
        1                                                                                  total messages: 95
                                            4               1
                1                                                          5
    0                               2               2
            3               2

        1                                                   6                  1
                                                4
                1           1
                                                                       3
                                        2
                                                        3
                4                                                                6


                        5               2                               1
                                                            7




                                            Alex Averbuch           Big Graph Processing                           26 / 36
Signal/Collect — Execution — Synchronous Guided


   • vertex scheduling: all active vertices (unordered) per iteration
   • termination: all iterations or (no signal and no collect)




                           Alex Averbuch   Big Graph Processing         27 / 36
Signal/Collect — Execution — Synchronous Guided


   • vertex scheduling: all active vertices (unordered) per iteration
   • termination: all iterations or (no signal and no collect)

   • extension v.signalScore() “importance for vertex to signal”
       • may change ⇐⇒ v.state changes
       • default → 1 if changed, 0 otherwise
   • extension v.collectScore() “importance for vertex to collect”
       • may change ⇐⇒ v.uncollectedSignals changes
       • default → v.uncollectedSignals.size()




                           Alex Averbuch   Big Graph Processing         27 / 36
Signal/Collect — Execution — Synchronous Guided


    • vertex scheduling: all active vertices (unordered) per iteration
    • termination: all iterations or (no signal and no collect)


  Scheduler: Synchronous Score-Guided
   done ← false
   while (iterations < num iterations) ∧ (done = false) do
      done ← true
      iterations ← iterations + 1
      for all v ∈ V parallel do
          if v.signalScore() > signal threshold then
              done ← false
              v.executeSignalOperation()
      for all v ∈ V parallel do
          if v.collectScore() > collect threshold then
              done ← false
              v.executeCollectOperation()




                                Alex Averbuch   Big Graph Processing     27 / 36
Signal/Collect — Execution — Synchronous Guided


                                -                                                mode: synchronous-guided
                                            2
                                                                                 scoreSignal: state != lastCollectedState
                    3
                                                                                 scoreCollect: v.uncollectedSignals() > 0
                                                        -

            -                       1
                                                                        5
        1
                                                -               1
                1                                                                   -
    *                                   2               2
            3               -

        1                                                           -                   1
                                                    4
                -           1
                                                                               3
                                            -
                                                            3
                4                                                                           *

                        -                   2                                   1
                                                                -




                                                Alex Averbuch               Big Graph Processing                            28 / 36
Signal/Collect — Execution — Synchronous Guided


                                    -                                                mode: synchronous-guided
                                                2
                                                                                     scoreSignal: state != lastCollectedState
                            3
                                                                                     scoreCollect: v.uncollectedSignals() > 0
                                                            -

                    1                   1
                                                                            5                          iteration: 0
        1                                                                                              signaling vertices: 1
            1                                                                                          collecting vertices: 3
                                                    -               1
                        1                                                               -              messages: 3
    0                                       2               2
                                    3                                                                  total operations: 4
                    3
                                                                                                       total messages: 3
                    3                                                   -
                1                                                                           1
            1                                           4
                        1           1
                                                                                   3
                                                -
                                                                3
                        4                                                                       *

                                -               2                                   1
                                                                    -




                                                    Alex Averbuch               Big Graph Processing                            28 / 36
Signal/Collect — Execution — Synchronous Guided


                            4                                                    mode: synchronous-guided
                4                       2
                                                                                 scoreSignal: state != lastCollectedState
                    3
                                                                                 scoreCollect: v.uncollectedSignals() > 0
                                                        2

            1                   1
                                                                        5                          iteration: 1
                                2                                                                  signaling vertices: 3
        1                                                                                          collecting vertices: 6
                                            5                   1
                    1                                                               -              messages: 6
            2
    0                               2                   2
                            2                                                                      total operations: 13
            3                           5
                                                                                                   total messages: 9
        1                                                           -                   1
                            2                       4
                1           1
                                                                               3
                                        2
                                                            3
            5 4                                                                             *

                        5                   2                                   1
                                                                -




                                                Alex Averbuch               Big Graph Processing                            28 / 36
Signal/Collect — Execution — Synchronous Guided


                            4                   6                                mode: synchronous-guided
                                        2
                                                                                 scoreSignal: state != lastCollectedState
                    3
                                                                                 scoreCollect: v.uncollectedSignals() > 0
                                                            2
                                                                             7
            1                   1
                                                                        5                          iteration: 2
                                                                6                                  signaling vertices: 6
        1                                                                                          collecting vertices: 5
                                            4                   1
                1                                                                    6             messages: 7
    0                               2                           7
                                                            2                                      total operations: 24
            3               2           4
                                                                                                   total messages: 16
        1                                                           6                    1
                                                    6
                                                        4
                1           1
                                                                                 3
                                        2
                                                                3
                4                                                                            *
                                        7

                        5                   2                                    1
                                                                7




                                                Alex Averbuch               Big Graph Processing                            28 / 36
Signal/Collect — Execution — Synchronous Guided


                            4                                             mode: synchronous-guided
                                        2
                                                                          scoreSignal: state != lastCollectedState
                    3
                                                                          scoreCollect: v.uncollectedSignals() > 0
                                                    2

            1                   1
                                                                 5                          iteration: 3
                                                            5                               signaling vertices: 4
        1                                                                                   collecting vertices: 3
                                            4               1
                1                                                             5             messages: 6
    0                               2                    6
                                                     2                                      total operations: 31
            3               2
                                                                                            total messages: 22
                                                             6                    1   7
        1
                                                4
                1           1
                                                                          3
                                        2
                                                    10 3              9
                4                                                                 7


                        5               2                                 1
                                                            7
                                                                              8




                                            Alex Averbuch            Big Graph Processing                            28 / 36
Signal/Collect — Execution — Synchronous Guided


                            4                                            mode: synchronous-guided
                                        2
                                                                         scoreSignal: state != lastCollectedState
                    3
                                                                         scoreCollect: v.uncollectedSignals() > 0
                                                    2

            1                   1
                                                                5                          iteration: 4
                                                                                           signaling vertices: 2
        1                                                                                  collecting vertices: 1
                                            4               1
                1                                                          5               messages: 1
    0                               2               2
                            2                                                              total operations: 34
            3
                                                                                           total messages: 23
                                                            6                  1     6
        1
                                                4
                1           1
                                                                       3
                                        2
                                                        3
                4                                                                6


                        5               2                               1
                                                            7




                                            Alex Averbuch           Big Graph Processing                            28 / 36
Signal/Collect — Execution — Synchronous Guided


                            4                                            mode: synchronous-guided
                                        2
                                                                         scoreSignal: state != lastCollectedState
                    3
                                                                         scoreCollect: v.uncollectedSignals() > 0
                                                    2

            1                   1
                                                                5                          total iterations: 4
                                                                                           total operations: 34
        1                                                                                  total messages: 23
                                            4               1
                1                                                          5
    0                               2               2
            3               2

        1                                                   6                  1
                                                4
                1           1
                                                                       3
                                        2
                                                        3
                4                                                                6


                        5               2                               1
                                                            7




                                            Alex Averbuch           Big Graph Processing                            28 / 36
Signal/Collect — Execution — Asynchronous


   • vertex scheduling: random
   • termination: max operations or (no signal and no collect)




                          Alex Averbuch   Big Graph Processing   29 / 36
Signal/Collect — Execution — Asynchronous


   • vertex scheduling: random
   • termination: max operations or (no signal and no collect)

   • no guarantee on order of execution
       • some vertices may signal while others collect
   • no guarantee that all vertices are executed same amount of times
   • asynchronous mode not usable for every algorithm
       • use when correctness not dependent on strict execution order




                            Alex Averbuch   Big Graph Processing        29 / 36
Signal/Collect — Execution — Asynchronous


    • vertex scheduling: random
    • termination: max operations or (no signal and no collect)


  Scheduler: Asynchronous
   ops ← 0
   while (ops < num ops) ∧ ∃v ∈ V :
   (v.signalScore > signal threshold) ∨ (v.collectScore > collect threshold) do
       S ← random subset of V
       for all v ∈ S do
           next ← random(signal/collect)
           if (next = signal) ∧ (v.signalScore > signal threshold) then
               v.executeSignalOperation()
               ops ← ops + 1
           else if (next = collect) ∧ (v.collectScore > collect threshold) then
               v.executeCollectOperation()
               ops ← ops + 1




                                Alex Averbuch   Big Graph Processing              29 / 36
Signal/Collect — Execution — Asynchronous Scheduled



   • vertex scheduling: scheduler-dependent
   • termination: scheduler-dependent
   • scheduler: schedules vertices’ signal & collect operations




                         Alex Averbuch   Big Graph Processing     30 / 36
Signal/Collect — Execution — Asynchronous Scheduled



    • vertex scheduling: scheduler-dependent
    • termination: scheduler-dependent
    • scheduler: schedules vertices’ signal & collect operations
        • e.g. “eager” scheduler → tries to signal right after collection



  Scheduler: “Eager”
   for all v ∈ V do
       if (v.collectScore() > collect threshold) then
           v.executeCollectOperation()
           if (v.signalScore() > signal threshold) then
               v.executeSignalOperation()




                                 Alex Averbuch   Big Graph Processing       30 / 36
Signal/Collect — Execution — Asynchronous Scheduled



   • vertex scheduling: scheduler-dependent
   • termination: scheduler-dependent
   • scheduler: schedules vertices’ signal & collect operations



   • benefit of this execution mode depends on:
       1 impact on convergence (number of operations)
       2 cost of operations
       3 cost of scheduling




                           Alex Averbuch   Big Graph Processing   30 / 36
Signal/Collect — Execution — Asynchronous Scheduled

  Scheduler: “SSSP” — minimize messages & computation
   Signal ← {v source} // sorted set
   while (Signal = {}) do
      for top k v ∈ Signal do
          v.executeSignalOperation()
          Signal.remove(v)
      for all v ∈ V do
          if (v.collectScore > collect threshold) then
              v.executeCollectOperation()
              if (v.signalScore > signal threshold) then
                  Signal.put(v)
      if (v destination.state ≤ min(Signal)) then
          Signal ← {}

   // returns distance of shortest next step
   function signalSort(v)
      Distances ← {∞}
      for all e ∈ v.outEdges do
          Distances.put(e.signal)
      return min(Distances)



                                Alex Averbuch   Big Graph Processing   31 / 36
Signal/Collect — Execution — Asynchronous Scheduled


                               -                                                 mode: asynchronous-scheduled
                                           2                                     scoreSignal: state != lastCollectedState
                   3                                                             scoreCollect: v.uncollectedSignals() > 0
                                                        -

           -                       1
                                                                     5
       1
                                               -            1
               1                                                             -
   *                                   2                2
           3               -

       1                                                         -               1
                                                    4
               -           1
                                                                         3
                                           -
                                                            3
               4                                                                     *

                       -                   2                             1
                                                             -




                                                   Alex Averbuch         Big Graph Processing                               32 / 36
Signal/Collect — Execution — Asynchronous Scheduled


                                     -                                                 mode: asynchronous-scheduled
                                                 2                                     scoreSignal: state != lastCollectedState
                            3                                                          scoreCollect: v.uncollectedSignals() > 0
                                                              -
                   2                                                                                     iteration: 0
                    1                    1
                                                                           5                             signaling vertices: 1
       1                                                                                                 collecting vertices: 3
           1
                                                     -            1                                      messages: 3
                        1                                                          -
   0                                         2                2                                          total operations: 4
                   3                3                                                                    total messages: 3
                                     5
                   3                                                   -
               1                                                                       1
           1            2                                 4
                        1           1
                                                                               3
                                                 -
                                                                  3
                        4                                                                  *

                                -                2                             1
                                                                   -




                                                         Alex Averbuch         Big Graph Processing                               32 / 36
Signal/Collect — Execution — Asynchronous Scheduled


                                -                                                 mode: asynchronous-scheduled
                                            2                                     scoreSignal: state != lastCollectedState
                    3                                                             scoreCollect: v.uncollectedSignals() > 0
                                                         -
           2                                                                                        iteration: 1
            1                       1
                                                                      5                             signaling vertices: 1
                                                                                                    collecting vertices: 2
       1
                                                -            1                                      messages: 2
                1                                                             -
   0                                    2                2                                          total operations: 7
           3                3                                                                       total messages: 5
                             5
       1                                                          -               1
                                2                    4
                1               1
                                                                          3
                                            2
                                                5            3
            5 4                                                                       *
                            7
                        5                   2                             1
                                                              -




                                                    Alex Averbuch         Big Graph Processing                               32 / 36
Batch Graph Processing Frameworks
Batch Graph Processing Frameworks
Batch Graph Processing Frameworks
Batch Graph Processing Frameworks
Batch Graph Processing Frameworks
Batch Graph Processing Frameworks
Batch Graph Processing Frameworks
Batch Graph Processing Frameworks
Batch Graph Processing Frameworks

More Related Content

Viewers also liked

Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdjins0618
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big datajins0618
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...jins0618
 
Movies&amp;demographics
Movies&amp;demographicsMovies&amp;demographics
Movies&amp;demographicsjins0618
 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationjins0618
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase miningjins0618
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisjins0618
 
Processing Large Graphs in Hadoop
Processing Large Graphs in HadoopProcessing Large Graphs in Hadoop
Processing Large Graphs in HadoopDani Solà Lagares
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...jins0618
 
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...Deepak Ajwani
 
Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台jins0618
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big datajins0618
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXAndrea Iacono
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processingjins0618
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutlinejins0618
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environmentjins0618
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Yuanyuan Tian
 

Viewers also liked (20)

Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big data
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
 
Movies&amp;demographics
Movies&amp;demographicsMovies&amp;demographics
Movies&amp;demographics
 
Graph processing
Graph processingGraph processing
Graph processing
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
 
Processing Large Graphs in Hadoop
Processing Large Graphs in HadoopProcessing Large Graphs in Hadoop
Processing Large Graphs in Hadoop
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
Trade-offs in Processing Large Graphs: Representations, Storage, Systems and ...
 
Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
 

Similar to Batch Graph Processing Frameworks

Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
Random graph models
Random graph modelsRandom graph models
Random graph modelsnetworksuw
 
Cosequential processing and the sorting of large files
Cosequential processing and the sorting of large filesCosequential processing and the sorting of large files
Cosequential processing and the sorting of large filesDevyani Vaidya
 

Similar to Batch Graph Processing Frameworks (7)

Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
 
Editors l21 l24
Editors l21 l24Editors l21 l24
Editors l21 l24
 
Graph theory
Graph theoryGraph theory
Graph theory
 
Random graph models
Random graph modelsRandom graph models
Random graph models
 
Cosequential processing and the sorting of large files
Cosequential processing and the sorting of large filesCosequential processing and the sorting of large files
Cosequential processing and the sorting of large files
 
Kaggle tokyo 2018
Kaggle tokyo 2018Kaggle tokyo 2018
Kaggle tokyo 2018
 
Integration
IntegrationIntegration
Integration
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Batch Graph Processing Frameworks

  • 1. Comparison of Graph Processing Frameworks Alex Averbuch Swedish Institute of Computer Science averbuch@sics.se January 25, 2012 Alex Averbuch Big Graph Processing 1 / 36
  • 2. Frameworks Compared • Pregel: a system for large-scale graph processing. G. Malewicz, M.H. Austern, AJ Bik, J.C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. PODC, 2009. • Signal/collect: graph algorithms for the (semantic) web. P. Stutz, A. Bernstein, and W. Cohen. The Semantic Web - ISWC, 2010. Alex Averbuch Big Graph Processing 2 / 36
  • 3. Background — Big Graphs Everywhere • Real world web and social graphs continue to grow • 2008 → Google estimates number of web pages at 1 trillion • March 2011 → LinkedIn has over 120 million registered users • September 2011 → Twitter has over 100 million active users • September 2011 → Facebook has over 800 million active users Alex Averbuch Big Graph Processing 3 / 36
  • 4. Background — Big Graphs Everywhere • Real world web and social graphs continue to grow • 2008 → Google estimates number of web pages at 1 trillion • March 2011 → LinkedIn has over 120 million registered users • September 2011 → Twitter has over 100 million active users • September 2011 → Facebook has over 800 million active users Data: The New Oil Alex Averbuch Big Graph Processing 3 / 36
  • 5. Background — Big Graphs Everywhere • Real world web and social graphs continue to grow • 2008 → Google estimates number of web pages at 1 trillion • March 2011 → LinkedIn has over 120 million registered users • September 2011 → Twitter has over 100 million active users • September 2011 → Facebook has over 800 million active users Data: The New Oil • Relevant, personalized user information relies on graph algorithms • Popularity rank → determine popular users, news, jobs, etc. • Shortest paths → find how users, groups, etc. are connected • Clustering → discover related people, groups, interests, etc. Alex Averbuch Big Graph Processing 3 / 36
  • 6. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices Alex Averbuch Big Graph Processing 4 / 36
  • 7. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 * * 1 Alex Averbuch Big Graph Processing 4 / 36
  • 8. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 4 2 0 - - - 1 1 iteration 0 Alex Averbuch Big Graph Processing 4 / 36
  • 9. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 0 4 1 - 1 iteration 0 Alex Averbuch Big Graph Processing 4 / 36
  • 10. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 5 1 3 4 2 0 4 1 - 1 iteration 1 Alex Averbuch Big Graph Processing 4 / 36
  • 11. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 0 3 1 5 1 iteration 1 Alex Averbuch Big Graph Processing 4 / 36
  • 12. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 4 1 4 2 0 3 1 5 1 iteration 2 Alex Averbuch Big Graph Processing 4 / 36
  • 13. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 0 3 1 4 1 iteration 2 Alex Averbuch Big Graph Processing 4 / 36
  • 14. Background — The Vertex Centric Model Definition: Vertex Centric Graph Computing Model • computations execute on a compute graph • same topology as that of data graph • vertices are computational units • edges are communication channels • vertices interact with other vertices using messages • computation proceeds in iterations. in each iteration, vertices: 1 perform some computation 2 communicate with other vertices 1 4 2 0 3 1 4 1 Alex Averbuch Big Graph Processing 4 / 36
  • 15. Pregel — Contributions • parallel programming model (for processing graphs) • distributed execution model (for processing graphs) • (limited) evaluation → using big data sets Alex Averbuch Big Graph Processing 5 / 36
  • 16. Pregel — Overview • vertex centric graph computing model • in each iteration a compute function is invoked on each vertex 1 reads messages sent to it in previous iteration 2 modifies its state & local graph topology 3 sends messages to other vertices 4 votes to halt (to become inactive) Vote to halt Active Inactive Message received Vertex State Machine Alex Averbuch Big Graph Processing 6 / 36
  • 17. Pregel — Programming Model (Vertex & Edge) • Vertex (v) • v.id → unique identifier • v.state → arbitrary vertex state • v.outEdges : List[Edge] → list of edges that have v as source • v.compute() : per iteration, calculates new state 1 reads incoming messages, from previous iteration 2 sends (unbounded number of) messages to other vertices 3 if destination non-existent, call handler (create vertex/remove edge) 4 modifies its state and that of its outgoing edges 5 adds/removes edges to/from outEdges 6 votes to halt • Edge (e) • e.targetId → identifier of target vertex • e.state → arbitrary edge state • no associated computation Alex Averbuch Big Graph Processing 7 / 36
  • 18. Pregel — Programming Model (Combiner & Aggregator) • Combiner • combines multiple messages into one (like reducer in M/R) • combined using commutative & associative function • reduces network traffic & message buffer size • e.g. in SSSP vertex only cares about length of shortest path Alex Averbuch Big Graph Processing 8 / 36
  • 19. Pregel — Programming Model (Combiner & Aggregator) • Combiner • combines multiple messages into one (like reducer in M/R) • combined using commutative & associative function • reduces network traffic & message buffer size • e.g. in SSSP vertex only cares about length of shortest path • Aggregator • globally shared/aggregated state 1 vertices write to aggregator variable locally 2 globally aggregated value available to all vertices in next iteration • aggregated using commutative & associative function • pre-defined aggregators: min, max, sum Alex Averbuch Big Graph Processing 8 / 36
  • 20. Pregel — Programming Model (Topology Mutations) • determinism in the presence of conflicts is achieved by: 1 partial ordering 1 remove edges 2 remove vertices (implicitly removes edges) 3 add vertices 4 add edges 2 conflict handlers • example conflict → vertices with same ID created simultaneously • extend conflict handler() of Vertex class • same handler called for all conflict types Alex Averbuch Big Graph Processing 9 / 36
  • 21. Pregel — Programming Model (Topology Mutations) • determinism in the presence of conflicts is achieved by: 1 partial ordering 1 remove edges 2 remove vertices (implicitly removes edges) 3 add vertices 4 add edges 2 conflict handlers • example conflict → vertices with same ID created simultaneously • extend conflict handler() of Vertex class • same handler called for all conflict types • most topology changes are seen in next iteration • self mutations (remove out edge, remove self) are immediate Alex Averbuch Big Graph Processing 9 / 36
  • 22. Pregel — Programming Model Example — Vertex Code: Vertex program for Single Source Shortest Path (SSSP) class S h o rt es t Pa t hV e rt e x : public Vertex < int , int , int > { void Compute ( MessageIt e ra to r * msgs ) { // i n i t i a l i z a t i o n int mindist = IsSource ( vertex_id () ) ? 0 : INF ; // read incoming me s s a g e s & update mindist for (; ! msgs - > Done () ; msgs - > Next () ) mindist = min ( mindist , msgs - > Value () ) ; // send updated mindist to n e i g h b o r s if ( mindist < GetValue () ) { * MutableValue () = mindist ; O utEdgeIterator iter = G e t O u t E d g e I t e r a t o r () ; for (; ! iter . Done () ; iter . Next () ) SendMessageTo ( iter . Target () , mindist + iter . GetValue () ) ; } // d e a c t i v a t e unless / until another message arrives VoteToHalt () ; } }; Alex Averbuch Big Graph Processing 10 / 36
  • 23. Pregel — Execution Model • vertex scheduling: all active vertices, per iteration • termination: no active vertices & no messages in transit Scheduler: Pregel (Bulk Synchronous Parallel) while (∃v ∈ V : v.active = true) do for all v ∈ V parallel do if (v.active = true) then v.compute() Alex Averbuch Big Graph Processing 11 / 36
  • 24. Pregel — Execution — Without Combiner - mode: pregel 2 3 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 12 / 36
  • 25. Pregel — Execution — Without Combiner - mode: pregel 2 3 - - 1 5 iteration: 0 1 computing vertices: 13 1 messages: 3 - 1 1 - total operations: 13 0 2 2 - total messages: 3 3 3 - 1 1 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 12 / 36
  • 26. Pregel — Execution — Without Combiner - mode: pregel 4 2 3 - 1 1 5 iteration: 1 2 computing vertices: 3 1 messages: 6 - 1 1 - 2 total operations: 16 0 2 2 3 total messages: 9 3 5 1 - 1 2 4 1 1 3 - 3 5 4 * - 2 1 - Alex Averbuch Big Graph Processing 12 / 36
  • 27. Pregel — Execution — Without Combiner 4 6 mode: pregel 2 3 2 7 1 1 5 iteration: 2 6 computing vertices: 6 1 messages: 7 5 1 1 - 7 total operations: 22 0 2 2 2 total messages: 16 3 4 1 - 1 6 4 1 1 3 2 3 4 * 7 2 1 5 - Alex Averbuch Big Graph Processing 12 / 36
  • 28. Pregel — Execution — Without Combiner 4 mode: pregel 2 3 2 1 1 5 iteration: 3 5 computing vertices: 5 1 messages: 6 4 1 1 6 6 total operations: 27 0 2 2 2 total messages: 22 3 6 1 7 1 4 1 1 3 2 10 3 9 4 * 5 2 1 7 8 Alex Averbuch Big Graph Processing 12 / 36
  • 29. Pregel — Execution — Without Combiner 4 mode: pregel 2 3 2 1 1 5 iteration: 4 computing vertices: 3 1 messages: 1 4 1 1 5 total operations: 30 0 2 2 2 total messages: 23 3 6 1 6 1 4 1 1 3 2 3 4 7 5 2 1 7 Alex Averbuch Big Graph Processing 12 / 36
  • 30. Pregel — Execution — Without Combiner 4 mode: pregel 2 3 2 1 1 5 iteration: 5 computing vertices: 1 1 messages: 0 4 1 1 5 total operations: 31 0 2 2 2 total messages: 23 3 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 12 / 36
  • 31. Pregel — Execution — Without Combiner 4 mode: pregel 2 3 2 1 1 5 total iterations: 5 total operations: 31 1 total messages: 23 4 1 1 5 0 2 2 3 2 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 12 / 36
  • 32. Pregel — Programming Model Example — Combiner Code: Combiner program for Single Source Shortest Path (SSSP) class MinIntCombiner : public Combiner < int > { virtual void Combine ( Me s sa ge It e ra to r * msgs ) { // i n i t i a l i z a t i o n int mindist = INF ; // read messages & update mindist for (; ! msgs - > Done () ; msgs - > Next () ) mindist = min ( mindist , msgs - > Value () ) ; // only emit minimum message value ( d i s t a n c e ) Output ( " combined_so ur ce " , mindist ) ; } }; Alex Averbuch Big Graph Processing 13 / 36
  • 33. Pregel — Execution — With Combiner - mode: pregel (combined) 2 3 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 14 / 36
  • 34. Pregel — Execution — With Combiner - mode: pregel (combined) 2 3 - - 1 5 iteration: 0 1 computing vertices: 13 1 messages: 3 - 1 1 - total operations: 13 0 2 2 - total messages: 3 3 3 - 1 1 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 14 / 36
  • 35. Pregel — Execution — With Combiner - mode: pregel (combined) 4 2 3 - 1 1 5 iteration: 1 2 computing vertices: 3 1 messages: 6 - 1 1 - 2 total operations: 16 0 2 2 3 total messages: 9 3 5 1 - 1 2 4 1 1 3 - 3 5 4 * - 2 1 - Alex Averbuch Big Graph Processing 14 / 36
  • 36. Pregel — Execution — With Combiner 4 6 mode: pregel (combined) 2 3 2 7 1 1 5 iteration: 2 6 computing vertices: 6 1 messages: 5 5 1 1 - 7 total operations: 22 0 2 2 2 total messages: 14 3 4 1 - 1 6 4 1 1 3 2 3 4 * 7 2 1 5 - Alex Averbuch Big Graph Processing 14 / 36
  • 37. Pregel — Execution — With Combiner 4 mode: pregel (combined) 2 3 2 1 1 5 iteration: 3 5 computing vertices: 5 1 messages: 3 4 1 1 6 6 total operations: 27 0 2 2 2 total messages: 17 3 6 1 7 1 4 1 1 3 2 10 3 9 4 * 5 2 1 7 8 Alex Averbuch Big Graph Processing 14 / 36
  • 38. Pregel — Execution — With Combiner 4 mode: pregel (combined) 2 3 2 1 1 5 iteration: 4 computing vertices: 3 1 messages: 1 4 1 1 5 total operations: 30 0 2 2 2 total messages: 18 3 6 1 6 1 4 1 1 3 2 3 4 7 5 2 1 7 Alex Averbuch Big Graph Processing 14 / 36
  • 39. Pregel — Execution — With Combiner 4 mode: pregel (combined) 2 3 2 1 1 5 iteration: 5 computing vertices: 1 1 messages: 0 4 1 1 5 total operations: 31 0 2 2 2 total messages: 18 3 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 14 / 36
  • 40. Pregel — Execution — With Combiner 4 mode: pregel (combined) 2 3 2 1 1 5 total iterations: 5 total operations: 31 1 total messages: 18 4 1 1 5 0 2 2 3 2 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 14 / 36
  • 41. Pregel — Execution — Comparison • combiner vs no combiner • algorithm → Single Source Shortest Path (SSSP) • sample graph → 13 vertices / 19 edges • cost → iterations, operations, message buffers Results: Cost comparison of execution modes (SSSP) Iterations Operations Messages Pregel 5 31 23 Pregel + Combiner 5 31 18 Alex Averbuch Big Graph Processing 15 / 36
  • 42. Pregel — Architecture Synchronization & Master Aggregatation Worker outQ Worker outQ Worker outQ inQ Partition inQ Partition inQ Partition Loading & Checkpointing Graph Dataset (Combined) Messages Alex Averbuch Big Graph Processing 16 / 36
  • 43. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete Alex Averbuch Big Graph Processing 17 / 36
  • 44. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete • Master 1 repeat until no active workers • signal workers to process • wait for all workers to finish • update active-worker count Alex Averbuch Big Graph Processing 17 / 36
  • 45. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete • Master 1 repeat until no active workers • signal workers to process • wait for all workers to finish • update active-worker count • Worker 1 repeat until inactive • wait for “start iteration” from master • read data from in-queue • perform local processing • write data to out-queue & transmit • update active/inactive status • notify master Alex Averbuch Big Graph Processing 17 / 36
  • 46. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete • Master 1 repeat until no active workers • signal workers to process • wait for all workers to finish • update active-worker count 2 notify client about completion • Worker 1 repeat until inactive • wait for “start iteration” from master • read data from in-queue • perform local processing • write data to out-queue & transmit • update active/inactive status • notify master Alex Averbuch Big Graph Processing 17 / 36
  • 47. Pregel — Typical Program • Client 1 load input data into workers 2 notify master to “start processing” 3 wait for master to complete 4 extract result data from workers • Master 1 repeat until no active workers • signal workers to process • wait for all workers to finish • update active-worker count 2 notify client about completion • Worker 1 repeat until inactive • wait for “start iteration” from master • read data from in-queue • perform local processing • write data to out-queue & transmit • update active/inactive status • notify master Alex Averbuch Big Graph Processing 17 / 36
  • 48. Pregel — Fault Tolerance • logging 1 checkpointing → state persisted at beginning of every n-th iteration • master persists → progress of execution, aggregate values • workers persist → vertex values, edge values, messages 2 confined recovery → workers log out-messages from their partitions Alex Averbuch Big Graph Processing 18 / 36
  • 49. Pregel — Fault Tolerance • logging 1 checkpointing → state persisted at beginning of every n-th iteration • master persists → progress of execution, aggregate values • workers persist → vertex values, edge values, messages 2 confined recovery → workers log out-messages from their partitions • failure detection • heart beats • worker gets no heartbeat from master → worker terminates • master gets no heartbeat from worker → marks worker as failed Alex Averbuch Big Graph Processing 18 / 36
  • 50. Pregel — Fault Tolerance • logging 1 checkpointing → state persisted at beginning of every n-th iteration • master persists → progress of execution, aggregate values • workers persist → vertex values, edge values, messages 2 confined recovery → workers log out-messages from their partitions • failure detection • heart beats • worker gets no heartbeat from master → worker terminates • master gets no heartbeat from worker → marks worker as failed • failure recovery • partition(s) belonging to failed worker(s) are reassigned • lost partitions recovered from checkpoints • missing iterations recomputed (using logged messages) Alex Averbuch Big Graph Processing 18 / 36
  • 51. Pregel — Evaluation — Scaling • algorithm → Single Source Shortest Path (SSSP) • hardware → 800 worker tasks scheduled on 300 multicore machines • graph • random, log-normal out-degree distribution (mean = 127.1) • up to 1,000,000,000 vertices / 127,000,000,000 edges Results: Scalability of Pregel (SSSP) 800 700 Runtime (seconds) 600 500 400 300 200 100 200 400 600 800 1000 Number of vertices (millions) Alex Averbuch Big Graph Processing 19 / 36
  • 52. Signal/Collect — Contributions • parallel programming model (for processing graphs) • parallel execution model (for processing graphs) • (limited) evaluation → benefits of various scheduling policies Alex Averbuch Big Graph Processing 20 / 36
  • 53. Signal/Collect — Programming Model (Vertex) • Vertex (v) • v.id → unique identifier • v.state → arbitrary vertex state • v.lastSignalState → v.state at time of last signal() • v.outEdges : List[edge] → list of edges that have v as source • v.signalMap : Map(vid,signal) → last received messages • vid - identifier of sender vertex • signal - last received signal from that vertex • v.uncollectedSignals : List[signal] → list of signals received since collect() was last executed • v.collect() : calculates new vertex state 1 collect incoming signals 2 process those signals (possibly using v.state) 3 return new vertex state Alex Averbuch Big Graph Processing 21 / 36
  • 54. Signal/Collect — Programming Model (Edge) • Edge (e) • e.source → source vertex • e.sourceId → identifier of source vertex (e.source.id) • e.targetId → identifier of target vertex • e.state → arbitrary edge state • e.signal() → calculates the signal to send, then sends it • signals are sent along edges of the compute graph Alex Averbuch Big Graph Processing 22 / 36
  • 55. Signal/Collect — Programming Model Example Code: Single Source Shortest Path (SSSP) class Location ( id : Any , initialState : Int ) extends Vertex { def collect : Int = min ( state , min ( u n c o l l e c t e d S i g n a l s ) ) } class Path ( sourceId : Any , targetId : Int ) extends Edge { def signal : Int = source . state + weight } • vertex state → shortest known path length to vertex from source • edge state (weight) → path length of that individual edge • signal → shortest known path length from source through edge Alex Averbuch Big Graph Processing 23 / 36
  • 56. Signal/Collect — Execution Model • different Execution Modes (scheduling policies) 1 synchronous 2 synchronous score-guided 3 asynchronous 4 asynchronous scheduled Alex Averbuch Big Graph Processing 24 / 36
  • 57. Signal/Collect — Execution Model • different Execution Modes (scheduling policies) 1 synchronous 2 synchronous score-guided 3 asynchronous 4 asynchronous scheduled • execution mode dictates when signal & collect are called Definition: Internal methods used by execution engine procedure v.executeSignalOperation() lastSignalState ← state for all e ∈ outGoingEdges do e.target.uncollectedSignals.append(e.signal()) e.target.signalMap.put(e.sourceId,e.signal()) procedure v.executeCollectOperation() state ← collect() uncollectedSignals ← Nil Alex Averbuch Big Graph Processing 24 / 36
  • 58. Signal/Collect — Execution — Synchronous • vertex scheduling: all vertices (unordered) per iteration • termination: all iterations Scheduler: Synchronous for i ← 1 to num iterations do for all v ∈ V parallel do v.executeSignalOperation() for all v ∈ V parallel do v.executeCollectOperation() Alex Averbuch Big Graph Processing 25 / 36
  • 59. Signal/Collect — Execution — Synchronous - mode: synchronous 2 3 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 26 / 36
  • 60. Signal/Collect — Execution — Synchronous - - mode: synchronous - 2 3 - - 1 1 5 iteration: 0 1 - - signaling vertices: 13 1 collecting vertices: 13 - 1 1 - messages: 19 - 0 2 - 2 total operations: 26 3 3 - total messages: 19 3 - - 1 - 1 1 - 4 1 1 3 - - 3 - - 4 * - - 2 1 - - Alex Averbuch Big Graph Processing 26 / 36
  • 61. Signal/Collect — Execution — Synchronous 4 - mode: synchronous 4 2 3 2 - 1 1 5 iteration: 1 1 2 - signaling vertices: 13 1 collecting vertices: 13 5 1 1 - messages: 19 2 0 2 - 2 total operations: 52 3 2 5 total messages: 38 3 - - 1 - 1 1 2 4 1 1 3 2 - 3 - 5 4 * - 5 2 1 - - Alex Averbuch Big Graph Processing 26 / 36
  • 62. Signal/Collect — Execution — Synchronous 4 6 mode: synchronous 4 2 3 2 7 1 1 5 iteration: 2 1 2 6 signaling vertices: 13 1 collecting vertices: 13 4 1 1 6 messages: 19 2 0 2 7 2 total operations: 78 3 2 4 total messages: 57 3 - 1 6 1 6 1 2 4 1 1 3 2 - 3 - 5 4 * 7 5 2 1 7 - Alex Averbuch Big Graph Processing 26 / 36
  • 63. Signal/Collect — Execution — Synchronous 4 6 mode: synchronous 4 2 3 2 7 1 1 5 iteration: 3 1 2 5 signaling vertices: 13 1 collecting vertices: 13 4 1 1 5 messages: 19 2 0 2 6 2 total operations: 104 3 2 4 total messages: 76 3 7 1 6 1 6 1 2 4 1 1 3 2 10 3 9 7 5 4 7 5 2 1 7 8 Alex Averbuch Big Graph Processing 26 / 36
  • 64. Signal/Collect — Execution — Synchronous 4 6 mode: synchronous 4 2 3 2 7 1 1 5 iteration: 4 1 2 5 signaling vertices: 13 1 collecting vertices: 13 4 1 1 5 messages: 19 2 0 2 6 2 total operations: 130 3 2 4 total messages: 95 3 6 1 6 1 6 1 2 4 1 1 3 2 10 3 9 6 5 4 7 5 2 1 7 8 Alex Averbuch Big Graph Processing 26 / 36
  • 65. Signal/Collect — Execution — Synchronous 4 mode: synchronous 2 3 2 1 1 5 total iterations: 4 total operations: 130 1 total messages: 95 4 1 1 5 0 2 2 3 2 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 26 / 36
  • 66. Signal/Collect — Execution — Synchronous Guided • vertex scheduling: all active vertices (unordered) per iteration • termination: all iterations or (no signal and no collect) Alex Averbuch Big Graph Processing 27 / 36
  • 67. Signal/Collect — Execution — Synchronous Guided • vertex scheduling: all active vertices (unordered) per iteration • termination: all iterations or (no signal and no collect) • extension v.signalScore() “importance for vertex to signal” • may change ⇐⇒ v.state changes • default → 1 if changed, 0 otherwise • extension v.collectScore() “importance for vertex to collect” • may change ⇐⇒ v.uncollectedSignals changes • default → v.uncollectedSignals.size() Alex Averbuch Big Graph Processing 27 / 36
  • 68. Signal/Collect — Execution — Synchronous Guided • vertex scheduling: all active vertices (unordered) per iteration • termination: all iterations or (no signal and no collect) Scheduler: Synchronous Score-Guided done ← false while (iterations < num iterations) ∧ (done = false) do done ← true iterations ← iterations + 1 for all v ∈ V parallel do if v.signalScore() > signal threshold then done ← false v.executeSignalOperation() for all v ∈ V parallel do if v.collectScore() > collect threshold then done ← false v.executeCollectOperation() Alex Averbuch Big Graph Processing 27 / 36
  • 69. Signal/Collect — Execution — Synchronous Guided - mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 28 / 36
  • 70. Signal/Collect — Execution — Synchronous Guided - mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 - 1 1 5 iteration: 0 1 signaling vertices: 1 1 collecting vertices: 3 - 1 1 - messages: 3 0 2 2 3 total operations: 4 3 total messages: 3 3 - 1 1 1 4 1 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 28 / 36
  • 71. Signal/Collect — Execution — Synchronous Guided 4 mode: synchronous-guided 4 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 1 1 5 iteration: 1 2 signaling vertices: 3 1 collecting vertices: 6 5 1 1 - messages: 6 2 0 2 2 2 total operations: 13 3 5 total messages: 9 1 - 1 2 4 1 1 3 2 3 5 4 * 5 2 1 - Alex Averbuch Big Graph Processing 28 / 36
  • 72. Signal/Collect — Execution — Synchronous Guided 4 6 mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 7 1 1 5 iteration: 2 6 signaling vertices: 6 1 collecting vertices: 5 4 1 1 6 messages: 7 0 2 7 2 total operations: 24 3 2 4 total messages: 16 1 6 1 6 4 1 1 3 2 3 4 * 7 5 2 1 7 Alex Averbuch Big Graph Processing 28 / 36
  • 73. Signal/Collect — Execution — Synchronous Guided 4 mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 1 1 5 iteration: 3 5 signaling vertices: 4 1 collecting vertices: 3 4 1 1 5 messages: 6 0 2 6 2 total operations: 31 3 2 total messages: 22 6 1 7 1 4 1 1 3 2 10 3 9 4 7 5 2 1 7 8 Alex Averbuch Big Graph Processing 28 / 36
  • 74. Signal/Collect — Execution — Synchronous Guided 4 mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 1 1 5 iteration: 4 signaling vertices: 2 1 collecting vertices: 1 4 1 1 5 messages: 1 0 2 2 2 total operations: 34 3 total messages: 23 6 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 28 / 36
  • 75. Signal/Collect — Execution — Synchronous Guided 4 mode: synchronous-guided 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 2 1 1 5 total iterations: 4 total operations: 34 1 total messages: 23 4 1 1 5 0 2 2 3 2 1 6 1 4 1 1 3 2 3 4 6 5 2 1 7 Alex Averbuch Big Graph Processing 28 / 36
  • 76. Signal/Collect — Execution — Asynchronous • vertex scheduling: random • termination: max operations or (no signal and no collect) Alex Averbuch Big Graph Processing 29 / 36
  • 77. Signal/Collect — Execution — Asynchronous • vertex scheduling: random • termination: max operations or (no signal and no collect) • no guarantee on order of execution • some vertices may signal while others collect • no guarantee that all vertices are executed same amount of times • asynchronous mode not usable for every algorithm • use when correctness not dependent on strict execution order Alex Averbuch Big Graph Processing 29 / 36
  • 78. Signal/Collect — Execution — Asynchronous • vertex scheduling: random • termination: max operations or (no signal and no collect) Scheduler: Asynchronous ops ← 0 while (ops < num ops) ∧ ∃v ∈ V : (v.signalScore > signal threshold) ∨ (v.collectScore > collect threshold) do S ← random subset of V for all v ∈ S do next ← random(signal/collect) if (next = signal) ∧ (v.signalScore > signal threshold) then v.executeSignalOperation() ops ← ops + 1 else if (next = collect) ∧ (v.collectScore > collect threshold) then v.executeCollectOperation() ops ← ops + 1 Alex Averbuch Big Graph Processing 29 / 36
  • 79. Signal/Collect — Execution — Asynchronous Scheduled • vertex scheduling: scheduler-dependent • termination: scheduler-dependent • scheduler: schedules vertices’ signal & collect operations Alex Averbuch Big Graph Processing 30 / 36
  • 80. Signal/Collect — Execution — Asynchronous Scheduled • vertex scheduling: scheduler-dependent • termination: scheduler-dependent • scheduler: schedules vertices’ signal & collect operations • e.g. “eager” scheduler → tries to signal right after collection Scheduler: “Eager” for all v ∈ V do if (v.collectScore() > collect threshold) then v.executeCollectOperation() if (v.signalScore() > signal threshold) then v.executeSignalOperation() Alex Averbuch Big Graph Processing 30 / 36
  • 81. Signal/Collect — Execution — Asynchronous Scheduled • vertex scheduling: scheduler-dependent • termination: scheduler-dependent • scheduler: schedules vertices’ signal & collect operations • benefit of this execution mode depends on: 1 impact on convergence (number of operations) 2 cost of operations 3 cost of scheduling Alex Averbuch Big Graph Processing 30 / 36
  • 82. Signal/Collect — Execution — Asynchronous Scheduled Scheduler: “SSSP” — minimize messages & computation Signal ← {v source} // sorted set while (Signal = {}) do for top k v ∈ Signal do v.executeSignalOperation() Signal.remove(v) for all v ∈ V do if (v.collectScore > collect threshold) then v.executeCollectOperation() if (v.signalScore > signal threshold) then Signal.put(v) if (v destination.state ≤ min(Signal)) then Signal ← {} // returns distance of shortest next step function signalSort(v) Distances ← {∞} for all e ∈ v.outEdges do Distances.put(e.signal) return min(Distances) Alex Averbuch Big Graph Processing 31 / 36
  • 83. Signal/Collect — Execution — Asynchronous Scheduled - mode: asynchronous-scheduled 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 - - 1 5 1 - 1 1 - * 2 2 3 - 1 - 1 4 - 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 32 / 36
  • 84. Signal/Collect — Execution — Asynchronous Scheduled - mode: asynchronous-scheduled 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 - 2 iteration: 0 1 1 5 signaling vertices: 1 1 collecting vertices: 3 1 - 1 messages: 3 1 - 0 2 2 total operations: 4 3 3 total messages: 3 5 3 - 1 1 1 2 4 1 1 3 - 3 4 * - 2 1 - Alex Averbuch Big Graph Processing 32 / 36
  • 85. Signal/Collect — Execution — Asynchronous Scheduled - mode: asynchronous-scheduled 2 scoreSignal: state != lastCollectedState 3 scoreCollect: v.uncollectedSignals() > 0 - 2 iteration: 1 1 1 5 signaling vertices: 1 collecting vertices: 2 1 - 1 messages: 2 1 - 0 2 2 total operations: 7 3 3 total messages: 5 5 1 - 1 2 4 1 1 3 2 5 3 5 4 * 7 5 2 1 - Alex Averbuch Big Graph Processing 32 / 36