9: MR+

                    Zubair Nabi

              zubair.nabi@itu.edu.pk


                  April 19, 2013




Zubair Nabi       9: MR+               April 19, 2013   1 / 26
Outline



  1    Introduction


  2    MR+


  3    Implementation


  4    Code-base




  Zubair Nabi           9: MR+   April 19, 2013   2 / 26
Outline



  1    Introduction


  2    MR+


  3    Implementation


  4    Code-base




  Zubair Nabi           9: MR+   April 19, 2013   3 / 26
Implicit MapReduce Assumptions




          The input data has no structure




  Zubair Nabi                      9: MR+   April 19, 2013   4 / 26
Implicit MapReduce Assumptions




          The input data has no structure
          The distribution of intermediate data is balanced




  Zubair Nabi                      9: MR+                     April 19, 2013   4 / 26
Implicit MapReduce Assumptions




          The input data has no structure
          The distribution of intermediate data is balanced
          Results materialize when all the map and reduce tasks complete




  Zubair Nabi                      9: MR+                         April 19, 2013   4 / 26
Implicit MapReduce Assumptions




          The input data has no structure
          The distribution of intermediate data is balanced
          Results materialize when all the map and reduce tasks complete
          The number of values of each key is small enough to be processed by
          a single reduce task




  Zubair Nabi                      9: MR+                         April 19, 2013   4 / 26
Implicit MapReduce Assumptions




          The input data has no structure
          The distribution of intermediate data is balanced
          Results materialize when all the map and reduce tasks complete
          The number of values of each key is small enough to be processed by
          a single reduce task
          Processing the data at the reduce stage in most cases is usually a
          simple aggregation function




  Zubair Nabi                      9: MR+                           April 19, 2013   4 / 26
Zipf distributions are everywhere




  Zubair Nabi            9: MR+     April 19, 2013   5 / 26
Reduce-intensive applications




          Image and speech correlation
          Backpropagation in neural networks
          Co-clustering
          Tree learning
          Computation of node diameter and radii in Tera-scale graphs
          ...




  Zubair Nabi                     9: MR+                           April 19, 2013   6 / 26
Outline



  1    Introduction


  2    MR+


  3    Implementation


  4    Code-base




  Zubair Nabi           9: MR+   April 19, 2013   7 / 26
Design Goals




          Negate skew in intermediate data




  Zubair Nabi                     9: MR+     April 19, 2013   8 / 26
Design Goals




          Negate skew in intermediate data
          Exploit structure in input data




  Zubair Nabi                        9: MR+   April 19, 2013   8 / 26
Design Goals




          Negate skew in intermediate data
          Exploit structure in input data
          Estimate results




  Zubair Nabi                        9: MR+   April 19, 2013   8 / 26
Design Goals




          Negate skew in intermediate data
          Exploit structure in input data
          Estimate results
          Favour commodity clusters




  Zubair Nabi                        9: MR+   April 19, 2013   8 / 26
Design Goals




          Negate skew in intermediate data
          Exploit structure in input data
          Estimate results
          Favour commodity clusters
          Maintain original functional model of MapReduce




  Zubair Nabi                        9: MR+                 April 19, 2013   8 / 26
Design

          Maintains the simple MapReduce programming model




  Zubair Nabi                   9: MR+                       April 19, 2013   9 / 26
Design

          Maintains the simple MapReduce programming model
          Instead of implementing MapReduce as a sequential two-staged
          architecture, MR+ allows map and reduce stages to interleave and
          iterate over intermediate results




  Zubair Nabi                     9: MR+                          April 19, 2013   9 / 26
Design

          Maintains the simple MapReduce programming model
          Instead of implementing MapReduce as a sequential two-staged
          architecture, MR+ allows map and reduce stages to interleave and
          iterate over intermediate results
          Leading to a multi-level inverted tree of reduce workers




  Zubair Nabi                       9: MR+                           April 19, 2013   9 / 26
Architecture



                 Map Phase             Reduce Phase   5% -10%
                                                      Estimation cycle
                                                      prioritizes data
        Map
        Reduce



                 MR       Brick-wall         MR End    MR+ Start           Brick-wall          MR+ End


                      (a) MapReduce                                      (b) MR+

                  Figure: Architectural comparison of MapReduce and MR+.




  Zubair Nabi                               9: MR+                                      April 19, 2013   10 / 26
Architectural Flexibility




     1    Instead of waiting for all maps to finish before scheduling a reduce
          task, MR+ permits a model where a reduce task can be scheduled for
          every n invocations of the map function




  Zubair Nabi                     9: MR+                         April 19, 2013   11 / 26
Architectural Flexibility




     1    Instead of waiting for all maps to finish before scheduling a reduce
          task, MR+ permits a model where a reduce task can be scheduled for
          every n invocations of the map function
     2    A densely populated key can be recursively reduced by repeated
          invocation of the reduce function at multiple reduce workers




  Zubair Nabi                     9: MR+                         April 19, 2013   11 / 26
Advantages




          Resilient to TCP Incast by amortizing data copying over the course of
          the job




  Zubair Nabi                      9: MR+                          April 19, 2013   12 / 26
Advantages




          Resilient to TCP Incast by amortizing data copying over the course of
          the job
          Early materialization of partial results for queries with thresholds or
          confidence intervals




  Zubair Nabi                        9: MR+                            April 19, 2013   12 / 26
Advantages




          Resilient to TCP Incast by amortizing data copying over the course of
          the job
          Early materialization of partial results for queries with thresholds or
          confidence intervals
          Finds structure in the data by running a sample cycle to learn the
          distribution of information and prioritizes input data with respect to the
          user query




  Zubair Nabi                        9: MR+                            April 19, 2013   12 / 26
Programming Model




          Retains the 2-stage MapReduce API




  Zubair Nabi                   9: MR+        April 19, 2013   13 / 26
Programming Model




          Retains the 2-stage MapReduce API
          MR+ reducers can be likened to distributed combiners




  Zubair Nabi                     9: MR+                         April 19, 2013   13 / 26
Programming Model




          Retains the 2-stage MapReduce API
          MR+ reducers can be likened to distributed combiners
          Repeated invocation of the reducer by default rules out non-associative
          functions




  Zubair Nabi                      9: MR+                          April 19, 2013   13 / 26
Programming Model




          Retains the 2-stage MapReduce API
          MR+ reducers can be likened to distributed combiners
          Repeated invocation of the reducer by default rules out non-associative
          functions
          But reducers can be designed in such a way that the associative
          operation is applied only at the very last reduce




  Zubair Nabi                      9: MR+                          April 19, 2013   13 / 26
Outline



  1    Introduction


  2    MR+


  3    Implementation


  4    Code-base




  Zubair Nabi           9: MR+   April 19, 2013   14 / 26
Scheduling




          Tasks are scheduled according to a configurable
          map_to_reduce_schedule_ratio parameter




  Zubair Nabi                   9: MR+                     April 19, 2013   15 / 26
Scheduling




          Tasks are scheduled according to a configurable
          map_to_reduce_schedule_ratio parameter
          For every map_to_reduce_schedule_ratio map tasks, 1
          reduce task is scheduled




  Zubair Nabi                   9: MR+                     April 19, 2013   15 / 26
Scheduling




          Tasks are scheduled according to a configurable
          map_to_reduce_schedule_ratio parameter
          For every map_to_reduce_schedule_ratio map tasks, 1
          reduce task is scheduled
          For instance, if map_to_reduce_schedule_ratio is 4, then the
          first reduce task is scheduled when 4 map tasks complete




  Zubair Nabi                   9: MR+                     April 19, 2013   15 / 26
Level-1 reducers




          Each reduce is assigned the output of map_to_reduce_ratio
          number of maps




  Zubair Nabi                  9: MR+                      April 19, 2013   16 / 26
Level-1 reducers




          Each reduce is assigned the output of map_to_reduce_ratio
          number of maps
          The location of their inputs is communicated by the JobTracker




  Zubair Nabi                    9: MR+                        April 19, 2013   16 / 26
Level-1 reducers




          Each reduce is assigned the output of map_to_reduce_ratio
          number of maps
          The location of their inputs is communicated by the JobTracker
          Each reduce task pulls its input via HTTP




  Zubair Nabi                      9: MR+                      April 19, 2013   16 / 26
Level-1 reducers




          Each reduce is assigned the output of map_to_reduce_ratio
          number of maps
          The location of their inputs is communicated by the JobTracker
          Each reduce task pulls its input via HTTP
          After the reduce logic has been applied to all keys, the output is
          earmarked for L > 1 reducers




  Zubair Nabi                       9: MR+                            April 19, 2013   16 / 26
Level > 1 reducers




          Assigned the input of reduce_input_ratio number of reduce
          tasks




  Zubair Nabi                  9: MR+                      April 19, 2013   17 / 26
Level > 1 reducers




          Assigned the input of reduce_input_ratio number of reduce
          tasks
          Eventually all key/value pairs make their way to the final level, which
          has a single worker




  Zubair Nabi                       9: MR+                           April 19, 2013   17 / 26
Level > 1 reducers




          Assigned the input of reduce_input_ratio number of reduce
          tasks
          Eventually all key/value pairs make their way to the final level, which
          has a single worker
          This final reduce can also be used to apply any non-associative
          operation




  Zubair Nabi                       9: MR+                           April 19, 2013   17 / 26
Structural comparison


                                                                                               k1, v1,v2,...
                                                               Map1                            k2, v1,v2,...
                                                           k1, v1,v2,...                            ...
  k1, v1,v2,...                                            k2, v1,v2,...         Reduce1,1     kn, v1,v2,...
  k2, v1,v2,...                                                 ...
       ...                                                                                     Reduce1,2
                                      k1, v1,v2,...        kn, v1,v2,...
  kn, v1,v2,...                         Reduce1                Map2        ...   Reduce2,1     k1, v1,v2,...             k1, v1,v2,...
                                                                                               k2, v1,v2,...             k2, v1,v2,...
    Map1                              k2, v1,v2,...                                                 ...                       ...
  k1, v1,v2,...                                                  .               Reduce3,1
  k2, v1,v2,...
                                        Reduce2                  .
                                                                 .
                                                                                               kn, v1,v2,...             kn, v1,v2,...
                                                                 .                             Reduce2,2           ...   Reduce1,φ
       ...                            k3, v1,v2,...              .         ...   Reduce4,1
                                                                 .
  kn, v1,v2,...                         Reduce3                                       .
                                                                                      .
                                                                                               k1, v1,v2,...
    Map2                                                                              .        k2, v1,v2,...
                                      k4, v1,v2,...                                   .             ...                  α = ω/mr
                                        Reduce4            Mapω-1          ...   Reduceα-1,1   kn, v1,v2,...
  k1, v1,v2,... .
                .     Shuffler                             k1, v1,v2,...                                                  β = α/rr
  k2, v1,v2,... .                            .                                                 Reduceβ,2
                .                            .
                                             .             k2, v1,v2,...                                                  ϒ = β/rr
       ...                                   .                  ...              Reduceα,1                                    .
  kn, v1,v2,...                       kn, v1,v2,...        kn, v1,v2,...
                                                                                                                              .
                                                                                                                              .
    Mapω                 Brick-wall     Reduceθ                Mapω                                                          1

                    (a) MapReduce                                                         (b) MR+

                       Figure: Structural comparison of MapReduce and MR+.




  Zubair Nabi                                         9: MR+                                                   April 19, 2013       18 / 26
Reduce Locality




          MR+ does not rely on key/values for input assignment




  Zubair Nabi                     9: MR+                         April 19, 2013   19 / 26
Reduce Locality




          MR+ does not rely on key/values for input assignment
          Reduce inputs are assigned on the basis of locality




  Zubair Nabi                      9: MR+                        April 19, 2013   19 / 26
Reduce Locality




          MR+ does not rely on key/values for input assignment
          Reduce inputs are assigned on the basis of locality
                1 Node-local
                2 Rack-local
                3 Any




  Zubair Nabi                      9: MR+                        April 19, 2013   19 / 26
Fault Tolerance


          Deterministic input assignment simplifies failure recovery in
          MapReduce




  Zubair Nabi                      9: MR+                           April 19, 2013   20 / 26
Fault Tolerance


          Deterministic input assignment simplifies failure recovery in
          MapReduce
          In case of MR+, if a map task or a level-1 reduce fails, it is simply
          re-executed




  Zubair Nabi                        9: MR+                            April 19, 2013   20 / 26
Fault Tolerance


          Deterministic input assignment simplifies failure recovery in
          MapReduce
          In case of MR+, if a map task or a level-1 reduce fails, it is simply
          re-executed
          For level > 1 reduce tasks, MR+ implements three strategies, which
          expose the trade-off between computation and storage




  Zubair Nabi                        9: MR+                            April 19, 2013   20 / 26
Fault Tolerance


          Deterministic input assignment simplifies failure recovery in
          MapReduce
          In case of MR+, if a map task or a level-1 reduce fails, it is simply
          re-executed
          For level > 1 reduce tasks, MR+ implements three strategies, which
          expose the trade-off between computation and storage
                1   Chain re-execution: The entire chain is re-executed




  Zubair Nabi                             9: MR+                          April 19, 2013   20 / 26
Fault Tolerance


          Deterministic input assignment simplifies failure recovery in
          MapReduce
          In case of MR+, if a map task or a level-1 reduce fails, it is simply
          re-executed
          For level > 1 reduce tasks, MR+ implements three strategies, which
          expose the trade-off between computation and storage
                1   Chain re-execution: The entire chain is re-executed
                2   Local replication: The output of each reduce is replicated on the local
                    file system of a rack-local neighbour




  Zubair Nabi                              9: MR+                              April 19, 2013   20 / 26
Fault Tolerance


          Deterministic input assignment simplifies failure recovery in
          MapReduce
          In case of MR+, if a map task or a level-1 reduce fails, it is simply
          re-executed
          For level > 1 reduce tasks, MR+ implements three strategies, which
          expose the trade-off between computation and storage
                1 Chain re-execution: The entire chain is re-executed
                2 Local replication: The output of each reduce is replicated on the local
                  file system of a rack-local neighbour
                3 Distributed replication: The output of each reduce is replicated on the
                  distributed file system




  Zubair Nabi                            9: MR+                              April 19, 2013   20 / 26
Input Prioritization



          User-defined map and reduce functions are applied to a
          sample_percentage amount of input, taken at random




  Zubair Nabi                   9: MR+                       April 19, 2013   21 / 26
Input Prioritization



          User-defined map and reduce functions are applied to a
          sample_percentage amount of input, taken at random
          This sampling cycle yields a representative distribution of data




  Zubair Nabi                       9: MR+                           April 19, 2013   21 / 26
Input Prioritization



          User-defined map and reduce functions are applied to a
          sample_percentage amount of input, taken at random
          This sampling cycle yields a representative distribution of data
          Used to exploit structure: data with semantic grouping or clusters of
          relevant information




  Zubair Nabi                       9: MR+                           April 19, 2013   21 / 26
Input Prioritization



          User-defined map and reduce functions are applied to a
          sample_percentage amount of input, taken at random
          This sampling cycle yields a representative distribution of data
          Used to exploit structure: data with semantic grouping or clusters of
          relevant information
          The distribution is used to generate a priority queue to assign to map
          tasks




  Zubair Nabi                       9: MR+                           April 19, 2013   21 / 26
Input Prioritization



          User-defined map and reduce functions are applied to a
          sample_percentage amount of input, taken at random
          This sampling cycle yields a representative distribution of data
          Used to exploit structure: data with semantic grouping or clusters of
          relevant information
          The distribution is used to generate a priority queue to assign to map
          tasks
          A full-fledged MR+ job is then run, in which map tasks read input from
          the priority queue




  Zubair Nabi                       9: MR+                           April 19, 2013   21 / 26
Input Prioritization (2)




          Due to this prioritization, relevant clusters of information are processed
          first




  Zubair Nabi                       9: MR+                            April 19, 2013   22 / 26
Input Prioritization (2)




          Due to this prioritization, relevant clusters of information are processed
          first
          As a result, the computation can be stopped mid-way if a threshold
          condition is satisfied




  Zubair Nabi                       9: MR+                            April 19, 2013   22 / 26
Outline



  1    Introduction


  2    MR+


  3    Implementation


  4    Code-base




  Zubair Nabi           9: MR+   April 19, 2013   23 / 26
Code-base




         Around 15,000 lines of Python code




 Zubair Nabi                     9: MR+       April 19, 2013   24 / 26
Code-base




         Around 15,000 lines of Python code
         Code implements both vanilla MapReduce and MR+




 Zubair Nabi                     9: MR+                   April 19, 2013   24 / 26
Code-base




         Around 15,000 lines of Python code
         Code implements both vanilla MapReduce and MR+
         Written over the course of roughly 5 years at LUMS




 Zubair Nabi                     9: MR+                       April 19, 2013   24 / 26
Code-base




         Around 15,000 lines of Python code
         Code implements both vanilla MapReduce and MR+
         Written over the course of roughly 5 years at LUMS
         Publicly available at: https://code.google.com/p/mrplus/
         source/browse/?name=BRANCH_VER_0_0_0_4_PY2x




 Zubair Nabi                     9: MR+                       April 19, 2013   24 / 26
Storage




          Abstracts away the underlying storage system




  Zubair Nabi                     9: MR+                 April 19, 2013   25 / 26
Storage




          Abstracts away the underlying storage system
          Currently supports the HDFS and Amazon’s S3




  Zubair Nabi                     9: MR+                 April 19, 2013   25 / 26
Storage




          Abstracts away the underlying storage system
          Currently supports the HDFS and Amazon’s S3
          Also supports the local OS file system (for unit testing)




  Zubair Nabi                       9: MR+                           April 19, 2013   25 / 26
Structure




          Modular structure so most of the code is re-used across MapReduce
          and MR+




  Zubair Nabi                     9: MR+                        April 19, 2013   26 / 26
Structure




          Modular structure so most of the code is re-used across MapReduce
          and MR+
          Google Protobufs and JSON used for serialization




  Zubair Nabi                     9: MR+                        April 19, 2013   26 / 26
Structure




          Modular structure so most of the code is re-used across MapReduce
          and MR+
          Google Protobufs and JSON used for serialization
          All configuration options within two files: siteconf.xml (site-wide)
          and jobconf.xml (job-specific)




  Zubair Nabi                     9: MR+                        April 19, 2013   26 / 26

Topic 9: MR+

  • 1.
    9: MR+ Zubair Nabi zubair.nabi@itu.edu.pk April 19, 2013 Zubair Nabi 9: MR+ April 19, 2013 1 / 26
  • 2.
    Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 2 / 26
  • 3.
    Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 3 / 26
  • 4.
    Implicit MapReduce Assumptions The input data has no structure Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  • 5.
    Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  • 6.
    Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Results materialize when all the map and reduce tasks complete Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  • 7.
    Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Results materialize when all the map and reduce tasks complete The number of values of each key is small enough to be processed by a single reduce task Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  • 8.
    Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Results materialize when all the map and reduce tasks complete The number of values of each key is small enough to be processed by a single reduce task Processing the data at the reduce stage in most cases is usually a simple aggregation function Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  • 9.
    Zipf distributions areeverywhere Zubair Nabi 9: MR+ April 19, 2013 5 / 26
  • 10.
    Reduce-intensive applications Image and speech correlation Backpropagation in neural networks Co-clustering Tree learning Computation of node diameter and radii in Tera-scale graphs ... Zubair Nabi 9: MR+ April 19, 2013 6 / 26
  • 11.
    Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 7 / 26
  • 12.
    Design Goals Negate skew in intermediate data Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  • 13.
    Design Goals Negate skew in intermediate data Exploit structure in input data Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  • 14.
    Design Goals Negate skew in intermediate data Exploit structure in input data Estimate results Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  • 15.
    Design Goals Negate skew in intermediate data Exploit structure in input data Estimate results Favour commodity clusters Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  • 16.
    Design Goals Negate skew in intermediate data Exploit structure in input data Estimate results Favour commodity clusters Maintain original functional model of MapReduce Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  • 17.
    Design Maintains the simple MapReduce programming model Zubair Nabi 9: MR+ April 19, 2013 9 / 26
  • 18.
    Design Maintains the simple MapReduce programming model Instead of implementing MapReduce as a sequential two-staged architecture, MR+ allows map and reduce stages to interleave and iterate over intermediate results Zubair Nabi 9: MR+ April 19, 2013 9 / 26
  • 19.
    Design Maintains the simple MapReduce programming model Instead of implementing MapReduce as a sequential two-staged architecture, MR+ allows map and reduce stages to interleave and iterate over intermediate results Leading to a multi-level inverted tree of reduce workers Zubair Nabi 9: MR+ April 19, 2013 9 / 26
  • 20.
    Architecture Map Phase Reduce Phase 5% -10% Estimation cycle prioritizes data Map Reduce MR Brick-wall MR End MR+ Start Brick-wall MR+ End (a) MapReduce (b) MR+ Figure: Architectural comparison of MapReduce and MR+. Zubair Nabi 9: MR+ April 19, 2013 10 / 26
  • 21.
    Architectural Flexibility 1 Instead of waiting for all maps to finish before scheduling a reduce task, MR+ permits a model where a reduce task can be scheduled for every n invocations of the map function Zubair Nabi 9: MR+ April 19, 2013 11 / 26
  • 22.
    Architectural Flexibility 1 Instead of waiting for all maps to finish before scheduling a reduce task, MR+ permits a model where a reduce task can be scheduled for every n invocations of the map function 2 A densely populated key can be recursively reduced by repeated invocation of the reduce function at multiple reduce workers Zubair Nabi 9: MR+ April 19, 2013 11 / 26
  • 23.
    Advantages Resilient to TCP Incast by amortizing data copying over the course of the job Zubair Nabi 9: MR+ April 19, 2013 12 / 26
  • 24.
    Advantages Resilient to TCP Incast by amortizing data copying over the course of the job Early materialization of partial results for queries with thresholds or confidence intervals Zubair Nabi 9: MR+ April 19, 2013 12 / 26
  • 25.
    Advantages Resilient to TCP Incast by amortizing data copying over the course of the job Early materialization of partial results for queries with thresholds or confidence intervals Finds structure in the data by running a sample cycle to learn the distribution of information and prioritizes input data with respect to the user query Zubair Nabi 9: MR+ April 19, 2013 12 / 26
  • 26.
    Programming Model Retains the 2-stage MapReduce API Zubair Nabi 9: MR+ April 19, 2013 13 / 26
  • 27.
    Programming Model Retains the 2-stage MapReduce API MR+ reducers can be likened to distributed combiners Zubair Nabi 9: MR+ April 19, 2013 13 / 26
  • 28.
    Programming Model Retains the 2-stage MapReduce API MR+ reducers can be likened to distributed combiners Repeated invocation of the reducer by default rules out non-associative functions Zubair Nabi 9: MR+ April 19, 2013 13 / 26
  • 29.
    Programming Model Retains the 2-stage MapReduce API MR+ reducers can be likened to distributed combiners Repeated invocation of the reducer by default rules out non-associative functions But reducers can be designed in such a way that the associative operation is applied only at the very last reduce Zubair Nabi 9: MR+ April 19, 2013 13 / 26
  • 30.
    Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 14 / 26
  • 31.
    Scheduling Tasks are scheduled according to a configurable map_to_reduce_schedule_ratio parameter Zubair Nabi 9: MR+ April 19, 2013 15 / 26
  • 32.
    Scheduling Tasks are scheduled according to a configurable map_to_reduce_schedule_ratio parameter For every map_to_reduce_schedule_ratio map tasks, 1 reduce task is scheduled Zubair Nabi 9: MR+ April 19, 2013 15 / 26
  • 33.
    Scheduling Tasks are scheduled according to a configurable map_to_reduce_schedule_ratio parameter For every map_to_reduce_schedule_ratio map tasks, 1 reduce task is scheduled For instance, if map_to_reduce_schedule_ratio is 4, then the first reduce task is scheduled when 4 map tasks complete Zubair Nabi 9: MR+ April 19, 2013 15 / 26
  • 34.
    Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps Zubair Nabi 9: MR+ April 19, 2013 16 / 26
  • 35.
    Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps The location of their inputs is communicated by the JobTracker Zubair Nabi 9: MR+ April 19, 2013 16 / 26
  • 36.
    Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps The location of their inputs is communicated by the JobTracker Each reduce task pulls its input via HTTP Zubair Nabi 9: MR+ April 19, 2013 16 / 26
  • 37.
    Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps The location of their inputs is communicated by the JobTracker Each reduce task pulls its input via HTTP After the reduce logic has been applied to all keys, the output is earmarked for L > 1 reducers Zubair Nabi 9: MR+ April 19, 2013 16 / 26
  • 38.
    Level > 1reducers Assigned the input of reduce_input_ratio number of reduce tasks Zubair Nabi 9: MR+ April 19, 2013 17 / 26
  • 39.
    Level > 1reducers Assigned the input of reduce_input_ratio number of reduce tasks Eventually all key/value pairs make their way to the final level, which has a single worker Zubair Nabi 9: MR+ April 19, 2013 17 / 26
  • 40.
    Level > 1reducers Assigned the input of reduce_input_ratio number of reduce tasks Eventually all key/value pairs make their way to the final level, which has a single worker This final reduce can also be used to apply any non-associative operation Zubair Nabi 9: MR+ April 19, 2013 17 / 26
  • 41.
    Structural comparison k1, v1,v2,... Map1 k2, v1,v2,... k1, v1,v2,... ... k1, v1,v2,... k2, v1,v2,... Reduce1,1 kn, v1,v2,... k2, v1,v2,... ... ... Reduce1,2 k1, v1,v2,... kn, v1,v2,... kn, v1,v2,... Reduce1 Map2 ... Reduce2,1 k1, v1,v2,... k1, v1,v2,... k2, v1,v2,... k2, v1,v2,... Map1 k2, v1,v2,... ... ... k1, v1,v2,... . Reduce3,1 k2, v1,v2,... Reduce2 . . kn, v1,v2,... kn, v1,v2,... . Reduce2,2 ... Reduce1,φ ... k3, v1,v2,... . ... Reduce4,1 . kn, v1,v2,... Reduce3 . . k1, v1,v2,... Map2 . k2, v1,v2,... k4, v1,v2,... . ... α = ω/mr Reduce4 Mapω-1 ... Reduceα-1,1 kn, v1,v2,... k1, v1,v2,... . . Shuffler k1, v1,v2,... β = α/rr k2, v1,v2,... . . Reduceβ,2 . . . k2, v1,v2,... ϒ = β/rr ... . ... Reduceα,1 . kn, v1,v2,... kn, v1,v2,... kn, v1,v2,... . . Mapω Brick-wall Reduceθ Mapω 1 (a) MapReduce (b) MR+ Figure: Structural comparison of MapReduce and MR+. Zubair Nabi 9: MR+ April 19, 2013 18 / 26
  • 42.
    Reduce Locality MR+ does not rely on key/values for input assignment Zubair Nabi 9: MR+ April 19, 2013 19 / 26
  • 43.
    Reduce Locality MR+ does not rely on key/values for input assignment Reduce inputs are assigned on the basis of locality Zubair Nabi 9: MR+ April 19, 2013 19 / 26
  • 44.
    Reduce Locality MR+ does not rely on key/values for input assignment Reduce inputs are assigned on the basis of locality 1 Node-local 2 Rack-local 3 Any Zubair Nabi 9: MR+ April 19, 2013 19 / 26
  • 45.
    Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  • 46.
    Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  • 47.
    Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  • 48.
    Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage 1 Chain re-execution: The entire chain is re-executed Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  • 49.
    Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage 1 Chain re-execution: The entire chain is re-executed 2 Local replication: The output of each reduce is replicated on the local file system of a rack-local neighbour Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  • 50.
    Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage 1 Chain re-execution: The entire chain is re-executed 2 Local replication: The output of each reduce is replicated on the local file system of a rack-local neighbour 3 Distributed replication: The output of each reduce is replicated on the distributed file system Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  • 51.
    Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  • 52.
    Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  • 53.
    Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Used to exploit structure: data with semantic grouping or clusters of relevant information Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  • 54.
    Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Used to exploit structure: data with semantic grouping or clusters of relevant information The distribution is used to generate a priority queue to assign to map tasks Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  • 55.
    Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Used to exploit structure: data with semantic grouping or clusters of relevant information The distribution is used to generate a priority queue to assign to map tasks A full-fledged MR+ job is then run, in which map tasks read input from the priority queue Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  • 56.
    Input Prioritization (2) Due to this prioritization, relevant clusters of information are processed first Zubair Nabi 9: MR+ April 19, 2013 22 / 26
  • 57.
    Input Prioritization (2) Due to this prioritization, relevant clusters of information are processed first As a result, the computation can be stopped mid-way if a threshold condition is satisfied Zubair Nabi 9: MR+ April 19, 2013 22 / 26
  • 58.
    Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 23 / 26
  • 59.
    Code-base Around 15,000 lines of Python code Zubair Nabi 9: MR+ April 19, 2013 24 / 26
  • 60.
    Code-base Around 15,000 lines of Python code Code implements both vanilla MapReduce and MR+ Zubair Nabi 9: MR+ April 19, 2013 24 / 26
  • 61.
    Code-base Around 15,000 lines of Python code Code implements both vanilla MapReduce and MR+ Written over the course of roughly 5 years at LUMS Zubair Nabi 9: MR+ April 19, 2013 24 / 26
  • 62.
    Code-base Around 15,000 lines of Python code Code implements both vanilla MapReduce and MR+ Written over the course of roughly 5 years at LUMS Publicly available at: https://code.google.com/p/mrplus/ source/browse/?name=BRANCH_VER_0_0_0_4_PY2x Zubair Nabi 9: MR+ April 19, 2013 24 / 26
  • 63.
    Storage Abstracts away the underlying storage system Zubair Nabi 9: MR+ April 19, 2013 25 / 26
  • 64.
    Storage Abstracts away the underlying storage system Currently supports the HDFS and Amazon’s S3 Zubair Nabi 9: MR+ April 19, 2013 25 / 26
  • 65.
    Storage Abstracts away the underlying storage system Currently supports the HDFS and Amazon’s S3 Also supports the local OS file system (for unit testing) Zubair Nabi 9: MR+ April 19, 2013 25 / 26
  • 66.
    Structure Modular structure so most of the code is re-used across MapReduce and MR+ Zubair Nabi 9: MR+ April 19, 2013 26 / 26
  • 67.
    Structure Modular structure so most of the code is re-used across MapReduce and MR+ Google Protobufs and JSON used for serialization Zubair Nabi 9: MR+ April 19, 2013 26 / 26
  • 68.
    Structure Modular structure so most of the code is re-used across MapReduce and MR+ Google Protobufs and JSON used for serialization All configuration options within two files: siteconf.xml (site-wide) and jobconf.xml (job-specific) Zubair Nabi 9: MR+ April 19, 2013 26 / 26