0
9: MR+                    Zubair Nabi              zubair.nabi@itu.edu.pk                  April 19, 2013Zubair Nabi      ...
Outline  1    Introduction  2    MR+  3    Implementation  4    Code-base  Zubair Nabi           9: MR+   April 19, 2013  ...
Outline  1    Introduction  2    MR+  3    Implementation  4    Code-base  Zubair Nabi           9: MR+   April 19, 2013  ...
Implicit MapReduce Assumptions          The input data has no structure  Zubair Nabi                      9: MR+   April 1...
Implicit MapReduce Assumptions          The input data has no structure          The distribution of intermediate data is ...
Implicit MapReduce Assumptions          The input data has no structure          The distribution of intermediate data is ...
Implicit MapReduce Assumptions          The input data has no structure          The distribution of intermediate data is ...
Implicit MapReduce Assumptions          The input data has no structure          The distribution of intermediate data is ...
Zipf distributions are everywhere  Zubair Nabi            9: MR+     April 19, 2013   5 / 26
Reduce-intensive applications          Image and speech correlation          Backpropagation in neural networks          C...
Outline  1    Introduction  2    MR+  3    Implementation  4    Code-base  Zubair Nabi           9: MR+   April 19, 2013  ...
Design Goals          Negate skew in intermediate data  Zubair Nabi                     9: MR+     April 19, 2013   8 / 26
Design Goals          Negate skew in intermediate data          Exploit structure in input data  Zubair Nabi              ...
Design Goals          Negate skew in intermediate data          Exploit structure in input data          Estimate results ...
Design Goals          Negate skew in intermediate data          Exploit structure in input data          Estimate results ...
Design Goals          Negate skew in intermediate data          Exploit structure in input data          Estimate results ...
Design          Maintains the simple MapReduce programming model  Zubair Nabi                   9: MR+                    ...
Design          Maintains the simple MapReduce programming model          Instead of implementing MapReduce as a sequentia...
Design          Maintains the simple MapReduce programming model          Instead of implementing MapReduce as a sequentia...
Architecture                 Map Phase             Reduce Phase   5% -10%                                                 ...
Architectural Flexibility     1    Instead of waiting for all maps to finish before scheduling a reduce          task, MR+ ...
Architectural Flexibility     1    Instead of waiting for all maps to finish before scheduling a reduce          task, MR+ ...
Advantages          Resilient to TCP Incast by amortizing data copying over the course of          the job  Zubair Nabi   ...
Advantages          Resilient to TCP Incast by amortizing data copying over the course of          the job          Early ...
Advantages          Resilient to TCP Incast by amortizing data copying over the course of          the job          Early ...
Programming Model          Retains the 2-stage MapReduce API  Zubair Nabi                   9: MR+        April 19, 2013  ...
Programming Model          Retains the 2-stage MapReduce API          MR+ reducers can be likened to distributed combiners...
Programming Model          Retains the 2-stage MapReduce API          MR+ reducers can be likened to distributed combiners...
Programming Model          Retains the 2-stage MapReduce API          MR+ reducers can be likened to distributed combiners...
Outline  1    Introduction  2    MR+  3    Implementation  4    Code-base  Zubair Nabi           9: MR+   April 19, 2013  ...
Scheduling          Tasks are scheduled according to a configurable          map_to_reduce_schedule_ratio parameter  Zubair...
Scheduling          Tasks are scheduled according to a configurable          map_to_reduce_schedule_ratio parameter        ...
Scheduling          Tasks are scheduled according to a configurable          map_to_reduce_schedule_ratio parameter        ...
Level-1 reducers          Each reduce is assigned the output of map_to_reduce_ratio          number of maps  Zubair Nabi  ...
Level-1 reducers          Each reduce is assigned the output of map_to_reduce_ratio          number of maps          The l...
Level-1 reducers          Each reduce is assigned the output of map_to_reduce_ratio          number of maps          The l...
Level-1 reducers          Each reduce is assigned the output of map_to_reduce_ratio          number of maps          The l...
Level > 1 reducers          Assigned the input of reduce_input_ratio number of reduce          tasks  Zubair Nabi         ...
Level > 1 reducers          Assigned the input of reduce_input_ratio number of reduce          tasks          Eventually a...
Level > 1 reducers          Assigned the input of reduce_input_ratio number of reduce          tasks          Eventually a...
Structural comparison                                                                                               k1, v1...
Reduce Locality          MR+ does not rely on key/values for input assignment  Zubair Nabi                     9: MR+     ...
Reduce Locality          MR+ does not rely on key/values for input assignment          Reduce inputs are assigned on the b...
Reduce Locality          MR+ does not rely on key/values for input assignment          Reduce inputs are assigned on the b...
Fault Tolerance          Deterministic input assignment simplifies failure recovery in          MapReduce  Zubair Nabi     ...
Fault Tolerance          Deterministic input assignment simplifies failure recovery in          MapReduce          In case ...
Fault Tolerance          Deterministic input assignment simplifies failure recovery in          MapReduce          In case ...
Fault Tolerance          Deterministic input assignment simplifies failure recovery in          MapReduce          In case ...
Fault Tolerance          Deterministic input assignment simplifies failure recovery in          MapReduce          In case ...
Fault Tolerance          Deterministic input assignment simplifies failure recovery in          MapReduce          In case ...
Input Prioritization          User-defined map and reduce functions are applied to a          sample_percentage amount of i...
Input Prioritization          User-defined map and reduce functions are applied to a          sample_percentage amount of i...
Input Prioritization          User-defined map and reduce functions are applied to a          sample_percentage amount of i...
Input Prioritization          User-defined map and reduce functions are applied to a          sample_percentage amount of i...
Input Prioritization          User-defined map and reduce functions are applied to a          sample_percentage amount of i...
Input Prioritization (2)          Due to this prioritization, relevant clusters of information are processed          first...
Input Prioritization (2)          Due to this prioritization, relevant clusters of information are processed          first...
Outline  1    Introduction  2    MR+  3    Implementation  4    Code-base  Zubair Nabi           9: MR+   April 19, 2013  ...
Code-base         Around 15,000 lines of Python code Zubair Nabi                     9: MR+       April 19, 2013   24 / 26
Code-base         Around 15,000 lines of Python code         Code implements both vanilla MapReduce and MR+ Zubair Nabi   ...
Code-base         Around 15,000 lines of Python code         Code implements both vanilla MapReduce and MR+         Writte...
Code-base         Around 15,000 lines of Python code         Code implements both vanilla MapReduce and MR+         Writte...
Storage          Abstracts away the underlying storage system  Zubair Nabi                     9: MR+                 Apri...
Storage          Abstracts away the underlying storage system          Currently supports the HDFS and Amazon’s S3  Zubair...
Storage          Abstracts away the underlying storage system          Currently supports the HDFS and Amazon’s S3        ...
Structure          Modular structure so most of the code is re-used across MapReduce          and MR+  Zubair Nabi        ...
Structure          Modular structure so most of the code is re-used across MapReduce          and MR+          Google Prot...
Structure          Modular structure so most of the code is re-used across MapReduce          and MR+          Google Prot...
Upcoming SlideShare
Loading in...5
×

Topic 9: MR+

260

Published on

Cloud Computing Workshop 2013, ITU

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
260
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Topic 9: MR+"

  1. 1. 9: MR+ Zubair Nabi zubair.nabi@itu.edu.pk April 19, 2013Zubair Nabi 9: MR+ April 19, 2013 1 / 26
  2. 2. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 2 / 26
  3. 3. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 3 / 26
  4. 4. Implicit MapReduce Assumptions The input data has no structure Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  5. 5. Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  6. 6. Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Results materialize when all the map and reduce tasks complete Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  7. 7. Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Results materialize when all the map and reduce tasks complete The number of values of each key is small enough to be processed by a single reduce task Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  8. 8. Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Results materialize when all the map and reduce tasks complete The number of values of each key is small enough to be processed by a single reduce task Processing the data at the reduce stage in most cases is usually a simple aggregation function Zubair Nabi 9: MR+ April 19, 2013 4 / 26
  9. 9. Zipf distributions are everywhere Zubair Nabi 9: MR+ April 19, 2013 5 / 26
  10. 10. Reduce-intensive applications Image and speech correlation Backpropagation in neural networks Co-clustering Tree learning Computation of node diameter and radii in Tera-scale graphs ... Zubair Nabi 9: MR+ April 19, 2013 6 / 26
  11. 11. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 7 / 26
  12. 12. Design Goals Negate skew in intermediate data Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  13. 13. Design Goals Negate skew in intermediate data Exploit structure in input data Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  14. 14. Design Goals Negate skew in intermediate data Exploit structure in input data Estimate results Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  15. 15. Design Goals Negate skew in intermediate data Exploit structure in input data Estimate results Favour commodity clusters Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  16. 16. Design Goals Negate skew in intermediate data Exploit structure in input data Estimate results Favour commodity clusters Maintain original functional model of MapReduce Zubair Nabi 9: MR+ April 19, 2013 8 / 26
  17. 17. Design Maintains the simple MapReduce programming model Zubair Nabi 9: MR+ April 19, 2013 9 / 26
  18. 18. Design Maintains the simple MapReduce programming model Instead of implementing MapReduce as a sequential two-staged architecture, MR+ allows map and reduce stages to interleave and iterate over intermediate results Zubair Nabi 9: MR+ April 19, 2013 9 / 26
  19. 19. Design Maintains the simple MapReduce programming model Instead of implementing MapReduce as a sequential two-staged architecture, MR+ allows map and reduce stages to interleave and iterate over intermediate results Leading to a multi-level inverted tree of reduce workers Zubair Nabi 9: MR+ April 19, 2013 9 / 26
  20. 20. Architecture Map Phase Reduce Phase 5% -10% Estimation cycle prioritizes data Map Reduce MR Brick-wall MR End MR+ Start Brick-wall MR+ End (a) MapReduce (b) MR+ Figure: Architectural comparison of MapReduce and MR+. Zubair Nabi 9: MR+ April 19, 2013 10 / 26
  21. 21. Architectural Flexibility 1 Instead of waiting for all maps to finish before scheduling a reduce task, MR+ permits a model where a reduce task can be scheduled for every n invocations of the map function Zubair Nabi 9: MR+ April 19, 2013 11 / 26
  22. 22. Architectural Flexibility 1 Instead of waiting for all maps to finish before scheduling a reduce task, MR+ permits a model where a reduce task can be scheduled for every n invocations of the map function 2 A densely populated key can be recursively reduced by repeated invocation of the reduce function at multiple reduce workers Zubair Nabi 9: MR+ April 19, 2013 11 / 26
  23. 23. Advantages Resilient to TCP Incast by amortizing data copying over the course of the job Zubair Nabi 9: MR+ April 19, 2013 12 / 26
  24. 24. Advantages Resilient to TCP Incast by amortizing data copying over the course of the job Early materialization of partial results for queries with thresholds or confidence intervals Zubair Nabi 9: MR+ April 19, 2013 12 / 26
  25. 25. Advantages Resilient to TCP Incast by amortizing data copying over the course of the job Early materialization of partial results for queries with thresholds or confidence intervals Finds structure in the data by running a sample cycle to learn the distribution of information and prioritizes input data with respect to the user query Zubair Nabi 9: MR+ April 19, 2013 12 / 26
  26. 26. Programming Model Retains the 2-stage MapReduce API Zubair Nabi 9: MR+ April 19, 2013 13 / 26
  27. 27. Programming Model Retains the 2-stage MapReduce API MR+ reducers can be likened to distributed combiners Zubair Nabi 9: MR+ April 19, 2013 13 / 26
  28. 28. Programming Model Retains the 2-stage MapReduce API MR+ reducers can be likened to distributed combiners Repeated invocation of the reducer by default rules out non-associative functions Zubair Nabi 9: MR+ April 19, 2013 13 / 26
  29. 29. Programming Model Retains the 2-stage MapReduce API MR+ reducers can be likened to distributed combiners Repeated invocation of the reducer by default rules out non-associative functions But reducers can be designed in such a way that the associative operation is applied only at the very last reduce Zubair Nabi 9: MR+ April 19, 2013 13 / 26
  30. 30. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 14 / 26
  31. 31. Scheduling Tasks are scheduled according to a configurable map_to_reduce_schedule_ratio parameter Zubair Nabi 9: MR+ April 19, 2013 15 / 26
  32. 32. Scheduling Tasks are scheduled according to a configurable map_to_reduce_schedule_ratio parameter For every map_to_reduce_schedule_ratio map tasks, 1 reduce task is scheduled Zubair Nabi 9: MR+ April 19, 2013 15 / 26
  33. 33. Scheduling Tasks are scheduled according to a configurable map_to_reduce_schedule_ratio parameter For every map_to_reduce_schedule_ratio map tasks, 1 reduce task is scheduled For instance, if map_to_reduce_schedule_ratio is 4, then the first reduce task is scheduled when 4 map tasks complete Zubair Nabi 9: MR+ April 19, 2013 15 / 26
  34. 34. Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps Zubair Nabi 9: MR+ April 19, 2013 16 / 26
  35. 35. Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps The location of their inputs is communicated by the JobTracker Zubair Nabi 9: MR+ April 19, 2013 16 / 26
  36. 36. Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps The location of their inputs is communicated by the JobTracker Each reduce task pulls its input via HTTP Zubair Nabi 9: MR+ April 19, 2013 16 / 26
  37. 37. Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps The location of their inputs is communicated by the JobTracker Each reduce task pulls its input via HTTP After the reduce logic has been applied to all keys, the output is earmarked for L > 1 reducers Zubair Nabi 9: MR+ April 19, 2013 16 / 26
  38. 38. Level > 1 reducers Assigned the input of reduce_input_ratio number of reduce tasks Zubair Nabi 9: MR+ April 19, 2013 17 / 26
  39. 39. Level > 1 reducers Assigned the input of reduce_input_ratio number of reduce tasks Eventually all key/value pairs make their way to the final level, which has a single worker Zubair Nabi 9: MR+ April 19, 2013 17 / 26
  40. 40. Level > 1 reducers Assigned the input of reduce_input_ratio number of reduce tasks Eventually all key/value pairs make their way to the final level, which has a single worker This final reduce can also be used to apply any non-associative operation Zubair Nabi 9: MR+ April 19, 2013 17 / 26
  41. 41. Structural comparison k1, v1,v2,... Map1 k2, v1,v2,... k1, v1,v2,... ... k1, v1,v2,... k2, v1,v2,... Reduce1,1 kn, v1,v2,... k2, v1,v2,... ... ... Reduce1,2 k1, v1,v2,... kn, v1,v2,... kn, v1,v2,... Reduce1 Map2 ... Reduce2,1 k1, v1,v2,... k1, v1,v2,... k2, v1,v2,... k2, v1,v2,... Map1 k2, v1,v2,... ... ... k1, v1,v2,... . Reduce3,1 k2, v1,v2,... Reduce2 . . kn, v1,v2,... kn, v1,v2,... . Reduce2,2 ... Reduce1,φ ... k3, v1,v2,... . ... Reduce4,1 . kn, v1,v2,... Reduce3 . . k1, v1,v2,... Map2 . k2, v1,v2,... k4, v1,v2,... . ... α = ω/mr Reduce4 Mapω-1 ... Reduceα-1,1 kn, v1,v2,... k1, v1,v2,... . . Shuffler k1, v1,v2,... β = α/rr k2, v1,v2,... . . Reduceβ,2 . . . k2, v1,v2,... ϒ = β/rr ... . ... Reduceα,1 . kn, v1,v2,... kn, v1,v2,... kn, v1,v2,... . . Mapω Brick-wall Reduceθ Mapω 1 (a) MapReduce (b) MR+ Figure: Structural comparison of MapReduce and MR+. Zubair Nabi 9: MR+ April 19, 2013 18 / 26
  42. 42. Reduce Locality MR+ does not rely on key/values for input assignment Zubair Nabi 9: MR+ April 19, 2013 19 / 26
  43. 43. Reduce Locality MR+ does not rely on key/values for input assignment Reduce inputs are assigned on the basis of locality Zubair Nabi 9: MR+ April 19, 2013 19 / 26
  44. 44. Reduce Locality MR+ does not rely on key/values for input assignment Reduce inputs are assigned on the basis of locality 1 Node-local 2 Rack-local 3 Any Zubair Nabi 9: MR+ April 19, 2013 19 / 26
  45. 45. Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  46. 46. Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  47. 47. Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  48. 48. Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage 1 Chain re-execution: The entire chain is re-executed Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  49. 49. Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage 1 Chain re-execution: The entire chain is re-executed 2 Local replication: The output of each reduce is replicated on the local file system of a rack-local neighbour Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  50. 50. Fault Tolerance Deterministic input assignment simplifies failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage 1 Chain re-execution: The entire chain is re-executed 2 Local replication: The output of each reduce is replicated on the local file system of a rack-local neighbour 3 Distributed replication: The output of each reduce is replicated on the distributed file system Zubair Nabi 9: MR+ April 19, 2013 20 / 26
  51. 51. Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  52. 52. Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  53. 53. Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Used to exploit structure: data with semantic grouping or clusters of relevant information Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  54. 54. Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Used to exploit structure: data with semantic grouping or clusters of relevant information The distribution is used to generate a priority queue to assign to map tasks Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  55. 55. Input Prioritization User-defined map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Used to exploit structure: data with semantic grouping or clusters of relevant information The distribution is used to generate a priority queue to assign to map tasks A full-fledged MR+ job is then run, in which map tasks read input from the priority queue Zubair Nabi 9: MR+ April 19, 2013 21 / 26
  56. 56. Input Prioritization (2) Due to this prioritization, relevant clusters of information are processed first Zubair Nabi 9: MR+ April 19, 2013 22 / 26
  57. 57. Input Prioritization (2) Due to this prioritization, relevant clusters of information are processed first As a result, the computation can be stopped mid-way if a threshold condition is satisfied Zubair Nabi 9: MR+ April 19, 2013 22 / 26
  58. 58. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 23 / 26
  59. 59. Code-base Around 15,000 lines of Python code Zubair Nabi 9: MR+ April 19, 2013 24 / 26
  60. 60. Code-base Around 15,000 lines of Python code Code implements both vanilla MapReduce and MR+ Zubair Nabi 9: MR+ April 19, 2013 24 / 26
  61. 61. Code-base Around 15,000 lines of Python code Code implements both vanilla MapReduce and MR+ Written over the course of roughly 5 years at LUMS Zubair Nabi 9: MR+ April 19, 2013 24 / 26
  62. 62. Code-base Around 15,000 lines of Python code Code implements both vanilla MapReduce and MR+ Written over the course of roughly 5 years at LUMS Publicly available at: https://code.google.com/p/mrplus/ source/browse/?name=BRANCH_VER_0_0_0_4_PY2x Zubair Nabi 9: MR+ April 19, 2013 24 / 26
  63. 63. Storage Abstracts away the underlying storage system Zubair Nabi 9: MR+ April 19, 2013 25 / 26
  64. 64. Storage Abstracts away the underlying storage system Currently supports the HDFS and Amazon’s S3 Zubair Nabi 9: MR+ April 19, 2013 25 / 26
  65. 65. Storage Abstracts away the underlying storage system Currently supports the HDFS and Amazon’s S3 Also supports the local OS file system (for unit testing) Zubair Nabi 9: MR+ April 19, 2013 25 / 26
  66. 66. Structure Modular structure so most of the code is re-used across MapReduce and MR+ Zubair Nabi 9: MR+ April 19, 2013 26 / 26
  67. 67. Structure Modular structure so most of the code is re-used across MapReduce and MR+ Google Protobufs and JSON used for serialization Zubair Nabi 9: MR+ April 19, 2013 26 / 26
  68. 68. Structure Modular structure so most of the code is re-used across MapReduce and MR+ Google Protobufs and JSON used for serialization All configuration options within two files: siteconf.xml (site-wide) and jobconf.xml (job-specific) Zubair Nabi 9: MR+ April 19, 2013 26 / 26
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×