Upcoming SlideShare
×

Topic 9: MR+

390 views

Published on

Cloud Computing Workshop 2013, ITU

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Views
Total views
390
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
26
0
Likes
0
Embeds 0
No embeds

No notes for slide

Topic 9: MR+

1. 1. 9: MR+ Zubair Nabi zubair.nabi@itu.edu.pk April 19, 2013Zubair Nabi 9: MR+ April 19, 2013 1 / 26
2. 2. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 2 / 26
3. 3. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 3 / 26
4. 4. Implicit MapReduce Assumptions The input data has no structure Zubair Nabi 9: MR+ April 19, 2013 4 / 26
5. 5. Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Zubair Nabi 9: MR+ April 19, 2013 4 / 26
6. 6. Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Results materialize when all the map and reduce tasks complete Zubair Nabi 9: MR+ April 19, 2013 4 / 26
7. 7. Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Results materialize when all the map and reduce tasks complete The number of values of each key is small enough to be processed by a single reduce task Zubair Nabi 9: MR+ April 19, 2013 4 / 26
8. 8. Implicit MapReduce Assumptions The input data has no structure The distribution of intermediate data is balanced Results materialize when all the map and reduce tasks complete The number of values of each key is small enough to be processed by a single reduce task Processing the data at the reduce stage in most cases is usually a simple aggregation function Zubair Nabi 9: MR+ April 19, 2013 4 / 26
9. 9. Zipf distributions are everywhere Zubair Nabi 9: MR+ April 19, 2013 5 / 26
10. 10. Reduce-intensive applications Image and speech correlation Backpropagation in neural networks Co-clustering Tree learning Computation of node diameter and radii in Tera-scale graphs ... Zubair Nabi 9: MR+ April 19, 2013 6 / 26
11. 11. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 7 / 26
12. 12. Design Goals Negate skew in intermediate data Zubair Nabi 9: MR+ April 19, 2013 8 / 26
13. 13. Design Goals Negate skew in intermediate data Exploit structure in input data Zubair Nabi 9: MR+ April 19, 2013 8 / 26
14. 14. Design Goals Negate skew in intermediate data Exploit structure in input data Estimate results Zubair Nabi 9: MR+ April 19, 2013 8 / 26
15. 15. Design Goals Negate skew in intermediate data Exploit structure in input data Estimate results Favour commodity clusters Zubair Nabi 9: MR+ April 19, 2013 8 / 26
16. 16. Design Goals Negate skew in intermediate data Exploit structure in input data Estimate results Favour commodity clusters Maintain original functional model of MapReduce Zubair Nabi 9: MR+ April 19, 2013 8 / 26
17. 17. Design Maintains the simple MapReduce programming model Zubair Nabi 9: MR+ April 19, 2013 9 / 26
18. 18. Design Maintains the simple MapReduce programming model Instead of implementing MapReduce as a sequential two-staged architecture, MR+ allows map and reduce stages to interleave and iterate over intermediate results Zubair Nabi 9: MR+ April 19, 2013 9 / 26
19. 19. Design Maintains the simple MapReduce programming model Instead of implementing MapReduce as a sequential two-staged architecture, MR+ allows map and reduce stages to interleave and iterate over intermediate results Leading to a multi-level inverted tree of reduce workers Zubair Nabi 9: MR+ April 19, 2013 9 / 26
20. 20. Architecture Map Phase Reduce Phase 5% -10% Estimation cycle prioritizes data Map Reduce MR Brick-wall MR End MR+ Start Brick-wall MR+ End (a) MapReduce (b) MR+ Figure: Architectural comparison of MapReduce and MR+. Zubair Nabi 9: MR+ April 19, 2013 10 / 26
21. 21. Architectural Flexibility 1 Instead of waiting for all maps to ﬁnish before scheduling a reduce task, MR+ permits a model where a reduce task can be scheduled for every n invocations of the map function Zubair Nabi 9: MR+ April 19, 2013 11 / 26
22. 22. Architectural Flexibility 1 Instead of waiting for all maps to ﬁnish before scheduling a reduce task, MR+ permits a model where a reduce task can be scheduled for every n invocations of the map function 2 A densely populated key can be recursively reduced by repeated invocation of the reduce function at multiple reduce workers Zubair Nabi 9: MR+ April 19, 2013 11 / 26
23. 23. Advantages Resilient to TCP Incast by amortizing data copying over the course of the job Zubair Nabi 9: MR+ April 19, 2013 12 / 26
24. 24. Advantages Resilient to TCP Incast by amortizing data copying over the course of the job Early materialization of partial results for queries with thresholds or conﬁdence intervals Zubair Nabi 9: MR+ April 19, 2013 12 / 26
25. 25. Advantages Resilient to TCP Incast by amortizing data copying over the course of the job Early materialization of partial results for queries with thresholds or conﬁdence intervals Finds structure in the data by running a sample cycle to learn the distribution of information and prioritizes input data with respect to the user query Zubair Nabi 9: MR+ April 19, 2013 12 / 26
26. 26. Programming Model Retains the 2-stage MapReduce API Zubair Nabi 9: MR+ April 19, 2013 13 / 26
27. 27. Programming Model Retains the 2-stage MapReduce API MR+ reducers can be likened to distributed combiners Zubair Nabi 9: MR+ April 19, 2013 13 / 26
28. 28. Programming Model Retains the 2-stage MapReduce API MR+ reducers can be likened to distributed combiners Repeated invocation of the reducer by default rules out non-associative functions Zubair Nabi 9: MR+ April 19, 2013 13 / 26
29. 29. Programming Model Retains the 2-stage MapReduce API MR+ reducers can be likened to distributed combiners Repeated invocation of the reducer by default rules out non-associative functions But reducers can be designed in such a way that the associative operation is applied only at the very last reduce Zubair Nabi 9: MR+ April 19, 2013 13 / 26
30. 30. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 14 / 26
31. 31. Scheduling Tasks are scheduled according to a conﬁgurable map_to_reduce_schedule_ratio parameter Zubair Nabi 9: MR+ April 19, 2013 15 / 26
32. 32. Scheduling Tasks are scheduled according to a conﬁgurable map_to_reduce_schedule_ratio parameter For every map_to_reduce_schedule_ratio map tasks, 1 reduce task is scheduled Zubair Nabi 9: MR+ April 19, 2013 15 / 26
33. 33. Scheduling Tasks are scheduled according to a conﬁgurable map_to_reduce_schedule_ratio parameter For every map_to_reduce_schedule_ratio map tasks, 1 reduce task is scheduled For instance, if map_to_reduce_schedule_ratio is 4, then the ﬁrst reduce task is scheduled when 4 map tasks complete Zubair Nabi 9: MR+ April 19, 2013 15 / 26
34. 34. Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps Zubair Nabi 9: MR+ April 19, 2013 16 / 26
35. 35. Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps The location of their inputs is communicated by the JobTracker Zubair Nabi 9: MR+ April 19, 2013 16 / 26
36. 36. Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps The location of their inputs is communicated by the JobTracker Each reduce task pulls its input via HTTP Zubair Nabi 9: MR+ April 19, 2013 16 / 26
37. 37. Level-1 reducers Each reduce is assigned the output of map_to_reduce_ratio number of maps The location of their inputs is communicated by the JobTracker Each reduce task pulls its input via HTTP After the reduce logic has been applied to all keys, the output is earmarked for L > 1 reducers Zubair Nabi 9: MR+ April 19, 2013 16 / 26
38. 38. Level > 1 reducers Assigned the input of reduce_input_ratio number of reduce tasks Zubair Nabi 9: MR+ April 19, 2013 17 / 26
39. 39. Level > 1 reducers Assigned the input of reduce_input_ratio number of reduce tasks Eventually all key/value pairs make their way to the ﬁnal level, which has a single worker Zubair Nabi 9: MR+ April 19, 2013 17 / 26
40. 40. Level > 1 reducers Assigned the input of reduce_input_ratio number of reduce tasks Eventually all key/value pairs make their way to the ﬁnal level, which has a single worker This ﬁnal reduce can also be used to apply any non-associative operation Zubair Nabi 9: MR+ April 19, 2013 17 / 26
41. 41. Structural comparison k1, v1,v2,... Map1 k2, v1,v2,... k1, v1,v2,... ... k1, v1,v2,... k2, v1,v2,... Reduce1,1 kn, v1,v2,... k2, v1,v2,... ... ... Reduce1,2 k1, v1,v2,... kn, v1,v2,... kn, v1,v2,... Reduce1 Map2 ... Reduce2,1 k1, v1,v2,... k1, v1,v2,... k2, v1,v2,... k2, v1,v2,... Map1 k2, v1,v2,... ... ... k1, v1,v2,... . Reduce3,1 k2, v1,v2,... Reduce2 . . kn, v1,v2,... kn, v1,v2,... . Reduce2,2 ... Reduce1,φ ... k3, v1,v2,... . ... Reduce4,1 . kn, v1,v2,... Reduce3 . . k1, v1,v2,... Map2 . k2, v1,v2,... k4, v1,v2,... . ... α = ω/mr Reduce4 Mapω-1 ... Reduceα-1,1 kn, v1,v2,... k1, v1,v2,... . . Shuffler k1, v1,v2,... β = α/rr k2, v1,v2,... . . Reduceβ,2 . . . k2, v1,v2,... ϒ = β/rr ... . ... Reduceα,1 . kn, v1,v2,... kn, v1,v2,... kn, v1,v2,... . . Mapω Brick-wall Reduceθ Mapω 1 (a) MapReduce (b) MR+ Figure: Structural comparison of MapReduce and MR+. Zubair Nabi 9: MR+ April 19, 2013 18 / 26
42. 42. Reduce Locality MR+ does not rely on key/values for input assignment Zubair Nabi 9: MR+ April 19, 2013 19 / 26
43. 43. Reduce Locality MR+ does not rely on key/values for input assignment Reduce inputs are assigned on the basis of locality Zubair Nabi 9: MR+ April 19, 2013 19 / 26
44. 44. Reduce Locality MR+ does not rely on key/values for input assignment Reduce inputs are assigned on the basis of locality 1 Node-local 2 Rack-local 3 Any Zubair Nabi 9: MR+ April 19, 2013 19 / 26
45. 45. Fault Tolerance Deterministic input assignment simpliﬁes failure recovery in MapReduce Zubair Nabi 9: MR+ April 19, 2013 20 / 26
46. 46. Fault Tolerance Deterministic input assignment simpliﬁes failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed Zubair Nabi 9: MR+ April 19, 2013 20 / 26
47. 47. Fault Tolerance Deterministic input assignment simpliﬁes failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage Zubair Nabi 9: MR+ April 19, 2013 20 / 26
48. 48. Fault Tolerance Deterministic input assignment simpliﬁes failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage 1 Chain re-execution: The entire chain is re-executed Zubair Nabi 9: MR+ April 19, 2013 20 / 26
49. 49. Fault Tolerance Deterministic input assignment simpliﬁes failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage 1 Chain re-execution: The entire chain is re-executed 2 Local replication: The output of each reduce is replicated on the local ﬁle system of a rack-local neighbour Zubair Nabi 9: MR+ April 19, 2013 20 / 26
50. 50. Fault Tolerance Deterministic input assignment simpliﬁes failure recovery in MapReduce In case of MR+, if a map task or a level-1 reduce fails, it is simply re-executed For level > 1 reduce tasks, MR+ implements three strategies, which expose the trade-off between computation and storage 1 Chain re-execution: The entire chain is re-executed 2 Local replication: The output of each reduce is replicated on the local ﬁle system of a rack-local neighbour 3 Distributed replication: The output of each reduce is replicated on the distributed ﬁle system Zubair Nabi 9: MR+ April 19, 2013 20 / 26
51. 51. Input Prioritization User-deﬁned map and reduce functions are applied to a sample_percentage amount of input, taken at random Zubair Nabi 9: MR+ April 19, 2013 21 / 26
52. 52. Input Prioritization User-deﬁned map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Zubair Nabi 9: MR+ April 19, 2013 21 / 26
53. 53. Input Prioritization User-deﬁned map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Used to exploit structure: data with semantic grouping or clusters of relevant information Zubair Nabi 9: MR+ April 19, 2013 21 / 26
54. 54. Input Prioritization User-deﬁned map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Used to exploit structure: data with semantic grouping or clusters of relevant information The distribution is used to generate a priority queue to assign to map tasks Zubair Nabi 9: MR+ April 19, 2013 21 / 26
55. 55. Input Prioritization User-deﬁned map and reduce functions are applied to a sample_percentage amount of input, taken at random This sampling cycle yields a representative distribution of data Used to exploit structure: data with semantic grouping or clusters of relevant information The distribution is used to generate a priority queue to assign to map tasks A full-ﬂedged MR+ job is then run, in which map tasks read input from the priority queue Zubair Nabi 9: MR+ April 19, 2013 21 / 26
56. 56. Input Prioritization (2) Due to this prioritization, relevant clusters of information are processed ﬁrst Zubair Nabi 9: MR+ April 19, 2013 22 / 26
57. 57. Input Prioritization (2) Due to this prioritization, relevant clusters of information are processed ﬁrst As a result, the computation can be stopped mid-way if a threshold condition is satisﬁed Zubair Nabi 9: MR+ April 19, 2013 22 / 26
58. 58. Outline 1 Introduction 2 MR+ 3 Implementation 4 Code-base Zubair Nabi 9: MR+ April 19, 2013 23 / 26
59. 59. Code-base Around 15,000 lines of Python code Zubair Nabi 9: MR+ April 19, 2013 24 / 26
60. 60. Code-base Around 15,000 lines of Python code Code implements both vanilla MapReduce and MR+ Zubair Nabi 9: MR+ April 19, 2013 24 / 26
61. 61. Code-base Around 15,000 lines of Python code Code implements both vanilla MapReduce and MR+ Written over the course of roughly 5 years at LUMS Zubair Nabi 9: MR+ April 19, 2013 24 / 26
62. 62. Code-base Around 15,000 lines of Python code Code implements both vanilla MapReduce and MR+ Written over the course of roughly 5 years at LUMS Publicly available at: https://code.google.com/p/mrplus/ source/browse/?name=BRANCH_VER_0_0_0_4_PY2x Zubair Nabi 9: MR+ April 19, 2013 24 / 26
63. 63. Storage Abstracts away the underlying storage system Zubair Nabi 9: MR+ April 19, 2013 25 / 26
64. 64. Storage Abstracts away the underlying storage system Currently supports the HDFS and Amazon’s S3 Zubair Nabi 9: MR+ April 19, 2013 25 / 26
65. 65. Storage Abstracts away the underlying storage system Currently supports the HDFS and Amazon’s S3 Also supports the local OS ﬁle system (for unit testing) Zubair Nabi 9: MR+ April 19, 2013 25 / 26
66. 66. Structure Modular structure so most of the code is re-used across MapReduce and MR+ Zubair Nabi 9: MR+ April 19, 2013 26 / 26
67. 67. Structure Modular structure so most of the code is re-used across MapReduce and MR+ Google Protobufs and JSON used for serialization Zubair Nabi 9: MR+ April 19, 2013 26 / 26
68. 68. Structure Modular structure so most of the code is re-used across MapReduce and MR+ Google Protobufs and JSON used for serialization All conﬁguration options within two ﬁles: siteconf.xml (site-wide) and jobconf.xml (job-speciﬁc) Zubair Nabi 9: MR+ April 19, 2013 26 / 26