Topic 9: MR+

9: MR+

Zubair Nabi

zubair.nabi@itu.edu.pk

April 19, 2013

Zubair Nabi 9: MR+ April 19, 2013 1 / 26

Outline

1 Introduction

2 MR+

3 Implementation

4 Code-base


Implicit MapReduce Assumptions

The input data has no structure



The distribution of intermediate data is balanced



Results materialize when all the map and reduce tasks complete



The number of values of each key is small enough to be processed by
a single reduce task



The number of values of each key is small enough to be processed by
a single reduce task
Processing the data at the reduce stage in most cases is usually a
simple aggregation function


Zipf distributions are everywhere


Reduce-intensive applications

Image and speech correlation
Backpropagation in neural networks
Co-clustering
Tree learning
Computation of node diameter and radii in Tera-scale graphs
...


Outline

1 Introduction

2 MR+

3 Implementation

4 Code-base


Design Goals

Negate skew in intermediate data


Design Goals

Exploit structure in input data


Design Goals

Estimate results


Design Goals

Estimate results
Favour commodity clusters


Design Goals

Estimate results
Favour commodity clusters
Maintain original functional model of MapReduce


Design

Maintains the simple MapReduce programming model


Design

Instead of implementing MapReduce as a sequential two-staged
architecture, MR+ allows map and reduce stages to interleave and
iterate over intermediate results


Design

Instead of implementing MapReduce as a sequential two-staged
architecture, MR+ allows map and reduce stages to interleave and
iterate over intermediate results
Leading to a multi-level inverted tree of reduce workers


Architecture

Map Phase Reduce Phase 5% -10%
Estimation cycle
prioritizes data
Map
Reduce

MR Brick-wall MR End MR+ Start Brick-wall MR+ End

(a) MapReduce (b) MR+

Figure: Architectural comparison of MapReduce and MR+.


Architectural Flexibility

1 Instead of waiting for all maps to ﬁnish before scheduling a reduce
task, MR+ permits a model where a reduce task can be scheduled for
every n invocations of the map function


Architectural Flexibility

1 Instead of waiting for all maps to ﬁnish before scheduling a reduce
task, MR+ permits a model where a reduce task can be scheduled for
every n invocations of the map function
2 A densely populated key can be recursively reduced by repeated
invocation of the reduce function at multiple reduce workers


Advantages

Resilient to TCP Incast by amortizing data copying over the course of
the job


Advantages

the job
Early materialization of partial results for queries with thresholds or
conﬁdence intervals


Advantages

the job
Early materialization of partial results for queries with thresholds or
conﬁdence intervals
Finds structure in the data by running a sample cycle to learn the
distribution of information and prioritizes input data with respect to the
user query


Programming Model

Retains the 2-stage MapReduce API


Programming Model

MR+ reducers can be likened to distributed combiners


Programming Model

Repeated invocation of the reducer by default rules out non-associative
functions


Programming Model

Repeated invocation of the reducer by default rules out non-associative
functions
But reducers can be designed in such a way that the associative
operation is applied only at the very last reduce


Outline

1 Introduction

2 MR+

3 Implementation

4 Code-base


Scheduling

Tasks are scheduled according to a conﬁgurable
map_to_reduce_schedule_ratio parameter


Scheduling

For every map_to_reduce_schedule_ratio map tasks, 1
reduce task is scheduled


Scheduling

For every map_to_reduce_schedule_ratio map tasks, 1
reduce task is scheduled
For instance, if map_to_reduce_schedule_ratio is 4, then the
ﬁrst reduce task is scheduled when 4 map tasks complete


Level-1 reducers

Each reduce is assigned the output of map_to_reduce_ratio
number of maps


Level-1 reducers

number of maps
The location of their inputs is communicated by the JobTracker


Level-1 reducers

number of maps
Each reduce task pulls its input via HTTP


Level-1 reducers

number of maps
Each reduce task pulls its input via HTTP
After the reduce logic has been applied to all keys, the output is
earmarked for L > 1 reducers


Level > 1 reducers

Assigned the input of reduce_input_ratio number of reduce
tasks


Level > 1 reducers

tasks
Eventually all key/value pairs make their way to the ﬁnal level, which
has a single worker


Level > 1 reducers

tasks
Eventually all key/value pairs make their way to the ﬁnal level, which
has a single worker
This ﬁnal reduce can also be used to apply any non-associative
operation


Structural comparison

k1, v1,v2,...
Map1 k2, v1,v2,...
k1, v1,v2,... ...
k1, v1,v2,... k2, v1,v2,... Reduce1,1 kn, v1,v2,...
k2, v1,v2,... ...
... Reduce1,2
k1, v1,v2,... kn, v1,v2,...
kn, v1,v2,... Reduce1 Map2 ... Reduce2,1 k1, v1,v2,... k1, v1,v2,...
k2, v1,v2,... k2, v1,v2,...
Map1 k2, v1,v2,... ... ...
k1, v1,v2,... . Reduce3,1
k2, v1,v2,...
Reduce2 .
.
kn, v1,v2,... kn, v1,v2,...
. Reduce2,2 ... Reduce1,φ
... k3, v1,v2,... . ... Reduce4,1
.
kn, v1,v2,... Reduce3 .
.
k1, v1,v2,...
Map2 . k2, v1,v2,...
k4, v1,v2,... . ... α = ω/mr
Reduce4 Mapω-1 ... Reduceα-1,1 kn, v1,v2,...
k1, v1,v2,... .
. Shuffler k1, v1,v2,... β = α/rr
k2, v1,v2,... . . Reduceβ,2
. .
. k2, v1,v2,... ϒ = β/rr
... . ... Reduceα,1 .
kn, v1,v2,... kn, v1,v2,... kn, v1,v2,...
.
.
Mapω Brick-wall Reduceθ Mapω 1

(a) MapReduce (b) MR+

Figure: Structural comparison of MapReduce and MR+.


Reduce Locality

MR+ does not rely on key/values for input assignment


Reduce Locality

Reduce inputs are assigned on the basis of locality


Reduce Locality

Reduce inputs are assigned on the basis of locality
1 Node-local
2 Rack-local
3 Any


Fault Tolerance

Deterministic input assignment simpliﬁes failure recovery in
MapReduce


Fault Tolerance

MapReduce
In case of MR+, if a map task or a level-1 reduce fails, it is simply
re-executed


Fault Tolerance

MapReduce
re-executed
For level > 1 reduce tasks, MR+ implements three strategies, which
expose the trade-off between computation and storage


Fault Tolerance

MapReduce
re-executed
1 Chain re-execution: The entire chain is re-executed


Fault Tolerance

MapReduce
re-executed
2 Local replication: The output of each reduce is replicated on the local
ﬁle system of a rack-local neighbour


Fault Tolerance

MapReduce
re-executed
2 Local replication: The output of each reduce is replicated on the local
ﬁle system of a rack-local neighbour
3 Distributed replication: The output of each reduce is replicated on the
distributed ﬁle system


Input Prioritization

User-deﬁned map and reduce functions are applied to a
sample_percentage amount of input, taken at random



This sampling cycle yields a representative distribution of data



Used to exploit structure: data with semantic grouping or clusters of
relevant information



The distribution is used to generate a priority queue to assign to map
tasks



The distribution is used to generate a priority queue to assign to map
tasks
A full-ﬂedged MR+ job is then run, in which map tasks read input from
the priority queue


Input Prioritization (2)

Due to this prioritization, relevant clusters of information are processed
ﬁrst


Input Prioritization (2)

Due to this prioritization, relevant clusters of information are processed
ﬁrst
As a result, the computation can be stopped mid-way if a threshold
condition is satisﬁed


Outline

1 Introduction

2 MR+

3 Implementation

4 Code-base


Code-base

Around 15,000 lines of Python code


Code-base

Code implements both vanilla MapReduce and MR+


Code-base

Written over the course of roughly 5 years at LUMS


Code-base

Written over the course of roughly 5 years at LUMS
Publicly available at: https://code.google.com/p/mrplus/
source/browse/?name=BRANCH_VER_0_0_0_4_PY2x


Storage

Abstracts away the underlying storage system


Storage

Currently supports the HDFS and Amazon’s S3


Storage

Currently supports the HDFS and Amazon’s S3
Also supports the local OS ﬁle system (for unit testing)


Structure

Modular structure so most of the code is re-used across MapReduce
and MR+


Structure

and MR+
Google Protobufs and JSON used for serialization


Structure

and MR+
Google Protobufs and JSON used for serialization
All configuration options within two files: siteconf.xml (site-wide)
and jobconf.xml (job-specific)


Topic 9: MR+

More Related Content

Viewers also liked

Similar to Topic 9: MR+

More from Zubair Nabi

Recently uploaded

Topic 9: MR+