1. The document analyzes non-work-conserving effects in the MapReduce framework under heavy load conditions. It develops a diffusion approximation model and lower bound for queue lengths.
2. It studies three tie-breaking policies for scheduling jobs in the reduce queue and intuition for why policy 1 and policy 3 can achieve the lower bound under certain assumptions.
3. The key conclusions are that tie-breaking rules should avoid possible jumps in queue lengths, and policy 1 and policy 3 can achieve the diffusion approximation lower bound if workload distributions are bounded and have certain hazard properties.
2. How to Handle Large Data?
https://www.gtisoft.com/img/DistributedComputing.jpg
One solution is “distributed computing”
MapReduce is one implementation.
3. What is MapReduce?
• “Job”: a document with several words
1. “Map Task”: Word (word, count) pair
2. “Copy/Shuffle”: distribute the pairs to reduce machine
3. “Reduce Task”: count the word frequency
using word as a key
Example: Finding word frequency
He is in the
elite of the
elite right now
(He,1)
(is,1)
(now,1)
… (the,2)
(elite,2)
(now,1)
…
4. Main Issue & Outline
• Model of MapReduce
• Load conditions and 3 Tie-Breaking Policies
• Diffusion Approximation
• Finding a Lower Bound
• Sketch of the proof
Main Issue: Analyze the Map & Reduce
queues and design a good scheduling policy [1]
Outline:
5. Model of MapReduce
• Each job has multiple
Map tasks and Reduce
Tasks.
• Map Task smaller than
Reduce Task
• Workload Bi with mean
E[B] and Var[B].
M/G/1 processor sharing queue
K-server G/G/1 queue
Intermediate data
Reduce
Task
Reduce
Queue
2
1
K
R
R
R
Map
Queue
Job
Reduce
Task
Time
Workload ~ Bi follows B
~Poi(l)
6. Reduce
Task
Reduce
Queue
2
1
K
R
R
R
Map
Queue
Job
Reduce
Task
Model and Notations
• Qr(t): # of jobs in the
reduce queue.
• Qm(t): # of jobs in the
map queue.
• For task j of job i:
• Ri(t): # of running
Reduce tasks of job i
• Ri: total # of Reduce
tasks of job i
Qr(t)
Qm(t)
: copy/shuffle
: reduce phase
: ( )
j
i
j
i
i
j j
i i
j
C
D
Z C D
7. Traffic Assumptions and Constraint
• Reduce queue:
1. max-min fairness
Ex: 10 servers with 3 jobs (4,3,3)
2. No “preemption”: jobs cannot be interrupted
• Dependence constraint:
Progress of Reduce tasks
<= Progress of Map Tasks
(t) (t)
i iC B
Reduce queue is NOT work-conserving
8. Lightly Loaded
Example: K=10
Current state: 2 jobs in Reduce queue
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Suppose: A new job join the Reduce queue
(5,5)
(?,?,?)
9. Heavily Loaded
How to break tie?
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Example: K=10
Current state: >>10 jobs in Reduce queue (1,…,1)
Suppose: One job finished and leaves Reduce queue
10. Study on Tie-Breaking Rules
• Consider 3 rules for Reduce queue:
Policy 1: Choose the one with the smallest remaining
Map service
Policy 2: Choose one job randomly (uniform)
Policy 3: Choose the one which starts Map service first
Time
Job 1
Job 2
Job 3
TS1 TS2 TS3 TE1TE2 TE3
Current
time
Map
Service
Policy 1
Policy 3
11. Diffusion Approximation
• Consider a sequence of MapReduce systems
(n) (n) (n) (n)
(B ,C ,D ,R )
(n)
l l
1. Primitive data:
4. Arrival rate:
• Heavy-traffic assumption:
(n) (n)
(1 E[B ])n l Map service
1(n) (n) (n) (n)
(1 E[(C ) R ])n D
K
l Reduce service
2. Reduce workload: 1
(n)
(n), (n),(n)
: (C )iR j j
i i ij
Z D
3. Limits: (n) (n) 2
( [ ], [ ]) ( , )b b
E B Var B
(n) (n) (n)
( [R ], [Z ], [Z ]) ( , , )r r r
E E Var
12. Diffusion Approximation (Cont.)
• Diffusion limits:
(n) (n)(n) (n)Q (nt) Q (nt)
Q (t) : , Q (t) :m r
m r
n n
• Queue length: (n) (n)
(Q ,Q )m r
Theorem 1 [4]: (Map queue)
(n)
*
(RBQ (t) Q (t) (Q ( M)0) W (t))m m m m
where *
W (t)m is a BM with:
2 2
b b
drift 2 ( ) 0b
(1)
(2)
22 2 2
b b b
variance 4 ( ) ( )b b
l
13. Relaxed MapReduce & Lower Bound
1. Assume no dependence constraint:
(t) (t)
i iC B
Reduce queue now becomes work-conserving
is possible
2. A job always given at least an equal number of
servers as in the original queue, for all t.
• Construct a new queue:
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Jobs are completed
no later than in the
original queue.
14. Lower Bound (Cont.)
Theorem 2:
There exists a sequence
such that
L,(n ( )) n
(t) (t)rr QQ
(1)
L,(n)
*
(t) (W (t)) (RBM)r r
Q
(2)
* *
is a BM with (0) 0,drift ,r r
r
K
W W
2
r
3
r
and variance
l
(3)
L,(n)
*
(t) is independent of W (t)r m
Q
Is the lower bound achievable?
15. Observation
A1: Map queue is M/G/1 and processor-sharing
Past departures are independent of Qm(t)
The queue is reversible, i.e. departure
process is also Poisson. [2]
Map
Queue
Job
Reduce
Task
Qm(t)
16. Observation (Cont.)
A2: In heavy traffic, if the processing of Reduce
queue is the same as the departure of Map queue
Reduce queue will be very close to a FIFO
multi-server queue with
a service time for job i.
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
1
(C )iR j j
i ij
D
17. Intuition for Policy 1
From A1&A2: Arrival process of Qr(t)
~ departure process of Qm(t)
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Policy 1: Choose the one with the smallest remaining
Map service
Qr(t)
Qm(t)
(n)
*
(t) (W (t))r r
Q
Reduce queue can be
approximated by RBM
18. Intuition for Policy 2
Diffusion approximation for Map queue
Policy 2: Choose one job randomly (uniform)
0 0
(nt ) [ (t )] q ( )mm m
Q n E Q n o n
Suppose 1 vacancy in Reduce queue at nt0
The chosen job has remaining Map workload= e
B
( )
( ) [B]
e
x
P B x
P B u E du
( is a distribution)e
B
[3]
nt0 nt0+Dt
Qm(t)
Time
n1/2
19. Intuition for Policy 2 (Cont.)
Let the remaining Map workload of
that job = be
The remaining Map service time of
that job ~ e
m
b q n
Reduce queue will grow by ~
e
m
b q n
K
l
0 0
(t ) (t ) , jump in diffusion limit
e
m
r r
b q
Q Q
K
l
nt0 nt0+Dt
Qr(t)
DQr(t)
Time
Why does it matter?
Little’s law!
20. Intuition for Policy 3
Depends on the workload distribution B
Policy 3: Choose the one which starts Map service first
Special case: B is constant
P[B>x]
x
1
x*
P[B>x]
x
1
x*
Policy 1 = Policy 3
Like Policy 1 Like Policy 2
21. Achieve the Lower Bound
Theorem 3.
(1) Under policy 1, if B is bounded, then
(n)
*
(W (t))(t)r r
Q
(2) Under policy 2, if B is bounded, then
* *
(n)
(W (t))(t)r r
Q
where * * *
is modified from (W (t))(W (t))r r
with jumps of random size when * *
(W (t))r
hits zero.
22. Achieve the Lower Bound (Cont.)
Theorem 3.
(3) Under policy 3, if B is bounded and has a
decreasing hazard function, then
(n)
*
(W (t))(t)r r
Q
* *
(n)
(W (t))(t)r r
Q
If B has an increasing hazard function, then
Hazard function (failure rate) Remark:
0
P(x )
(x) lim
P(x )x
B x x
H
x BD
D
D
(Like Policy 1)
(Like Policy 2)
23. Conclusion
• Non-work-conserving effect might occur in
the MapReduce system under heavy traffic.
• With heavy-traffic assumption, we obtain a
lower bound using diffusion approximation.
• Tie-breaking rule should be carefully
designed to avoid possible jumps in the
queue length.
24. References
• [1] J. Tan et al., “Non-work-conserving Effects in
MapReduce: Diffusion Limit and Criticality,” in Proc.
SIGMETRICS, 2014.
• [2] F. P. Kelly. Reversibility and Stochastic Networks.
John Wiley & Sons, 1979.
• [3] H. C. Gromoll, “Diffusion approximation for a
processor sharing queue in heavy traffic,” Annals of
Applied Probability, 14:555–611, 2004.
• [4] A. Lambert, F. Simatos, and B. Zwart. “Scaling
limits via excursion theory: Interplay between Crump
Mode-Jagers branching processes and processor sharing
queues,” The Annals of Applied Probability, 23:2161–
2603, 2013.