SlideShare a Scribd company logo
Non-Work-Conserving Effects
in MapReduce:
Diffusion Approximation
Ping-Chun Hsieh
2015.04.27
How to Handle Large Data?
https://www.gtisoft.com/img/DistributedComputing.jpg
One solution is “distributed computing”
MapReduce is one implementation.
What is MapReduce?
• “Job”: a document with several words
1. “Map Task”: Word (word, count) pair
2. “Copy/Shuffle”: distribute the pairs to reduce machine
3. “Reduce Task”: count the word frequency
using word as a key
 Example: Finding word frequency
He is in the
elite of the
elite right now
(He,1)
(is,1)
(now,1)
… (the,2)
(elite,2)
(now,1)
…
Main Issue & Outline
• Model of MapReduce
• Load conditions and 3 Tie-Breaking Policies
• Diffusion Approximation
• Finding a Lower Bound
• Sketch of the proof
 Main Issue: Analyze the Map & Reduce
queues and design a good scheduling policy [1]
 Outline:
Model of MapReduce
• Each job has multiple
Map tasks and Reduce
Tasks.
• Map Task smaller than
Reduce Task
• Workload Bi with mean
E[B] and Var[B].
M/G/1 processor sharing queue
K-server G/G/1 queue
Intermediate data
Reduce
Task
Reduce
Queue
2
1
K
R
R
R
Map
Queue
Job
Reduce
Task
Time
Workload ~ Bi follows B
~Poi(l)
Reduce
Task
Reduce
Queue
2
1
K
R
R
R
Map
Queue
Job
Reduce
Task
Model and Notations
• Qr(t): # of jobs in the
reduce queue.
• Qm(t): # of jobs in the
map queue.
• For task j of job i:
• Ri(t): # of running
Reduce tasks of job i
• Ri: total # of Reduce
tasks of job i
Qr(t)
Qm(t)
: copy/shuffle
: reduce phase
: ( )
j
i
j
i
i
j j
i i
j
C
D
Z C D 
Traffic Assumptions and Constraint
• Reduce queue:
1. max-min fairness
Ex: 10 servers with 3 jobs (4,3,3)
2. No “preemption”: jobs cannot be interrupted
• Dependence constraint:
Progress of Reduce tasks
<= Progress of Map Tasks
(t) (t)
i iC B
  
Reduce queue is NOT work-conserving
Lightly Loaded
 Example: K=10
Current state: 2 jobs in Reduce queue
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Suppose: A new job join the Reduce queue
(5,5)
(?,?,?)
Heavily Loaded
How to break tie?
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
 Example: K=10
Current state: >>10 jobs in Reduce queue (1,…,1)
Suppose: One job finished and leaves Reduce queue
Study on Tie-Breaking Rules
• Consider 3 rules for Reduce queue:
Policy 1: Choose the one with the smallest remaining
Map service
Policy 2: Choose one job randomly (uniform)
Policy 3: Choose the one which starts Map service first
Time
Job 1
Job 2
Job 3
TS1 TS2 TS3 TE1TE2 TE3
Current
time
Map
Service
Policy 1
Policy 3
Diffusion Approximation
• Consider a sequence of MapReduce systems
(n) (n) (n) (n)
(B ,C ,D ,R )
(n)
l l
1. Primitive data:
4. Arrival rate:
• Heavy-traffic assumption:
(n) (n)
(1 E[B ])n  l Map service
1(n) (n) (n) (n)
(1 E[(C ) R ])n D
K
  l Reduce service
2. Reduce workload: 1
(n)
(n), (n),(n)
: (C )iR j j
i i ij
Z D
 
3. Limits: (n) (n) 2
( [ ], [ ]) ( , )b b
E B Var B   
(n) (n) (n)
( [R ], [Z ], [Z ]) ( , , )r r r
E E Var    
Diffusion Approximation (Cont.)
• Diffusion limits:
(n) (n)(n) (n)Q (nt) Q (nt)
Q (t) : , Q (t) :m r
m r
n n
 
• Queue length: (n) (n)
(Q ,Q )m r
Theorem 1 [4]: (Map queue)
(n)
*
(RBQ (t) Q (t) (Q ( M)0) W (t))m m m m
   
where *
W (t)m is a BM with:
2 2
b b
drift 2 ( ) 0b
     (1)
(2)
22 2 2
b b b
variance 4 ( ) ( )b b
    l   
Relaxed MapReduce & Lower Bound
1. Assume no dependence constraint:
(t) (t)
i iC B
  
Reduce queue now becomes work-conserving
is possible
2. A job always given at least an equal number of
servers as in the original queue, for all t.
• Construct a new queue:
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Jobs are completed
no later than in the
original queue.
Lower Bound (Cont.)
Theorem 2:
There exists a sequence
such that
L,(n ( )) n
(t) (t)rr QQ 
(1)
L,(n)
*
(t) (W (t)) (RBM)r r
Q  
(2)
* *
is a BM with (0) 0,drift ,r r
r
K
W W

 

2
r
3
r
and variance  

l

(3)
L,(n)
*
(t) is independent of W (t)r m
Q
Is the lower bound achievable?
Observation
A1: Map queue is M/G/1 and processor-sharing
Past departures are independent of Qm(t)
The queue is reversible, i.e. departure
process is also Poisson. [2]
Map
Queue
Job
Reduce
Task
Qm(t)
Observation (Cont.)
A2: In heavy traffic, if the processing of Reduce
queue is the same as the departure of Map queue
Reduce queue will be very close to a FIFO
multi-server queue with
a service time for job i.
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
1
(C )iR j j
i ij
D

Intuition for Policy 1
From A1&A2: Arrival process of Qr(t)
~ departure process of Qm(t)
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Policy 1: Choose the one with the smallest remaining
Map service
Qr(t)
Qm(t)
(n)
*
(t) (W (t))r r
Q  
Reduce queue can be
approximated by RBM
Intuition for Policy 2
Diffusion approximation for Map queue
Policy 2: Choose one job randomly (uniform)
0 0
(nt ) [ (t )] q ( )mm m
Q n E Q n o n   
Suppose 1 vacancy in Reduce queue at nt0
The chosen job has remaining Map workload= e
B
( )
( ) [B]
e
x
P B x
P B u E du


 
( is a distribution)e
B
[3]
nt0 nt0+Dt
Qm(t)
Time
n1/2
Intuition for Policy 2 (Cont.)
Let the remaining Map workload of
that job = be
The remaining Map service time of
that job ~ e
m
b q n
Reduce queue will grow by ~
e
m
b q n
K
l
0 0
(t ) (t ) , jump in diffusion limit
e
m
r r
b q
Q Q
K
 
 
l
nt0 nt0+Dt
Qr(t)
DQr(t)
Time
Why does it matter?
Little’s law!
Intuition for Policy 3
Depends on the workload distribution B
Policy 3: Choose the one which starts Map service first
Special case: B is constant
P[B>x]
x
1
x*
P[B>x]
x
1
x*
Policy 1 = Policy 3
Like Policy 1 Like Policy 2
Achieve the Lower Bound
Theorem 3.
(1) Under policy 1, if B is bounded, then
(n)
*
(W (t))(t)r r
Q 
(2) Under policy 2, if B is bounded, then
* *
(n)
(W (t))(t)r r
Q 
where * * *
is modified from (W (t))(W (t))r r

with jumps of random size when * *
(W (t))r

hits zero.
Achieve the Lower Bound (Cont.)
Theorem 3.
(3) Under policy 3, if B is bounded and has a
decreasing hazard function, then
(n)
*
(W (t))(t)r r
Q 
* *
(n)
(W (t))(t)r r
Q 
If B has an increasing hazard function, then
Hazard function (failure rate) Remark:
0
P(x )
(x) lim
P(x )x
B x x
H
x BD 
   D

D  
(Like Policy 1)
(Like Policy 2)
Conclusion
• Non-work-conserving effect might occur in
the MapReduce system under heavy traffic.
• With heavy-traffic assumption, we obtain a
lower bound using diffusion approximation.
• Tie-breaking rule should be carefully
designed to avoid possible jumps in the
queue length.
References
• [1] J. Tan et al., “Non-work-conserving Effects in
MapReduce: Diffusion Limit and Criticality,” in Proc.
SIGMETRICS, 2014.
• [2] F. P. Kelly. Reversibility and Stochastic Networks.
John Wiley & Sons, 1979.
• [3] H. C. Gromoll, “Diffusion approximation for a
processor sharing queue in heavy traffic,” Annals of
Applied Probability, 14:555–611, 2004.
• [4] A. Lambert, F. Simatos, and B. Zwart. “Scaling
limits via excursion theory: Interplay between Crump
Mode-Jagers branching processes and processor sharing
queues,” The Annals of Applied Probability, 23:2161–
2603, 2013.

More Related Content

What's hot

Radix-2 DIT FFT
Radix-2 DIT FFT Radix-2 DIT FFT
Radix-2 DIT FFT
Sarang Joshi
 
our igu poster
our igu posterour igu poster
our igu poster
Charu Kamra
 
Pakdd
PakddPakdd
Pakdd
Siswanto .
 
1633 the inverse z-transform
1633 the inverse z-transform1633 the inverse z-transform
1633 the inverse z-transform
Dr Fereidoun Dejahang
 
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
Arthur Weglein
 
DSP 08 _ Sheet Eight
DSP 08 _ Sheet EightDSP 08 _ Sheet Eight
DSP 08 _ Sheet Eight
Amr E. Mohamed
 
Kumaraswamy disribution
Kumaraswamy disributionKumaraswamy disribution
Kumaraswamy disribution
Pankaj Das
 
Some fixed point theorems of expansion mapping in g-metric spaces
Some fixed point theorems of expansion mapping in g-metric spacesSome fixed point theorems of expansion mapping in g-metric spaces
Some fixed point theorems of expansion mapping in g-metric spaces
inventionjournals
 
Ecte401 notes week3
Ecte401 notes week3Ecte401 notes week3
Ecte401 notes week3
subhasree konar
 
Recurrence theorem
Recurrence theoremRecurrence theorem
Recurrence theorem
Rajendran
 
Fast fourier transform
Fast fourier transformFast fourier transform
Fast fourier transform
Ashraf Khan
 
Blind separation of complex-valued satellite-AIS data for marine surveillance...
Blind separation of complex-valued satellite-AIS data for marine surveillance...Blind separation of complex-valued satellite-AIS data for marine surveillance...
Blind separation of complex-valued satellite-AIS data for marine surveillance...
IJECEIAES
 
Fuzzieee-98-final
Fuzzieee-98-finalFuzzieee-98-final
Fuzzieee-98-final
Sumit Sen
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QL
Szymon Klarman
 
(文献紹介)デブラー手法の紹介
(文献紹介)デブラー手法の紹介(文献紹介)デブラー手法の紹介
(文献紹介)デブラー手法の紹介
Morpho, Inc.
 
9511
95119511
Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04
Rediet Moges
 
Dkd 4 2-sheet_1_annex_a
Dkd 4 2-sheet_1_annex_aDkd 4 2-sheet_1_annex_a
Dkd 4 2-sheet_1_annex_a
Filomeno Faustino
 
Ground Excited Systems
Ground Excited SystemsGround Excited Systems
Ground Excited Systems
Teja Ande
 

What's hot (19)

Radix-2 DIT FFT
Radix-2 DIT FFT Radix-2 DIT FFT
Radix-2 DIT FFT
 
our igu poster
our igu posterour igu poster
our igu poster
 
Pakdd
PakddPakdd
Pakdd
 
1633 the inverse z-transform
1633 the inverse z-transform1633 the inverse z-transform
1633 the inverse z-transform
 
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
 
DSP 08 _ Sheet Eight
DSP 08 _ Sheet EightDSP 08 _ Sheet Eight
DSP 08 _ Sheet Eight
 
Kumaraswamy disribution
Kumaraswamy disributionKumaraswamy disribution
Kumaraswamy disribution
 
Some fixed point theorems of expansion mapping in g-metric spaces
Some fixed point theorems of expansion mapping in g-metric spacesSome fixed point theorems of expansion mapping in g-metric spaces
Some fixed point theorems of expansion mapping in g-metric spaces
 
Ecte401 notes week3
Ecte401 notes week3Ecte401 notes week3
Ecte401 notes week3
 
Recurrence theorem
Recurrence theoremRecurrence theorem
Recurrence theorem
 
Fast fourier transform
Fast fourier transformFast fourier transform
Fast fourier transform
 
Blind separation of complex-valued satellite-AIS data for marine surveillance...
Blind separation of complex-valued satellite-AIS data for marine surveillance...Blind separation of complex-valued satellite-AIS data for marine surveillance...
Blind separation of complex-valued satellite-AIS data for marine surveillance...
 
Fuzzieee-98-final
Fuzzieee-98-finalFuzzieee-98-final
Fuzzieee-98-final
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QL
 
(文献紹介)デブラー手法の紹介
(文献紹介)デブラー手法の紹介(文献紹介)デブラー手法の紹介
(文献紹介)デブラー手法の紹介
 
9511
95119511
9511
 
Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04
 
Dkd 4 2-sheet_1_annex_a
Dkd 4 2-sheet_1_annex_aDkd 4 2-sheet_1_annex_a
Dkd 4 2-sheet_1_annex_a
 
Ground Excited Systems
Ground Excited SystemsGround Excited Systems
Ground Excited Systems
 

Viewers also liked

CORREO ELECTRONICO
CORREO ELECTRONICOCORREO ELECTRONICO
CORREO ELECTRONICO
Kevin_Illescas123
 
New AmeriCorps Program Orientation July 2014
New AmeriCorps Program Orientation July 2014New AmeriCorps Program Orientation July 2014
New AmeriCorps Program Orientation July 2014
br7059hotmail
 
karaoke
karaokekaraoke
karaoke
luisesproman
 
Julia 2015 Resume
Julia 2015 ResumeJulia 2015 Resume
Julia 2015 Resume
JULIA KNECHT
 
New AmeriCorps Program Orientation August 2014
New AmeriCorps Program Orientation August 2014New AmeriCorps Program Orientation August 2014
New AmeriCorps Program Orientation August 2014
br7059hotmail
 
CD_omslag_version_25aug
CD_omslag_version_25augCD_omslag_version_25aug
CD_omslag_version_25aug
Magnus Båge
 
TIC Lookbook
TIC LookbookTIC Lookbook
TIC Lookbook
David Builta
 
Enigma Otilei de George Calinescu
Enigma Otilei de George CalinescuEnigma Otilei de George Calinescu
Enigma Otilei de George Calinescu
Nikuta Cibotari
 
Felix de kampenbouwer
Felix de kampenbouwerFelix de kampenbouwer
Felix de kampenbouwer
Gtn Rbn
 
Ashraf abu al khair cv
Ashraf abu al khair cvAshraf abu al khair cv
Ashraf abu al khair cv
M . Ashraf Abu el khair
 
presentacion digital
presentacion digitalpresentacion digital
presentacion digital
luisesproman
 
11 crise do primeiro reinado
11   crise do primeiro reinado11   crise do primeiro reinado
11 crise do primeiro reinado
aridu18
 
6 apresentação periodo joanino estagio
6  apresentação periodo joanino estagio6  apresentação periodo joanino estagio
6 apresentação periodo joanino estagio
aridu18
 
Sejarah jepang
Sejarah   jepangSejarah   jepang
Sejarah jepang
Immanuela Lesterina
 
Wojciech gawinowski vostok design
Wojciech gawinowski vostok designWojciech gawinowski vostok design
Wojciech gawinowski vostok design
Hanna Wincenciak
 
Jesuíta no Brasil Colonial
Jesuíta no Brasil ColonialJesuíta no Brasil Colonial
Jesuíta no Brasil Colonial
aridu18
 
Cardápio de atividades - liberalismo econômico e socialismo
Cardápio de atividades -  liberalismo econômico e socialismo Cardápio de atividades -  liberalismo econômico e socialismo
Cardápio de atividades - liberalismo econômico e socialismo
aridu18
 
1 cronologia básica da história do brasil e do brasil império
1   cronologia básica da história do brasil e do brasil império1   cronologia básica da história do brasil e do brasil império
1 cronologia básica da história do brasil e do brasil império
aridu18
 

Viewers also liked (19)

CORREO ELECTRONICO
CORREO ELECTRONICOCORREO ELECTRONICO
CORREO ELECTRONICO
 
New AmeriCorps Program Orientation July 2014
New AmeriCorps Program Orientation July 2014New AmeriCorps Program Orientation July 2014
New AmeriCorps Program Orientation July 2014
 
karaoke
karaokekaraoke
karaoke
 
Julia 2015 Resume
Julia 2015 ResumeJulia 2015 Resume
Julia 2015 Resume
 
Pro man
Pro manPro man
Pro man
 
New AmeriCorps Program Orientation August 2014
New AmeriCorps Program Orientation August 2014New AmeriCorps Program Orientation August 2014
New AmeriCorps Program Orientation August 2014
 
CD_omslag_version_25aug
CD_omslag_version_25augCD_omslag_version_25aug
CD_omslag_version_25aug
 
TIC Lookbook
TIC LookbookTIC Lookbook
TIC Lookbook
 
Enigma Otilei de George Calinescu
Enigma Otilei de George CalinescuEnigma Otilei de George Calinescu
Enigma Otilei de George Calinescu
 
Felix de kampenbouwer
Felix de kampenbouwerFelix de kampenbouwer
Felix de kampenbouwer
 
Ashraf abu al khair cv
Ashraf abu al khair cvAshraf abu al khair cv
Ashraf abu al khair cv
 
presentacion digital
presentacion digitalpresentacion digital
presentacion digital
 
11 crise do primeiro reinado
11   crise do primeiro reinado11   crise do primeiro reinado
11 crise do primeiro reinado
 
6 apresentação periodo joanino estagio
6  apresentação periodo joanino estagio6  apresentação periodo joanino estagio
6 apresentação periodo joanino estagio
 
Sejarah jepang
Sejarah   jepangSejarah   jepang
Sejarah jepang
 
Wojciech gawinowski vostok design
Wojciech gawinowski vostok designWojciech gawinowski vostok design
Wojciech gawinowski vostok design
 
Jesuíta no Brasil Colonial
Jesuíta no Brasil ColonialJesuíta no Brasil Colonial
Jesuíta no Brasil Colonial
 
Cardápio de atividades - liberalismo econômico e socialismo
Cardápio de atividades -  liberalismo econômico e socialismo Cardápio de atividades -  liberalismo econômico e socialismo
Cardápio de atividades - liberalismo econômico e socialismo
 
1 cronologia básica da história do brasil e do brasil império
1   cronologia básica da história do brasil e do brasil império1   cronologia básica da história do brasil e do brasil império
1 cronologia básica da história do brasil e do brasil império
 

Similar to Final Project: Non-Work-Conserving Effects in MapReduce

Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
eXascale Infolab
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
Jonny Daenen
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
Massimiliano Ruocco
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
Khaled Al-Shamaa
 
2012 금융수학 겨울학교 - FM 210 Mhz
2012 금융수학 겨울학교 - FM 210 Mhz2012 금융수학 겨울학교 - FM 210 Mhz
2012 금융수학 겨울학교 - FM 210 Mhz
Kyunghoon Kim
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
IEEE NTUA SB
 
Convex Optimization Modelling with CVXOPT
Convex Optimization Modelling with CVXOPTConvex Optimization Modelling with CVXOPT
Convex Optimization Modelling with CVXOPT
andrewmart11
 
Vectorise all the things - long version.pptx
Vectorise all the things - long version.pptxVectorise all the things - long version.pptx
Vectorise all the things - long version.pptx
JodieBurchell1
 
5_2019_01_12!09_25_57_AM.ppt
5_2019_01_12!09_25_57_AM.ppt5_2019_01_12!09_25_57_AM.ppt
5_2019_01_12!09_25_57_AM.ppt
aboma2hawi
 
a decomposition methodMin quasdratic.pdf
a decomposition methodMin quasdratic.pdfa decomposition methodMin quasdratic.pdf
a decomposition methodMin quasdratic.pdf
AnaRojas146538
 
time response
time responsetime response
time response
University Malaya
 
Linear models
Linear modelsLinear models
Linear models
FAO
 
Digital Image Procesing
Digital Image ProcesingDigital Image Procesing
Digital Image Procesing
vepiga5005
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
SSA KPI
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
T. E. BOGALE
 
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
VLSICS Design
 
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
VLSICS Design
 
12. Linear models
12. Linear models12. Linear models
12. Linear models
ExternalEvents
 
Image Enhancement in Spatial Frequency Domain
Image Enhancement in Spatial Frequency DomainImage Enhancement in Spatial Frequency Domain
Image Enhancement in Spatial Frequency Domain
venkadanathanachudha
 
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
subhashchandra197
 

Similar to Final Project: Non-Work-Conserving Effects in MapReduce (20)

Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
2012 금융수학 겨울학교 - FM 210 Mhz
2012 금융수학 겨울학교 - FM 210 Mhz2012 금융수학 겨울학교 - FM 210 Mhz
2012 금융수학 겨울학교 - FM 210 Mhz
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
 
Convex Optimization Modelling with CVXOPT
Convex Optimization Modelling with CVXOPTConvex Optimization Modelling with CVXOPT
Convex Optimization Modelling with CVXOPT
 
Vectorise all the things - long version.pptx
Vectorise all the things - long version.pptxVectorise all the things - long version.pptx
Vectorise all the things - long version.pptx
 
5_2019_01_12!09_25_57_AM.ppt
5_2019_01_12!09_25_57_AM.ppt5_2019_01_12!09_25_57_AM.ppt
5_2019_01_12!09_25_57_AM.ppt
 
a decomposition methodMin quasdratic.pdf
a decomposition methodMin quasdratic.pdfa decomposition methodMin quasdratic.pdf
a decomposition methodMin quasdratic.pdf
 
time response
time responsetime response
time response
 
Linear models
Linear modelsLinear models
Linear models
 
Digital Image Procesing
Digital Image ProcesingDigital Image Procesing
Digital Image Procesing
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
 
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
 
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
 
12. Linear models
12. Linear models12. Linear models
12. Linear models
 
Image Enhancement in Spatial Frequency Domain
Image Enhancement in Spatial Frequency DomainImage Enhancement in Spatial Frequency Domain
Image Enhancement in Spatial Frequency Domain
 
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 

Final Project: Non-Work-Conserving Effects in MapReduce

  • 1. Non-Work-Conserving Effects in MapReduce: Diffusion Approximation Ping-Chun Hsieh 2015.04.27
  • 2. How to Handle Large Data? https://www.gtisoft.com/img/DistributedComputing.jpg One solution is “distributed computing” MapReduce is one implementation.
  • 3. What is MapReduce? • “Job”: a document with several words 1. “Map Task”: Word (word, count) pair 2. “Copy/Shuffle”: distribute the pairs to reduce machine 3. “Reduce Task”: count the word frequency using word as a key  Example: Finding word frequency He is in the elite of the elite right now (He,1) (is,1) (now,1) … (the,2) (elite,2) (now,1) …
  • 4. Main Issue & Outline • Model of MapReduce • Load conditions and 3 Tie-Breaking Policies • Diffusion Approximation • Finding a Lower Bound • Sketch of the proof  Main Issue: Analyze the Map & Reduce queues and design a good scheduling policy [1]  Outline:
  • 5. Model of MapReduce • Each job has multiple Map tasks and Reduce Tasks. • Map Task smaller than Reduce Task • Workload Bi with mean E[B] and Var[B]. M/G/1 processor sharing queue K-server G/G/1 queue Intermediate data Reduce Task Reduce Queue 2 1 K R R R Map Queue Job Reduce Task Time Workload ~ Bi follows B ~Poi(l)
  • 6. Reduce Task Reduce Queue 2 1 K R R R Map Queue Job Reduce Task Model and Notations • Qr(t): # of jobs in the reduce queue. • Qm(t): # of jobs in the map queue. • For task j of job i: • Ri(t): # of running Reduce tasks of job i • Ri: total # of Reduce tasks of job i Qr(t) Qm(t) : copy/shuffle : reduce phase : ( ) j i j i i j j i i j C D Z C D 
  • 7. Traffic Assumptions and Constraint • Reduce queue: 1. max-min fairness Ex: 10 servers with 3 jobs (4,3,3) 2. No “preemption”: jobs cannot be interrupted • Dependence constraint: Progress of Reduce tasks <= Progress of Map Tasks (t) (t) i iC B    Reduce queue is NOT work-conserving
  • 8. Lightly Loaded  Example: K=10 Current state: 2 jobs in Reduce queue Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R Suppose: A new job join the Reduce queue (5,5) (?,?,?)
  • 9. Heavily Loaded How to break tie? Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R  Example: K=10 Current state: >>10 jobs in Reduce queue (1,…,1) Suppose: One job finished and leaves Reduce queue
  • 10. Study on Tie-Breaking Rules • Consider 3 rules for Reduce queue: Policy 1: Choose the one with the smallest remaining Map service Policy 2: Choose one job randomly (uniform) Policy 3: Choose the one which starts Map service first Time Job 1 Job 2 Job 3 TS1 TS2 TS3 TE1TE2 TE3 Current time Map Service Policy 1 Policy 3
  • 11. Diffusion Approximation • Consider a sequence of MapReduce systems (n) (n) (n) (n) (B ,C ,D ,R ) (n) l l 1. Primitive data: 4. Arrival rate: • Heavy-traffic assumption: (n) (n) (1 E[B ])n  l Map service 1(n) (n) (n) (n) (1 E[(C ) R ])n D K   l Reduce service 2. Reduce workload: 1 (n) (n), (n),(n) : (C )iR j j i i ij Z D   3. Limits: (n) (n) 2 ( [ ], [ ]) ( , )b b E B Var B    (n) (n) (n) ( [R ], [Z ], [Z ]) ( , , )r r r E E Var    
  • 12. Diffusion Approximation (Cont.) • Diffusion limits: (n) (n)(n) (n)Q (nt) Q (nt) Q (t) : , Q (t) :m r m r n n   • Queue length: (n) (n) (Q ,Q )m r Theorem 1 [4]: (Map queue) (n) * (RBQ (t) Q (t) (Q ( M)0) W (t))m m m m     where * W (t)m is a BM with: 2 2 b b drift 2 ( ) 0b      (1) (2) 22 2 2 b b b variance 4 ( ) ( )b b     l   
  • 13. Relaxed MapReduce & Lower Bound 1. Assume no dependence constraint: (t) (t) i iC B    Reduce queue now becomes work-conserving is possible 2. A job always given at least an equal number of servers as in the original queue, for all t. • Construct a new queue: Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R Jobs are completed no later than in the original queue.
  • 14. Lower Bound (Cont.) Theorem 2: There exists a sequence such that L,(n ( )) n (t) (t)rr QQ  (1) L,(n) * (t) (W (t)) (RBM)r r Q   (2) * * is a BM with (0) 0,drift ,r r r K W W     2 r 3 r and variance    l  (3) L,(n) * (t) is independent of W (t)r m Q Is the lower bound achievable?
  • 15. Observation A1: Map queue is M/G/1 and processor-sharing Past departures are independent of Qm(t) The queue is reversible, i.e. departure process is also Poisson. [2] Map Queue Job Reduce Task Qm(t)
  • 16. Observation (Cont.) A2: In heavy traffic, if the processing of Reduce queue is the same as the departure of Map queue Reduce queue will be very close to a FIFO multi-server queue with a service time for job i. Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R 1 (C )iR j j i ij D 
  • 17. Intuition for Policy 1 From A1&A2: Arrival process of Qr(t) ~ departure process of Qm(t) Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R Policy 1: Choose the one with the smallest remaining Map service Qr(t) Qm(t) (n) * (t) (W (t))r r Q   Reduce queue can be approximated by RBM
  • 18. Intuition for Policy 2 Diffusion approximation for Map queue Policy 2: Choose one job randomly (uniform) 0 0 (nt ) [ (t )] q ( )mm m Q n E Q n o n    Suppose 1 vacancy in Reduce queue at nt0 The chosen job has remaining Map workload= e B ( ) ( ) [B] e x P B x P B u E du     ( is a distribution)e B [3] nt0 nt0+Dt Qm(t) Time n1/2
  • 19. Intuition for Policy 2 (Cont.) Let the remaining Map workload of that job = be The remaining Map service time of that job ~ e m b q n Reduce queue will grow by ~ e m b q n K l 0 0 (t ) (t ) , jump in diffusion limit e m r r b q Q Q K     l nt0 nt0+Dt Qr(t) DQr(t) Time Why does it matter? Little’s law!
  • 20. Intuition for Policy 3 Depends on the workload distribution B Policy 3: Choose the one which starts Map service first Special case: B is constant P[B>x] x 1 x* P[B>x] x 1 x* Policy 1 = Policy 3 Like Policy 1 Like Policy 2
  • 21. Achieve the Lower Bound Theorem 3. (1) Under policy 1, if B is bounded, then (n) * (W (t))(t)r r Q  (2) Under policy 2, if B is bounded, then * * (n) (W (t))(t)r r Q  where * * * is modified from (W (t))(W (t))r r  with jumps of random size when * * (W (t))r  hits zero.
  • 22. Achieve the Lower Bound (Cont.) Theorem 3. (3) Under policy 3, if B is bounded and has a decreasing hazard function, then (n) * (W (t))(t)r r Q  * * (n) (W (t))(t)r r Q  If B has an increasing hazard function, then Hazard function (failure rate) Remark: 0 P(x ) (x) lim P(x )x B x x H x BD     D  D   (Like Policy 1) (Like Policy 2)
  • 23. Conclusion • Non-work-conserving effect might occur in the MapReduce system under heavy traffic. • With heavy-traffic assumption, we obtain a lower bound using diffusion approximation. • Tie-breaking rule should be carefully designed to avoid possible jumps in the queue length.
  • 24. References • [1] J. Tan et al., “Non-work-conserving Effects in MapReduce: Diffusion Limit and Criticality,” in Proc. SIGMETRICS, 2014. • [2] F. P. Kelly. Reversibility and Stochastic Networks. John Wiley & Sons, 1979. • [3] H. C. Gromoll, “Diffusion approximation for a processor sharing queue in heavy traffic,” Annals of Applied Probability, 14:555–611, 2004. • [4] A. Lambert, F. Simatos, and B. Zwart. “Scaling limits via excursion theory: Interplay between Crump Mode-Jagers branching processes and processor sharing queues,” The Annals of Applied Probability, 23:2161– 2603, 2013.