SlideShare a Scribd company logo
1 of 24
Non-Work-Conserving Effects
in MapReduce:
Diffusion Approximation
Ping-Chun Hsieh
2015.04.27
How to Handle Large Data?
https://www.gtisoft.com/img/DistributedComputing.jpg
One solution is “distributed computing”
MapReduce is one implementation.
What is MapReduce?
• “Job”: a document with several words
1. “Map Task”: Word (word, count) pair
2. “Copy/Shuffle”: distribute the pairs to reduce machine
3. “Reduce Task”: count the word frequency
using word as a key
 Example: Finding word frequency
He is in the
elite of the
elite right now
(He,1)
(is,1)
(now,1)
… (the,2)
(elite,2)
(now,1)
…
Main Issue & Outline
• Model of MapReduce
• Load conditions and 3 Tie-Breaking Policies
• Diffusion Approximation
• Finding a Lower Bound
• Sketch of the proof
 Main Issue: Analyze the Map & Reduce
queues and design a good scheduling policy [1]
 Outline:
Model of MapReduce
• Each job has multiple
Map tasks and Reduce
Tasks.
• Map Task smaller than
Reduce Task
• Workload Bi with mean
E[B] and Var[B].
M/G/1 processor sharing queue
K-server G/G/1 queue
Intermediate data
Reduce
Task
Reduce
Queue
2
1
K
R
R
R
Map
Queue
Job
Reduce
Task
Time
Workload ~ Bi follows B
~Poi(l)
Reduce
Task
Reduce
Queue
2
1
K
R
R
R
Map
Queue
Job
Reduce
Task
Model and Notations
• Qr(t): # of jobs in the
reduce queue.
• Qm(t): # of jobs in the
map queue.
• For task j of job i:
• Ri(t): # of running
Reduce tasks of job i
• Ri: total # of Reduce
tasks of job i
Qr(t)
Qm(t)
: copy/shuffle
: reduce phase
: ( )
j
i
j
i
i
j j
i i
j
C
D
Z C D 
Traffic Assumptions and Constraint
• Reduce queue:
1. max-min fairness
Ex: 10 servers with 3 jobs (4,3,3)
2. No “preemption”: jobs cannot be interrupted
• Dependence constraint:
Progress of Reduce tasks
<= Progress of Map Tasks
(t) (t)
i iC B
  
Reduce queue is NOT work-conserving
Lightly Loaded
 Example: K=10
Current state: 2 jobs in Reduce queue
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Suppose: A new job join the Reduce queue
(5,5)
(?,?,?)
Heavily Loaded
How to break tie?
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
 Example: K=10
Current state: >>10 jobs in Reduce queue (1,…,1)
Suppose: One job finished and leaves Reduce queue
Study on Tie-Breaking Rules
• Consider 3 rules for Reduce queue:
Policy 1: Choose the one with the smallest remaining
Map service
Policy 2: Choose one job randomly (uniform)
Policy 3: Choose the one which starts Map service first
Time
Job 1
Job 2
Job 3
TS1 TS2 TS3 TE1TE2 TE3
Current
time
Map
Service
Policy 1
Policy 3
Diffusion Approximation
• Consider a sequence of MapReduce systems
(n) (n) (n) (n)
(B ,C ,D ,R )
(n)
l l
1. Primitive data:
4. Arrival rate:
• Heavy-traffic assumption:
(n) (n)
(1 E[B ])n  l Map service
1(n) (n) (n) (n)
(1 E[(C ) R ])n D
K
  l Reduce service
2. Reduce workload: 1
(n)
(n), (n),(n)
: (C )iR j j
i i ij
Z D
 
3. Limits: (n) (n) 2
( [ ], [ ]) ( , )b b
E B Var B   
(n) (n) (n)
( [R ], [Z ], [Z ]) ( , , )r r r
E E Var    
Diffusion Approximation (Cont.)
• Diffusion limits:
(n) (n)(n) (n)Q (nt) Q (nt)
Q (t) : , Q (t) :m r
m r
n n
 
• Queue length: (n) (n)
(Q ,Q )m r
Theorem 1 [4]: (Map queue)
(n)
*
(RBQ (t) Q (t) (Q ( M)0) W (t))m m m m
   
where *
W (t)m is a BM with:
2 2
b b
drift 2 ( ) 0b
     (1)
(2)
22 2 2
b b b
variance 4 ( ) ( )b b
    l   
Relaxed MapReduce & Lower Bound
1. Assume no dependence constraint:
(t) (t)
i iC B
  
Reduce queue now becomes work-conserving
is possible
2. A job always given at least an equal number of
servers as in the original queue, for all t.
• Construct a new queue:
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Jobs are completed
no later than in the
original queue.
Lower Bound (Cont.)
Theorem 2:
There exists a sequence
such that
L,(n ( )) n
(t) (t)rr QQ 
(1)
L,(n)
*
(t) (W (t)) (RBM)r r
Q  
(2)
* *
is a BM with (0) 0,drift ,r r
r
K
W W

 

2
r
3
r
and variance  

l

(3)
L,(n)
*
(t) is independent of W (t)r m
Q
Is the lower bound achievable?
Observation
A1: Map queue is M/G/1 and processor-sharing
Past departures are independent of Qm(t)
The queue is reversible, i.e. departure
process is also Poisson. [2]
Map
Queue
Job
Reduce
Task
Qm(t)
Observation (Cont.)
A2: In heavy traffic, if the processing of Reduce
queue is the same as the departure of Map queue
Reduce queue will be very close to a FIFO
multi-server queue with
a service time for job i.
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
1
(C )iR j j
i ij
D

Intuition for Policy 1
From A1&A2: Arrival process of Qr(t)
~ departure process of Qm(t)
Map
Queue
Job
Reduce
Task
Reduce
Task
Reduce
Queue 1 R
K R
Policy 1: Choose the one with the smallest remaining
Map service
Qr(t)
Qm(t)
(n)
*
(t) (W (t))r r
Q  
Reduce queue can be
approximated by RBM
Intuition for Policy 2
Diffusion approximation for Map queue
Policy 2: Choose one job randomly (uniform)
0 0
(nt ) [ (t )] q ( )mm m
Q n E Q n o n   
Suppose 1 vacancy in Reduce queue at nt0
The chosen job has remaining Map workload= e
B
( )
( ) [B]
e
x
P B x
P B u E du


 
( is a distribution)e
B
[3]
nt0 nt0+Dt
Qm(t)
Time
n1/2
Intuition for Policy 2 (Cont.)
Let the remaining Map workload of
that job = be
The remaining Map service time of
that job ~ e
m
b q n
Reduce queue will grow by ~
e
m
b q n
K
l
0 0
(t ) (t ) , jump in diffusion limit
e
m
r r
b q
Q Q
K
 
 
l
nt0 nt0+Dt
Qr(t)
DQr(t)
Time
Why does it matter?
Little’s law!
Intuition for Policy 3
Depends on the workload distribution B
Policy 3: Choose the one which starts Map service first
Special case: B is constant
P[B>x]
x
1
x*
P[B>x]
x
1
x*
Policy 1 = Policy 3
Like Policy 1 Like Policy 2
Achieve the Lower Bound
Theorem 3.
(1) Under policy 1, if B is bounded, then
(n)
*
(W (t))(t)r r
Q 
(2) Under policy 2, if B is bounded, then
* *
(n)
(W (t))(t)r r
Q 
where * * *
is modified from (W (t))(W (t))r r

with jumps of random size when * *
(W (t))r

hits zero.
Achieve the Lower Bound (Cont.)
Theorem 3.
(3) Under policy 3, if B is bounded and has a
decreasing hazard function, then
(n)
*
(W (t))(t)r r
Q 
* *
(n)
(W (t))(t)r r
Q 
If B has an increasing hazard function, then
Hazard function (failure rate) Remark:
0
P(x )
(x) lim
P(x )x
B x x
H
x BD 
   D

D  
(Like Policy 1)
(Like Policy 2)
Conclusion
• Non-work-conserving effect might occur in
the MapReduce system under heavy traffic.
• With heavy-traffic assumption, we obtain a
lower bound using diffusion approximation.
• Tie-breaking rule should be carefully
designed to avoid possible jumps in the
queue length.
References
• [1] J. Tan et al., “Non-work-conserving Effects in
MapReduce: Diffusion Limit and Criticality,” in Proc.
SIGMETRICS, 2014.
• [2] F. P. Kelly. Reversibility and Stochastic Networks.
John Wiley & Sons, 1979.
• [3] H. C. Gromoll, “Diffusion approximation for a
processor sharing queue in heavy traffic,” Annals of
Applied Probability, 14:555–611, 2004.
• [4] A. Lambert, F. Simatos, and B. Zwart. “Scaling
limits via excursion theory: Interplay between Crump
Mode-Jagers branching processes and processor sharing
queues,” The Annals of Applied Probability, 23:2161–
2603, 2013.

More Related Content

What's hot

Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...Arthur Weglein
 
Kumaraswamy disribution
Kumaraswamy disributionKumaraswamy disribution
Kumaraswamy disributionPankaj Das
 
Some fixed point theorems of expansion mapping in g-metric spaces
Some fixed point theorems of expansion mapping in g-metric spacesSome fixed point theorems of expansion mapping in g-metric spaces
Some fixed point theorems of expansion mapping in g-metric spacesinventionjournals
 
Recurrence theorem
Recurrence theoremRecurrence theorem
Recurrence theoremRajendran
 
Fast fourier transform
Fast fourier transformFast fourier transform
Fast fourier transformAshraf Khan
 
Blind separation of complex-valued satellite-AIS data for marine surveillance...
Blind separation of complex-valued satellite-AIS data for marine surveillance...Blind separation of complex-valued satellite-AIS data for marine surveillance...
Blind separation of complex-valued satellite-AIS data for marine surveillance...IJECEIAES
 
Fuzzieee-98-final
Fuzzieee-98-finalFuzzieee-98-final
Fuzzieee-98-finalSumit Sen
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLSzymon Klarman
 
(文献紹介)デブラー手法の紹介
(文献紹介)デブラー手法の紹介(文献紹介)デブラー手法の紹介
(文献紹介)デブラー手法の紹介Morpho, Inc.
 
Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04Rediet Moges
 
Ground Excited Systems
Ground Excited SystemsGround Excited Systems
Ground Excited SystemsTeja Ande
 

What's hot (19)

Radix-2 DIT FFT
Radix-2 DIT FFT Radix-2 DIT FFT
Radix-2 DIT FFT
 
our igu poster
our igu posterour igu poster
our igu poster
 
Pakdd
PakddPakdd
Pakdd
 
1633 the inverse z-transform
1633 the inverse z-transform1633 the inverse z-transform
1633 the inverse z-transform
 
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
Finite-difference modeling, accuracy, and boundary conditions- Arthur Weglein...
 
DSP 08 _ Sheet Eight
DSP 08 _ Sheet EightDSP 08 _ Sheet Eight
DSP 08 _ Sheet Eight
 
Kumaraswamy disribution
Kumaraswamy disributionKumaraswamy disribution
Kumaraswamy disribution
 
Some fixed point theorems of expansion mapping in g-metric spaces
Some fixed point theorems of expansion mapping in g-metric spacesSome fixed point theorems of expansion mapping in g-metric spaces
Some fixed point theorems of expansion mapping in g-metric spaces
 
Ecte401 notes week3
Ecte401 notes week3Ecte401 notes week3
Ecte401 notes week3
 
Recurrence theorem
Recurrence theoremRecurrence theorem
Recurrence theorem
 
Fast fourier transform
Fast fourier transformFast fourier transform
Fast fourier transform
 
Blind separation of complex-valued satellite-AIS data for marine surveillance...
Blind separation of complex-valued satellite-AIS data for marine surveillance...Blind separation of complex-valued satellite-AIS data for marine surveillance...
Blind separation of complex-valued satellite-AIS data for marine surveillance...
 
Fuzzieee-98-final
Fuzzieee-98-finalFuzzieee-98-final
Fuzzieee-98-final
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QL
 
(文献紹介)デブラー手法の紹介
(文献紹介)デブラー手法の紹介(文献紹介)デブラー手法の紹介
(文献紹介)デブラー手法の紹介
 
9511
95119511
9511
 
Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04Digital Signal Processing[ECEG-3171]-Ch1_L04
Digital Signal Processing[ECEG-3171]-Ch1_L04
 
Dkd 4 2-sheet_1_annex_a
Dkd 4 2-sheet_1_annex_aDkd 4 2-sheet_1_annex_a
Dkd 4 2-sheet_1_annex_a
 
Ground Excited Systems
Ground Excited SystemsGround Excited Systems
Ground Excited Systems
 

Viewers also liked

New AmeriCorps Program Orientation July 2014
New AmeriCorps Program Orientation July 2014New AmeriCorps Program Orientation July 2014
New AmeriCorps Program Orientation July 2014br7059hotmail
 
New AmeriCorps Program Orientation August 2014
New AmeriCorps Program Orientation August 2014New AmeriCorps Program Orientation August 2014
New AmeriCorps Program Orientation August 2014br7059hotmail
 
CD_omslag_version_25aug
CD_omslag_version_25augCD_omslag_version_25aug
CD_omslag_version_25augMagnus Båge
 
Enigma Otilei de George Calinescu
Enigma Otilei de George CalinescuEnigma Otilei de George Calinescu
Enigma Otilei de George CalinescuNikuta Cibotari
 
Felix de kampenbouwer
Felix de kampenbouwerFelix de kampenbouwer
Felix de kampenbouwerGtn Rbn
 
presentacion digital
presentacion digitalpresentacion digital
presentacion digitalluisesproman
 
11 crise do primeiro reinado
11   crise do primeiro reinado11   crise do primeiro reinado
11 crise do primeiro reinadoaridu18
 
6 apresentação periodo joanino estagio
6  apresentação periodo joanino estagio6  apresentação periodo joanino estagio
6 apresentação periodo joanino estagioaridu18
 
Wojciech gawinowski vostok design
Wojciech gawinowski vostok designWojciech gawinowski vostok design
Wojciech gawinowski vostok designHanna Wincenciak
 
Jesuíta no Brasil Colonial
Jesuíta no Brasil ColonialJesuíta no Brasil Colonial
Jesuíta no Brasil Colonialaridu18
 
Cardápio de atividades - liberalismo econômico e socialismo
Cardápio de atividades -  liberalismo econômico e socialismo Cardápio de atividades -  liberalismo econômico e socialismo
Cardápio de atividades - liberalismo econômico e socialismo aridu18
 
1 cronologia básica da história do brasil e do brasil império
1   cronologia básica da história do brasil e do brasil império1   cronologia básica da história do brasil e do brasil império
1 cronologia básica da história do brasil e do brasil impérioaridu18
 

Viewers also liked (19)

CORREO ELECTRONICO
CORREO ELECTRONICOCORREO ELECTRONICO
CORREO ELECTRONICO
 
New AmeriCorps Program Orientation July 2014
New AmeriCorps Program Orientation July 2014New AmeriCorps Program Orientation July 2014
New AmeriCorps Program Orientation July 2014
 
karaoke
karaokekaraoke
karaoke
 
Julia 2015 Resume
Julia 2015 ResumeJulia 2015 Resume
Julia 2015 Resume
 
Pro man
Pro manPro man
Pro man
 
New AmeriCorps Program Orientation August 2014
New AmeriCorps Program Orientation August 2014New AmeriCorps Program Orientation August 2014
New AmeriCorps Program Orientation August 2014
 
CD_omslag_version_25aug
CD_omslag_version_25augCD_omslag_version_25aug
CD_omslag_version_25aug
 
TIC Lookbook
TIC LookbookTIC Lookbook
TIC Lookbook
 
Enigma Otilei de George Calinescu
Enigma Otilei de George CalinescuEnigma Otilei de George Calinescu
Enigma Otilei de George Calinescu
 
Felix de kampenbouwer
Felix de kampenbouwerFelix de kampenbouwer
Felix de kampenbouwer
 
Ashraf abu al khair cv
Ashraf abu al khair cvAshraf abu al khair cv
Ashraf abu al khair cv
 
presentacion digital
presentacion digitalpresentacion digital
presentacion digital
 
11 crise do primeiro reinado
11   crise do primeiro reinado11   crise do primeiro reinado
11 crise do primeiro reinado
 
6 apresentação periodo joanino estagio
6  apresentação periodo joanino estagio6  apresentação periodo joanino estagio
6 apresentação periodo joanino estagio
 
Sejarah jepang
Sejarah   jepangSejarah   jepang
Sejarah jepang
 
Wojciech gawinowski vostok design
Wojciech gawinowski vostok designWojciech gawinowski vostok design
Wojciech gawinowski vostok design
 
Jesuíta no Brasil Colonial
Jesuíta no Brasil ColonialJesuíta no Brasil Colonial
Jesuíta no Brasil Colonial
 
Cardápio de atividades - liberalismo econômico e socialismo
Cardápio de atividades -  liberalismo econômico e socialismo Cardápio de atividades -  liberalismo econômico e socialismo
Cardápio de atividades - liberalismo econômico e socialismo
 
1 cronologia básica da história do brasil e do brasil império
1   cronologia básica da história do brasil e do brasil império1   cronologia básica da história do brasil e do brasil império
1 cronologia básica da história do brasil e do brasil império
 

Similar to Final Project: Non-Work-Conserving Effects in MapReduce

Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
2012 금융수학 겨울학교 - FM 210 Mhz
2012 금융수학 겨울학교 - FM 210 Mhz2012 금융수학 겨울학교 - FM 210 Mhz
2012 금융수학 겨울학교 - FM 210 MhzKyunghoon Kim
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntuaIEEE NTUA SB
 
Convex Optimization Modelling with CVXOPT
Convex Optimization Modelling with CVXOPTConvex Optimization Modelling with CVXOPT
Convex Optimization Modelling with CVXOPTandrewmart11
 
Vectorise all the things - long version.pptx
Vectorise all the things - long version.pptxVectorise all the things - long version.pptx
Vectorise all the things - long version.pptxJodieBurchell1
 
5_2019_01_12!09_25_57_AM.ppt
5_2019_01_12!09_25_57_AM.ppt5_2019_01_12!09_25_57_AM.ppt
5_2019_01_12!09_25_57_AM.pptaboma2hawi
 
a decomposition methodMin quasdratic.pdf
a decomposition methodMin quasdratic.pdfa decomposition methodMin quasdratic.pdf
a decomposition methodMin quasdratic.pdfAnaRojas146538
 
Linear models
Linear modelsLinear models
Linear modelsFAO
 
Digital Image Procesing
Digital Image ProcesingDigital Image Procesing
Digital Image Procesingvepiga5005
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...SSA KPI
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...T. E. BOGALE
 
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...VLSICS Design
 
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...VLSICS Design
 
Image Enhancement in Spatial Frequency Domain
Image Enhancement in Spatial Frequency DomainImage Enhancement in Spatial Frequency Domain
Image Enhancement in Spatial Frequency Domainvenkadanathanachudha
 

Similar to Final Project: Non-Work-Conserving Effects in MapReduce (20)

Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
2012 금융수학 겨울학교 - FM 210 Mhz
2012 금융수학 겨울학교 - FM 210 Mhz2012 금융수학 겨울학교 - FM 210 Mhz
2012 금융수학 겨울학교 - FM 210 Mhz
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
 
Convex Optimization Modelling with CVXOPT
Convex Optimization Modelling with CVXOPTConvex Optimization Modelling with CVXOPT
Convex Optimization Modelling with CVXOPT
 
Vectorise all the things - long version.pptx
Vectorise all the things - long version.pptxVectorise all the things - long version.pptx
Vectorise all the things - long version.pptx
 
5_2019_01_12!09_25_57_AM.ppt
5_2019_01_12!09_25_57_AM.ppt5_2019_01_12!09_25_57_AM.ppt
5_2019_01_12!09_25_57_AM.ppt
 
a decomposition methodMin quasdratic.pdf
a decomposition methodMin quasdratic.pdfa decomposition methodMin quasdratic.pdf
a decomposition methodMin quasdratic.pdf
 
time response
time responsetime response
time response
 
Linear models
Linear modelsLinear models
Linear models
 
Digital Image Procesing
Digital Image ProcesingDigital Image Procesing
Digital Image Procesing
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
 
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
 
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
CONCURRENT TERNARY GALOIS-BASED COMPUTATION USING NANO-APEX MULTIPLEXING NIBS...
 
12. Linear models
12. Linear models12. Linear models
12. Linear models
 
Image Enhancement in Spatial Frequency Domain
Image Enhancement in Spatial Frequency DomainImage Enhancement in Spatial Frequency Domain
Image Enhancement in Spatial Frequency Domain
 
Recursive algorithms
Recursive algorithmsRecursive algorithms
Recursive algorithms
 

Recently uploaded

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

Recently uploaded (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Final Project: Non-Work-Conserving Effects in MapReduce

  • 1. Non-Work-Conserving Effects in MapReduce: Diffusion Approximation Ping-Chun Hsieh 2015.04.27
  • 2. How to Handle Large Data? https://www.gtisoft.com/img/DistributedComputing.jpg One solution is “distributed computing” MapReduce is one implementation.
  • 3. What is MapReduce? • “Job”: a document with several words 1. “Map Task”: Word (word, count) pair 2. “Copy/Shuffle”: distribute the pairs to reduce machine 3. “Reduce Task”: count the word frequency using word as a key  Example: Finding word frequency He is in the elite of the elite right now (He,1) (is,1) (now,1) … (the,2) (elite,2) (now,1) …
  • 4. Main Issue & Outline • Model of MapReduce • Load conditions and 3 Tie-Breaking Policies • Diffusion Approximation • Finding a Lower Bound • Sketch of the proof  Main Issue: Analyze the Map & Reduce queues and design a good scheduling policy [1]  Outline:
  • 5. Model of MapReduce • Each job has multiple Map tasks and Reduce Tasks. • Map Task smaller than Reduce Task • Workload Bi with mean E[B] and Var[B]. M/G/1 processor sharing queue K-server G/G/1 queue Intermediate data Reduce Task Reduce Queue 2 1 K R R R Map Queue Job Reduce Task Time Workload ~ Bi follows B ~Poi(l)
  • 6. Reduce Task Reduce Queue 2 1 K R R R Map Queue Job Reduce Task Model and Notations • Qr(t): # of jobs in the reduce queue. • Qm(t): # of jobs in the map queue. • For task j of job i: • Ri(t): # of running Reduce tasks of job i • Ri: total # of Reduce tasks of job i Qr(t) Qm(t) : copy/shuffle : reduce phase : ( ) j i j i i j j i i j C D Z C D 
  • 7. Traffic Assumptions and Constraint • Reduce queue: 1. max-min fairness Ex: 10 servers with 3 jobs (4,3,3) 2. No “preemption”: jobs cannot be interrupted • Dependence constraint: Progress of Reduce tasks <= Progress of Map Tasks (t) (t) i iC B    Reduce queue is NOT work-conserving
  • 8. Lightly Loaded  Example: K=10 Current state: 2 jobs in Reduce queue Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R Suppose: A new job join the Reduce queue (5,5) (?,?,?)
  • 9. Heavily Loaded How to break tie? Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R  Example: K=10 Current state: >>10 jobs in Reduce queue (1,…,1) Suppose: One job finished and leaves Reduce queue
  • 10. Study on Tie-Breaking Rules • Consider 3 rules for Reduce queue: Policy 1: Choose the one with the smallest remaining Map service Policy 2: Choose one job randomly (uniform) Policy 3: Choose the one which starts Map service first Time Job 1 Job 2 Job 3 TS1 TS2 TS3 TE1TE2 TE3 Current time Map Service Policy 1 Policy 3
  • 11. Diffusion Approximation • Consider a sequence of MapReduce systems (n) (n) (n) (n) (B ,C ,D ,R ) (n) l l 1. Primitive data: 4. Arrival rate: • Heavy-traffic assumption: (n) (n) (1 E[B ])n  l Map service 1(n) (n) (n) (n) (1 E[(C ) R ])n D K   l Reduce service 2. Reduce workload: 1 (n) (n), (n),(n) : (C )iR j j i i ij Z D   3. Limits: (n) (n) 2 ( [ ], [ ]) ( , )b b E B Var B    (n) (n) (n) ( [R ], [Z ], [Z ]) ( , , )r r r E E Var    
  • 12. Diffusion Approximation (Cont.) • Diffusion limits: (n) (n)(n) (n)Q (nt) Q (nt) Q (t) : , Q (t) :m r m r n n   • Queue length: (n) (n) (Q ,Q )m r Theorem 1 [4]: (Map queue) (n) * (RBQ (t) Q (t) (Q ( M)0) W (t))m m m m     where * W (t)m is a BM with: 2 2 b b drift 2 ( ) 0b      (1) (2) 22 2 2 b b b variance 4 ( ) ( )b b     l   
  • 13. Relaxed MapReduce & Lower Bound 1. Assume no dependence constraint: (t) (t) i iC B    Reduce queue now becomes work-conserving is possible 2. A job always given at least an equal number of servers as in the original queue, for all t. • Construct a new queue: Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R Jobs are completed no later than in the original queue.
  • 14. Lower Bound (Cont.) Theorem 2: There exists a sequence such that L,(n ( )) n (t) (t)rr QQ  (1) L,(n) * (t) (W (t)) (RBM)r r Q   (2) * * is a BM with (0) 0,drift ,r r r K W W     2 r 3 r and variance    l  (3) L,(n) * (t) is independent of W (t)r m Q Is the lower bound achievable?
  • 15. Observation A1: Map queue is M/G/1 and processor-sharing Past departures are independent of Qm(t) The queue is reversible, i.e. departure process is also Poisson. [2] Map Queue Job Reduce Task Qm(t)
  • 16. Observation (Cont.) A2: In heavy traffic, if the processing of Reduce queue is the same as the departure of Map queue Reduce queue will be very close to a FIFO multi-server queue with a service time for job i. Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R 1 (C )iR j j i ij D 
  • 17. Intuition for Policy 1 From A1&A2: Arrival process of Qr(t) ~ departure process of Qm(t) Map Queue Job Reduce Task Reduce Task Reduce Queue 1 R K R Policy 1: Choose the one with the smallest remaining Map service Qr(t) Qm(t) (n) * (t) (W (t))r r Q   Reduce queue can be approximated by RBM
  • 18. Intuition for Policy 2 Diffusion approximation for Map queue Policy 2: Choose one job randomly (uniform) 0 0 (nt ) [ (t )] q ( )mm m Q n E Q n o n    Suppose 1 vacancy in Reduce queue at nt0 The chosen job has remaining Map workload= e B ( ) ( ) [B] e x P B x P B u E du     ( is a distribution)e B [3] nt0 nt0+Dt Qm(t) Time n1/2
  • 19. Intuition for Policy 2 (Cont.) Let the remaining Map workload of that job = be The remaining Map service time of that job ~ e m b q n Reduce queue will grow by ~ e m b q n K l 0 0 (t ) (t ) , jump in diffusion limit e m r r b q Q Q K     l nt0 nt0+Dt Qr(t) DQr(t) Time Why does it matter? Little’s law!
  • 20. Intuition for Policy 3 Depends on the workload distribution B Policy 3: Choose the one which starts Map service first Special case: B is constant P[B>x] x 1 x* P[B>x] x 1 x* Policy 1 = Policy 3 Like Policy 1 Like Policy 2
  • 21. Achieve the Lower Bound Theorem 3. (1) Under policy 1, if B is bounded, then (n) * (W (t))(t)r r Q  (2) Under policy 2, if B is bounded, then * * (n) (W (t))(t)r r Q  where * * * is modified from (W (t))(W (t))r r  with jumps of random size when * * (W (t))r  hits zero.
  • 22. Achieve the Lower Bound (Cont.) Theorem 3. (3) Under policy 3, if B is bounded and has a decreasing hazard function, then (n) * (W (t))(t)r r Q  * * (n) (W (t))(t)r r Q  If B has an increasing hazard function, then Hazard function (failure rate) Remark: 0 P(x ) (x) lim P(x )x B x x H x BD     D  D   (Like Policy 1) (Like Policy 2)
  • 23. Conclusion • Non-work-conserving effect might occur in the MapReduce system under heavy traffic. • With heavy-traffic assumption, we obtain a lower bound using diffusion approximation. • Tie-breaking rule should be carefully designed to avoid possible jumps in the queue length.
  • 24. References • [1] J. Tan et al., “Non-work-conserving Effects in MapReduce: Diffusion Limit and Criticality,” in Proc. SIGMETRICS, 2014. • [2] F. P. Kelly. Reversibility and Stochastic Networks. John Wiley & Sons, 1979. • [3] H. C. Gromoll, “Diffusion approximation for a processor sharing queue in heavy traffic,” Annals of Applied Probability, 14:555–611, 2004. • [4] A. Lambert, F. Simatos, and B. Zwart. “Scaling limits via excursion theory: Interplay between Crump Mode-Jagers branching processes and processor sharing queues,” The Annals of Applied Probability, 23:2161– 2603, 2013.