"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

HFSP: Size-based Scheduling for Hadoop
Mario Pastorelli∗ Antonio Barbuzzi∗ Matteo Dell’Amico∗
Damiano Carra† Pietro Michiardi∗
∗EURECOM, France
†University of Verona, Italy
IEEE BigData 2013
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 1 / 15

Why a new scheduler?
Focus on short system response times
heterogeneous workloads [VLDB12,VLDB13,SOCC13]
big diﬀerences in jobs sizes
data exploration, preliminary analyses, algorithm tuning, orchestration
jobs. . .

Why a new scheduler?
Focus on short system response times
heterogeneous workloads [VLDB12,VLDB13,SOCC13]
big differences in jobs sizes
data exploration, preliminary analyses, algorithm tuning, orchestration
jobs. . .
Current schedulers need manual setup
fine-tuning of the scheduler parameters
configuration of pools of jobs
complex, error prone and difficult to adapt to workload/cluster
changes

Size-based schedulers
Size-based schedulers are more eﬃcient than other schedulers
job priority based on the job size
focus resources on a few jobs instead of splitting them among many
jobs
. . . but the job size is required

Size-based schedulers
Size-based schedulers are more eﬃcient than other schedulers
job priority based on the job size
focus resources on a few jobs instead of splitting them among many
jobs
. . . but the job size is required
MapReduce is suitable for size-based scheduling
we don’t have the job size but we have the time to estimate it
no perfect estimation is required . . .
. . . as long as the jobs very diﬀerent are sorted correctly

Size-based schedulers: example
Job Arrival Time Size
job1 0s 30s
job2 10s 10s
job3 15s 10s
Processor
Share
SRPT

Size-based schedulers: example
Job Arrival Time Size
job1 0s 30s
job2 10s 10s
job3 15s 10s
Scheduler AVG sojourn time
Processor Share 35s
SRPT 25s
Processor
Share
SRPT

Hadoop Fair Sojourn Protocol
Like SRPT, HFSP wants to be eﬃcient but it avoids starvation
How: Shortest Remaining Virtual Time ﬁrst (SRVT)
Each job has a virtual size based on the real one
Virtual size decreases with time
Jobs are scheduled by ascending virtual size

Hadoop Fair Sojourn Protocol: challenges
Job size estimation
Virtual size and aging
Task scheduling policy

Job size estimation (1/2)
Two ways to estimate a job size:
Oﬄine: based on the informations available a priori (num tasks, block
size, past history . . . ):
available since job submission
not very precise
Online: based on the performance of a subset of tasks:
need time for training
more precise

Two ways to estimate a job size:
Oﬄine: based on the informations available a priori (num tasks, block
size, past history . . . ):
available since job submission
not very precise
Online: based on the performance of a subset of tasks:
need time for training
more precise
We need both:
Oﬄine estimation for the initial size, because jobs need size since their
submission
Online estimation because it is more precise: when it is completed, the
job size is updated

Implementation details:
Online estimation is done while the job progresses, no work is wasted
Estimation technique: ﬁrst-order statistics are good enough
The Map and Reduce phases of a job are treated as independent
Further details in the paper . . .

Virtual size and aging
Like SRPT, HFSP wants to be eﬃcient but it avoids starvation
How:
Each job has a “virtual” size
A “virtual” Fair Scheduler lets each job make virtual progress
We use virtual job sizes to take scheduling decision in the real cluster
→ Priority to small jobs
→ Every job eventually gets small, hence no starvation

Task scheduling policy
When a task slot becomes free:
Schedule a task for online estimation, if any
otherwise, schedule a task from the highest priority job

Experimental Setup
Task Trackers 36
CPUs Task Tracker 4
RAM Task Tracker 8 GB
Map slots 72
Reduce slots 36
Network speed: 1Gbps

Experimental Setup
Task Trackers 36
CPUs Task Tracker 4
RAM Task Tracker 8 GB
Map slots 72
Reduce slots 36
Network speed: 1Gbps
Using PigMix jobs
Two kinds of workloads
inspired by existing traces
Dataset size Map tasks
Workload
SMALL LARGE
1 GB < 5 65% 0%
10 GB 10 − 50 20% 10%
40 GB 50 − 150 10% 60%
100 GB > 150 5% 30%

Results
SMALL
101 102 103
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
HFSP
FAIR
Same performance for tiny jobs
Large diﬀerence for other jobs
Mean sojourn time descreased by
16% using HFSP

Results
SMALL
101 102 103
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
HFSP
FAIR
Same performance for tiny jobs
Large diﬀerence for other jobs
Mean sojourn time descreased by
16% using HFSP
LARGE
101 102 103 104
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
HFSP
FAIR
Jobs completed after 100 seconds:
Fair: 2% jobs HFSP: 30% jobs
Jobs completed after 1000 seconds:
Fair: 15% jobs HFSP: 90% jobs

Experiments: task times and estimation errors
Task times are skewed
10% of the Reducers are much
longer than other tasks
100 101 102 103 104
Task Time
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
MAP
REDUCE

Experiments: task times and estimation errors
Task times are skewed
10% of the Reducers are much
longer than other tasks
100 101 102 103 104
Task Time
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
MAP
REDUCE
0.25 0.5 1 2 4
Error
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
MAP
REDUCE error = est. size
real size
∼60% jobs are over estimated
impact of the over-estimation is
mitigated by the aging function

Conclusions
HFSP strives for eﬃciency and avoids starvation
Particularly suitable for loaded clusters
Requires no manual, per-job priorities
→ heterogeneous workloads can coexist in the same cluster
HFSP developed within the BigFoot project
Available at: https://github.com/bigfootproject/HFSP

Thank you!
@mariopastorelli @BigFoot project

"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (20)

Similar to "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

Similar to "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014 (20)

Recently uploaded

Recently uploaded (20)

"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014