SlideShare a Scribd company logo
1 of 22
Download to read offline
HFSP: Size-based Scheduling for Hadoop
Mario Pastorelli∗ Antonio Barbuzzi∗ Matteo Dell’Amico∗
Damiano Carra† Pietro Michiardi∗
∗EURECOM, France
†University of Verona, Italy
IEEE BigData 2013
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 1 / 15
Why a new scheduler?
Focus on short system response times
heterogeneous workloads [VLDB12,VLDB13,SOCC13]
big differences in jobs sizes
data exploration, preliminary analyses, algorithm tuning, orchestration
jobs. . .
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 2 / 15
Why a new scheduler?
Focus on short system response times
heterogeneous workloads [VLDB12,VLDB13,SOCC13]
big differences in jobs sizes
data exploration, preliminary analyses, algorithm tuning, orchestration
jobs. . .
Current schedulers need manual setup
fine-tuning of the scheduler parameters
configuration of pools of jobs
complex, error prone and difficult to adapt to workload/cluster
changes
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 2 / 15
Size-based schedulers
Size-based schedulers are more efficient than other schedulers
job priority based on the job size
focus resources on a few jobs instead of splitting them among many
jobs
. . . but the job size is required
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 3 / 15
Size-based schedulers
Size-based schedulers are more efficient than other schedulers
job priority based on the job size
focus resources on a few jobs instead of splitting them among many
jobs
. . . but the job size is required
MapReduce is suitable for size-based scheduling
we don’t have the job size but we have the time to estimate it
no perfect estimation is required . . .
. . . as long as the jobs very different are sorted correctly
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 3 / 15
Size-based schedulers: example
Job Arrival Time Size
job1 0s 30s
job2 10s 10s
job3 15s 10s
Processor
Share
SRPT
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 4 / 15
Size-based schedulers: example
Job Arrival Time Size
job1 0s 30s
job2 10s 10s
job3 15s 10s
Scheduler AVG sojourn time
Processor Share 35s
SRPT 25s
Processor
Share
SRPT
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 4 / 15
Hadoop Fair Sojourn Protocol
Like SRPT, HFSP wants to be efficient but it avoids starvation
How: Shortest Remaining Virtual Time first (SRVT)
Each job has a virtual size based on the real one
Virtual size decreases with time
Jobs are scheduled by ascending virtual size
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 5 / 15
Hadoop Fair Sojourn Protocol: challenges
Job size estimation
Virtual size and aging
Task scheduling policy
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 6 / 15
Job size estimation (1/2)
Two ways to estimate a job size:
Offline: based on the informations available a priori (num tasks, block
size, past history . . . ):
available since job submission
not very precise
Online: based on the performance of a subset of tasks:
need time for training
more precise
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 7 / 15
Job size estimation (1/2)
Two ways to estimate a job size:
Offline: based on the informations available a priori (num tasks, block
size, past history . . . ):
available since job submission
not very precise
Online: based on the performance of a subset of tasks:
need time for training
more precise
We need both:
Offline estimation for the initial size, because jobs need size since their
submission
Online estimation because it is more precise: when it is completed, the
job size is updated
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 7 / 15
Job size estimation (2/2)
Implementation details:
Online estimation is done while the job progresses, no work is wasted
Estimation technique: first-order statistics are good enough
The Map and Reduce phases of a job are treated as independent
Further details in the paper . . .
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 8 / 15
Virtual size and aging
Like SRPT, HFSP wants to be efficient but it avoids starvation
How:
Each job has a “virtual” size
A “virtual” Fair Scheduler lets each job make virtual progress
We use virtual job sizes to take scheduling decision in the real cluster
→ Priority to small jobs
→ Every job eventually gets small, hence no starvation
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 9 / 15
Task scheduling policy
When a task slot becomes free:
Schedule a task for online estimation, if any
otherwise, schedule a task from the highest priority job
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 10 / 15
Experimental Setup
Task Trackers 36
CPUs Task Tracker 4
RAM Task Tracker 8 GB
Map slots 72
Reduce slots 36
Network speed: 1Gbps
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 11 / 15
Experimental Setup
Task Trackers 36
CPUs Task Tracker 4
RAM Task Tracker 8 GB
Map slots 72
Reduce slots 36
Network speed: 1Gbps
Using PigMix jobs
Two kinds of workloads
inspired by existing traces
Dataset size Map tasks
Workload
SMALL LARGE
1 GB < 5 65% 0%
10 GB 10 − 50 20% 10%
40 GB 50 − 150 10% 60%
100 GB > 150 5% 30%
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 11 / 15
Results
SMALL
101 102 103
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
HFSP
FAIR
Same performance for tiny jobs
Large difference for other jobs
Mean sojourn time descreased by
16% using HFSP
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 12 / 15
Results
SMALL
101 102 103
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
HFSP
FAIR
Same performance for tiny jobs
Large difference for other jobs
Mean sojourn time descreased by
16% using HFSP
LARGE
101 102 103 104
Sojourn Time (s)
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
HFSP
FAIR
Jobs completed after 100 seconds:
Fair: 2% jobs HFSP: 30% jobs
Jobs completed after 1000 seconds:
Fair: 15% jobs HFSP: 90% jobs
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 12 / 15
Experiments: task times and estimation errors
Task times are skewed
10% of the Reducers are much
longer than other tasks
100 101 102 103 104
Task Time
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
MAP
REDUCE
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 13 / 15
Experiments: task times and estimation errors
Task times are skewed
10% of the Reducers are much
longer than other tasks
100 101 102 103 104
Task Time
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
MAP
REDUCE
0.25 0.5 1 2 4
Error
0.0
0.2
0.4
0.6
0.8
1.0
ECDF
MAP
REDUCE error = est. size
real size
∼60% jobs are over estimated
impact of the over-estimation is
mitigated by the aging function
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 13 / 15
Conclusions
HFSP strives for efficiency and avoids starvation
Particularly suitable for loaded clusters
Requires no manual, per-job priorities
→ heterogeneous workloads can coexist in the same cluster
HFSP developed within the BigFoot project
Available at: https://github.com/bigfootproject/HFSP
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 14 / 15
Thank you!
@mariopastorelli @BigFoot project
Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 15 / 15

More Related Content

What's hot

Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
PTIHPA
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflows
Hong-Linh Truong
 
Introduction to TreasureData Cloud
Introduction to TreasureData CloudIntroduction to TreasureData Cloud
Introduction to TreasureData Cloud
Jazz Yao-Tsung Wang
 

What's hot (17)

HPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific ComputingHPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific Computing
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
 
Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1
 
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflows
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 
Earlier stage for straggler detection and handling using combined CPU test an...
Earlier stage for straggler detection and handling using combined CPU test an...Earlier stage for straggler detection and handling using combined CPU test an...
Earlier stage for straggler detection and handling using combined CPU test an...
 
1. what is hadoop part 1
1. what is hadoop   part 11. what is hadoop   part 1
1. what is hadoop part 1
 
Effect of countries in performance of hadoop.
Effect of countries in performance of hadoop.Effect of countries in performance of hadoop.
Effect of countries in performance of hadoop.
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
 
Introduction to TreasureData Cloud
Introduction to TreasureData CloudIntroduction to TreasureData Cloud
Introduction to TreasureData Cloud
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
IRJET - Weather Log Analysis based on Hadoop Technology
IRJET - Weather Log Analysis based on Hadoop TechnologyIRJET - Weather Log Analysis based on Hadoop Technology
IRJET - Weather Log Analysis based on Hadoop Technology
 

Viewers also liked

AdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT Devices
AdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT DevicesAdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT Devices
AdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT Devices
Demetris Trihinas
 

Viewers also liked (20)

Size-Based Scheduling: From Theory To Practice, And Back
Size-Based Scheduling: From Theory To Practice, And BackSize-Based Scheduling: From Theory To Practice, And Back
Size-Based Scheduling: From Theory To Practice, And Back
 
Hawk presentation
Hawk presentationHawk presentation
Hawk presentation
 
Scalding
ScaldingScalding
Scalding
 
Paper6745 presentation tianjian
Paper6745 presentation tianjianPaper6745 presentation tianjian
Paper6745 presentation tianjian
 
Assisting IoT Projects and Developers in Designing Interoperable Semantic Web...
Assisting IoT Projects and Developers in Designing Interoperable Semantic Web...Assisting IoT Projects and Developers in Designing Interoperable Semantic Web...
Assisting IoT Projects and Developers in Designing Interoperable Semantic Web...
 
Presentation aina2016 seg3.0_methodology_v2
Presentation aina2016 seg3.0_methodology_v2Presentation aina2016 seg3.0_methodology_v2
Presentation aina2016 seg3.0_methodology_v2
 
IEEE big data 2015
IEEE big data 2015IEEE big data 2015
IEEE big data 2015
 
Tutorial at IEEE WF-IOT Dec. 2016 - Five Years of Research and Innovation Exp...
Tutorial at IEEE WF-IOT Dec. 2016 - Five Years of Research and Innovation Exp...Tutorial at IEEE WF-IOT Dec. 2016 - Five Years of Research and Innovation Exp...
Tutorial at IEEE WF-IOT Dec. 2016 - Five Years of Research and Innovation Exp...
 
Introduction to Accumulo
Introduction to AccumuloIntroduction to Accumulo
Introduction to Accumulo
 
Making sense of IoT, M2M and Big Data
Making sense of IoT, M2M and Big DataMaking sense of IoT, M2M and Big Data
Making sense of IoT, M2M and Big Data
 
IEEE Internet of Things (IoT) Initiative in Ukraine #iotconfua
IEEE Internet of Things (IoT) Initiative in Ukraine #iotconfuaIEEE Internet of Things (IoT) Initiative in Ukraine #iotconfua
IEEE Internet of Things (IoT) Initiative in Ukraine #iotconfua
 
AdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT Devices
AdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT DevicesAdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT Devices
AdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT Devices
 
IEEE IOT PROJECT TITLE 2015-16
IEEE IOT PROJECT TITLE 2015-16IEEE IOT PROJECT TITLE 2015-16
IEEE IOT PROJECT TITLE 2015-16
 
2015 IEEE CESdownload part 2 - josephwei
2015 IEEE CESdownload   part 2 - josephwei2015 IEEE CESdownload   part 2 - josephwei
2015 IEEE CESdownload part 2 - josephwei
 
AI IEEE
AI IEEEAI IEEE
AI IEEE
 
IoT ( M2M) - Big Data - Analytics: Emulation and Demonstration
IoT ( M2M) - Big Data - Analytics: Emulation and DemonstrationIoT ( M2M) - Big Data - Analytics: Emulation and Demonstration
IoT ( M2M) - Big Data - Analytics: Emulation and Demonstration
 
IEEE SusTech IoT Keynote Presentation 10/10/16
IEEE SusTech IoT Keynote Presentation 10/10/16IEEE SusTech IoT Keynote Presentation 10/10/16
IEEE SusTech IoT Keynote Presentation 10/10/16
 
How to Build an IoT Startup - Syam @IEEE TENSYMP 2015
How to Build an IoT Startup - Syam @IEEE TENSYMP 2015How to Build an IoT Startup - Syam @IEEE TENSYMP 2015
How to Build an IoT Startup - Syam @IEEE TENSYMP 2015
 
Internet of things : Beginners view
Internet of things : Beginners viewInternet of things : Beginners view
Internet of things : Beginners view
 
Internet of Things: Challenges and Issues
Internet of Things: Challenges and IssuesInternet of Things: Challenges and Issues
Internet of Things: Challenges and Issues
 

Similar to "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
yhadoop
 

Similar to "HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014 (20)

BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
OS-Assisted Task Preemption for Hadoop
OS-Assisted Task Preemption for HadoopOS-Assisted Task Preemption for Hadoop
OS-Assisted Task Preemption for Hadoop
 
HFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn ProtocolHFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn Protocol
 
Ijircce publish this paper
Ijircce publish this paperIjircce publish this paper
Ijircce publish this paper
 
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot ConfigurationsMap Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
 
Hui 3.0
Hui 3.0Hui 3.0
Hui 3.0
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Efficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and HiveEfficient Log Management using Oozie, Parquet and Hive
Efficient Log Management using Oozie, Parquet and Hive
 
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
 
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningA Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
 
A hadoop map reduce
A hadoop map reduceA hadoop map reduce
A hadoop map reduce
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 

"HFSP: Size-based Scheduling for Hadoop" presentation for BigData 2014

  • 1. HFSP: Size-based Scheduling for Hadoop Mario Pastorelli∗ Antonio Barbuzzi∗ Matteo Dell’Amico∗ Damiano Carra† Pietro Michiardi∗ ∗EURECOM, France †University of Verona, Italy IEEE BigData 2013 Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 1 / 15
  • 2. Why a new scheduler? Focus on short system response times heterogeneous workloads [VLDB12,VLDB13,SOCC13] big differences in jobs sizes data exploration, preliminary analyses, algorithm tuning, orchestration jobs. . . Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 2 / 15
  • 3. Why a new scheduler? Focus on short system response times heterogeneous workloads [VLDB12,VLDB13,SOCC13] big differences in jobs sizes data exploration, preliminary analyses, algorithm tuning, orchestration jobs. . . Current schedulers need manual setup fine-tuning of the scheduler parameters configuration of pools of jobs complex, error prone and difficult to adapt to workload/cluster changes Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 2 / 15
  • 4. Size-based schedulers Size-based schedulers are more efficient than other schedulers job priority based on the job size focus resources on a few jobs instead of splitting them among many jobs . . . but the job size is required Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 3 / 15
  • 5. Size-based schedulers Size-based schedulers are more efficient than other schedulers job priority based on the job size focus resources on a few jobs instead of splitting them among many jobs . . . but the job size is required MapReduce is suitable for size-based scheduling we don’t have the job size but we have the time to estimate it no perfect estimation is required . . . . . . as long as the jobs very different are sorted correctly Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 3 / 15
  • 6. Size-based schedulers: example Job Arrival Time Size job1 0s 30s job2 10s 10s job3 15s 10s Processor Share SRPT Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 4 / 15
  • 7. Size-based schedulers: example Job Arrival Time Size job1 0s 30s job2 10s 10s job3 15s 10s Scheduler AVG sojourn time Processor Share 35s SRPT 25s Processor Share SRPT Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 4 / 15
  • 8. Hadoop Fair Sojourn Protocol Like SRPT, HFSP wants to be efficient but it avoids starvation How: Shortest Remaining Virtual Time first (SRVT) Each job has a virtual size based on the real one Virtual size decreases with time Jobs are scheduled by ascending virtual size Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 5 / 15
  • 9. Hadoop Fair Sojourn Protocol: challenges Job size estimation Virtual size and aging Task scheduling policy Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 6 / 15
  • 10. Job size estimation (1/2) Two ways to estimate a job size: Offline: based on the informations available a priori (num tasks, block size, past history . . . ): available since job submission not very precise Online: based on the performance of a subset of tasks: need time for training more precise Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 7 / 15
  • 11. Job size estimation (1/2) Two ways to estimate a job size: Offline: based on the informations available a priori (num tasks, block size, past history . . . ): available since job submission not very precise Online: based on the performance of a subset of tasks: need time for training more precise We need both: Offline estimation for the initial size, because jobs need size since their submission Online estimation because it is more precise: when it is completed, the job size is updated Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 7 / 15
  • 12. Job size estimation (2/2) Implementation details: Online estimation is done while the job progresses, no work is wasted Estimation technique: first-order statistics are good enough The Map and Reduce phases of a job are treated as independent Further details in the paper . . . Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 8 / 15
  • 13. Virtual size and aging Like SRPT, HFSP wants to be efficient but it avoids starvation How: Each job has a “virtual” size A “virtual” Fair Scheduler lets each job make virtual progress We use virtual job sizes to take scheduling decision in the real cluster → Priority to small jobs → Every job eventually gets small, hence no starvation Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 9 / 15
  • 14. Task scheduling policy When a task slot becomes free: Schedule a task for online estimation, if any otherwise, schedule a task from the highest priority job Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 10 / 15
  • 15. Experimental Setup Task Trackers 36 CPUs Task Tracker 4 RAM Task Tracker 8 GB Map slots 72 Reduce slots 36 Network speed: 1Gbps Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 11 / 15
  • 16. Experimental Setup Task Trackers 36 CPUs Task Tracker 4 RAM Task Tracker 8 GB Map slots 72 Reduce slots 36 Network speed: 1Gbps Using PigMix jobs Two kinds of workloads inspired by existing traces Dataset size Map tasks Workload SMALL LARGE 1 GB < 5 65% 0% 10 GB 10 − 50 20% 10% 40 GB 50 − 150 10% 60% 100 GB > 150 5% 30% Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 11 / 15
  • 17. Results SMALL 101 102 103 Sojourn Time (s) 0.0 0.2 0.4 0.6 0.8 1.0 ECDF HFSP FAIR Same performance for tiny jobs Large difference for other jobs Mean sojourn time descreased by 16% using HFSP Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 12 / 15
  • 18. Results SMALL 101 102 103 Sojourn Time (s) 0.0 0.2 0.4 0.6 0.8 1.0 ECDF HFSP FAIR Same performance for tiny jobs Large difference for other jobs Mean sojourn time descreased by 16% using HFSP LARGE 101 102 103 104 Sojourn Time (s) 0.0 0.2 0.4 0.6 0.8 1.0 ECDF HFSP FAIR Jobs completed after 100 seconds: Fair: 2% jobs HFSP: 30% jobs Jobs completed after 1000 seconds: Fair: 15% jobs HFSP: 90% jobs Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 12 / 15
  • 19. Experiments: task times and estimation errors Task times are skewed 10% of the Reducers are much longer than other tasks 100 101 102 103 104 Task Time 0.0 0.2 0.4 0.6 0.8 1.0 ECDF MAP REDUCE Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 13 / 15
  • 20. Experiments: task times and estimation errors Task times are skewed 10% of the Reducers are much longer than other tasks 100 101 102 103 104 Task Time 0.0 0.2 0.4 0.6 0.8 1.0 ECDF MAP REDUCE 0.25 0.5 1 2 4 Error 0.0 0.2 0.4 0.6 0.8 1.0 ECDF MAP REDUCE error = est. size real size ∼60% jobs are over estimated impact of the over-estimation is mitigated by the aging function Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 13 / 15
  • 21. Conclusions HFSP strives for efficiency and avoids starvation Particularly suitable for loaded clusters Requires no manual, per-job priorities → heterogeneous workloads can coexist in the same cluster HFSP developed within the BigFoot project Available at: https://github.com/bigfootproject/HFSP Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 14 / 15
  • 22. Thank you! @mariopastorelli @BigFoot project Mario Pastorelli et al. (EURECOM) HFSP: Size-based Scheduling for Hadoop IEEE BigData 2013 15 / 15