SlideShare a Scribd company logo
1 of 30
Hadoop MapReduce
Scheduling Algorithms
Presented By,
Leila Panahi
Fatemeh Sheykh Mohammadi
Under the Guidance of
Dr. Leila Safari
December 2016
Agenda
Introduction Evaluation
Algorithms Summary
2/30
Introduction
 Job scheduling in multi-user environments -> challenge in MapReduce
 Each node is a physical machine with computational and storage capabilities
 Hadoop uses the number of slots concept for each node in order to control the
maximum number of tasks that can be executed concurrently on a node.
 Each slot of the node at any time is only capable of executing one task
 Two types of slot : map slot, and reduce slot.
3/30
Scheduling Algorithms
• Locality-Aware
• Replica-Aware
(Maestro)
• Center-of-Gravity
(CoGRS)
• Context-Aware
(CASH)
• LATE
• SAMR
• ESAMR
• LA
• HAT
4/30
FIFO
• LSCHED
• RAS
• SARS
• COSSH
• Load-
Driven
• Job Aware
Quality Metrics for MapReduce scheduling algorithms
5/30
Quality
Metrics
Fairness
Throughput
Response
Time
Availability
Energy
Efficiency
Resource
Util.
Scalability
Overheads
FIFO
 Default
 First in – First out
 The main objective:
to schedule jobs based on their priorities in first-come first-out of first serve
order
 limitations:
poor response times for short jobs compared to large jobs,
 Low performance when run multiple types of jobs
it give good result only for single type of job
6/30
Fair Scheduling
 all jobs get, on average, an equal share of
resources over time
 The objective:
is to do a equal distribution of compute resources
among the users/jobs in the system
 Covers some limitation of FIFO:
it can works well in both small and large clusters
and less complex.
disadvantage :
does not consider the job weight of each node
7/30
Capacity scheduler
 similar to fair scheduling But used of queues instead of pool
 Queues and sub-queues
 Capacity Guarantee with elasticity
 ACLs for security
 Runtime changes/draining apps
 Resource based scheduling
8/30
Speculation Execution
 Identify slow tasks
The job progress in Hadoop >> 𝑝𝑠 =
𝑀
𝑁
𝑓𝑜𝑟 𝑀𝑎𝑝 𝑇𝑎𝑠𝑘𝑠
1
3
× 𝑘 +
𝑀
𝑁
𝑓𝑜𝑟 𝑅𝑒𝑑𝑢𝑐𝑒 𝑇𝑎𝑠𝑘𝑠
 The average job progress in Hadoop >> 𝑝𝑠 𝑎𝑣𝑔 = 𝑖=1
𝐾 𝑃𝑠[𝑖]
𝐾
 Jobs need to backup >> 𝑓𝑜𝑟 𝑡𝑎𝑠𝑘 𝑇𝑖: 𝑝𝑠 𝑖 < 𝑝𝑠 𝑎𝑣𝑔 − 20%
9/30
Longest Approximate Time to End (LATE)
 scheduler to robustly improve performance by reducing overhead of
speculation execution tasks
 in heterogeneous environment
 find real slow tasks by computing remaining time of all the tasks
 it ranks tasks by estimated time remaining and starts a copy of the highest
ranked task that has a progress rate lower than the Slow Task Threshold
𝑃𝑅 =
𝑃𝑆
𝑇𝑟
(Progress Rate)
 𝑇𝑇𝐸 =
1−𝑃𝑆
𝑃𝑅
(Time To End)
10/30
Longest Approximate Time to End (LATE)
The advantage:
robustness to node heterogeneity, since only some of the slowest speculative
tasks are restarted.
This method does not break the synchronization phase between the map and
reduce phases, but only takes action on appropriate slow tasks.
11/30
Self-Adaptive MapReduce (SAMR)
 Historical information
 nodes
 jobs
12/30
execution time
system resources
Fast : finish a task in a shorter time
Slow : finish a task in a longer time
fast
slow
Self-Adaptive MapReduce (SAMR)
SAMR decreases the time of the execution up to 25% compared with Hadoop’s
scheduler and 14% compared with LATE scheduler.
13/30
Enhanced Self-Adaptive MapReduce (ESAMR)
 SAMR does not consider the fact that size of datasets and type
of jobs may lead to different weights for map and reduce stage.
 classifies the historical information stored on every node into k
clusters using a machine learning technique.
 If a running job has completed some map tasks on a node:
temporary map phase weight (M1) on the node according to the
job’s map tasks completed on the node.
14/30
Enhanced Self-Adaptive MapReduce (ESAMR)
 The temporary M1 weight is used to find the cluster whose M1 weight is the
closest.
Uses the cluster’s stage weights to estimate the job’s map tasks’ TimeToEnd on
the node and identify slow tasks that need to be re-executed.
 Reduce phase : similar procedure.
 After a job has finished, ESAMR calculates the job’s stage weights on every
node and saves these new weighs as a part of the historical information.
 Applies k-means to re-classify the historical information stored on every worker
node into k clusters and saves the updated average stage weights for each of the
k clusters
15/30
Delay
 To address the conflict between locality and fairness
 when a node requests a task,
 if the headof-line job cannot launch a local task
skip it and look at subsequent jobs
 if a job has been skipped long enough
start allowing it to launch non- local tasks, to avoid starvation
 temporarily relaxes fairness to improve locality by asking
jobs to wait for a scheduling opportunity on a node with
local data
16/30
Maestro
 avoid the non-local Map tasks execution problem that relies on replica aware
execution of Map tasks
 keeps track of the chunks and replica locations, along with the number of other
chunks hosted by each node
 efficiently schedule the map task on a data local node which causes minimal
impacts on other nodes local map tasks executions
17/30
Maestro
 It does map task scheduling in two waves:
 initially, it fills the empty slots of each data node based on the number of
hosted map tasks and on the replication scheme for their input data
 second, runtime scheduling takes into account the probability of
scheduling a map task on a given machine depending on the replicas of
the task’s input data
provide a higher locality in the execution of map tasks
more balanced intermediate data distribution for the shuffling phase.
18/30
Context-aware Scheduler
 uses the existing heterogeneity of most clusters and the workload
mix, proposing optimizations for jobs using the same dataset
 The design is based on two key insights:
First, a large percentage of MapReduce jobs are run periodically and
roughly have the same characteristics regarding CPU, network, and
disk requirements
Second, the nodes in a Hadoop cluster become heterogeneous over
time due to failures, when newer nodes replace old ones
19/30
Context-aware Scheduler
 The scheduler uses three steps to
accomplish its objective
classify jobs as CPU or I/O bound
classify nodes as Computational or I/O
map the tasks of a job with different
demands to the nodes that can fulfill the
demands
20/30
Locality-Aware Reduce Task Scheduler
 The Reduce phase scheduling is modified to become aware of
 balance among scheduling delay
 scheduling skew
 system utilization
 parallelism
Partitions
Locations
Size
decrease network traffic
21/30
Center-of-Gravity Reduce Scheduler
 locality-aware
 skew-aware
 The proposed scheduler attempts to schedule every Reduce task at
its center-of-gravity node deter-mined by the network locations
 MapReduce jobs to co-exist on the same system
saving MapReduce
network traffic
Reduce task scheduler
22/30
COSSH
 it considers heterogeneity at both application and cluster levels
 The main approach: use system information to make better scheduling decisions, which leads
to improving the performance.
 two main processes
New job
(user)
• queuing process to store the
incoming job in an appropriate queue
Heartbeat
(free resource)
• triggers the routing process to assign
a job to the current free resource
23/30
self-adaptive scheduling algorithm for reduce start time (SARS)
 optimal reduce scheduling policy for reduce tasks start time
 works by delaying the reduce processes
Shorten the copy duration of the reduce Process
Decrease the task complete time
Save the reduce slots resources
limitation
only focus on
reduce process
24/30
Summary
Scheduling Algorithm Idea to Implementation
FIFO schedule jobs based on their priorities in first-come firstout.
Fair
Scheduling
do a equal distribution of compute resources among the users/jobs in the
system.
Capacity Maximization the resource utilization and throughput in multi-tenant cluster
environment.
hybrid
scheduler
based on
dynamic
Priority
designed for data intensive workloads and tries to maintain data locality during
job execution
LATE Fault Tolerance
25/30
Scheduling Algorithm Idea to Implementation
SAMR To improve MapReduce in terms of saving the time of the execution and the
system’s resources.
delay
scheduling
To address the conflict between locality and fairness.
Maestro Proposed for map tasks, to improve the overall performance of the
MapReduce computation.
CREST re-executing a combination of tasks on a group of computing nodes.
context-aware
scheduler
To optimizations for jobs using the same dataset
LARTS decrease network traffic
Summary
26/30
Summary
Scheduling Algorithm Idea to Implementation
CoGRS proposed scheduler attempts to schedule every Reduce task at its center-of-
gravity node deter-mined by the network locations.
MaRCO achieve nearly full overlap via the novel idea of including the reduce in the
overlap.
COSHH proposed to improve the mean completion time of jobs
SARS shorten the copy duration of the reduce process, decrease the task complete
time, and save the reduce slots Resources
27/30
Refrences
1. Varma, Rakesh. "Survey on MapReduce and Scheduling Algorithms in Hadoop." International Journal of Science and
Research 4.2 (2015).
2. Zaharia, Matei, et al. "Job scheduling for multi-user mapreduce clusters." EECS Department, University of California,
Berkeley, Tech. Rep. UCB/EECS-2009-55 (2009).
3. Tiwari, Nidhi, et al. "Classification framework of MapReduce scheduling algorithms." ACM Computing Surveys
(CSUR) 47.3 (2015): 49.
4. Zaharia, Matei, et al. "Improving MapReduce Performance in Heterogeneous Environments." OSDI. Vol. 8. No. 4.
2008.
5. Kumar, K. Arun, et al. "CASH: context aware scheduler for Hadoop." Proceedings of the International Conference on
Advances in Computing, Communications and Informatics. ACM, 2012.
6. Hammoud, Mohammad, M. Suhail Rehman, and Majd F. Sakr. "Center-of-gravity reduce task scheduling to lower
mapreduce network traffic." Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on. IEEE, 2012.
7. Rasooli, Aysan, and Douglas G. Down. "COSHH: A classification and optimization based scheduler for heterogeneous
Hadoop systems." Future Generation Computer Systems 36 (2014): 1-15.
8. Lei, Lei, Tianyu Wo, and Chunming Hu. "CREST: Towards fast speculation of straggler tasks in MapReduce." e-
Business Engineering (ICEBE), 2011 IEEE 8th International Conference on. IEEE, 2011.
Refrences
9. Zaharia, Matei, et al. "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling."
Proceedings of the 5th European conference on Computer systems. ACM, 2010.
10. Sun, Xiaoyu, Chen He, and Ying Lu. "ESAMR: an enhanced self-adaptive MapReduce scheduling algorithm." Parallel
and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on. IEEE, 2012.
11. Nguyen, Phuong, et al. "A hybrid scheduling algorithm for data intensive workloads in a mapreduce environment."
Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing. IEEE Computer
Society, 2012.
12. Hammoud, Mohammad, and Majd F. Sakr. "Locality-aware reduce task scheduling for MapReduce." Cloud
Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on. IEEE, 2011.
13. Ibrahim, Shadi, et al. "Maestro: Replica-aware map scheduling for mapreduce." Cluster, Cloud and Grid Computing
(CCGrid), 2012 12th IEEE/ACM International Symposium on. IEEE, 2012.
14. Chen, Quan, et al. "Samr: A self-adaptive mapreduce scheduling algorithm in heterogeneous environment."
Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010.
15. Tang, Zhuo, et al. "A self-adaptive scheduling algorithm for reduce start time." Future Generation Computer Systems
43 (2015): 51-60.
KEEP
CALM
OUR LAST
SLIDE
THIS IS

More Related Content

What's hot

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system modelHarshad Umredkar
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDr. C.V. Suresh Babu
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 

What's hot (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 

Similar to MapReduce Scheduling Algorithms

Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...IRJET Journal
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterIRJET Journal
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfTSANKARARAO
 
Sharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow ApplicationsSharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow Applicationsijcsit
 
Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...ijgca
 
Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...Ricardo014
 
Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...ijgca
 
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORSMULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORScscpconf
 
Multiple dag applications
Multiple dag applicationsMultiple dag applications
Multiple dag applicationscsandit
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingLu Wei
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...Absi Ahmed
 
Comparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling AlgorithmsComparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling Algorithmsiosrjce
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintijccsa
 
A bi objective workflow application
A bi objective workflow applicationA bi objective workflow application
A bi objective workflow applicationIJITE
 
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing IJORCS
 

Similar to MapReduce Scheduling Algorithms (20)

E031201032036
E031201032036E031201032036
E031201032036
 
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Sharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow ApplicationsSharing of cluster resources among multiple Workflow Applications
Sharing of cluster resources among multiple Workflow Applications
 
Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...
 
Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...
 
Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...Optimized Assignment of Independent Task for Improving Resources Performance ...
Optimized Assignment of Independent Task for Improving Resources Performance ...
 
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORSMULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORS
 
Multiple dag applications
Multiple dag applicationsMultiple dag applications
Multiple dag applications
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce Scheduling
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
 
C017241316
C017241316C017241316
C017241316
 
Comparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling AlgorithmsComparative Analysis of Various Grid Based Scheduling Algorithms
Comparative Analysis of Various Grid Based Scheduling Algorithms
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
Main map reduce
Main map reduceMain map reduce
Main map reduce
 
A bi objective workflow application
A bi objective workflow applicationA bi objective workflow application
A bi objective workflow application
 
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
Max Min Fair Scheduling Algorithm using In Grid Scheduling with Load Balancing
 

Recently uploaded

Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

MapReduce Scheduling Algorithms

  • 1. Hadoop MapReduce Scheduling Algorithms Presented By, Leila Panahi Fatemeh Sheykh Mohammadi Under the Guidance of Dr. Leila Safari December 2016
  • 3. Introduction  Job scheduling in multi-user environments -> challenge in MapReduce  Each node is a physical machine with computational and storage capabilities  Hadoop uses the number of slots concept for each node in order to control the maximum number of tasks that can be executed concurrently on a node.  Each slot of the node at any time is only capable of executing one task  Two types of slot : map slot, and reduce slot. 3/30
  • 4. Scheduling Algorithms • Locality-Aware • Replica-Aware (Maestro) • Center-of-Gravity (CoGRS) • Context-Aware (CASH) • LATE • SAMR • ESAMR • LA • HAT 4/30 FIFO • LSCHED • RAS • SARS • COSSH • Load- Driven • Job Aware
  • 5. Quality Metrics for MapReduce scheduling algorithms 5/30 Quality Metrics Fairness Throughput Response Time Availability Energy Efficiency Resource Util. Scalability Overheads
  • 6. FIFO  Default  First in – First out  The main objective: to schedule jobs based on their priorities in first-come first-out of first serve order  limitations: poor response times for short jobs compared to large jobs,  Low performance when run multiple types of jobs it give good result only for single type of job 6/30
  • 7. Fair Scheduling  all jobs get, on average, an equal share of resources over time  The objective: is to do a equal distribution of compute resources among the users/jobs in the system  Covers some limitation of FIFO: it can works well in both small and large clusters and less complex. disadvantage : does not consider the job weight of each node 7/30
  • 8. Capacity scheduler  similar to fair scheduling But used of queues instead of pool  Queues and sub-queues  Capacity Guarantee with elasticity  ACLs for security  Runtime changes/draining apps  Resource based scheduling 8/30
  • 9. Speculation Execution  Identify slow tasks The job progress in Hadoop >> 𝑝𝑠 = 𝑀 𝑁 𝑓𝑜𝑟 𝑀𝑎𝑝 𝑇𝑎𝑠𝑘𝑠 1 3 × 𝑘 + 𝑀 𝑁 𝑓𝑜𝑟 𝑅𝑒𝑑𝑢𝑐𝑒 𝑇𝑎𝑠𝑘𝑠  The average job progress in Hadoop >> 𝑝𝑠 𝑎𝑣𝑔 = 𝑖=1 𝐾 𝑃𝑠[𝑖] 𝐾  Jobs need to backup >> 𝑓𝑜𝑟 𝑡𝑎𝑠𝑘 𝑇𝑖: 𝑝𝑠 𝑖 < 𝑝𝑠 𝑎𝑣𝑔 − 20% 9/30
  • 10. Longest Approximate Time to End (LATE)  scheduler to robustly improve performance by reducing overhead of speculation execution tasks  in heterogeneous environment  find real slow tasks by computing remaining time of all the tasks  it ranks tasks by estimated time remaining and starts a copy of the highest ranked task that has a progress rate lower than the Slow Task Threshold 𝑃𝑅 = 𝑃𝑆 𝑇𝑟 (Progress Rate)  𝑇𝑇𝐸 = 1−𝑃𝑆 𝑃𝑅 (Time To End) 10/30
  • 11. Longest Approximate Time to End (LATE) The advantage: robustness to node heterogeneity, since only some of the slowest speculative tasks are restarted. This method does not break the synchronization phase between the map and reduce phases, but only takes action on appropriate slow tasks. 11/30
  • 12. Self-Adaptive MapReduce (SAMR)  Historical information  nodes  jobs 12/30 execution time system resources Fast : finish a task in a shorter time Slow : finish a task in a longer time fast slow
  • 13. Self-Adaptive MapReduce (SAMR) SAMR decreases the time of the execution up to 25% compared with Hadoop’s scheduler and 14% compared with LATE scheduler. 13/30
  • 14. Enhanced Self-Adaptive MapReduce (ESAMR)  SAMR does not consider the fact that size of datasets and type of jobs may lead to different weights for map and reduce stage.  classifies the historical information stored on every node into k clusters using a machine learning technique.  If a running job has completed some map tasks on a node: temporary map phase weight (M1) on the node according to the job’s map tasks completed on the node. 14/30
  • 15. Enhanced Self-Adaptive MapReduce (ESAMR)  The temporary M1 weight is used to find the cluster whose M1 weight is the closest. Uses the cluster’s stage weights to estimate the job’s map tasks’ TimeToEnd on the node and identify slow tasks that need to be re-executed.  Reduce phase : similar procedure.  After a job has finished, ESAMR calculates the job’s stage weights on every node and saves these new weighs as a part of the historical information.  Applies k-means to re-classify the historical information stored on every worker node into k clusters and saves the updated average stage weights for each of the k clusters 15/30
  • 16. Delay  To address the conflict between locality and fairness  when a node requests a task,  if the headof-line job cannot launch a local task skip it and look at subsequent jobs  if a job has been skipped long enough start allowing it to launch non- local tasks, to avoid starvation  temporarily relaxes fairness to improve locality by asking jobs to wait for a scheduling opportunity on a node with local data 16/30
  • 17. Maestro  avoid the non-local Map tasks execution problem that relies on replica aware execution of Map tasks  keeps track of the chunks and replica locations, along with the number of other chunks hosted by each node  efficiently schedule the map task on a data local node which causes minimal impacts on other nodes local map tasks executions 17/30
  • 18. Maestro  It does map task scheduling in two waves:  initially, it fills the empty slots of each data node based on the number of hosted map tasks and on the replication scheme for their input data  second, runtime scheduling takes into account the probability of scheduling a map task on a given machine depending on the replicas of the task’s input data provide a higher locality in the execution of map tasks more balanced intermediate data distribution for the shuffling phase. 18/30
  • 19. Context-aware Scheduler  uses the existing heterogeneity of most clusters and the workload mix, proposing optimizations for jobs using the same dataset  The design is based on two key insights: First, a large percentage of MapReduce jobs are run periodically and roughly have the same characteristics regarding CPU, network, and disk requirements Second, the nodes in a Hadoop cluster become heterogeneous over time due to failures, when newer nodes replace old ones 19/30
  • 20. Context-aware Scheduler  The scheduler uses three steps to accomplish its objective classify jobs as CPU or I/O bound classify nodes as Computational or I/O map the tasks of a job with different demands to the nodes that can fulfill the demands 20/30
  • 21. Locality-Aware Reduce Task Scheduler  The Reduce phase scheduling is modified to become aware of  balance among scheduling delay  scheduling skew  system utilization  parallelism Partitions Locations Size decrease network traffic 21/30
  • 22. Center-of-Gravity Reduce Scheduler  locality-aware  skew-aware  The proposed scheduler attempts to schedule every Reduce task at its center-of-gravity node deter-mined by the network locations  MapReduce jobs to co-exist on the same system saving MapReduce network traffic Reduce task scheduler 22/30
  • 23. COSSH  it considers heterogeneity at both application and cluster levels  The main approach: use system information to make better scheduling decisions, which leads to improving the performance.  two main processes New job (user) • queuing process to store the incoming job in an appropriate queue Heartbeat (free resource) • triggers the routing process to assign a job to the current free resource 23/30
  • 24. self-adaptive scheduling algorithm for reduce start time (SARS)  optimal reduce scheduling policy for reduce tasks start time  works by delaying the reduce processes Shorten the copy duration of the reduce Process Decrease the task complete time Save the reduce slots resources limitation only focus on reduce process 24/30
  • 25. Summary Scheduling Algorithm Idea to Implementation FIFO schedule jobs based on their priorities in first-come firstout. Fair Scheduling do a equal distribution of compute resources among the users/jobs in the system. Capacity Maximization the resource utilization and throughput in multi-tenant cluster environment. hybrid scheduler based on dynamic Priority designed for data intensive workloads and tries to maintain data locality during job execution LATE Fault Tolerance 25/30
  • 26. Scheduling Algorithm Idea to Implementation SAMR To improve MapReduce in terms of saving the time of the execution and the system’s resources. delay scheduling To address the conflict between locality and fairness. Maestro Proposed for map tasks, to improve the overall performance of the MapReduce computation. CREST re-executing a combination of tasks on a group of computing nodes. context-aware scheduler To optimizations for jobs using the same dataset LARTS decrease network traffic Summary 26/30
  • 27. Summary Scheduling Algorithm Idea to Implementation CoGRS proposed scheduler attempts to schedule every Reduce task at its center-of- gravity node deter-mined by the network locations. MaRCO achieve nearly full overlap via the novel idea of including the reduce in the overlap. COSHH proposed to improve the mean completion time of jobs SARS shorten the copy duration of the reduce process, decrease the task complete time, and save the reduce slots Resources 27/30
  • 28. Refrences 1. Varma, Rakesh. "Survey on MapReduce and Scheduling Algorithms in Hadoop." International Journal of Science and Research 4.2 (2015). 2. Zaharia, Matei, et al. "Job scheduling for multi-user mapreduce clusters." EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-55 (2009). 3. Tiwari, Nidhi, et al. "Classification framework of MapReduce scheduling algorithms." ACM Computing Surveys (CSUR) 47.3 (2015): 49. 4. Zaharia, Matei, et al. "Improving MapReduce Performance in Heterogeneous Environments." OSDI. Vol. 8. No. 4. 2008. 5. Kumar, K. Arun, et al. "CASH: context aware scheduler for Hadoop." Proceedings of the International Conference on Advances in Computing, Communications and Informatics. ACM, 2012. 6. Hammoud, Mohammad, M. Suhail Rehman, and Majd F. Sakr. "Center-of-gravity reduce task scheduling to lower mapreduce network traffic." Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on. IEEE, 2012. 7. Rasooli, Aysan, and Douglas G. Down. "COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems." Future Generation Computer Systems 36 (2014): 1-15. 8. Lei, Lei, Tianyu Wo, and Chunming Hu. "CREST: Towards fast speculation of straggler tasks in MapReduce." e- Business Engineering (ICEBE), 2011 IEEE 8th International Conference on. IEEE, 2011.
  • 29. Refrences 9. Zaharia, Matei, et al. "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling." Proceedings of the 5th European conference on Computer systems. ACM, 2010. 10. Sun, Xiaoyu, Chen He, and Ying Lu. "ESAMR: an enhanced self-adaptive MapReduce scheduling algorithm." Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on. IEEE, 2012. 11. Nguyen, Phuong, et al. "A hybrid scheduling algorithm for data intensive workloads in a mapreduce environment." Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing. IEEE Computer Society, 2012. 12. Hammoud, Mohammad, and Majd F. Sakr. "Locality-aware reduce task scheduling for MapReduce." Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on. IEEE, 2011. 13. Ibrahim, Shadi, et al. "Maestro: Replica-aware map scheduling for mapreduce." Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on. IEEE, 2012. 14. Chen, Quan, et al. "Samr: A self-adaptive mapreduce scheduling algorithm in heterogeneous environment." Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010. 15. Tang, Zhuo, et al. "A self-adaptive scheduling algorithm for reduce start time." Future Generation Computer Systems 43 (2015): 51-60.