Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and a Novel SLA based Learning Scheduler in Hadoop" by G Sudha Sadhasivam and Priya NPresentation Transcript
AN EXTENSION OF FAIRSHARESCHEDULER AND A NOVEL SLA BASED LEARNING SCHEDULER IN HADOOP BY Dr G SUDHA SADHASIVAM PROFESSOR & PRIYA N STUDENTPSG COLLEGE OF TECHNOLOGY COIMBATORE
agenda Introduction - Metascheduler in Fairsharescheduler. Features. Extended Fairscheduler Architecture. Work Flow. Experimental results. Learning Scheduler with SLA. Design of Proposed System. Work Flow
Fairshare scheduler Existing System :-
Jobs in pool are executed in Fairshare manner.
Proposed System :-
Fairshare Execution of Jobs from pool such that Large Job first and Small Job Backfilling.
FEAtures Jobs in pools Guaranteed capacity Minimum Shares Job Limits Job Priorities Pool Weights
ARCHITECTURE Node 1 USER 1 Node 2 Pool USER 2 FAIRSHARE SCHEDULER Node 3 USER 3 LARGE JOB FIRST+ SMALL JOB BACKFILLING Node 4 USER 4
User Estimated time = (no.of maps *maptime)+(no.of reduces * reduce time).
WORKFLOW = no.of maps * maptime+no.of reduces * reduce time Calculate no. Of maps and reduces Find User Estimated Time Create a list of jobs Get Jobs in pool Finished/running fairscheduler.start() Get runstate of job in progress Remove from list Categorize jobs as small and large Update:- Weight,taskcount,min.slots,runnability,fairshare Job finish time<user estimated time Bring large job first and backfill small jobs Backfill if exe_time<delay
RESULT(LFSB) :Different Jobs
More small jobs
A Novel sla based learning scheduler
Schedulers IN Hadoop Hadoop on Demand – FIFO with Torque No data locality Fairshare Fairshares resources among jobs in pools Excess resources are shored between pools Capacity Fairsharing among organisations Inter queue priority is maintained manually (not dynamic) Dynamic priority scheduler Adjustable priority dynamically Demand / budget of the user More priority for smaller jobs Large jobs have to be broken up into smaller ones
PATCHES Security features to isolate users Launching multuple tasks per heartbeat Parallelise jobs and launch smaller jobs faster Prevent oversubscribing nodes (only fter job submission) – RAM / HD
Task assignment right node.
No policies and less user level response.
Proposed System :-
SLA :user specifying requirements.
Job executing at right node.
Classify jobs as I/O bound or cpu bound – priority and assign jobs
Proposed methodology SLA – User details ,job requirements and charge sheet. Scheduler:
Classifies jobs based on (SLA+Job Features) and node features.(new job)
Classification based on Job traces History (Learning).
Creation of Queues for jobs as I/O and CPU
Assignment to Queues based on Utility Function.
Gather all node details & check for SLA approval. If Yes allow to submit jobs. Owner,Description,User details and requirements Node 1 Node 2 SLA USER Node 3 LEARNING SCHEDULER Node 4 Node 5
Workflow of Scheduler Node features CLASSIFIER Job Features+SLA (MIS+MOS)/MTCT >Avg.Disk I/o rate Job Traces history RIGHT NODE& Job type Calculate &Compare Utility Change priority I/O or CPU I/O queue CPU queue
CPU or I/O bound JOB I/O rate : 10 Mbytes / sec MTCT : 10 sec
Find the right node for the job using a classifier.
:Naïve Bayes classifier
Find the Job type whether I/O or CPU bound.
(MIS+MOS)/MTCT >Avg.Disk I/O rate
Calculate the Utility Function value.
Pass the jobs to the queue.
Fairscheduler with Backfilling improves on waiting time for large jobs. It introduces “no starvation” slogan and improves response time.
SLA based scheduler brings high user level response and better utilization of resources.
Saeed Iqbal ,Rinku Gupta, Yung chin Fang “Job Scheduling in HPC clusters” DELL Power Solutions 2005.
Juan Wang, Wenming Guo, ”The Application of Backfilling in Cluster Systems”,2009 IEEE International Conference on Communication and Mobile Computing.
Jaideep Dhok and Vasudeva Varma “Using Pattern Classification for Task Assignment in Map Reduce”. 10th IEEE/ACM International Conference CCGrid 2010.
Amy W. Apon, Thomas D.Wagner, and Lawrence. Dowdy. “A learning approach to processor allocation in parallel systems”. In CIKM ’99:Proceedings of the eighth international conference on Information and knowledge management, pages 531–537, New York, NY, USA, 1999.
Harry Zhang. “The Optimality of Naive Bayes”. In Valerie Barr and Zdravko Markov, editors, FLAIRS Conference. AAAI Press, 2004.