Apache Hadoop India Summit 2011 talk "An Extension of Fairshare-Scheduler and a Novel SLA based Learning Scheduler in Hadoop" by G Sudha Sadhasivam and Priya N

  • 1,669 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,669
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
52
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. AN EXTENSION OF FAIRSHARESCHEDULER AND A NOVEL SLA BASED LEARNING SCHEDULER IN HADOOP
    BY
    Dr G SUDHA SADHASIVAM
    PROFESSOR
    &
    PRIYA N
    STUDENTPSG COLLEGE OF TECHNOLOGY COIMBATORE
  • 2. agenda
    Introduction
    - Metascheduler in Fairsharescheduler.
    Features.
    Extended Fairscheduler Architecture.
    Work Flow.
    Experimental results.
    Learning Scheduler with SLA.
    Design of Proposed System.
    Work Flow
  • 3. Fairshare scheduler
    Existing System :-
    • Jobs in pool are executed in Fairshare manner.
    Proposed System :-
    • Fairshare Execution of Jobs from pool such that Large Job first and Small Job Backfilling.
  • FEAtures
    Jobs in pools
    Guaranteed capacity
    Minimum Shares
    Job Limits
    Job Priorities
    Pool Weights
  • 4. ARCHITECTURE
    Node 1
    USER 1
    Node 2
    Pool
    USER 2
    FAIRSHARE SCHEDULER
    Node 3
    USER 3
    LARGE JOB FIRST+ SMALL JOB BACKFILLING
    Node 4
    USER 4
  • 5. Calculate
    • User Estimated time = (no.of maps *maptime)+(no.of reduces * reduce time).
    Update
    • Runnability
    • 6. Taskcount=total_Tasks–running_Tasks–finished_Tasks+needed_Tasks_for_job
    • 7. Weight = weight *priorityfactor.
    • 8. Fairshare=(weight *oldslots)/totalweight
    • 9. Deficit (MR_Deficit) =(fairshare - running) *timedelta
  • WORKFLOW
    = no.of maps * maptime+no.of reduces * reduce time
    Calculate no. Of maps and reduces
    Find User Estimated Time
    Create a list of jobs
    Get Jobs in pool
    Finished/running
    fairscheduler.start()
    Get runstate of job in progress
    Remove from list
    Categorize jobs as small and large
    Update:-
    Weight,taskcount,min.slots,runnability,fairshare
    Job finish time<user estimated time
    Bring large job first and backfill small jobs
    Backfill if exe_time<delay
  • 10.
  • 11. RESULT(LFSB) :Different Jobs
  • 12. More small jobs
  • 13. A Novel sla based learning scheduler
  • 14. Schedulers IN Hadoop
    Hadoop on Demand –
    FIFO with Torque
    No data locality
    Fairshare
    Fairshares resources among jobs in pools
    Excess resources are shored between pools
    Capacity
    Fairsharing among organisations
    Inter queue priority is maintained manually (not dynamic)
    Dynamic priority scheduler
    Adjustable priority dynamically
    Demand / budget of the user
    More priority for smaller jobs
    Large jobs have to be broken up into smaller ones
  • 15. PATCHES
    Security features to isolate users
    Launching multuple tasks per heartbeat
    Parallelise jobs and launch smaller jobs faster
    Prevent oversubscribing nodes (only fter job submission) – RAM / HD
  • 16.
    • Existing System:-
    • 17. Task assignment right node.
    • 18. No policies and less user level response.
    • 19. Proposed System :-
    • 20. SLA :user specifying requirements.
    • 21. Job executing at right node.
    • 22. Classify jobs as I/O bound or cpu bound – priority and assign jobs
  • Proposed methodology
    SLA – User details ,job requirements and charge sheet.
    Scheduler:
    • Classifies jobs based on (SLA+Job Features) and node features.(new job)
    • 23. Classification based on Job traces History (Learning).
    • 24. Creation of Queues for jobs as I/O and CPU
    • 25. Assignment to Queues based on Utility Function.
  • Gather all node details & check for SLA approval. If Yes allow to submit jobs.
    Owner,Description,User details and requirements
    Node 1
    Node 2
    SLA
    USER
    Node 3
    LEARNING SCHEDULER
    Node 4
    Node 5
  • 26.
  • 27.
  • 28.
  • 29. Workflow of Scheduler
    Node features
    CLASSIFIER
    Job Features+SLA
    (MIS+MOS)/MTCT >Avg.Disk I/o rate
    Job Traces history
    RIGHT NODE& Job type
    Calculate &Compare Utility
    Change priority
    I/O or CPU
    I/O queue
    CPU queue
  • 30. example
    Node Feature value
  • 31. Job Submitted (Job Features)
    ram=400Mb,HD=100Gb, M=6,R=2
    ram=500Mb. HD=120Gb M=8 R=0.
    P(node)={no. job Features+no.node features*(P(F1)+P(F2), …P(Fn))}/Total features
    P(J1M1)=1,P(J1M2)=0.875 ,P(J1M3)=0.8,P(J1M4)=1, P(J1M5)=1, P(J1M6)=0.625.
    P(J2M1)=1,P(J1M2)=0.857 ,P(J1M3)=0.857,P(J1M4)=0.514, P(J1M5)=0.857, P(J1M6)=0.514
    JOB 1= M1,M4,M5. M4 satisfies.
    JOB 2= M1.
  • 32. CPU or I/O bound JOB
    I/O rate : 10 Mbytes / sec
    MTCT : 10 sec
  • 33. Scheduler
    • Find the right node for the job using a classifier.
    :Naïve Bayes classifier
    • Find the Job type whether I/O or CPU bound.
    (MIS+MOS)/MTCT >Avg.Disk I/O rate
    • Calculate the Utility Function value.
    FIFO,Deficit,SJF.
    • Pass the jobs to the queue.
  • Advantages
    • Fairscheduler with Backfilling improves on waiting time for large jobs. It introduces “no starvation” slogan and improves response time.
    • 34. SLA based scheduler brings high user level response and better utilization of resources.
  • References
    • Saeed Iqbal ,Rinku Gupta, Yung chin Fang “Job Scheduling in HPC clusters” DELL Power Solutions 2005.
    • 35. Juan Wang, Wenming Guo, ”The Application of Backfilling in Cluster Systems”,2009 IEEE International Conference on Communication and Mobile Computing.
    • 36. Jaideep Dhok and Vasudeva Varma “Using Pattern Classification for Task Assignment in Map Reduce”. 10th IEEE/ACM International Conference CCGrid 2010.
    • 37. Amy W. Apon, Thomas D.Wagner, and Lawrence. Dowdy. “A learning approach to processor allocation in parallel systems”. In CIKM ’99:Proceedings of the eighth international conference on Information and knowledge management, pages 531–537, New York, NY, USA, 1999.
    • 38. Harry Zhang. “The Optimality of Naive Bayes”. In Valerie Barr and Zdravko Markov, editors, FLAIRS Conference. AAAI Press, 2004.
  • THANK YOU