Economic Scheduling of Hadoop Jobs

3,343 views
3,207 views

Published on

A presentation on The Dynamic Priority MapReduce Scheduler by Thomas Sandholm and Kevin Lai of HP Labs, Palo Alto.

This scheduler is a contribution to Hadoop 0.21+

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,343
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
91
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Economic Scheduling of Hadoop Jobs

  1. 1. Economic Scheduling of Hadoop Jobs -The Dynamic Priority MapReduce Scheduler Thomas Sandholm, HP Labs, Palo Alto Kevin Lai, HP Labs, Palo Alto 1
  2. 2. The Problem » Allocate slots on compute nodes for job tasks » Classic Approach: Throughput optimization » Cross User Priorities inferred based on heuristics » Social Scheduling » Our Approach: User value optimization » Users are given an incentive to scale up or down » Automate demand conflict resolution 2
  3. 3. Other Hadoop Schedulers » FIFO » HOD » Fairshare » Capacity » Designed for no queues or few static fixed QoS queues » Works well in corporate clusters 3
  4. 4. Dynamic Priority Scheduler Requirements » Users may come and go frequently » Users may be unknown to providers » Users may want to schedule jobs across data centers and Hadoop installations manual, social scheduling of users (assumed to be cooperating) breaks down 4
  5. 5. Architecture 4/(4+1.5+2)*15=8 5
  6. 6. Our Solution: Automated Resource Allocation Budget Remaining Running Share Tasks Pending Spending Rate Tasks 6
  7. 7. Proportional-Share Scheduling » qi = bi/(bi + p) » p = ∑ b-i » Huberman et al Spawn ‘92 » Waldspurger et al Lottery Scheduling ‘95 » Lai et al Tycoon ‘05 7
  8. 8. Key Design Principles » Pay-per-use: spending rate is only deducted from budget if a job performed work » Work-conserving: users are never charged more than their spending rates but can get more slots if other users are idle » Preemptive: higher spending users may cause tasks from lower spending users to be killed » Scalable: No memory, or history-based fair-share smoothing 8
  9. 9. Implementation » Standalone Hadoop MapReduce JobTracker Scheduler Plugin » HTTP/XML/REST Servlet to provide secure management and monitoring of queues » Generic queue allocation/accounting classes (could move into mapred core) » Pluggable scheduler enforcing shares, when scheduling jobs (could be replaced by capacity/fairshare enforcers) 9
  10. 10. Configuration Option Examples mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.DynamicPriorityScheduler mapred.priority-scheduler.kill-interval 0 mapred.dynamic-scheduler.alloc-interval 20 mapred.dynamic-scheduler.budget-file /etc/hadoop.budget mapred.priority-scheduler.acl-file /etc/hadoop.acl 10
  11. 11. Experiment Fairshare vs Capacity vs FIFO vs DP 2-80 simulated users/queues 2 Clusters PiEstimator Simulation 11
  12. 12. Budget Dynamics DynPrio preempt FIFO scheduler DynPrio no Capacity scheduler preempt Funding runs out Budget replenished 12
  13. 13. Service Differentiation DynPrio FIFO 13
  14. 14. Dynamic Adjustment 14
  15. 15. More info » Papers › SIGMETRICS 2009 › Workshop on Job Scheduling for Parallel Processing (JSSPP’10) › International Conference on Cloud Computing and Virtualization (CCV’10) » HADOOP-4768 JIRA » Source: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/dynamic-scheduler/ 15

×