Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Economic Scheduling of Hadoop Jobs

3,474 views

Published on

A presentation on The Dynamic Priority MapReduce Scheduler by Thomas Sandholm and Kevin Lai of HP Labs, Palo Alto.

This scheduler is a contribution to Hadoop 0.21+

Published in: Technology
  • Be the first to comment

Economic Scheduling of Hadoop Jobs

  1. 1. Economic Scheduling of Hadoop Jobs -The Dynamic Priority MapReduce Scheduler Thomas Sandholm, HP Labs, Palo Alto Kevin Lai, HP Labs, Palo Alto 1
  2. 2. The Problem » Allocate slots on compute nodes for job tasks » Classic Approach: Throughput optimization » Cross User Priorities inferred based on heuristics » Social Scheduling » Our Approach: User value optimization » Users are given an incentive to scale up or down » Automate demand conflict resolution 2
  3. 3. Other Hadoop Schedulers » FIFO » HOD » Fairshare » Capacity » Designed for no queues or few static fixed QoS queues » Works well in corporate clusters 3
  4. 4. Dynamic Priority Scheduler Requirements » Users may come and go frequently » Users may be unknown to providers » Users may want to schedule jobs across data centers and Hadoop installations manual, social scheduling of users (assumed to be cooperating) breaks down 4
  5. 5. Architecture 4/(4+1.5+2)*15=8 5
  6. 6. Our Solution: Automated Resource Allocation Budget Remaining Running Share Tasks Pending Spending Rate Tasks 6
  7. 7. Proportional-Share Scheduling » qi = bi/(bi + p) » p = ∑ b-i » Huberman et al Spawn ‘92 » Waldspurger et al Lottery Scheduling ‘95 » Lai et al Tycoon ‘05 7
  8. 8. Key Design Principles » Pay-per-use: spending rate is only deducted from budget if a job performed work » Work-conserving: users are never charged more than their spending rates but can get more slots if other users are idle » Preemptive: higher spending users may cause tasks from lower spending users to be killed » Scalable: No memory, or history-based fair-share smoothing 8
  9. 9. Implementation » Standalone Hadoop MapReduce JobTracker Scheduler Plugin » HTTP/XML/REST Servlet to provide secure management and monitoring of queues » Generic queue allocation/accounting classes (could move into mapred core) » Pluggable scheduler enforcing shares, when scheduling jobs (could be replaced by capacity/fairshare enforcers) 9
  10. 10. Configuration Option Examples mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.DynamicPriorityScheduler mapred.priority-scheduler.kill-interval 0 mapred.dynamic-scheduler.alloc-interval 20 mapred.dynamic-scheduler.budget-file /etc/hadoop.budget mapred.priority-scheduler.acl-file /etc/hadoop.acl 10
  11. 11. Experiment Fairshare vs Capacity vs FIFO vs DP 2-80 simulated users/queues 2 Clusters PiEstimator Simulation 11
  12. 12. Budget Dynamics DynPrio preempt FIFO scheduler DynPrio no Capacity scheduler preempt Funding runs out Budget replenished 12
  13. 13. Service Differentiation DynPrio FIFO 13
  14. 14. Dynamic Adjustment 14
  15. 15. More info » Papers › SIGMETRICS 2009 › Workshop on Job Scheduling for Parallel Processing (JSSPP’10) › International Conference on Cloud Computing and Virtualization (CCV’10) » HADOOP-4768 JIRA » Source: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/dynamic-scheduler/ 15

×