Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Job Scheduling in Hadoop
an exposé

Joydeep Sen Sarma
About Me
c 2007

Facebook: Ran/Managed Hadoop ~ 3 years
Wrote Hive
Mentor/PM Hadoop Fair-Scheduler
Used Hadoop/Hive (as Wa...
The Crime
Shared Hadoop Clusters

Statistical Multiplexing
Largest jobs only fit on pooled hardware
Data Locality
Easier t...
… and the Punishment
• “Have you no Hadoop Etiquettes?” (c 2007)
(reducer count capped in response)

• User takes down ent...
The Perfect Weapon
Scheduler

• Efficient
• Scalable

• Strong Isolation
• Fair
• Fault Tolerant
• Low Latency
Quick Review
• Fair Scheduler (Fairness/Isolation)
• Speculation (Fault Tolerance/Latency)
• Preemption (Fairness)
• Usage...
And then there’s Hadoop (1.x) …
• Single JobTracker for all Jobs
– Does not scale, SPOF

• Pull Based Architecture
– Scala...
Poll Driven Scheduling
insert overwrite table dest
select … from ads join
campaigns on …group by …;

Map Tasks

Job Tracke...
Pessmistic Locking
getBestTask():
for pool: sortedPools
for job: pool.sortedJobs()
for task: job.tasks()
if betterMatch(ta...
Slot Based Scheduling
• N cpus, M map slots, R reduce slots
– Memory cannot be oversubscribed!

• How to divide?
– M < N ...
Long Running Reducers
• Online Scheduling
– No advance information of future workload

• Greedy + Fair Scheduling
– Schedu...
Optimistic Locking
Task[] getBestTaskCandidates():
for pool: sortedPools
for job: pool.sortedJobs.clone()
for task: job.ta...
Corona: Push Scheduling
1. JT subscribes for M maps and R reduces
–

Receives availability from Cluster Manager (CM)

2. C...
Corona/YARN: Scalability
1. JobTracker for each Job now Independent
–

More Fault Tolerant and Isolated as well

2. Centra...
Pesky Reducers
• Hadoop 2 removes distinction between M and
R slots
• Not Enough
– Reduce Tasks don’t use much CPU in shuf...
The Future is Cloudy
• Data Center Assumption:
– Cluster characteristics known
– Job spec fits to cluster

• In Cloud:
– C...
Questions?

joydeep@qubole.com
http://www.linkedin.com/in/joydeeps
Upcoming SlideShare
Loading in …5
×

Hadoop Scheduling - a 7 year perspective

2,860 views

Published on

Talk at Flipkart's SlashN conference 2014. Perspectives on Hadoop

Published in: Technology

Hadoop Scheduling - a 7 year perspective

  1. 1. Job Scheduling in Hadoop an exposé Joydeep Sen Sarma
  2. 2. About Me c 2007 Facebook: Ran/Managed Hadoop ~ 3 years Wrote Hive Mentor/PM Hadoop Fair-Scheduler Used Hadoop/Hive (as Warehouse/ETL Dev) Re-wrote significant chunks of Hadoop Job Scheduling (incl. Corona) Qubole: Running World’s largest Hadoop clusters on AWS c 2014
  3. 3. The Crime Shared Hadoop Clusters Statistical Multiplexing Largest jobs only fit on pooled hardware Data Locality Easier to manage
  4. 4. … and the Punishment • “Have you no Hadoop Etiquettes?” (c 2007) (reducer count capped in response) • User takes down entire Cluster (OOM) (c 2007-09) • Bad Job slows down entire Cluster (c 2009) • Steady State Latencies get intolerable (c 2010-) • ”How do I know I am getting my fair share?” (c 2011) • “Too few reducer slots, cluster idle” (c 2013)
  5. 5. The Perfect Weapon Scheduler • Efficient • Scalable • Strong Isolation • Fair • Fault Tolerant • Low Latency
  6. 6. Quick Review • Fair Scheduler (Fairness/Isolation) • Speculation (Fault Tolerance/Latency) • Preemption (Fairness) • Usage Monitoring/Limits (Isolation)
  7. 7. And then there’s Hadoop (1.x) … • Single JobTracker for all Jobs – Does not scale, SPOF • Pull Based Architecture – Scalability and Low Latency at permanent War – Inefficient – leaves idle time • Slot Based Scheduling – Inefficient • Pessimistic Locking in Tracker – Scalability Bottleneck • Long Running Tasks – Fairness and Efficiency at permanent War
  8. 8. Poll Driven Scheduling insert overwrite table dest select … from ads join campaigns on …group by …; Map Tasks Job Tracker Master ReduceTasks Heartbeat MapTask TaskTracker Slave Child 8
  9. 9. Pessmistic Locking getBestTask(): for pool: sortedPools for job: pool.sortedJobs() for task: job.tasks() if betterMatch(task) … processHeartbeat(): synchronized(world): return getBestTask()
  10. 10. Slot Based Scheduling • N cpus, M map slots, R reduce slots – Memory cannot be oversubscribed! • How to divide? – M < N  not enough mappers at times – R < N  not enough reducers at times – N=M=R  enough memory to run 2N tasks ? • Reduce Tasks Problematic – Network Intensive to start, CPU wasted – Memory Intensive later
  11. 11. Long Running Reducers • Online Scheduling – No advance information of future workload • Greedy + Fair Scheduling – Schedule ASAP – Preempt if future workload disagrees • Long Running Reducers – Preemption causes restart and wasted work – No effective way to use short bursts of idle cpu
  12. 12. Optimistic Locking Task[] getBestTaskCandidates(): for pool: sortedPools for job: pool.sortedJobs.clone() for task: job.tasks.clone() synchronized(task): … processHeartbeat(): tasks = getBestTaskCandidates() synchronized(world): return acquireTasks(tasks)
  13. 13. Corona: Push Scheduling 1. JT subscribes for M maps and R reduces – Receives availability from Cluster Manager (CM) 2. CM publishes availability ASAP – Pushes events to JT 3. JT pushes tasks to available TT – In parallel
  14. 14. Corona/YARN: Scalability 1. JobTracker for each Job now Independent – More Fault Tolerant and Isolated as well 2. Centralized Cluster/Resource Manager – Must be super-efficient! 3. Fundamental Differences – – Corona ~ Latency YARN ~ Heterogenous workloads
  15. 15. Pesky Reducers • Hadoop 2 removes distinction between M and R slots • Not Enough – Reduce Tasks don’t use much CPU in shuffle – Still long running and bad to preempt  Re-architect to run millions of small Reducers
  16. 16. The Future is Cloudy • Data Center Assumption: – Cluster characteristics known – Job spec fits to cluster • In Cloud: – Cluster can grow/shrink, change node-type – Job Spec must be dynamic – Uniform task configuration untenable
  17. 17. Questions? joydeep@qubole.com http://www.linkedin.com/in/joydeeps

×