Scheduling on Large
Clusters
Based on Google’s Omega Paper
Sameer Tiwari
Hadoop Architect, Pivotal Inc.
stiwari@gopivotal....
Scheduling on Large Clusters
● Goals
o High Utilization
o Honor User Defined constraints
o Maintain High Efficiency
● Issu...
Types of Schedulers
● Monolithic
o Single Resource Manager and Scheduler
o Google Borg
● Two Level
o Single Resource Manag...
Monolithic Schedulers
● Stable, been around since 1990s
● Issues
o Head of line blocking
o Scalability is limited
o Popula...
Statically Partitioned Schedulers
● Common with Hadoop deployments
o Assumes full control of resources
o Dedicated or stat...
● AKA: Two-level Schedulers
o Resource Manager dynamically partitions a cluster
o Resources presented to partitions as “of...
● What type of scheduler is Hadoop YARN?
o App Master requests single RM, per job
o But, the App Master provides job-mgmt ...
● No external Resource Manager
● Each scheduler has full access to cluster
● A copy of the cluster state is at each schedu...
● Each scheduler, free to choose a policy
● Requires a common understanding of
o Resources
o Precedence
● Relies on post-f...
Questions?
Upcoming SlideShare
Loading in …5
×

Scheduling on large clusters - Google's Borg and Omega, YARN, Mesos

6,293 views

Published on

My read on the Google's Omega paper

0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,293
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
103
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide
  • Users can ask for colocation or ask for a particular rack or machine
    Efficiency is : Fast allocation
  • Works well with small jobs (<<cluster resources) and short lived jobs that give up resources frequently
  • Works well with small jobs (<<cluster resources) and short lived jobs that give up resources frequently
  • * Addresses two issues of the two-level scheduler approach
    – limited parallelism due to pessimistic concurrency control
    - restricted visibility of resources in a scheduler framework
    - no head-of-line blocking
    * Potential cost of redoing work when the optimistic concurrency assumptions are incorrect
    * Resource Hoarding not possible in an all-or-nothing resource allocation
    * To prevent starvation: Incremental transactions == accept all but conflicting txns
  • Scheduling on large clusters - Google's Borg and Omega, YARN, Mesos

    1. 1. Scheduling on Large Clusters Based on Google’s Omega Paper Sameer Tiwari Hadoop Architect, Pivotal Inc. stiwari@gopivotal.com, @sameertech
    2. 2. Scheduling on Large Clusters ● Goals o High Utilization o Honor User Defined constraints o Maintain High Efficiency ● Issues o Un-predictable load o Varying types of load o Increasing load and cluster size
    3. 3. Types of Schedulers ● Monolithic o Single Resource Manager and Scheduler o Google Borg ● Two Level o Single Resource Management and multiple schedulers o Mesos, Hadoop-on-Demand (HOD project) ● Shared state o Multiple schedulers with access to all resources o Google Omega
    4. 4. Monolithic Schedulers ● Stable, been around since 1990s ● Issues o Head of line blocking o Scalability is limited o Popular with HPC community  Maui -> Moab(R), Platform LSF (IBM) o Multi Path scheduling addresses some of these problems
    5. 5. Statically Partitioned Schedulers ● Common with Hadoop deployments o Assumes full control of resources o Dedicated or statically partitioned clusters ● Issues o Low utilization o Data fragmentation
    6. 6. ● AKA: Two-level Schedulers o Resource Manager dynamically partitions a cluster o Resources presented to partitions as “offers” o Partitions request resources as needed o e.g. Mesos and Hadoop on Demand (HOD) ● Issues o Pessimistic locking is used during allocation o Not suitable for “long running” jobs o Gang scheduling (e.g. MPI jobs) can cause deadlocks o Each scheduler has no idea about any other scheduler  Pre-emption is tricky Dynamic Schedulers
    7. 7. ● What type of scheduler is Hadoop YARN? o App Master requests single RM, per job o But, the App Master provides job-mgmt service, not scheduling o Effectively, its a Monolithic Scheduler Trivia
    8. 8. ● No external Resource Manager ● Each scheduler has full access to cluster ● A copy of the cluster state is at each scheduler ● Optimistic concurrency control o Updates are made atomically in a transaction o Only one commit will succeed o Failed transactions will try again ● Gang scheduling, will not result in resource hoarding Shared State Schedulers
    9. 9. ● Each scheduler, free to choose a policy ● Requires a common understanding of o Resources o Precedence ● Relies on post-facto enforcement ● Results in high utilization and efficiency Shared State Schedulers
    10. 10. Questions?

    ×