Successfully reported this slideshow.

Scheduling on large clusters - Google's Borg and Omega, YARN, Mesos



Loading in …3
1 of 10
1 of 10

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Scheduling on large clusters - Google's Borg and Omega, YARN, Mesos

  1. 1. Scheduling on Large Clusters Based on Google’s Omega Paper Sameer Tiwari Hadoop Architect, Pivotal Inc., @sameertech
  2. 2. Scheduling on Large Clusters ● Goals o High Utilization o Honor User Defined constraints o Maintain High Efficiency ● Issues o Un-predictable load o Varying types of load o Increasing load and cluster size
  3. 3. Types of Schedulers ● Monolithic o Single Resource Manager and Scheduler o Google Borg ● Two Level o Single Resource Management and multiple schedulers o Mesos, Hadoop-on-Demand (HOD project) ● Shared state o Multiple schedulers with access to all resources o Google Omega
  4. 4. Monolithic Schedulers ● Stable, been around since 1990s ● Issues o Head of line blocking o Scalability is limited o Popular with HPC community  Maui -> Moab(R), Platform LSF (IBM) o Multi Path scheduling addresses some of these problems
  5. 5. Statically Partitioned Schedulers ● Common with Hadoop deployments o Assumes full control of resources o Dedicated or statically partitioned clusters ● Issues o Low utilization o Data fragmentation
  6. 6. ● AKA: Two-level Schedulers o Resource Manager dynamically partitions a cluster o Resources presented to partitions as “offers” o Partitions request resources as needed o e.g. Mesos and Hadoop on Demand (HOD) ● Issues o Pessimistic locking is used during allocation o Not suitable for “long running” jobs o Gang scheduling (e.g. MPI jobs) can cause deadlocks o Each scheduler has no idea about any other scheduler  Pre-emption is tricky Dynamic Schedulers
  7. 7. ● What type of scheduler is Hadoop YARN? o App Master requests single RM, per job o But, the App Master provides job-mgmt service, not scheduling o Effectively, its a Monolithic Scheduler Trivia
  8. 8. ● No external Resource Manager ● Each scheduler has full access to cluster ● A copy of the cluster state is at each scheduler ● Optimistic concurrency control o Updates are made atomically in a transaction o Only one commit will succeed o Failed transactions will try again ● Gang scheduling, will not result in resource hoarding Shared State Schedulers
  9. 9. ● Each scheduler, free to choose a policy ● Requires a common understanding of o Resources o Precedence ● Relies on post-facto enforcement ● Results in high utilization and efficiency Shared State Schedulers
  10. 10. Questions?

Editor's Notes

  • Users can ask for colocation or ask for a particular rack or machine
    Efficiency is : Fast allocation
  • Works well with small jobs (<<cluster resources) and short lived jobs that give up resources frequently
  • Works well with small jobs (<<cluster resources) and short lived jobs that give up resources frequently
  • * Addresses two issues of the two-level scheduler approach
    – limited parallelism due to pessimistic concurrency control
    - restricted visibility of resources in a scheduler framework
    - no head-of-line blocking
    * Potential cost of redoing work when the optimistic concurrency assumptions are incorrect
    * Resource Hoarding not possible in an all-or-nothing resource allocation
    * To prevent starvation: Incremental transactions == accept all but conflicting txns
  • ×