Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scheduling Policies in YARN

1,408 views

Published on

Scheduling Policies in YARN

Published in: Technology
  • Be the first to comment

Scheduling Policies in YARN

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduling Policies in YARN Wangda Tan, Varun Vasudev San Jose, June 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Who we are ⬢ Wangda Tan – Apache Hadoop PMC member ⬢ Varun Vasudev – Apache Hadoop committer
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda ⬢ Existing scheduling in YARN ⬢ Adding resource types and resource profiles ⬢ Resource scheduling for services ⬢ GUTS(Grand Unified Theory of Scheduling) API ⬢ Q & A
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Existing scheduling in YARN
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Current resource types ⬢ Currently only support scheduling based on memory and cpu ⬢ Depending on the calculator, scheduler will take cpu into account ⬢ Most applications are unaware of the resources being used for scheduling –Applications may not get the containers they expect due to a mismatch ⬢ No support for resources like gpu, disk, network
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Locality for containers ⬢ Applications can request for host or rack locality –If the request can’t be satisfied in a certain number of tries, the container is allocated on the next node to heartbeat –Good for MapReduce type applications ⬢ Insufficient for services –Services need support for affinity, anti-affinity, gang scheduling –Need support for fallback strategies
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Placement and capacity options ⬢ Node partitions –End up partitioning the cluster – akin to sub-clusters – Support for non-exclusive partitions is available ⬢ Reservations –Let you plan for capacity in advance –Help you guarantee capacity for high priority large jobs
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource types and resource profiles
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extending resource types in YARN ⬢ Add support for generalized resource types ⬢ Users can use configuration to add and remove resource types from the scheduler ⬢ Allows users to experiment with resource types –For resources like network, modeling is hard - should you use ops or bandwidth? –No need to touch the code ⬢ Current work is for countable resource types –Support for exclusive resource types(like ports) is future work
  10. 10. 1 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource profiles ⬢ Analogous to instance types in EC2 ⬢ Hard for users to conceptualize concepts like disk bandwidth –Collection of resource types –Allows admins to define a set of profiles that can users can use to request containers –Users don’t need to worry about resource types like disk bandwidth –New resource types can be added and removed without users needing to change their job submissions ⬢ Profiles are stored on the RM –users just pass on the name of the profile they want(“small”, “medium”, “large”) ⬢ YARN-3926 is the umbrella jira for the feature
  11. 11. 1 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource profiles examples resource-profiles.json { “minimum”: { “yarn.io/memory”: 1024, “yarn.io/cpu”: 1 }, “maximum”: { “yarn.io/memory”: 8192, “yarn.io/cpu”: 8 }, “default”: { “yarn.io/memory”: 2048, “yarn.io/cpu”: 2 } } resource-profiles.json { “minimum”: { “yarn.io/memory”: 1024, “yarn.io/cpu”: 1 }, “maximum”: { “yarn.io/memory”: 8192, “yarn.io/cpu”: 8 }, “default”: { “yarn.io/memory”: 2048, “yarn.io/cpu”: 2 } “small”: { “yarn.io/memory”: 1024, “yarn.io/cpu”: 1 }, “medium”: { “yarn.io/memory”: 3072, “yarn.io/cpu”: 3 }, “large”: { “yarn.io/memory”: 8192, “yarn.io/cpu”: 8 } }
  12. 12. 1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource Scheduling for Services
  13. 13. 1 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Affinity and Anti-affinity ⬢ Anti-Affinity –Some services don’t want their daemons run on the same host/rack for better fault recovering or performance. –For example, don’t run >1 HBase region server on the same fault zone. Overview
  14. 14. 1 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Affinity and Anti-affinity ⬢ Affinity –Some services want to run their daemons close to each other, etc. for performance. –For example, run Storm workers as close as possible for better data exchanging performance. (SW = Storm Worker) Overview
  15. 15. 1 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ⬢ Requirements –Be able to specify affinity/anti-affinity for intra/inter application(s) •Intra-application •Inter-application •Example of inter-application anti-affinity –Hard and soft affinity/anti-affinity •Hard: Reject not expected resources. •Soft: Best effort •Example of inter-application soft anti- affinity Requirements Affinity and Anti-affinity
  16. 16. 1 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Affinity and Anti-affinity ⬢ YARN-1042 is the umbrella JIRA ⬢ Demo
  17. 17. 1 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Affinity/Anti-affinity Demo
  18. 18. 1 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container Resizing ⬢ Use cases –Services can modify size of their running container according to workload changes. –For example: when HBase region servers are running, when workload changes . We can return excessive resources of RM to improve utilization. ⬢ Before this feature –Application has to re-ask container with different size from YARN. –Contexts in task memory will be lost. ⬢ Status –α-feature will be included by Hadoop 2.8 –YARN-1197 is the umbrella jira Overview
  19. 19. 1 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved1 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved GUTS (Grand Unified Theory of Scheduling) API
  20. 20. 2 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Requirements ⬢ We have more and more new scheduling requirements: –Scheduling fallbacks •Try plan-A first, fall back to plan-B if plan-A cannot be satisfied in X secs. •Currently YARN only supports one scheduling fallbacks: node/rack/off-switch fallbacks by delay scheduling, but user cannot specify order of fallbacks. – Affinity / Anti-affinity
  21. 21. 2 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Requirements –Node partitions •Already supported by YARN-796, which can divide a big cluster to several smaller clusters according to hardware and purpose, we can specify capacities and ACLs for node partitions. –Node constraints •Is a way to tag nodes without complexities like ACLs/capacity- configurations. (YARN-3409)
  22. 22. 2 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Requirements –Gang scheduling •Give me N containers at once or nothing. – Resource reservation •Give me resource at time T. This is supported since YARN-1051 (Hadoop 2.6), we need to consider unifying APIs. – Combination of above •Gang scheduling + anti-affinity: give me 10 containers at once but avoid nodes which have containers from application- X. •Scheduling fallbacks + node partition: give me 10 containers from partition X, if I cannot get them within 5 mins, any hosts are fine.
  23. 23. 2 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Problems of existing ResourceRequest API ⬢ Existing Resource Request API is not extensible –Cannot specify relationships between ResourceRequest –Fragmentation of resource request APIs •We have ResourceRequest (what I want now), BlacklistRequest (dislike), ReservationRequest (what I want in the future) API for different purposes.
  24. 24. 2 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Proposal ⬢ We need an unified API to specify resource requirements, following requirements will be considered: –Allocation tag •Tag the purpose of allocated container (like Hbase_regionserver) –Quantities of request • Total number of containers • Minimum concurrency (give me at least N containers at once) • Maximum concurrency (don’t give me more than N container at once) – Relationships between placement request • And/Or/Not: give me resource according to specified conditions • Order and delay of fallbacks: Try to allocate request#1 first, fall back to request#2 after waits for X seconds – Time: • Give me resource between [T1, T2]
  25. 25. 2 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved In simple words … ⬢ Application can use unified API to request resource with different constraints/conditions. ⬢ Easier to be understood, combination of resource requests can be supported. ⬢ Let’s see some examples:
  26. 26. 2 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Examples: ⬢ Gang scheduling: I want 8 containers allocate to me at once. ⬢ Reservation + anti-affinity: Give me 5 containers tomorrow and not on the same host of application_..._0005 “12345”: { // Allocation_id // Other fields.. // Quantity conditions allocation_size: 2G, maximum_allocations: 8, minimum_concurrency: 8, } “12345”: { // Allocation_id allocation_size = 1G, maximum_allocations = 5, placement_strategy: { NOT { // do not take me to this application target_app_id: application_123456789_0015 } }, time_conditions: { allocation_start_time: [ 10:50 pm tomorrow - *] } }
  27. 27. 2 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Examples: ⬢ Request with fallbacks: Try to allocate on GPU partition first, then fall back to any hosts after 5 mins. “567890”: { // allocation_id allocation_size: 2G, maximum_allocations = 10, placement_strategy: { ORDERED_OR [ { node_partition: GPU, delay_to_next: 5 min }, { host: * } ] } }
  28. 28. 2 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Status & Plan ⬢ Working on API definition to make sure it covers all target scenarios. ⬢ Will start POC soon ⬢ This should be a replacement of existing ResourceRequest API, old API will be kept and automatically converted to new request (old application will not be affected). ⬢ If you want to get more details, please take a look at design doc and discussions of YARN-4902.
  29. 29. 2 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q & A ⬢ Thank you!

×