1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop YARN:
Capacity Scheduler Improvements
June 2017
Sunil Govindan and Junping Du
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About us
Sunil G
Hortonworks
Apache Hadoop Committer
Junping Du
Hortonworks
Apache Hadoop PMC and Committer
junping_du@apache.org sunilg@apache.org
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
⬢ Overview: Capacity Scheduler
⬢ Current features in Capacity Scheduler
⬢ Ongoing work in Capacity Scheduler
⬢ Q & A
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop YARN Scheduler
Inter/Intra queue pre-emption
Application
Queue B – 25%
Queue C – 25%
Label: SAS (exclusive)
Queue A – 50%
FIFO
ResourceManager
(active)
Application, Queue A, 4G, 1 vcore
Reservation for application
User
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Feature Overview (Available)
⬢ Priority support for Queue and Application
– Application Priority
– Queue Priority
⬢ Preemption Support
– Improvement in Inter Queue preemption model
– Intra Queue preemption support based on application priority and user-limit
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Application/Queue Priority
⬢ Overview
– Available as part of Hadoop 2.8 release
– Done in YARN-1963 and YARN-5864
⬢ Application Priority
– Execute some YARN applications at higher priority, regardless of other applications running
– Helps to avoid creating multiple queues for priority
– Enable support to set priority at each application level and dynamically change at runtime
– Control the users who are abusing higher priorities via ACLs configurable at queue level
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Application/Queue Priority
⬢ Queue Priority
– Currently queues are ordered according to relative used-capacities
– Possibility of scarce resources allocation to less-important apps first
• Latency sensitivity
• Resource fragmentation for large-container apps
– Queue priority helps to configure a higher integer value to critical queue such as long running
service queue.
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Preemption Support
⬢ Resource Preemption
– Resource Preemption in YARN allows businesses to
• maximize the use of their cluster compute power
• decrease compute time for most applications
• ensure that resources are available in a timely manner for critical applications.
⬢ Inter-Queue Preemption
– helps in scenarios such as
 Over-committed cluster where queue elasticity is used extensively
 Starving applications present in under-served queues for resources
– addresses queue starvation by finding the best suited resources for under served queue
– Improved support for reservations and queue-priority (Available as part of Hadoop 2.8 release)
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Preemption Support
⬢ Intra-Queue Preemption
– Available as part of Hadoop 2.8 and done in YARN-2009 and YARN-2113
– helps in scenarios such as when
 Lower priority applications consumed Queue’s entire quota starving higher priority apps
 Few users could consume entire user quota to starve other user’s applications
–“normalize resources based on application’s priority and user-limit within the queue”
1
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Preemption Support
⬢ Intra-Queue Preemption based on Application Priority
Before
Input:
app1 , p1, u1 <pending=20 , used=50 >
app2 , p1, u1 <pending=20 , used= 20>
app3 , p3, u1 <pending=50 , used= 0>
Configuration:
intra-queue-preemption.enabled = true
root.qA.capacity = 70%
root.qB.capacity = 30%
Cluster resource = 100 (qA.used=70, qB.used=30)
After
Preempted:
app1 , p1, u1 <preempted=31, used=19>
app2 , p1, u1 <preempted=19, used=1>
app3 , p3, u1 <pending=0 , used=50>
Analysis:
⬢ 30 resources were preempted from app1 and app2
⬢ App2’s AM container got spared.
Pending
Used
app1 app2 app3 app1 app2 app3
p3
Preempted
1
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Preemption Support
⬢ Intra-Queue Preemption based on User Limit
Before
Input:
app1 , p1, u1 <pending=20 , used=25 >
app2 , p1, u2 <pending=20 , used= 25>
app3 , p1, u3 <pending=30 , used= 50>
Configuration:
intra-queue-preemption.enabled = true
root.qA.capacity = 10%
User-limit = 33%
Cluster resource = 100 (qA.used=100)
After
Preempted:
app1 , p1, u1 <preempted=0, used=33>
app2 , p1, u2 <preempted=0, used=33>
app3 , p1, u3 < preempted=16, used=34>
Analysis:
⬢ 16 resources were preempted from app3
⬢ App1 and app2 shared these preempted resources.
Pending
Used
app1 app2 app3 app1 app2 app3
u3
Preempted
u2u1
1
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Feature Overview (Ongoing)
⬢ Global Scheduling Support
– Scheduling placement support
– 8x performance improvement
⬢ Absolute Resource Configuration support
– Capacity scheduler queue planning was based on percentage
– Introduce Absolute resource configuration per queue level
– Interaction with Resource profile feature
1
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Global Scheduling Support
⬢ Global Scheduling Support
– Available as part of Hadoop 3.0 Alpha-3 release and done in YARN-5139
– Better support for placing resource requests for applications
– 8x performance improvement
⬢ Overview
– Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions.
– Considering future complex resource placement requirements, such as node constraints (give
me "a && b || c") or anti-affinity (“do not allocate HBase regionsevers and Storm workers on the
same host”), YARN scheduler is moving towards Global Scheduling.
– With global scheduling, YARN scheduler should be able to look at more nodes and select the
best nodes based on application requirements unlike existing schedulers.
1
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Global Scheduling Support
⬢ Design Overview
– Allocation decision on multiple nodes instead of
single node.
– Improved locking mechanism to allow multiple
allocation threads looking at cluster states to create
allocation proposals.
– Each of these allocation-proposal will be sent to
scheduler to commit or reject.
⬢ This design is not finalized to consider few more
options. YARN-6592
1
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Absolute Resource Configuration
⬢ Current Model
– Use Queues to manage resource-usage by different
team/department/BU and different workloads
– Using percentages
– Hierarchical queues helps for better queue
management
– Work is getting done as part of YARN-5881
root
(100%)
Sales
(30%)
Engineering
(65%)
Default
(5%)
Dev
(50%)
QE
(50%)
1
6
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Absolute Resource Configuration
⬢ Issues in current model
– Resource management through percentages is not easy
 works perfectly fine when the ratio between queues are fixed
 not easy for admins who want fine control of resources of queues
 With nodes getting added or removed, it will be tougher to set a specific resource limit for
specific queue
– One percentage value for all resource-types
– Sum of the min-resources of all the queues must be 100%
1
7
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacity Scheduler : Absolute Resource Configuration
⬢ New Approach
– Can specify absolute resource values as min-resource to
queues.
– For better elasticity, also can support specify absolute
resource values as max-resource to queues.
– parent.min-resource >= Σ(child.min-resource).
– “relax the exactly-sum-to-100% requirement of today”
⬢ Challenges of Absolute Configuration
– Ensuring SLAs when cluster scales down
– Handling min-resources when cluster scales up
root
[memory=100Gi,vcores=100]
Sales
[memory=30Gi,v
cores=30]
Engineering
[memory=65Gi,
vcores=60]
Default
[memory=1Gi,
vcores=1]
Dev
[memory=30Gi,
vcores=50]
QE
[memory=35Gi,
vcores=10]
1
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved1
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ongoing Features in YARN and CS
1
9
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN features : Capacity scheduler
⬢ New YARN UI (YARN-3368)
– Available as part of Hadoop 3.0 Alpha-3 release
– Support for native-service to launch service
– Makes Hadoop YARN much easier to manage
⬢ Resource profiles (YARN-3926)
– Ongoing effort to support different resources
– Performance improvement
⬢ Application Timeout (YARN-3813)
– Control lifetime of an application by YARN
– Helps to kill applications which runs over the limited time allotted
⬢ Opportunistic Containers
– Two approaches: Distributed Scheduling (YARN-2877) and Centralized (YARN-5220)
2
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ongoing work
⬢ GPU Scheduling (YARN-3926)
– Ongoing effort to support GPU while scheduling
– GPU isolation per core
⬢ Distributed scheduling
–YARN-2877, YARN-4742
–NMs run as a local scheduler
–Allows faster scheduling turnaround
⬢ Better support for disk and network isolation (YARN-2619, YARN-2140)
–Tied to supporting arbitrary resource types
2
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN features : New UI (Dashboard)
2
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN features : New UI (Queues)
2
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN features : New UI (Applications)
2
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved2
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you!

Jun 2017 HUG: YARN Scheduling – A Step Beyond

  • 1.
    1 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache Hadoop YARN: Capacity Scheduler Improvements June 2017 Sunil Govindan and Junping Du
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved About us Sunil G Hortonworks Apache Hadoop Committer Junping Du Hortonworks Apache Hadoop PMC and Committer junping_du@apache.org sunilg@apache.org
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda ⬢ Overview: Capacity Scheduler ⬢ Current features in Capacity Scheduler ⬢ Ongoing work in Capacity Scheduler ⬢ Q & A
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache Hadoop YARN Scheduler Inter/Intra queue pre-emption Application Queue B – 25% Queue C – 25% Label: SAS (exclusive) Queue A – 50% FIFO ResourceManager (active) Application, Queue A, 4G, 1 vcore Reservation for application User
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Capacity Scheduler : Feature Overview (Available) ⬢ Priority support for Queue and Application – Application Priority – Queue Priority ⬢ Preemption Support – Improvement in Inter Queue preemption model – Intra Queue preemption support based on application priority and user-limit
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Capacity Scheduler : Application/Queue Priority ⬢ Overview – Available as part of Hadoop 2.8 release – Done in YARN-1963 and YARN-5864 ⬢ Application Priority – Execute some YARN applications at higher priority, regardless of other applications running – Helps to avoid creating multiple queues for priority – Enable support to set priority at each application level and dynamically change at runtime – Control the users who are abusing higher priorities via ACLs configurable at queue level
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved Capacity Scheduler : Application/Queue Priority ⬢ Queue Priority – Currently queues are ordered according to relative used-capacities – Possibility of scarce resources allocation to less-important apps first • Latency sensitivity • Resource fragmentation for large-container apps – Queue priority helps to configure a higher integer value to critical queue such as long running service queue.
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved Capacity Scheduler : Preemption Support ⬢ Resource Preemption – Resource Preemption in YARN allows businesses to • maximize the use of their cluster compute power • decrease compute time for most applications • ensure that resources are available in a timely manner for critical applications. ⬢ Inter-Queue Preemption – helps in scenarios such as  Over-committed cluster where queue elasticity is used extensively  Starving applications present in under-served queues for resources – addresses queue starvation by finding the best suited resources for under served queue – Improved support for reservations and queue-priority (Available as part of Hadoop 2.8 release)
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Capacity Scheduler : Preemption Support ⬢ Intra-Queue Preemption – Available as part of Hadoop 2.8 and done in YARN-2009 and YARN-2113 – helps in scenarios such as when  Lower priority applications consumed Queue’s entire quota starving higher priority apps  Few users could consume entire user quota to starve other user’s applications –“normalize resources based on application’s priority and user-limit within the queue”
  • 10.
    1 0 © Hortonworks Inc.2011 – 2016. All Rights Reserved Capacity Scheduler : Preemption Support ⬢ Intra-Queue Preemption based on Application Priority Before Input: app1 , p1, u1 <pending=20 , used=50 > app2 , p1, u1 <pending=20 , used= 20> app3 , p3, u1 <pending=50 , used= 0> Configuration: intra-queue-preemption.enabled = true root.qA.capacity = 70% root.qB.capacity = 30% Cluster resource = 100 (qA.used=70, qB.used=30) After Preempted: app1 , p1, u1 <preempted=31, used=19> app2 , p1, u1 <preempted=19, used=1> app3 , p3, u1 <pending=0 , used=50> Analysis: ⬢ 30 resources were preempted from app1 and app2 ⬢ App2’s AM container got spared. Pending Used app1 app2 app3 app1 app2 app3 p3 Preempted
  • 11.
    1 1 © Hortonworks Inc.2011 – 2016. All Rights Reserved Capacity Scheduler : Preemption Support ⬢ Intra-Queue Preemption based on User Limit Before Input: app1 , p1, u1 <pending=20 , used=25 > app2 , p1, u2 <pending=20 , used= 25> app3 , p1, u3 <pending=30 , used= 50> Configuration: intra-queue-preemption.enabled = true root.qA.capacity = 10% User-limit = 33% Cluster resource = 100 (qA.used=100) After Preempted: app1 , p1, u1 <preempted=0, used=33> app2 , p1, u2 <preempted=0, used=33> app3 , p1, u3 < preempted=16, used=34> Analysis: ⬢ 16 resources were preempted from app3 ⬢ App1 and app2 shared these preempted resources. Pending Used app1 app2 app3 app1 app2 app3 u3 Preempted u2u1
  • 12.
    1 2 © Hortonworks Inc.2011 – 2016. All Rights Reserved Capacity Scheduler : Feature Overview (Ongoing) ⬢ Global Scheduling Support – Scheduling placement support – 8x performance improvement ⬢ Absolute Resource Configuration support – Capacity scheduler queue planning was based on percentage – Introduce Absolute resource configuration per queue level – Interaction with Resource profile feature
  • 13.
    1 3 © Hortonworks Inc.2011 – 2016. All Rights Reserved Capacity Scheduler : Global Scheduling Support ⬢ Global Scheduling Support – Available as part of Hadoop 3.0 Alpha-3 release and done in YARN-5139 – Better support for placing resource requests for applications – 8x performance improvement ⬢ Overview – Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions. – Considering future complex resource placement requirements, such as node constraints (give me "a && b || c") or anti-affinity (“do not allocate HBase regionsevers and Storm workers on the same host”), YARN scheduler is moving towards Global Scheduling. – With global scheduling, YARN scheduler should be able to look at more nodes and select the best nodes based on application requirements unlike existing schedulers.
  • 14.
    1 4 © Hortonworks Inc.2011 – 2016. All Rights Reserved Capacity Scheduler : Global Scheduling Support ⬢ Design Overview – Allocation decision on multiple nodes instead of single node. – Improved locking mechanism to allow multiple allocation threads looking at cluster states to create allocation proposals. – Each of these allocation-proposal will be sent to scheduler to commit or reject. ⬢ This design is not finalized to consider few more options. YARN-6592
  • 15.
    1 5 © Hortonworks Inc.2011 – 2016. All Rights Reserved Capacity Scheduler : Absolute Resource Configuration ⬢ Current Model – Use Queues to manage resource-usage by different team/department/BU and different workloads – Using percentages – Hierarchical queues helps for better queue management – Work is getting done as part of YARN-5881 root (100%) Sales (30%) Engineering (65%) Default (5%) Dev (50%) QE (50%)
  • 16.
    1 6 © Hortonworks Inc.2011 – 2016. All Rights Reserved Capacity Scheduler : Absolute Resource Configuration ⬢ Issues in current model – Resource management through percentages is not easy  works perfectly fine when the ratio between queues are fixed  not easy for admins who want fine control of resources of queues  With nodes getting added or removed, it will be tougher to set a specific resource limit for specific queue – One percentage value for all resource-types – Sum of the min-resources of all the queues must be 100%
  • 17.
    1 7 © Hortonworks Inc.2011 – 2016. All Rights Reserved Capacity Scheduler : Absolute Resource Configuration ⬢ New Approach – Can specify absolute resource values as min-resource to queues. – For better elasticity, also can support specify absolute resource values as max-resource to queues. – parent.min-resource >= Σ(child.min-resource). – “relax the exactly-sum-to-100% requirement of today” ⬢ Challenges of Absolute Configuration – Ensuring SLAs when cluster scales down – Handling min-resources when cluster scales up root [memory=100Gi,vcores=100] Sales [memory=30Gi,v cores=30] Engineering [memory=65Gi, vcores=60] Default [memory=1Gi, vcores=1] Dev [memory=30Gi, vcores=50] QE [memory=35Gi, vcores=10]
  • 18.
    1 8 © Hortonworks Inc.2011 – 2016. All Rights Reserved1 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ongoing Features in YARN and CS
  • 19.
    1 9 © Hortonworks Inc.2011 – 2016. All Rights Reserved YARN features : Capacity scheduler ⬢ New YARN UI (YARN-3368) – Available as part of Hadoop 3.0 Alpha-3 release – Support for native-service to launch service – Makes Hadoop YARN much easier to manage ⬢ Resource profiles (YARN-3926) – Ongoing effort to support different resources – Performance improvement ⬢ Application Timeout (YARN-3813) – Control lifetime of an application by YARN – Helps to kill applications which runs over the limited time allotted ⬢ Opportunistic Containers – Two approaches: Distributed Scheduling (YARN-2877) and Centralized (YARN-5220)
  • 20.
    2 0 © Hortonworks Inc.2011 – 2016. All Rights Reserved Ongoing work ⬢ GPU Scheduling (YARN-3926) – Ongoing effort to support GPU while scheduling – GPU isolation per core ⬢ Distributed scheduling –YARN-2877, YARN-4742 –NMs run as a local scheduler –Allows faster scheduling turnaround ⬢ Better support for disk and network isolation (YARN-2619, YARN-2140) –Tied to supporting arbitrary resource types
  • 21.
    2 1 © Hortonworks Inc.2011 – 2016. All Rights Reserved YARN features : New UI (Dashboard)
  • 22.
    2 2 © Hortonworks Inc.2011 – 2016. All Rights Reserved YARN features : New UI (Queues)
  • 23.
    2 3 © Hortonworks Inc.2011 – 2016. All Rights Reserved YARN features : New UI (Applications)
  • 24.
    2 4 © Hortonworks Inc.2011 – 2016. All Rights Reserved2 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you!