3. Optimize The Jobs!
● Internal Downsizer tool quantifies job waste
● Application framework limitations
● Optimally tuned container can still have opportunities
Time
ContainerUtilization
Underutilized
Resources
4. What about Static Overcommit?
● Configure YARN to use more memory than node provides
● Tried with some success
● Performs very poorly when node fully utilized
5. Overcommit Prototype Design Goals
● No changes to applications
● Minimize changes to YARN protocols
● Minimize changes to scheduler internals
● Overcommit on memory only
● Conservative growth
● Rapid correction
8. ResourceManager Overcommit Tunables
Parameter Description Value
memory.max-factor Maximum amount a node will be overcommitted 1.5
memory.low-water-mark Maximum overcommit below this node utilization 0.6
memory.high-water-mark No overcommit above this node utilization 0.8
memory.increment-mb Maximum increment above node allocation 16384
increment-period-ms Delay between overcommit increments if node
container state does not change
0
Parameters use yarn.resourcemanager.scheduler.overcommit. prefix
9. NodeManager Self-Preservation Preemption
Node Utilization
High Water Mark
Low Water Mark
● Utilization above high mark triggers preemption
● Preempts enough to reach low mark utilization
● Does not preempt containers below original node size
● Containers preempted in group order
○ Tasks from preemptable queue
○ ApplicationMasters from preemptable queue
○ Tasks from non-preemptable queue
○ ApplicationMasters from non-preemptable queue
● Youngest containers preempted first within a group
0%
100%
10. NodeManager Overcommit Tunables
Parameter Description Value
memory.high-water-mark Preemption when above this utilization 0.95
memory.low-water-mark Target utilization after preemption 0.92
Parameters use yarn.nodemanager.resource-monitor.overcommit. prefix
13. Lessons Learned
● Significant overcommit achievable on real workloads
● Far less preemption than expected
● Container reservations can drive overcommit growth
● Coordinated reducers can be a problem
● Cluster totals over time can be a bit confusing at first
14. Future Work
● YARN-5202
● Only grows cluster as a whole not individual queues
● Nodes can overcommit while others are relatively idle
● CPU overcommit
● Predict growth based on past behavior
● Relinquish nodes during quiet periods
● Integration with YARN-1011
15. YARN-1011
● Explicit GUARANTEED vs. OPPORTUNISTIC distinction
● Promotion of containers once resources are available
● SLA guarantees along with best-effort load
16. Acknowledgements
● Nathan Roberts for co-developing overcommit POC
● Inigo Goiri for nodemanager utilization collection and reporting
● Giovanni Matteo Fumarola for nodemanager AM container detection
● YARN-1011 contributors for helping to shape the long-term solution