Never late again! Job-Level deadline SLOs in YARN

Never late again!
Job-Level deadline SLOs in YARN
Subru Krishnan, Carlo Curino
OSDI 2016 paper: https://aka.ms/morpheus-osdi-2016
JIRA: https://issues.apache.org/jira/browse/YARN-5326

Context
Team
 CISL: Carlo, Subru, Sriram
 MSR: Ishai, Inigo, Jana
 Students: Sangeetha, Alexey, Jonathan, Ruslan
CISL resource-management research agenda
 Rayon/Morpheus: support SLOs via resource reservations
JIRA: YARN-1051, Publications: SoCC 2014, OSDI 2016
 Mercury/Yaq: boost utilization via container types and node-level queueing
JIRA: YARN-2877, Publications: ATC 2015, Eurosys 2016
 Federation: scale out YARN by “federating” multiple clusters
JIRA: YARN-2915

Notation
Jii
100%
 Graphical notation:

Operators’ goal
Maximize ROI by:
1. Run as many jobs as possible
100% Utilization

Operators’ goal
Maximize ROI by:
1. Run as many jobs as possible
2. Work-preserving sharing: fairness, bonus token, over-capacity,…
100% Utilization

Users want/expect “predictability”
Periodic production jobs should “run today like they did yesterday”

Users’ pain point
Periodic production jobs should “run today like they did yesterday”…
But they don’t!
Sources of unpredictability:
1. sharing-induced unpredictability (opportunistic resources, queueing)

Users’ pain point
Periodic production jobs should “run today like they did yesterday”…
But they don’t!
Sources of unpredictability:
1. sharing-induced unpredictability (opportunistic resources, queueing)
2. Inherent unpredictability (failures, skew, stragglers, hw-changes,…)
TPC-H Q1
running isolated
(Unpredictability makes up for 25% of resource management escalations!)

Users’ reaction
Practical workarounds:
1. Heavy over-provisioning 
2. Manual policing
Cosmo11
Nov. 2015

Problem & Opportunity
PROBLEM
 Existing systems (must?) trade utilization for predictability
 Zero-sum game trade-off (operators vs users)
OPPORTUNITY
 Focus on periodic jobs* and use rich history to:
 Model user expectations (target SLO)
 Model job resource demands (resources required to meet SLO)
 We control the scheduler:
 Leverage the notion of “reservation” to enforce SLOs
 Retain/boost cluster utilization
* We support non-periodic and best-effort jobs as well.

Overview
 Make implicit user expectations explicitly
 By mining historical logs for candidate SLOs and job resource models
 Eliminate sharing-induced unpredictability
 By reserving capacity at the right time in the “cluster agenda”  periodic reservations
 Mitigate inherent unpredictability
 By dynamically re-provisioning the reservation
 Retain high-utilization
 Tight skylines (in job resource model) + cost-based planning algorithms

Extracting target SLOs
 Isolate Periodic jobs (shared normalized
name, script similarity, submission regularity)
 Extracting SLOs from Provenance Graph (PG)
 Estimate a and d using random variables:
 𝑇𝑖𝑛𝐴𝑣𝑎𝑖𝑙, 𝑇𝑠𝑡𝑎𝑟𝑡, 𝑇𝑒𝑛𝑑, 𝑇𝑜𝑢𝑡𝑅𝑒𝑎𝑑
A B
Z
Job completion time of A
Output consumption time of B
a d
X
Y

Validation of SLO extraction
 Deadlines we derived are “real”
 Given job pairs 𝐴 → 𝐵, such that B is the first consumer
of A’s output
 𝑷 𝑩 𝒇𝒂𝒊𝒍 𝑨 𝒎𝒊𝒔𝒔𝑺𝑳𝑶 ≈ 𝑷 𝑩 𝒇𝒂𝒊𝒍 𝑨 𝒇𝒂𝒊𝒍 > 𝟒 × 𝑷(𝑩 𝒇𝒂𝒊𝒍|𝑨 𝒎𝒆𝒆𝒕𝑺𝑳𝑶)
 Deadline are “useful”
 70 % of actionable deadlines (non-trivial slack between
job end and deadline)
A B

Job Resource Modeling (1/2)
 Collect usage patterns or skylines
of periodic job instances
 Run in the past
 Find a skyline that best fits
 Collected skylines
 Balances cost and runtime
variation
 Via a parameter
Past instances of
the job
Best fitting pattern as inferred by Acetone

Job Resource Modeling (2/2)
 Solve an LP
 Penalize Over/Under allocation
 Weighted by a configurable
parameter 𝛼
 Varying 𝛼
 Cheapest with large potential
runtime variation
 Expensive with lower risk for
runtime variation
 Can be exploited by
Operator/Designer

Packing multiple Periodic jobs
 Storing a separate reservation for each instance
is prohibitive (tens of millions of jobs)
 Fix offset for each job produces a repeating
pattern
 Identify and Store the smallest repeating unit
(Least Common Multiple LCM of period)
 Pack jobs with cost-based greedy online
algorithm (see paper)
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1'
2'
5'
10'
15'
30'
45'
1h
1.5h
2h
3h
4h
6h
8h
12h
1d
2d
3d
4d
1w
portionoftotal(%)
periodicity
periodic jobs
instances

Dynamic Re-provisioning
 Reservations eliminate sharing-induced unpredictability but
 Provide little resistance to inherent unpredictability
 Lots of opportunities for white-box models (e.g., Jockey,
PerfOrator)
 Simple reactive mechanism works well:
 “Stretch out skyline in time”
 Extend each stage (a little at a time)
 Validate via packing algo that we can fit
 (practical bounds on max extension) TPCH-Q1
(100 runs)

Experiments & Results:
Job Modeling, Packing, Dynamic Re-provisioning
 Simulation over Production Trace
 50K nodes cluster, 1month, several millions of job instances, ~100k periodic jobs
Job Resource Model

Experiments & Results: (good) trade-offs

Experiments: Testing at Scale
 Scale experiments
 2700 nodes cluster
 Plot utilization as tot memory*
 ~200 concurrent reservations
 (took some massaging to convince scheduler )
 Further scaling achieved via federation
 (separate talk)
* To account for varying container sizes

Conclusions
 We presented Morpheus, a system designed to resolve the tension
between predictability and utilization
 Three key ideas:
 Automatically derive SLOs from historical data
 Rely on recurrent reservations and packing algorithms to meet SLOs
 Dynamically re-provision resources to mitigate inherent execution variance
 We show a potential 5x-13x reduction in SLO violations while also
reducing the cluster size by 14-28%
 Overall, Morpheus enables predictable performance without
compromising utilization
 A win-win for both operators and users

Constructing Provenance+Telemetry Graph
Writes
Job A File 1
Properties
example for
Jobs:
GUID,
StartTime,
EndTime,
Priority,
User,
Status,
ErrorCode,
TotalSlotHours,
TotalCpuTime
TotalVertices,
…
Properties
example for
File:
GUID,
StartTime,
EndTime,
Blocks Num,
…
Relationships:
Read,
Write,
Part Of,
Located In,
…
Relationship
properties (for
Writes):
StartTime,
EndTime,
WriteSize

Never late again! Job-Level deadline SLOs in YARN

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Never late again! Job-Level deadline SLOs in YARN

Similar to Never late again! Job-Level deadline SLOs in YARN (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Never late again! Job-Level deadline SLOs in YARN