YARN Federation

YARN Federation
(YARN-2915)
Subru Krishnan, Kishore Chaliparambil,
Carlo Curino, and Giovanni Fumarola
Microsoft

Who are we?
Large team:
• Cloud and Information Services Lab (CISL)
• Applied research group in large-scale systems and machine learning
• BigData Resource Management team
• Design, build and operate Microsoft’s big data infrastructure

Agenda
• YARN @MS
• Federation Architecture
• Policy space
• Demo

YARN @MS
Familiar Challenges:
• Diverse workloads (batch, interactive, services,…)
• Support for production SLAs
• ROI on cluster investments (utilization)
Special Challenges:
• Leverage existing strong infrastructure (Cosmos/Scope/REEF/Azure)
• Enable all OSS technologies
• Scale of first-party clusters (each can exceed 50k nodes)
• Public Cloud (security, number of tenants, service integration…)
Big Bet: Unified Resource Management through YARN (OSS)
+
Azure
+

YARN @MS: Innovate and Contribute
Problems
• Lack of SLAs for production jobs
• High utilization for a broad range of
workloads
• YARN scalability,
• Private cloud (from disjoint clusters)
• Cross-DC?
Our Solution…
• Rayon: resource reservation
framework (YARN-1051)
• Mercury: introduce container types
and node-level queueing (YARN-
2877)
• Federation: “federate” multiple
YARN clusters (YARN-2915)

YARN Federation in Apache
• Umbrella JIRA: YARN-2915
• Includes detailed design proposal and e2e patch
• Federation branch created and API patches posted
• You are welcome to join and contribute 
• Thanks: Wangda, Karthik, Vinod, Jian….

Next
YARN Federation Architecture
by Kishore Chaliparambil

YARN Federation
•Enables applications to scale to 100k of thousands of
nodes
•YARN Resource Manager (RM) is a single instance.
• Scalability of RM is affected by
• Cardinality: |nodes|, |apps|, |tasks|
• Frequency: NM and AM heartbeat intervals, task duration
•YARN is battle-tested on 4-8k nodes
•@Microsoft: >50k node clusters, short lived tasks
•So how does federation work?

Yarn Sub-Cluster #1 Yarn Sub-Cluster #3Yarn Sub-Cluster #2
RM
Task
RM
Task
RM
Task
AM RM Proxy Service
(Per Node)Policy StateRouter Service
YARN Client
Federation
Services
YARN
Sub Clusters
Servers in Datacenter
AM
AM
Federation Architecture
• Implements Client-RM Protocol
• Stateless, Scalable Service
• Multiple Instances with Load
Balancer
• Implements AM-RM Protocol
• Hosted in NM
• Intercepts all AM-RM
communications
• Sub-clusters are unmodified standalone
YARN clusters with about 6K nodes.
Start ContainersSubmit App
• Voila! Applications can transparently span
across multiple YARN sub clusters and scale
to Datacenter level
• No code change in any application
• Centralized, highly-available repository
• RDBMS, Zookeeper, HDFS,…

AM RM Proxy Service Internals
Node Manager
AM RM Proxy Service
Application Master
Per Application Pipeline (Interceptor Chain)
Federation Interceptor
Security/Throttling Interceptor
…
Home RM Proxy
Unmanaged AM
SC #2
Unmanaged AM
SC #3
SC #1 RM SC #2 RM SC#3 RM
• Hosted in NM
• Extensible Design
• DDoS Prevention
• Unmanaged AM used for container negotiation.
They are created on demand based on policy
• Code Committed to 2.8
Policy

Next
Federation Policies
by Carlo Curino

Yarn Sub-Cluster #1 Yarn Sub-Cluster #3Yarn Sub-Cluster #2
RM RM RM
AM RM Proxy Service
(Per Node)Policy StateRouter Service
YARN Client
Federation
Services
YARN
Sub Clusters
Servers in Datacenter
Federation: Policy Engine
Policy Engine
Federation
Admin APIs
Flexible policies
• Manually curated (to start)
• Automatically generated (later)
General enforcement mechanisms:
• Router
• AMRMProxy
• RM Schedulers

Federation Policies
Goal: efficiently operate a federated cluster
• Complex trade-offs: load balancing, scaling, global-invariants (fairness), tenant
isolation, fault-tolerance,…
Policies
• Input: user, reservation, queue, node labels, ResourceRequest, …
• State information: sub-clusters load, planned maintenance,…
• Output: routing/scheduling decisions (that determine all container allocations)

Tackling hard problems with policies
SC1 SC2 SC3 SC4
? ? ? ?
Global queue structure
Local enforcement
A hard problem:
How to transparently enable “global queues” via “local enforcement”?
R
A B C
A1
100%
25%25%25%
40% 60%
D 25%
A2

Spectrum of options: Full Partitioning
SC3SC2SC1 SC4
Policies: Router and AMRMProxy direct to single RM
Pros: perfect scale-out, isolation
Cons: fragmentation/utilization issues, max-size job, uneven impact of SC failures,…
R
A
A1
100%
100%
40% 60%A2
R
A B C
A1
100%
25%25%25%
40% 60%
D 25%
A2
R
B
100%
100%
R
C
100%
100%
R
C
100%
D 100%

Spectrum of options: Full Replication
SC4SC1 SC2 SC3
Policies: Router (round-robin/random), and AMRMProxy fwd to RMs based on
locality of Resource Request
Pros: simple, symmetric, fair (if all jobs broadcast demand), resilient
Cons: scalability in #jobs, …  (heuristics improvements)
R
A B C
A1
100%
25%25%25%
40% 60%
D 25%
A2
R
A B C
A1
100%
25%25%25%
40% 60%
D 25%
A2
R
A B C
A1
100%
25%25%25%
40% 60%
D 25%
A2
R
A B C
A1
100%
25%25%25%
40% 60%
D 25%
A2
R
A B C
A1
100%
25%25%25%
40% 60%
D 25%
A2

Spectrum of options: Dynamic Partial Replication
SC3SC2SC1 SC4
Policies: Router (round-robin/random on subset of RMs), and AMRMProxy fwd to
RMs based on locality of ResourceRequest (on subset of RMs)
Pros: trade-off between advantages of replication/partitioning
Cons: complexity / rebalancing  could use dynamic approach
R
A B C
A1
100%
25%25%25%
40% 60%
D 25%
A2
R
A
A1
100%
50%
80%
D 50%
R
A
100%
50%
100%
C 50%
A2
R
B
100%
100%
R
C
100%
50% D 50%
20%A2

Demo
Show basic job running across sub-clusters
Show some UIs and ops commands
Showcase user-based, partially-replicated, routing policy
• Router: random-weighted among a set of sub-clusters…
• AMRMProxy: broadcast request to set of sub-clusters…

Next
YARN Federation Demo
by Giovanni Fumarola

YARN Federation

More Related Content

What's hot

Viewers also liked

Similar to YARN Federation

More from DataWorks Summit/Hadoop Summit

Recently uploaded

YARN Federation