Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

YARN Federation

1,183 views

Published on

YARN Federation

Published in: Technology

YARN Federation

  1. 1. YARN Federation (YARN-2915) Subru Krishnan, Kishore Chaliparambil, Carlo Curino, and Giovanni Fumarola Microsoft
  2. 2. Who are we? Large team: • Cloud and Information Services Lab (CISL) • Applied research group in large-scale systems and machine learning • BigData Resource Management team • Design, build and operate Microsoft’s big data infrastructure
  3. 3. Agenda • YARN @MS • Federation Architecture • Policy space • Demo
  4. 4. YARN @MS Familiar Challenges: • Diverse workloads (batch, interactive, services,…) • Support for production SLAs • ROI on cluster investments (utilization) Special Challenges: • Leverage existing strong infrastructure (Cosmos/Scope/REEF/Azure) • Enable all OSS technologies • Scale of first-party clusters (each can exceed 50k nodes) • Public Cloud (security, number of tenants, service integration…) Big Bet: Unified Resource Management through YARN (OSS) + Azure +
  5. 5. YARN @MS: Innovate and Contribute Problems • Lack of SLAs for production jobs • High utilization for a broad range of workloads • YARN scalability, • Private cloud (from disjoint clusters) • Cross-DC? Our Solution… • Rayon: resource reservation framework (YARN-1051) • Mercury: introduce container types and node-level queueing (YARN- 2877) • Federation: “federate” multiple YARN clusters (YARN-2915)
  6. 6. YARN Federation in Apache • Umbrella JIRA: YARN-2915 • Includes detailed design proposal and e2e patch • Federation branch created and API patches posted • You are welcome to join and contribute  • Thanks: Wangda, Karthik, Vinod, Jian….
  7. 7. Next YARN Federation Architecture by Kishore Chaliparambil
  8. 8. YARN Federation •Enables applications to scale to 100k of thousands of nodes •YARN Resource Manager (RM) is a single instance. • Scalability of RM is affected by • Cardinality: |nodes|, |apps|, |tasks| • Frequency: NM and AM heartbeat intervals, task duration •YARN is battle-tested on 4-8k nodes •@Microsoft: >50k node clusters, short lived tasks •So how does federation work?
  9. 9. Yarn Sub-Cluster #1 Yarn Sub-Cluster #3Yarn Sub-Cluster #2 RM Task RM Task RM Task AM RM Proxy Service (Per Node)Policy StateRouter Service YARN Client Federation Services YARN Sub Clusters Servers in Datacenter AM AM Federation Architecture • Implements Client-RM Protocol • Stateless, Scalable Service • Multiple Instances with Load Balancer • Implements AM-RM Protocol • Hosted in NM • Intercepts all AM-RM communications • Sub-clusters are unmodified standalone YARN clusters with about 6K nodes. Start ContainersSubmit App • Voila! Applications can transparently span across multiple YARN sub clusters and scale to Datacenter level • No code change in any application • Centralized, highly-available repository • RDBMS, Zookeeper, HDFS,…
  10. 10. AM RM Proxy Service Internals Node Manager AM RM Proxy Service Application Master Per Application Pipeline (Interceptor Chain) Federation Interceptor Security/Throttling Interceptor … Home RM Proxy Unmanaged AM SC #2 Unmanaged AM SC #3 SC #1 RM SC #2 RM SC#3 RM • Hosted in NM • Extensible Design • DDoS Prevention • Unmanaged AM used for container negotiation. They are created on demand based on policy • Code Committed to 2.8 Policy
  11. 11. Next Federation Policies by Carlo Curino
  12. 12. Yarn Sub-Cluster #1 Yarn Sub-Cluster #3Yarn Sub-Cluster #2 RM RM RM AM RM Proxy Service (Per Node)Policy StateRouter Service YARN Client Federation Services YARN Sub Clusters Servers in Datacenter Federation: Policy Engine Policy Engine Federation Admin APIs Flexible policies • Manually curated (to start) • Automatically generated (later) General enforcement mechanisms: • Router • AMRMProxy • RM Schedulers
  13. 13. Federation Policies Goal: efficiently operate a federated cluster • Complex trade-offs: load balancing, scaling, global-invariants (fairness), tenant isolation, fault-tolerance,… Policies • Input: user, reservation, queue, node labels, ResourceRequest, … • State information: sub-clusters load, planned maintenance,… • Output: routing/scheduling decisions (that determine all container allocations)
  14. 14. Tackling hard problems with policies SC1 SC2 SC3 SC4 ? ? ? ? Global queue structure Local enforcement A hard problem: How to transparently enable “global queues” via “local enforcement”? R A B C A1 100% 25%25%25% 40% 60% D 25% A2
  15. 15. Spectrum of options: Full Partitioning SC3SC2SC1 SC4 Policies: Router and AMRMProxy direct to single RM Pros: perfect scale-out, isolation Cons: fragmentation/utilization issues, max-size job, uneven impact of SC failures,… R A A1 100% 100% 40% 60%A2 R A B C A1 100% 25%25%25% 40% 60% D 25% A2 R B 100% 100% R C 100% 100% R C 100% D 100%
  16. 16. Spectrum of options: Full Replication SC4SC1 SC2 SC3 Policies: Router (round-robin/random), and AMRMProxy fwd to RMs based on locality of Resource Request Pros: simple, symmetric, fair (if all jobs broadcast demand), resilient Cons: scalability in #jobs, …  (heuristics improvements) R A B C A1 100% 25%25%25% 40% 60% D 25% A2 R A B C A1 100% 25%25%25% 40% 60% D 25% A2 R A B C A1 100% 25%25%25% 40% 60% D 25% A2 R A B C A1 100% 25%25%25% 40% 60% D 25% A2 R A B C A1 100% 25%25%25% 40% 60% D 25% A2
  17. 17. Spectrum of options: Dynamic Partial Replication SC3SC2SC1 SC4 Policies: Router (round-robin/random on subset of RMs), and AMRMProxy fwd to RMs based on locality of ResourceRequest (on subset of RMs) Pros: trade-off between advantages of replication/partitioning Cons: complexity / rebalancing  could use dynamic approach R A B C A1 100% 25%25%25% 40% 60% D 25% A2 R A A1 100% 50% 80% D 50% R A 100% 50% 100% C 50% A2 R B 100% 100% R C 100% 50% D 50% 20%A2
  18. 18. Demo Show basic job running across sub-clusters Show some UIs and ops commands Showcase user-based, partially-replicated, routing policy • Router: random-weighted among a set of sub-clusters… • AMRMProxy: broadcast request to set of sub-clusters…
  19. 19. Next YARN Federation Demo by Giovanni Fumarola

×