Multi-Tenant Data Cloud with YARN & Helix

3,258 views

Published on

Building applications on YARN with Helix

Published in: Engineering, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,258
On SlideShare
0
From Embeds
0
Number of Embeds
378
Actions
Shares
0
Downloads
42
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Multi-Tenant Data Cloud with YARN & Helix

  1. 1. Multi-Tenant Data Cloud with YARN & Helix LinkedIn - Data infra : Helix, Espresso @kishore_b_g Yahoo - Ads infra : S4 Kishore Gopalakrishna 1Thursday, June 5, 14
  2. 2. What is YARN Next Generation Compute Platform MapReduce HDFS Hadoop 1.0 MapReduce HDFS Hadoop 2.0 Others (Batch, Interactive, Online, Streaming) YARN (cluster resource management) 2Thursday, June 5, 14
  3. 3. What is YARN Next Generation Compute Platform MapReduce HDFS Hadoop 1.0 MapReduce HDFS Hadoop 2.0 Others (Batch, Interactive, Online, Streaming) YARN (cluster resource management) A1 A1 A2 A3 B1 C1 C5 B2 B3 C2 B4 B5 C3 C4 Enables 2Thursday, June 5, 14
  4. 4. HDFS/Common Area YARN YARN Architecture Client Resource Manager Node Manager Node Manager submit job node statusnode status container request App Package Application Master Container 3Thursday, June 5, 14
  5. 5. So, let’s build something 4Thursday, June 5, 14
  6. 6. Example System Generate Data Serve M/R Redis Server 3 HDFS 3 - Generate data in Hadoop - Use it for serving 5Thursday, June 5, 14
  7. 7. Example System Generate Data Serve M/R Server 3 HDFS 3 6Thursday, June 5, 14
  8. 8. Example System Requirements Big Data :-) Partitioned, replicated Fault tolerant, Scalable Efficient resource utilization Generate Data Serve M/R Server 3 HDFS 3 6Thursday, June 5, 14
  9. 9. Application Master Example System Request Containers Assign work Handle Failure Handle workload Changes Requirements Big Data :-) Partitioned, replicated Fault tolerant, Scalable Efficient resource utilization Generate Data Serve M/R Server 3 HDFS 3 6Thursday, June 5, 14
  10. 10. Allocation + Assignment HDFS Server 1 Server 2Server 3 Partition Assignment - affinity, even distribution Replica Placement - on different physical machines Container Allocation - data affinity, rack aware placement M/Rp1 p2 p3 p4 p5 p6 p1 p2 p5 p4 Server 3 p3 p4 p1 p6 Server 3 p5 p6 p3 p2 Multiple servers to serve the partitioned data M/R job generates partitioned data 7Thursday, June 5, 14
  11. 11. Failure Handling Server 1 Server 2Server 1 Acquire new container close to data if possible Assign failed partitions to new container On Failure - Even load distribution, while waiting for new container Server 23 Server 3 p5 p4 p1 p6 p3 p2 p1 p2 p3 p4 p5 p6 8Thursday, June 5, 14
  12. 12. Failure Handling Server 1 Server 2Server 1 Acquire new container close to data if possible Assign failed partitions to new container On Failure - Even load distribution, while waiting for new container Server 23 Server 3 p5 p4 p1 p6 p3 p2 p1 p2 p3 p4 p5 p6 8Thursday, June 5, 14
  13. 13. Failure Handling Server 1 Server 2Server 1 Acquire new container close to data if possible Assign failed partitions to new container On Failure - Even load distribution, while waiting for new container Server 23 Server 3 Server 4 p5 p4 p1 p6 p3 p2 p1 p2 p3 p4 p5 p6 p3 p2 p5 p6 8Thursday, June 5, 14
  14. 14. Workload Changes Server 1 Server 2Server 3 Workload change - Acquire/Release containers Container change - Re-distribute work Monitor - CPU, Memory, Latency, Tps p1 p2 p5 p4 Server 3 p3 p4 p1 p6 Server 3 p5 p6 p3 p2 9Thursday, June 5, 14
  15. 15. Workload Changes Server 1 Server 2Server 3 Workload change - Acquire/Release containers Container change - Re-distribute work Monitor - CPU, Memory, Latency, Tps p1 p2 p5 p4 Server 3 p3 p4 p1 p6 Server 3 p5 p6 p3 p2 Server 3 p4 p6 p2 9Thursday, June 5, 14
  16. 16. Workload Changes Server 1 Server 2Server 3 Workload change - Acquire/Release containers Container change - Re-distribute work Monitor - CPU, Memory, Latency, Tps p1 p2 p5 Server 3 p3 p4 p1 Server 3 p5 p6 p3 Server 3 p4 p6 p2 9Thursday, June 5, 14
  17. 17. Service Discovery Server 1 Server 2Server 3 Dynamically updated on changes Discover everything, what is running where p1 p2 p1 p1 Server 3 p3 p4 p1 p1 Server 3 p5 p6 p1 p1 10Thursday, June 5, 14
  18. 18. Service Discovery Server 1 Server 2Server 3 Dynamically updated on changes Discover everything, what is running where p1 p2 p1 p1 Server 3 p3 p4 p1 p1 Server 3 p5 p6 p1 p1 Client Client Service Discovery 10Thursday, June 5, 14
  19. 19. Building YARN Application Writing AM is Hard and Error Prone Handling Faults, Workload Changes is non-trivial and often overlooked Request container How many containers Where Assign work Place partitions & replicas Affinity Workload changes acquire/ release containers Minimize movement Faults Handling Detect non trivial failures new v/s reuse containers Other Service Discovery Monitoring 11Thursday, June 5, 14
  20. 20. Building YARN Application Writing AM is Hard and Error Prone Handling Faults, Workload Changes is non-trivial and often overlooked Request container How many containers Where Assign work Place partitions & replicas Affinity Workload changes acquire/ release containers Minimize movement Faults Handling Detect non trivial failures new v/s reuse containers Other Service Discovery Monitoring Is there something that can make this easy? 11Thursday, June 5, 14
  21. 21. Apache Helix 12Thursday, June 5, 14
  22. 22. What is Helix? Built at LinkedIn, 2+ years in production Generic cluster management framework Contributed to Apache, now a TLP: helix.apache.org Decoupling cluster management from core functionality 13Thursday, June 5, 14
  23. 23. Helix at LinkedIn Oracle Oracle OracleDB Change Capture Change Consumers Index Search Index User Writes Data Replicator In Production ETL HDFS Analytics 14Thursday, June 5, 14
  24. 24. Helix at LinkedIn In Production Over 1000 instances covering over 30000 partitions Over 1000 instances for change capture consumers As many as 500 instances in a single Helix cluster (all numbers are per-datacenter) 15Thursday, June 5, 14
  25. 25. Others Using Helix 16Thursday, June 5, 14
  26. 26. Helix concepts Resource (Database, Index, Topic, Task) 17Thursday, June 5, 14
  27. 27. Helix concepts Resource (Database, Index, Topic, Task) Partitions p1 p2 p3 p4 p5 p6 17Thursday, June 5, 14
  28. 28. Helix concepts Resource (Database, Index, Topic, Task) Partitions Replicas p1 p2 p3 p4 p5 p6 r1 r2 r3 17Thursday, June 5, 14
  29. 29. Helix concepts Resource (Database, Index, Topic, Task) Partitions Replicas p1 p2 p3 p4 p5 p6 r1 r2 r3 Container Process Container Process Container Process 17Thursday, June 5, 14
  30. 30. Helix concepts Resource (Database, Index, Topic, Task) Partitions Replicas p1 p2 p3 p4 p5 p6 r1 r2 r3 Container Process Container Process Container Process Assignment ? 17Thursday, June 5, 14
  31. 31. State Model and Constraints Helix Concepts 18Thursday, June 5, 14
  32. 32. Serve bootstrap State Model and Constraints Helix Concepts Stop 18Thursday, June 5, 14
  33. 33. Serve bootstrap State Model and Constraints Helix Concepts State Constraints Transition Constraints Partition Resource Node Cluster Serve: 3 bootstrap: 0 Max T1 transitions in parallel - Max T2 transitions in parallel No more than 10 replicas Max T3 transitions in parallel - Max T4 transitions in parallel Stop 18Thursday, June 5, 14
  34. 34. Serve bootstrap State Model and Constraints Helix Concepts State Constraints Transition Constraints Partition Resource Node Cluster Serve: 3 bootstrap: 0 Max T1 transitions in parallel - Max T2 transitions in parallel No more than 10 replicas Max T3 transitions in parallel - Max T4 transitions in parallel StateCount= Replication factor:3 Stop 18Thursday, June 5, 14
  35. 35. ParticipantParticipantParticipant Helix Architecture P1 stop bootstrap server P2 P5 P3 P4 P8 P6 P7 Controller Client Client Target Provider Provisioner Rebalancer assign work via callback spectator spectator Service Discovery metrics metrics 19Thursday, June 5, 14
  36. 36. Helix Controller High-Level Overview Resource Config Constraints Objectives Controller TargetProvider Provisioner Rebalancer Number of Containers Task-> Container Mapping YARN RM 20Thursday, June 5, 14
  37. 37. Helix Controller Target Provider Determine how many containers are required along with the spec Fixed CPU Memory Bin Packing monitoring system provides usage information Default implementations, Bin Packing can be used to customize further TargetProvider Resources p1,p2 .. pn Existing containers c1,c2 .. cn Health of tasks, containers cpu, memory, health Allocation constraints Affinity, rack locality SLA Fixed: 10 containers CPU headroom:30% Memory Usage: 70% time: 5h Number of container release list acquire list Container spec cpu: x memory: y location: L 21Thursday, June 5, 14
  38. 38. Helix Controller Provisioner Given the container spec, interact with YARN RM to acquire/release, NM to start/stop containers YARN Interacts with YARN RM and subscribes to notifications 22Thursday, June 5, 14
  39. 39. Helix Controller Rebalancer Based on the current nodes in the cluster and constraints, find an assignment of task to node Auto Semi-Auto Static Rebalancer Tasks t1,t2 .. tn Existing containers c1,c2 .. cn Allocation constraints & objectives Affinity, rack locality, Even distribution of tasks, Minimize movement while expanding Assignment C1: t1,t2 C2: t3,t4 User defined Based on the FSM, compute & fire the transitions to Participants 23Thursday, June 5, 14
  40. 40. Example System: Helix-Based Solution Solution Configure App Configure Target Provider Configure Provisioner Configure Rebalancer Generate Data Serve M/R Server 3 HDFS 3 24Thursday, June 5, 14
  41. 41. Configure AppConfigure App App Name Partitioned Data Server App Master Package /path/to/ GenericHelixAppMaster.tar App package /path/to/ RedisServerLauncher.tar App Config DataDirectory: hdfs:/path/to/ data Configure target providerConfigure target provider TargetProvider RedisTargetProvider Goal Target TPS: 1 million Min container 1 Max containers 25 Configure ProvisionerConfigure Provisioner YARN RM host:port Configure RebalancerConfigure Rebalancer Partitions 6 Replica 2 Max partitions per container 4 Rebalancer.Mode AUTO Placement Data Affinity FailureHandling Even distribution Scaling Minimize Movement app_config_spec.yaml Example System: Helix-Based Solution 25Thursday, June 5, 14
  42. 42. yarn_app_launcher.sh  app_config_spec.yaml Launch Application 26Thursday, June 5, 14
  43. 43. Helix + YARN Server 1 Server 2 27Thursday, June 5, 14
  44. 44. Helix + YARN YARN Resource Manager Client submit job Server 1 Server 2 27Thursday, June 5, 14
  45. 45. Application Master Helix + YARN YARN Resource Manager Client submit job Launch AM Server 1 Server 2 27Thursday, June 5, 14
  46. 46. Application Master Helix + YARN Helix Controller YARN Resource Manager Target Provider Provisioner RebalancerClient submit job Launch AM Server 1 Server 2 27Thursday, June 5, 14
  47. 47. Application Master Helix + YARN Helix Controller YARN Resource Manager Target Provider Provisioner RebalancerClient submit job Launch AM request cntrs Server 1 Server 2 27Thursday, June 5, 14
  48. 48. Node ManagerNode Manager Application Master Helix + YARN Helix Controller Node Manager YARN Resource Manager Target Provider Provisioner RebalancerClient submit job Launch AM request cntrs launch containers Server 1 Server 2participant 3 participant 3 participant 3 27Thursday, June 5, 14
  49. 49. Node ManagerNode Manager Application Master Helix + YARN Helix Controller Node Manager YARN Resource Manager Target Provider Provisioner Rebalancer assign work Client submit job Launch AM request cntrs launch containers Server 1 Server 2participant 3 p1 p2 p5 p4 participant 3 p3 p4 p1 p6 participant 3 p5 p6 p3 p2 27Thursday, June 5, 14
  50. 50. Auto Scaling Non linear scaling from 0 to 1M TPS and back 28Thursday, June 5, 14
  51. 51. Failure Handling: Random Faults Recovering from faults at 1M Tps (5%, 10%, 20% failures/min) 29Thursday, June 5, 14
  52. 52. Summary HDFS YARN (cluster resource management) HELIX (container + task management) Others (Batch, Interactive, Online, Streaming) Fault tolerance, Expansion handled transparently Generic Application Master Efficient resource utilization by task model 30Thursday, June 5, 14
  53. 53. Questions? Website Twitter Mail Team helix.apache.org, #apachehelix @apachehelix, @kishore_b_g user@helix.apache.org Kanak Biscuitwala, Zhen Zhang ?We love helping & being helped 31Thursday, June 5, 14

×