Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Efficient cluster resource management by using Cook and Mesos / Li Jin (Two Sigma)

918 views

Published on

Managing resources (cpu, memory, network io) in compute clusters is difficult. Regardless of running Hadoop, Spark or customized workloads, we face the challenge of scheduling a mixture of long running, short running workload with different resource requirements and deadlines in a compute cluster. The difficulty often comes in when we try to maximize cluster utilization and share resources properly among workloads at the same time.

This talk presents a solution to this problem by using two cutting-edge open source technology — Cook (https://github.com/twosigma/cook) and Apache Mesos (http://mesos.apache.org). At Two Sigma, we use Cook and Mesos to manage our entire compute clusters and run tens of thousands of compute workload every day. By using Cook and Mesos, we are able to efficiently utilize the compute cluster and achieve high user satisfaction.

In this talk, we will discuss the idea behind our algorithm, the design of the system and show how Cook and Mesos can be used to solve cluster resource sharing problem for other people.

Published in: Engineering
  • Be the first to comment

Efficient cluster resource management by using Cook and Mesos / Li Jin (Two Sigma)

  1. 1. Efficient cluster resource management using Mesos and Cook Li Jin
  2. 2. About Me • Software Engineer @ Two Sigma
  3. 3. Outline • Introduction: Mesos and Cook
  4. 4. What is Mesos • Open Source Apache Project • 2010: AMPLab, University of California Berkeley • 2012: Twitter, Airbnb • 2015: Twitter, Airbnb, Apple, Bloomberg, Cisco, eBay, Yelp…
  5. 5. What is Mesos • Tool to build distributed applications – Hadoop, Spark… – Cassandra, Kafta, Riak…
  6. 6. What is Mesos • Distributed applications commonality: – Manages resources (cpu, memory, disk…) on worker hosts – Manages life cycle of remote processes – Manages communication between masters and workers
  7. 7. What is Mesos
  8. 8. What is Mesos
  9. 9. What is Mesos
  10. 10. What is Mesos • Distributed applications commonality: – Manages resources (cpu, memory, disk…) on worker hosts – Manages life cycle of remote processes – Manages communication between masters and workers
  11. 11. Mesos Primitives
  12. 12. Mesos @ Two Sigma Cook Mesos
  13. 13. What is Cook • Two Sigma’s Simulation Platform • Manages tens of thousands of simulations • Shares compute resources among users
  14. 14. What is Simulation • Idempotent, distributed, resource intensive computations • Simulation set • A handful ~ thousands of simulations • Simulation • Multiple Mesos tasks
  15. 15. What is Simulation • Simulation task footprint • 10 ~ 100 GB RAM • 1 ~ 20 CPUs • 15 minutes ~ a few hours • Simulation use cases • Interactive • Batch processing
  16. 16. Problem • High resource demand • 5 x capacity during peak hours • Optimize • Utilization • Process workloads as fast as possible • Fairness • Allocate resources fairly to users
  17. 17. What is Fairness • FIFO • Time sharing • Throw a dice • …
  18. 18. What is Fairness • A story…
  19. 19. What is Fairness Resource Allocation
  20. 20. What is Fairness, Really • Fairness is not about ‘fair’ • Fairness is about user experience • User should get their share of the cluster whenever they need it
  21. 21. Outline • Introduction: Mesos and Cook • Problem: Utilization and Fairness • Fairness: How do we do it
  22. 22. Static Quota • Quota = Max percentage of the cluster allowed for single user • Static • 100 % / # Max concurrent users • Pros: • Fairness • Cons: • Poor Utilization
  23. 23. Dynamic Quota • Dynamic • Quota * Utilization Adjustment • Pros: • Higher Utilization • Cons: • Poor Fairness
  24. 24. Dynamic Quota Unfair Resource Allocation Fair Resource Allocation Hours…
  25. 25. Can we do better? Static Quota Dynamic Quota ? Fairness Utilization
  26. 26. Preemption • Kill a Simulation task and reschedule later • Reclaim resource faster! Unfair Resource Allocation Minutes! Fair Resource Allocation
  27. 27. Outline • Introduction: Mesos and Cook • Problem: Utilization and Fairness • Fairness: How do we do it • Preemption: How do we do it
  28. 28. Preemption: Intuition Running Waiting
  29. 29. Preemption: Intuition Running Waiting
  30. 30. Preemption: Intuition Running Waiting
  31. 31. Preemption: Intuition ?? ? ?? ?Running Waiting
  32. 32. Preemption: Intuition Running Waiting
  33. 33. Preemption: Intuition Running Waiting
  34. 34. Preemption: Intuition Running Waiting
  35. 35. Preemption: Intuition Running Waiting
  36. 36. Preemption: Intuition Running Waiting
  37. 37. Preemption: Intuition Running Waiting
  38. 38. Problem • Not all tasks are equal • We just preempted some important tasks! Bad User Experience
  39. 39. Score Function • Score Function: Reflect task’s value • Fairness • Importance • Preemption principal: • Preempt low score task for high score task
  40. 40. Preemption: Intuition €€€ €€ € Running Waiting £££££ ££££ £££ ¥¥¥¥ ¥¥¥ ¥¥
  41. 41. Preemption: Intuition ₽₽₽ ₽₽ ₽ Running Waiting ₽₽₽ ₽₽ ₽ ₽₽₽ ₽₽ ₽
  42. 42. Preemption: Intuition ₽₽₽ ₽₽ ₽ Running Waiting ₽₽₽ ₽₽ ₽ ₽₽₽ ₽₽ ₽
  43. 43. Preemption: Intuition ₽₽₽ ₽₽ ₽ Running Waiting ₽₽₽ ₽₽ ₽ ₽₽₽ ₽₽ ₽
  44. 44. Preemption: Intuition ₽₽₽ ₽₽ ₽ Running Waiting ₽₽₽ ₽₽ ₽ ₽₽₽ ₽₽ ₽
  45. 45. Preemption: Intuition ₽₽₽ ₽₽ ₽ Running Waiting ₽₽₽ ₽₽ ₽ ₽₽₽ ₽₽ ₽
  46. 46. Preemption: Intuition ₽₽₽ ₽₽ ₽ Running Waiting ₽₽₽ ₽₽ ₽ ₽₽₽ ₽₽ ₽
  47. 47. Preemption: Intuition €€€ €€ € Running Waiting £££££ ££££ £££ ¥¥¥¥ ¥¥¥ ¥¥
  48. 48. Outline • Introduction: Mesos and Cook • Problem: Utilization and Fairness • Fairness: How do we do it • Preemption: How do we do it • Intuition • Formalization
  49. 49. Cumulative Resource Share (CRS) • Assuming there is an total order of tasks for each user, where > means ‘more important than’. – CRS of task t is sum of all tasks of the same user that are greater than or equal to t, divided by total cluster resource. • 𝐶𝑅𝑆 𝑡 = 1 𝑅 𝑇𝑜𝑡𝑎𝑙 𝑡′≥𝑡 𝑅 𝑡′
  50. 50. Cumulative Resource Share (CRS) • 𝑅 𝑎 = 𝑅 𝑏 = 𝑅 𝑐 = 1 𝑐𝑝𝑢, 𝑅𝑡𝑜𝑡𝑎𝑙 = 6 𝑐𝑝𝑢𝑠 • 𝑎 > 𝑏 > 𝑐 • 𝐶𝑅𝑆 𝑎 = 𝑅 𝑎 𝑅 𝑇𝑜𝑡𝑎𝑙 = 1 6 • 𝐶𝑅𝑆 𝑏 = 𝑅 𝑎+𝑅 𝑏 𝑅 𝑇𝑜𝑡𝑎𝑙 = 2 6 • 𝐶𝑅𝑆 𝑐 = 𝑅 𝑎+𝑅 𝑏+𝑅 𝑐 𝑅 𝑇𝑜𝑡𝑎𝑙 = 3 6
  51. 51. Preemption: Formalization €€€ €€ € Running Waiting £££££ ££££ £££ ¥¥¥¥ ¥¥¥ ¥¥
  52. 52. Preemption: Formalization 1/6 2/6 3/6 Running Waiting 1/6 2/6 3/6 1/6 2/6 3/6
  53. 53. Preemption: Formalization 1/6 2/6 3/6 Running Waiting 1/6 2/6 3/6 1/6 2/6 3/6
  54. 54. Preemption: Formalization 1/6 2/6 3/6 Running Waiting 1/6 2/6 3/6 1/6 2/6 3/6
  55. 55. Preemption: Formalization 1/6 2/6 3/6 Running Waiting 1/6 2/6 3/6 1/6 2/6 3/6
  56. 56. Multiple Resources? • Dominant Resource Fairness: Fair Allocation of Multiple Resource Types • Published by UC Berkeley in 2011
  57. 57. Dominant Cumulative Resource Share • 𝐶𝑅𝑆 𝑡 = 1 𝑅 𝑇𝑜𝑡𝑎𝑙 𝑡′≥𝑡 𝑅 𝑡′ • 𝐷𝐶𝑅𝑆 𝑡 = max 𝑅 1 𝑅 𝑇𝑜𝑡𝑎𝑙 𝑡′≥𝑡 𝑅 𝑡′ • 𝑆𝑐𝑜𝑟𝑒(𝑡) = −𝐷𝐶𝑅𝑆(𝑡)
  58. 58. Outline • Introduction: Mesos and Cook • Problem: Utilization and Fairness • Fairness: How do we do it • Preemption: How do we do it • Intuition • Formalization • Put things together: Mesos and Cook
  59. 59. Cook: Architecture
  60. 60. Are we doing better? Static Quota Dynamic Quota Preemption? Fairness Utilization
  61. 61. Outline • Introduction: Mesos and Cook • Problem: Utilization and Fairness • Fairness: How do we do it • Preemption: How do we do it • Intuition • Formalization • Put things together: Mesos and Cook • Benchmark
  62. 62. Benchmark • Simulated • 7 day production workload trace
  63. 63. Benchmark 0 2 4 6 8 10 12 SpeedUp Simulation Set Speed Up Distribution Dynamic Quota Preemption
  64. 64. Benchmark 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Utilization Effective Utilization Dynamic Quota Preemption
  65. 65. It works!
  66. 66. Open Source • https://github.com/apache/mesos • https://github.com/twosigma/cook • @icexelloss
  67. 67. Questions?

×