Managing resources (cpu, memory, network io) in compute clusters is difficult. Regardless of running Hadoop, Spark or customized workloads, we face the challenge of scheduling a mixture of long running, short running workload with different resource requirements and deadlines in a compute cluster. The difficulty often comes in when we try to maximize cluster utilization and share resources properly among workloads at the same time.
This talk presents a solution to this problem by using two cutting-edge open source technology — Cook (https://github.com/twosigma/cook) and Apache Mesos (http://mesos.apache.org). At Two Sigma, we use Cook and Mesos to manage our entire compute clusters and run tens of thousands of compute workload every day. By using Cook and Mesos, we are able to efficiently utilize the compute cluster and achieve high user satisfaction.
In this talk, we will discuss the idea behind our algorithm, the design of the system and show how Cook and Mesos can be used to solve cluster resource sharing problem for other people.