Fancy buzzword, or the fate of containers?
Orchestration
© 2015 Mesosphere, Inc.
2
Connor Doyle
Software Engineer
Mesosphere, Inc.
connor@mesosphere.io
@nor0101
Hi!
© 2015 Mesosphere, Inc.
• What problem are we solving?
• Prior art
• Axes of choice
• The allure of two-level scheduling
• To infinity and beyond
• Oversubscription
• Maintenance
Agenda
3
© 2015 Mesosphere, Inc.
“Container orchestration” implies
horizontal scalability.
Why you need scale varies, and your
workload profile has bearing on how you
should run your clusters. (e.g. HPC/HTC
needs are different from a consumer
retail website).
Mo’ scale, mo’ problems:
failure (and cascading failure), fault
zones, maintaining SLOs, maintenence
windows, monitoring/alerts
The problem space
4
© 2015 Mesosphere, Inc.
“Container orchestration” implies
horizontal scalability.
Why you need scale varies, and your
workload profile has bearing on how you
should run your clusters. (e.g. HPC/HTC
needs are different from a consumer
retail website).
Mo’ scale, mo’ problems:
failure (and cascading failure), fault
zones, maintaining SLOs, maintenence
windows, monitoring/alerts
The problem space
5
© 2015 Mesosphere, Inc.
“Container orchestration” implies
horizontal scalability.
Why you need scale varies, and your
workload profile has bearing on how you
should run your clusters. (e.g. HPC/HTC
needs are different from a consumer
retail website).
Mo’ scale, mo’ problems:
failure (and cascading failure), fault
zones, maintaining SLOs, maintenence
windows, monitoring/alerts
The problem space
6
© 2015 Mesosphere, Inc.
“Container orchestration” implies
horizontal scalability.
Why you need scale varies, and your
workload profile has bearing on how you
should run your clusters. (e.g. HPC/HTC
needs are different from a consumer
retail website).
Mo’ scale, mo’ problems:
failure (and cascading failure), fault
zones, maintaining SLOs, maintenence
windows, monitoring/alerts
The problem space
7
We want:
- Stability
- Performance
- Flexibility
- Abstractions we can grasp and
explain
© 2015 Mesosphere, Inc.
Orchestration starts with a good
scheduler.
8
© 2015 Mesosphere, Inc.
We have options :)
• Centralized
• Batch schedulers (HTCondor, Slurm, Torque)
• Monolithic schedulers (Borg)
• Process schedulers (systemd, fleet, Kubernetes)
• Two-level schedulers (Mesos, Ω)
• Decentralized
• Completely! Sparrow
• Hybrid! Mercury
9
© 2015 Mesosphere, Inc.
This is a HUGE opportunity
10
© 2015 Mesosphere, Inc.
- To get the abstractions right
- To mitigate the next software
crisis
- To do better!
This is a HUGE opportunity
11
© 2015 Mesosphere, Inc.
- To get the abstractions right
- To mitigate the next software
crisis
- To do better!
This is a HUGE opportunity
12
© 2015 Mesosphere, Inc.
- To get the abstractions right
- To mitigate the next software
crisis
- To do better!
This is a HUGE opportunity
13
© 2015 Mesosphere, Inc.
Two-level scheduling is a nice
model.
14
© 2015 Mesosphere, Inc.
Let the cluster manager:
- Keep track of resources
- Offer resources to applications fairly
- Implement low-level isolation
Two-level scheduling
15
© 2015 Mesosphere, Inc.
Let the application-specific scheduler:
- Track its own job queue
- Think about task constraints
- Define task semantics
- Choose appropriate containerization
- Respond to failures
Two-level scheduling
16
© 2015 Mesosphere, Inc.
Hey, looks like a managed runtime!
These have been popular lately!
• JVM
• HHVM
• V8
• ...
17
© 2015 Mesosphere, Inc.
Hey, looks like a managed runtime!
These have been popular lately!
• JVM
• HHVM
• V8
• ...
Why?
They allow high-level general-purpose
programs to benefit from:
- Portable units of execution
- Architecture dependent optimizations
- Dynamic (de)optimizations based on
insights learned at execution time
and it gets better over time for free!
18
© 2015 Mesosphere, Inc.
A goal: maximize utilization
19
© 2015 Mesosphere, Inc.
...safely!
20
Jobs like to run on underutilized hardware!
Contention for shared resources can
negatively impact other goals (such as tail-
latency or throughput)
Besides estimating oversubscribable
resources we need to revise the estimates
over time!
© 2015 Mesosphere, Inc.
...safely!
21
Jobs like to run on underutilized hardware!
Contention for shared resources can
negatively impact other goals (such as tail-
latency or throughput)
Besides estimating oversubscribable
resources we need to revise the estimates
over time!
© 2015 Mesosphere, Inc.
...safely!
22
Jobs like to run on underutilized hardware!
Contention for shared resources can
negatively impact other goals (such as tail-
latency or throughput)
Besides estimating oversubscribable
resources we need to revise the estimates
over time!
© 2015 Mesosphere, Inc. 23
© 2015 Mesosphere, Inc. 24
© 2015 Mesosphere, Inc.
More challenges opportunities
Choose victims wisely!
Is killing the only option?
25
© 2015 Mesosphere, Inc.
Another goal: orderly downtime
“I’m removing this node from the cluster NOW.”
“I’m going to take this node offline in three hours.”
26
© 2015 Mesosphere, Inc.
Another goal: orderly downtime
27
Tag resource offers with a time horizon
Give application schedulers a chance to relocate affected tasks
© 2015 Mesosphere, Inc.
References
1. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing
2. Distributed Computing in Practice: The Condor Experience
3. Heracles: Improving Resource Efficiency at Scale
4. Large-scale cluster management at Google with Borg
5. Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
6. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
7. Mesos Oversubscription Design Document
8. MESOS-1474: Provide cluster maintenance primitives for operators
9. Omega: flexible, scalable schedulers for large compute clusters
10. Quasar: Resource-Efficient and QoS-Aware Cluster Management
11. Reliable Cron across the Planet
12. Sparrow: Distributed, Low Latency Scheduling
28

Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

  • 1.
    Fancy buzzword, orthe fate of containers? Orchestration
  • 2.
    © 2015 Mesosphere,Inc. 2 Connor Doyle Software Engineer Mesosphere, Inc. connor@mesosphere.io @nor0101 Hi!
  • 3.
    © 2015 Mesosphere,Inc. • What problem are we solving? • Prior art • Axes of choice • The allure of two-level scheduling • To infinity and beyond • Oversubscription • Maintenance Agenda 3
  • 4.
    © 2015 Mesosphere,Inc. “Container orchestration” implies horizontal scalability. Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website). Mo’ scale, mo’ problems: failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts The problem space 4
  • 5.
    © 2015 Mesosphere,Inc. “Container orchestration” implies horizontal scalability. Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website). Mo’ scale, mo’ problems: failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts The problem space 5
  • 6.
    © 2015 Mesosphere,Inc. “Container orchestration” implies horizontal scalability. Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website). Mo’ scale, mo’ problems: failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts The problem space 6
  • 7.
    © 2015 Mesosphere,Inc. “Container orchestration” implies horizontal scalability. Why you need scale varies, and your workload profile has bearing on how you should run your clusters. (e.g. HPC/HTC needs are different from a consumer retail website). Mo’ scale, mo’ problems: failure (and cascading failure), fault zones, maintaining SLOs, maintenence windows, monitoring/alerts The problem space 7 We want: - Stability - Performance - Flexibility - Abstractions we can grasp and explain
  • 8.
    © 2015 Mesosphere,Inc. Orchestration starts with a good scheduler. 8
  • 9.
    © 2015 Mesosphere,Inc. We have options :) • Centralized • Batch schedulers (HTCondor, Slurm, Torque) • Monolithic schedulers (Borg) • Process schedulers (systemd, fleet, Kubernetes) • Two-level schedulers (Mesos, Ω) • Decentralized • Completely! Sparrow • Hybrid! Mercury 9
  • 10.
    © 2015 Mesosphere,Inc. This is a HUGE opportunity 10
  • 11.
    © 2015 Mesosphere,Inc. - To get the abstractions right - To mitigate the next software crisis - To do better! This is a HUGE opportunity 11
  • 12.
    © 2015 Mesosphere,Inc. - To get the abstractions right - To mitigate the next software crisis - To do better! This is a HUGE opportunity 12
  • 13.
    © 2015 Mesosphere,Inc. - To get the abstractions right - To mitigate the next software crisis - To do better! This is a HUGE opportunity 13
  • 14.
    © 2015 Mesosphere,Inc. Two-level scheduling is a nice model. 14
  • 15.
    © 2015 Mesosphere,Inc. Let the cluster manager: - Keep track of resources - Offer resources to applications fairly - Implement low-level isolation Two-level scheduling 15
  • 16.
    © 2015 Mesosphere,Inc. Let the application-specific scheduler: - Track its own job queue - Think about task constraints - Define task semantics - Choose appropriate containerization - Respond to failures Two-level scheduling 16
  • 17.
    © 2015 Mesosphere,Inc. Hey, looks like a managed runtime! These have been popular lately! • JVM • HHVM • V8 • ... 17
  • 18.
    © 2015 Mesosphere,Inc. Hey, looks like a managed runtime! These have been popular lately! • JVM • HHVM • V8 • ... Why? They allow high-level general-purpose programs to benefit from: - Portable units of execution - Architecture dependent optimizations - Dynamic (de)optimizations based on insights learned at execution time and it gets better over time for free! 18
  • 19.
    © 2015 Mesosphere,Inc. A goal: maximize utilization 19
  • 20.
    © 2015 Mesosphere,Inc. ...safely! 20 Jobs like to run on underutilized hardware! Contention for shared resources can negatively impact other goals (such as tail- latency or throughput) Besides estimating oversubscribable resources we need to revise the estimates over time!
  • 21.
    © 2015 Mesosphere,Inc. ...safely! 21 Jobs like to run on underutilized hardware! Contention for shared resources can negatively impact other goals (such as tail- latency or throughput) Besides estimating oversubscribable resources we need to revise the estimates over time!
  • 22.
    © 2015 Mesosphere,Inc. ...safely! 22 Jobs like to run on underutilized hardware! Contention for shared resources can negatively impact other goals (such as tail- latency or throughput) Besides estimating oversubscribable resources we need to revise the estimates over time!
  • 23.
  • 24.
  • 25.
    © 2015 Mesosphere,Inc. More challenges opportunities Choose victims wisely! Is killing the only option? 25
  • 26.
    © 2015 Mesosphere,Inc. Another goal: orderly downtime “I’m removing this node from the cluster NOW.” “I’m going to take this node offline in three hours.” 26
  • 27.
    © 2015 Mesosphere,Inc. Another goal: orderly downtime 27 Tag resource offers with a time horizon Give application schedulers a chance to relocate affected tasks
  • 28.
    © 2015 Mesosphere,Inc. References 1. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing 2. Distributed Computing in Practice: The Condor Experience 3. Heracles: Improving Resource Efficiency at Scale 4. Large-scale cluster management at Google with Borg 5. Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters 6. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center 7. Mesos Oversubscription Design Document 8. MESOS-1474: Provide cluster maintenance primitives for operators 9. Omega: flexible, scalable schedulers for large compute clusters 10. Quasar: Resource-Efficient and QoS-Aware Cluster Management 11. Reliable Cron across the Planet 12. Sparrow: Distributed, Low Latency Scheduling 28