Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

Fancy buzzword, or the fate of containers?
Orchestration

© 2015 Mesosphere, Inc.
2
Connor Doyle
Software Engineer
Mesosphere, Inc.
connor@mesosphere.io
@nor0101
Hi!

• What problem are we solving?
• Prior art
• Axes of choice
• The allure of two-level scheduling
• To infinity and beyond
• Oversubscription
• Maintenance
Agenda
3

“Container orchestration” implies
horizontal scalability.
Why you need scale varies, and your
workload profile has bearing on how you
should run your clusters. (e.g. HPC/HTC
needs are different from a consumer
retail website).
Mo’ scale, mo’ problems:
failure (and cascading failure), fault
zones, maintaining SLOs, maintenence
windows, monitoring/alerts
The problem space
4

retail website).
The problem space
5

retail website).
The problem space
6

retail website).
The problem space
7
We want:
- Stability
- Performance
- Flexibility
- Abstractions we can grasp and
explain

Orchestration starts with a good
scheduler.
8

We have options :)
• Centralized
• Batch schedulers (HTCondor, Slurm, Torque)
• Monolithic schedulers (Borg)
• Process schedulers (systemd, fleet, Kubernetes)
• Two-level schedulers (Mesos, Ω)
• Decentralized
• Completely! Sparrow
• Hybrid! Mercury
9

This is a HUGE opportunity
10

- To get the abstractions right
- To mitigate the next software
crisis
- To do better!
11

crisis
- To do better!
12

crisis
- To do better!
13

Two-level scheduling is a nice
model.
14

Let the cluster manager:
- Keep track of resources
- Offer resources to applications fairly
- Implement low-level isolation
Two-level scheduling
15

Let the application-specific scheduler:
- Track its own job queue
- Think about task constraints
- Define task semantics
- Choose appropriate containerization
- Respond to failures
Two-level scheduling
16

Hey, looks like a managed runtime!
These have been popular lately!
• JVM
• HHVM
• V8
• ...
17

Hey, looks like a managed runtime!
These have been popular lately!
• JVM
• HHVM
• V8
• ...
Why?
They allow high-level general-purpose
programs to benefit from:
- Portable units of execution
- Architecture dependent optimizations
- Dynamic (de)optimizations based on
insights learned at execution time
and it gets better over time for free!
18

A goal: maximize utilization
19

...safely!
20
Jobs like to run on underutilized hardware!
Contention for shared resources can
negatively impact other goals (such as tail-
latency or throughput)
Besides estimating oversubscribable
resources we need to revise the estimates
over time!

...safely!
21
over time!

...safely!
22
over time!

More challenges opportunities
Choose victims wisely!
Is killing the only option?
25

Another goal: orderly downtime
“I’m removing this node from the cluster NOW.”
“I’m going to take this node offline in three hours.”
26

Another goal: orderly downtime
27
Tag resource offers with a time horizon
Give application schedulers a chance to relocate affected tasks

References
1. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing
2. Distributed Computing in Practice: The Condor Experience
3. Heracles: Improving Resource Efficiency at Scale
4. Large-scale cluster management at Google with Borg
5. Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
6. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
7. Mesos Oversubscription Design Document
8. MESOS-1474: Provide cluster maintenance primitives for operators
9. Omega: flexible, scalable schedulers for large compute clusters
10. Quasar: Resource-Efficient and QoS-Aware Cluster Management
11. Reliable Cron across the Planet
12. Sparrow: Distributed, Low Latency Scheduling
28

Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

More Related Content

What's hot

Viewers also liked

Similar to Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?

Recently uploaded

Orchestration: Fancy Buzzword, or the Inevitable fate of Docker Containers?