DockerCon EU 2015: Placing a container on a train at 200mph

Placing a container on a train at 200mph
Casper S. Jensen
Software Engineer, Uber

About Me
● Joined Uber January 2015,
Compute Platform
Denmark, Aarhus office
● PhD, CS
On a completely unrelated topic
● Linux aficionado
● Docker “user” since February

Not that hard...
10
You just have to handle
● 24/7 availability across the globe
● Very different markets
● 1000s of developers and teams
● Adding new features like there’s no tomorrow
UberPOOL, UberKITTEN, UberICECREAM, UberEATS,
UberWHATEVERYOUCANIMAGINE
● Hypergrowth in all dimensions
● Datacenters, servers, infrastructure, etc
Basically, you have to make magic happen every time a user
opens the application

Software
Development
The old UBER way

A fair amount of frustration
12
1)Write service RFC
2)Wait for feedback
3)Do all necessary scaffolding by hand
4)Start developing your service
5)Wait for infra team to write service scaffolding
6)Wait for IT to allocate servers
7)Wait for infra team to provision servers
8)Deploy to development servers and test
9)Deploy to production
10)Monitor and iterate
Steps 5–7 could take days or weeks...

It's just not scalable
13
But you have to start somewhere

—Internal e-mail, February 2015
“Make it easier for service
owners to manage their local
service environments.”
14

New development process
16
1)Write service RFC
2)Wait for feedback
3)Do all necessary scaffolding using tools
4)Start developing your service
5)Deploy to development servers and test
6)Deploy to production
7)Monitor and iterate

All the things you did not consider
19
● Routing
● Dynamic service discovery
● Deployment
● Placement engine
● Logging and tracing
● Dual build environments
● Handling of secrets
● Security updates
● Private repositories
● Replicating images across multiple datacenters
Also, how much freedom do you really want to give your developers?

Change
all the things!
Let's go through some examples

uDeploy
21
● Rolling upgrades
● Automatic rollbacks on failure
● Health checks, stats, exceptions,
○ Load-, and system-tests
● Service building
● Build replication
● 4.000+ upgrades/week
● 3.000+ builds/week
● 300+ rollbacks/week
● 600+ managed services
Our in-house deployment/cluster
management system

Moving to docker with zero downtime
22
Build multiplexing
We want to keep on trucking while migrating to docker

Build process & scaffolding
23
Declarative build scripts
● Service configuration in git
● Preset service frameworks
● Many options
● Generator creating
○ Dockerfile
○ Health checks
○ Entry point scripts inside container
○ In general, all glue between host and service
● Possible to supply custom Dockerfile
service_name: test-uber-service
owning_team: udeploy
backend_port: 123
frontend_port: 456
service_type: clay_wheel
clay_wheel:
celeries:
- queue: test-uber-service
has_celerybeat: true

Image replication
24
● Multiple datacenters
● Images must be stored within DCs
● Build once, replicate everywhere
● Traffic restrictions, push but not pull
Current setup
● Stock docker registry
● File back-end
● Docker-mover
● Syncing images using pull/push
● Use notification API to speed up replication

Service discovery & routing
25
● Previously, we used HAProxy + scripts to do this
● Now, we use Hyberbahn + TChannel RPC
https://github.com/uber/{hyperbahn|tchannel}
○ Used for docker and legacy services
○ Required in order to move containers around in seconds
○ Dynamic routing, circuit breaking, retries, rate limiting,
load balancing
○ Completely dynamic, no fixed ports

27
● Remove team dependencies
● More freedom
● Not tied to specific frameworks
or versions (hi, Python 3)
● Easy to experiment with new
technologies
● Too much freedom
● Non-trivial integrating with a
large running system
● Infrastructure must be dynamic
throughout
● Containers are only a minor
part of the infrastructure,
don't forget that
The good & the bad

Current and future wins
● Today, 30% of all services in docker
● Soon-ish, 100%
● Great improvements in provisioning time (done)
● Framework and service owners can manage their own
environment (done)
● Faster and automatic scaling of capacity (in progress)

Thank you!
Casper S. Jensen
caspersj@uber.com

DockerCon EU 2015: Placing a container on a train at 200mph

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to DockerCon EU 2015: Placing a container on a train at 200mph

Similar to DockerCon EU 2015: Placing a container on a train at 200mph (20)

More from Docker, Inc.

More from Docker, Inc. (20)

Recently uploaded

Recently uploaded (20)

DockerCon EU 2015: Placing a container on a train at 200mph