Docker - Powering RA at Zalando
Docker Meetup - Dortmund 7.6.2016 | jan.mussler@zalando.de | @JanMussler
15 countries
3 fulfillment centers
18+ million active customers
3.0+ billion € revenue
135+ million visits per month
1.000+ employees in tech
Europe's Leading Fashion Platform
Visit us: tech.zalando.com
Zalando’s Technology History
Platform
80+ Engineering teams
Platform team
deploy
Server needs
Storage requests
RADICAL AGILITY
AUTONOMY
Compliance Innovation
STUPS
AWS
STUPS
DOCKER
DEPLOY
SSH
ACCESS
AUDIT
REPORTS
FULL AWS
ACCESS
STUPS: A PLATFORM ON TOP OF AMAZON WEB SERVICES
➊
➋
➌
➍
➎
Internet
*.abc.example.org *.xyz.example.org
Team ABC Team XYZ
ISOLATED AWS ACCOUNTS
EC2EC2
ELBELB
EC2
DEPLOYMENT
IMMUTABLE STACKS
ELB myapp-1
myapp.example.org
EC2
+ Docker
EC2
+ Docker
EC2
+ Docker
IMMUTABLE STACKS
ELB myapp-1
EC2
+ Docker
EC2
+ Docker
EC2
+ Docker
ELB myapp-2
EC2
+ Docker
EC2
+ Docker
myapp.example.org
● Immutable AMI
● YAML user data
● Docker runtime
● Application logging:
LogEntries, Scalyr, CloudWatch Logs
● Prometheus Node Agent for metrics
● KMS encrypted env vars
TAUPAGE AMI
Taupage
AMI
SENZA: DEFINITION YAML
SenzaInfo:
StackName: hello-world
Parameters:
- ImageVersion:
Description: "Docker image version of Hello World."
SenzaComponents:
- Configuration:
Type: Senza::StupsAutoConfiguration # auto-detect network setup
- AppServer: # will create a launch configuration and ASG with scaling triggers
Type: Senza::TaupageAutoScalingGroup
InstanceType: t2.micro
SecurityGroups: [app-hello-world]
ElasticLoadBalancer: AppLoadBalancer
TaupageConfig:
runtime: Docker
source: "stups/hello-world:{{Arguments.ImageVersion}}"
ports:
8080: 8080
SENZA: STACK DEPLOYMENT
$ senza create hello-world.yaml 1 0.2
Generating Cloud Formation template.. OK
Creating Cloud Formation stack hello-world-1.. OK
$ senza events hello-world.yaml 1
Stack Name│Ver.│Resource Type │Resource ID │Status │Status Reason │Event Time
hello-world 1 CloudFormation::Stack hello-world-1 CREATE_IN_PROGRESS User Initiated 10m ago
...
hello-world 1 CloudFormation::Stack hello-world-1 CREATE_COMPLETE 6m ago
SENZA: MANAGE STACKS
SSH ACCESS
SSH ACCESS: TIME-LIMITED ACCESS TO ANY TEAM SERVER
LOGGING
Automation
GOCD
Thoughtwork’s GOCD in action
GOCD - Pipeline example - configuration overlay
Plan - B
The
OAuth 2.0 authorization framework
enables a third-party application
to obtain limited access to
an HTTP service.
- oauth.net
OAUTH 2.0?
● Robustness & resilience
⇒ Cassandra, no SPOF
● Low latency for token validation
⇒ Token Info next to application
● Horizontal scalability
⇒ Cassandra, “stateless” Token Info
PLAN B: GOALS - Build open source Oauth2 Provider
PLAN B: COMPLETE PICTURE
bobalice
create
token
Token Info
validate
Provider
credential storage
Revocation
poll
public keys
poll
revocation listsS3
call with Bearer token
Written in Go
~16 MB Docker image
Stateless application
CPU bound, Go 1.6 ~40x speedup for EC verify
EC2 instance start to healthy: 45sec
Scaling Token Info example
ZMON
Flexible and extendable: Checks & Alerts in Python
Integrate: REST APIs, OAUTH2, AWS Auto Discovery
Fully configurable via UI / API: no restarts required!
Great for teams: team dashboards, alerts inheritance
Fast/scaling metrics: Redis, KairosDB + Grafana3
Hackweek 2015 - iOS app and Android app ;-)
ZMON - High Lights ;-)
Continued ...
Instance Metrics
● Memory usage
● Disk space usage
● CPU usage
● Application logs
● Application metrics
Monitoring instances on AWS
Scalyr Agent
Log shipping
Prometheus
Node Agent
:9100/metrics
Taupage AMI (Ubuntu base)
Application Container
Go / Spring Boot / Cassandra
Docker run time
:8080 -> app
:7979 -> metrics
Annotated Metric Data in Grafana
Annotated Metric Data in Grafana
Running same Docker Image everywhere
CLAIR - SQS
CoreOS’ Clair with PierOne - Static vulnerability analysis of images
Learnings?
● AWS terminology and behavior
● OAuth2 + Security + Security Groups
● Ops can be hard -> SaaS?
● CF deployment takes time
● DNS load balancing and switching :-(
○ Remember timeout config …!!
○ ELB soso ...
● Great flexibility and power though
A lot of input to cover ...
Zalando on Github:
https://github.com/zalando
STUPS online:
https://stups.io
ZMON Demo:
https://demo.zmon.io
Zalando Tech:
https://tech.zalando.com

Powering Radical Agility with Docker