Cloud Native Night August 2016, Munich: Talk by Thomas Schneider (Lead Engineer at zooplus)
Join our Meetup: www.meetup.com/cloud-native-muc
Abstract: This talk is on experiences with the cloud native stack in production using CoreOS, docker, Mesos & nginx.
2. MUC_DATE_THEME_INITIALS 2
Ι Intro
Ι Pets vs. cattle
Ι Where we started ...
Ι Learnings & dead-ends
Ι ... where we ended up
Ι Technology stack
Ι Showtime, baby!
Agenda
3. MUC_DATE_THEME_INITIALS 3
Intro
Ι Background
§ ~10 years experience in IT
§ Mainly software development
§ Some system engineering, automation & operations
§ @zooplus:
§ Lead engineer, build & runtime platform
§ Trying to encourage DevOps culture
Ι Get in touch
§ thomas.schneider@zooplus.com
§ github.com/schneidexe
5. MUC_DATE_THEME_INITIALS 5
Where we started …
Ι Done via shell
Ι .sh scripts, scp, rsync
Ι Mainly .jar/.war files
Ι 2 out of 3 apps
deployed differently
Ι Locally != dev != prod
Ι Dev create & test, but
no idea how it runs
Ι Ops deploy & run, but
have no idea what
Ι Monolithic code-base
Ι CI partly automated
Ι Manually configured
teamcity
Ι 10+ all (=one) purpose
agents
Ι Up to 10+ builds in
parallel
Ι Maven binary repository
(nexus)
Ι Source repository
(subversion)
Build Deploy
Ι 300+ apps/jobs
Ι Lot’s of bare-metal
Ι Static environment
Ι Servers used by multiple
apps, upgrades blocked
Ι Outdated machines
Ι Monitoring: ELK,
collectd (hard to get all)
Ι High error noise level
Ι Ops == Firefighters
Ι Devs blocked by Infra
Ι Biz scaled, errors $$$
Run
Application stack is limited and tailored to environment. Build and release process slow, error-
prone and inflexible. Intransparent infrastructure, extension is slow & painful.
7. MUC_DATE_THEME_INITIALS 7
Learnings & dead-ends
Ι Handle diversity
§ Horizontal scaling is not our primary issue
§ Handling tons of different apps it is!
Ι Immutability & automation
§ Everything is (made from) code, build immutable artifacts
§ Automation is not only for speed but for knowledge
§ Re-creation over re-configuration
Ι Bring the pain forward
§ Do not do it all by yourself, get infra and devs (and biz) on-board
§ Think reverse: from prod to dev
Ι Keep it simple and fast
§ Get buy-in from devs, they have to understand it
§ Teams had/have to adopt several times (after all that years of stable infra!)
§ Simple is fun, Speed is fun - if it’s fun people will use it
8. MUC_DATE_THEME_INITIALS 8
Learnings & dead-ends
Ι Docker helps, but does not solve all of your problems
§ It’s not just about ‘docker build’ & ‘docker run’
§ Think about scaling, monitoring and management of your apps on prod
Ι Stay focused
§ What do YOU need? (not Google, Netflix & co)
§ There are new products around every week, do 2-3 POCs, then stick with your descision
§ Keep it modular, have a plan in mind how to migrate/replace parts
§ Be not scared of throwing away a few things
Ι Persistence
§ Try not to mix stateful and stateless things, externalize data
§ A database might be more a pet than cattle
9. MUC_DATE_THEME_INITIALS 9
Learnings & dead-ends
Ι Puppet
§ Not the best tool for deploying your app
§ Re-configuration can be tricky (systems become almost identical)
§ Not immutable (unless you really nail every dependency)
Ι Fleet (0.9.x)
§ nice features like side-kicks, low overhead
§ no resource management, too low level
§ stability issues
Ι Mixing frameworks on Mesos agent nodes
§ Isolation can get tough if patterns are too different e.g. jobs and services
§ Not enough resources for big jobs on service nodes
§ Spike utilization of batch jobs or builds can impact overall host performance
Ι Graphite and containers
§ cannot handle metrics with too much dynamics (30k different containers in 1 week)
10. MUC_DATE_THEME_INITIALS 10
... where we ended up
Ι GUIs and REST APIs
Ι Deploy
§ Services
§ Jobs
§ Containers
§ Machines
Ι Unified deployment &
management
Ι Cloud-agnostic
Ι You build it, you run it!*
Ι Distributed code-base
Ι full CI/CD life-cycle can
be automated
Ι Pre-configured jenkins
master
Ι Disposable, customized
jenkins slaves
Ι Scalable builds
Ι Multi-format binary
repository (Artifactory)
Ι Source repository (git)
Build Deploy
Ι Flexible resource
management
Ι Health-checks &
self-healing
Ι Environment config
Ι Service discovery &
routing
Ι Horizontal scaling
Ι House-keeping
Ι Out-of-the-box
monitoring (metrics,
logs)
Run
Build, deploy and run any application with high flexibility & low effort (Jenkinsfile, Dockerfile,
Deployment .json). Same release process for all applications. High transparency on infrastructure*.
14. MUC_DATE_THEME_INITIALS 14
Technology stack:
Deploy API
Ι Fast cloud-like provisioning of
VMs (resources: cpu, mem,
disk, net)
Ι Lightweight bootstrapping
with cloud-init
Ι Focus on cattle machines
Ι Re-create over re-configure
$ curl -X POST
--form "image=coreos-1081.3.0"
--form "application=docker"
--form "env=dev112"
--form "cpu=8"
--form "mem=16"
--form "disk=100"
--form "cloudinit_file=@docker.yml"
"http://deploy.zooplus.de/api/v1/machines"
coreos:
units
- name: docker.service
drop-ins:
- name: docker-opts.conf
content: |
[Service]
Environment='DOCKER_OPTS=host=tcp://0.0.0.0:2375'
15. MUC_DATE_THEME_INITIALS 15
Technology stack:
Docker
Ι Automated reproducible
builds with Dockerfile
Ι Immutable images
Ι Bundles app and
dependencies
Ι Common artifact format
Ι Standardized way of
deployment, monitoring, etc.
Ι Isolation of applications
Ι Resource allocation
$ cat Dockerfile
FROM repo.zooplus.de/centos:7
RUN yum install –y java-1.8.0_47 &&
yum clean all
ADD shop.jar /shop.jar
CMD java –jar shop.jar
$ docker build –t shop .
Building image shop
Step 1 : FROM repo.zooplus.de/centos:7
---> 9b92a6d1f7de
...
$ docker run shop
Starting shop...
16. MUC_DATE_THEME_INITIALS 16
Technology stack:
Mesos
Ι Resource manager
Ι Task distribution
Ι “Whole DC as single machine”
Ι All tasks run in docker
containers
Ι Web UI for status/utilization
and debugging
(logs, task state)
Ι Usually no direct interaction
17. MUC_DATE_THEME_INITIALS 17
Technology stack:
Jenkins
Ι Automation engine
Ι Jenkins 2.x
Ι Jenkinsfile and multi-
branch support
Ι Post-commit hooks
Ι Immutable slaves
§ Running on mesos/docker
§ Customized
§ Highly scalable
Ι Jenkins master docker image
§ Spawn test instance in <1min
(builds should run on prod)
§ Bootstrap with DSL
folder(”catalog") { }
multibranchWorkflowJob(catalog/app') {
branchSources {
git {
remote('ssh://git@stash.zooplus.de:22/cat/app.git')
credentialsId('ef406810-be3c-4f2a-ad65-6239706d1766')
}
}
}
18. MUC_DATE_THEME_INITIALS 18
Technology stack:
Marathon
Ι Task scheduler for mesos
Ι Distributed init system
Ι Long runnnig apps/services
Ι Rest API: submit apps via
.json
Ι GUI: manage apps & manual
config
Ι Health checks & self-healing
Ι Multi-app deployments
Ι Rolling updates
Ι Horizontal scaling
19. MUC_DATE_THEME_INITIALS 19
Technology stack:
Chronos
Ι Task scheduler for mesos
Ι Distributed cron system
Ι Batch jobs
Ι Rest API: submit jobs via
.json
Ι GUI: job details & status,
manual execution
Ι Scheduling
§ Time-based
§ Dependency-based
20. MUC_DATE_THEME_INITIALS 20
Technology stack:
Nixy/Nginx/Mesos-DNS
Ι Nixy
§ Service catalog from marathon
§ REST-like API
§ Event-based
§ Configures nginx based on
templates
Ι Nginx
§ State-of-the-art web server
§ Used as service router
§ SSL termination
§ Proxy for HTTP, TCP and UDP
§ Access control & public
exposure
Ι Mesos-DNS
§ Service catalog from mesos
§ Convention-over-configuration
naming pattern
§ used for “internal” services
"Apps": {
"/finance/jenkins": {
"Tasks": [
["ops85-150.web.zooplus.de:20357"],
["ops85-150.web.zooplus.de:20358"]
],
"Frontends": [
{
"Type": "http",
"Data": ["finance-jenkins"]
}
]
}
}
$ host jenkins-finance.marathon.prod.zooplus.net
jenkins-finance.marathon.prod.zooplus.net has address 192.168.85.150
21. MUC_DATE_THEME_INITIALS 21
Technology stack:
journal & beats
Ι Hostbeat
§ Ships host metrics in beats
format
§ Like collectd
Ι Dockerbeat
§ Ships container metrics in
beats format
§ Metadata: env, labels
Ι Journal/Filebeat
§ Ships every single log line from
journald to ELK
§ Docker uses journal log-driver
to ship stdout/stderr
§ Apps should log in JSON-lines
Ι ELK/Graphite
§ Elastic search: event-data
§ Graphite: TSD/metrics
Ι Nagios