Running Docker in Production Successfully
John Fiedler
Sr. Director of Engineering @ SalesforceIQ
About me
● I work for SalesforceIQ formerly RelateIQ
● I’ve used Docker for over 2 years
● I’ve done a couple of talks on Docker
o http://blog.heavybit.com/blog/2015/3/2
3/dockermeetup
o https://engineering.twitter.com/universi
ty/videos/chef-versus-docker-at-
relateiq
o https://www.youtube.com/watch?v=z9
yNq-IjCcM
● I co-authored this book:
o http://bleedingedgepress.com/docker-
in-the-trenches/
Docker Book
● 50% off for everyone!
● Click here!
https://gum.co/lQGH/dockerconeu
● Only $11.50
● 200 pages
Agenda
Docker Journey with SalesforceIQ
Lessons Learned
PaaS/CaaS
Docker Journey with
SalesforceIQ
Two years in production...
What is production?
Production != test dev
Isolation, Security, Performance, Monitoring, Logging…
Scale, templates, automation…
What is successful?
>99% uptime or low # of outages?
Fast code deployment?
0 Security Incidents?
100% of our web infrastructure running with Docker
Boom
SalesforceIQ journey into production
2013 2014 2014 2014Q4Q4 Q1 Q2
Dev
Environment
Continuous
Deployment
in Teamcity
Web
Zero Downtime
Deployments
Full Stack
Container
Azkaban
DockerMe
Integrations
Batch Jobs
Mesos
Kafka
Dev/
Ops
CLI
Craft CMS
Main Website
Beanstalk
2015+
Devenv 2.0
P
a
a
S
Now2015
Database
CI/CD Server
Dev or Ops
Environment
Web Server
Api Server
Batch Jobs
Integrations
What we’ve put in containers
Rate of Change
Dependencies
Database
CI/CD Server
Dev or Ops
Environment
Web Server
Api Server
Batch Jobs
Integrations
Stateful
Long-Life
Stateless
Short-Life
What we’ve put in containers
Zoom in a little
Persistent Storage
Middleware / Integrations /
Internal Tools / Scripts / Jobs
Web
Monitoring
Logging
Security
Dev
Environment
Ops
Environment
CI / CD
Fully Somewhat No
Create Deploy Run Operate
Dockerized
Batch & Stream processing
Lessons Learned
Alot...
Lots of tidbits
● Docker is prod ready but many surrounding
solutions are not (alpha and beta)
o Caution with the new toys is required
● Don’t go straight towards a PaaS if you're just
starting out
o Kubernetes, Mesos, CoreOS, Swarm, ECS
● Keep it simple
o Know what works and what doesn’t
● Old tools still work great, and I’ll show you how
o Know how to scale what you're doing
● You're going to have to roll your own at some point
(orchestration)
o As of version 1.5.11, HAProxy does not
support zero downtime restarts or reloads
of configuration.
● Learn from others, Tons of people in production
now
o Read the whole internet
● You can secure running containers
o Twistlock, Conjur, Banyanops
● Get creative
o Docker is golden and mobile
You can docker with Chef, Ansible, SaltStack...
• You can use the tools you have today if you're not dockerized already
• What…
• But those are the tools i’m already using...
• Yes they still work and work great
Our current prod web server
● Worked with all our existing
tools!
○ Chef, Monitoring, Logging
● Security didn’t change
○ Security keys
○ Firewall
● Super easy to scale
○ Could pack with Packer to
create AMI
○ Shell script was super easy
● Zero downtime
● Rollbacks
Web Container
v1
Web Container
v2
Hipache/Redis Container
Amazon AMI setup with Chef
Cron job to run shell script to orchestrate containers
Demo
It’s time
#1 thing we found!!!!
You WILL have disk/file system issues...
File system...
Volumes not unmounting
Long deletion times on device mapper
–storage-opt dm.blkdiscard=false
Kernel version matters!
Great visual deep dive
http://merrigrove.blogspot.com/2015/10/visualizi
ng-docker-containers-and-images.html?m=1
What we used overtime
1. Started with AUFS - hit 42 layer limit
2. Then moved to device mapper
a. Device/Volume not found
b. NNOOOOOOOOOO
3. Back using AUFS again after bug fixes
and layer 42 limit removal
a. Continue to fight layer issues, mount
issues
4. Back to device mapper with Docker 1.7
dynamic binaries!
What we’ve landed on
Ubuntu = AUFS
Amazon Linux = Device mapper
Get a good registry
Great options
• Hub.docker.com
• Quay.io
• Trusted registry
• Google
• Azure
• AWS
• S3.. no registry…
save/load
1. We started private registry
a. went insane with buggy
releases, failed pulls/pushes
2. Went to quay.io
a. happy but slow, and costs
$$
3. Back to private registry 0.9
release… now stable
4. Scaled it and working great
5. Now working on upgrading to
Docker Registry 2.1
Storage
-Unlimited
-Cheap
Elasticache
-Redis
Beanstalk
-Autoscale
Scaling our registry
• 100% AWS
• Beanstalk
ELB
Auto scaling Group
Docker web service
• Redis Cache
Elasticache
Had issues when a node failed
• S3 Backend
Had huge issues on layer corruption
ELB
Docker
Registry Cache
S3
Isolation is your friend
Low container to host ratio
• Compute
Spikey Processing… no problem
• Storage
Out of disk… no problem
• Networking
Shared bandwidth… no problem
• Ram
Swapping issue… no problem
• Security Groups
Least privilege… no problem
Web Container
v2
Amazon AMI setup with Chef
Cron job to run shell script to orchestrate containers
Hipache/Redis Container
Web Container
v1
CI/CD with Docker
• The biggest ROI with Docker
• Teamcity
• Used to use Docker in Docker
https://jpetazzo.github.io/2015/09/03/do-not-
use-docker-in-docker-for-ci/
• Agents used to run in a docker container
Now built with chef and packer
• Autoscaling with Docker?
Github.com
Dockerfile
Teamcity
Agent Agent Agent
Registry
Server
Many PaaS/CaaS utilize sidekicks
• Amazon ECS
https://github.com/aws/amazon-ecs-agent
• Amazon Beanstalk
https://github.com/aws/aws-eb-python-
dockerfiles
• Netflix
Prana
• Smartstack
• Docker Ambassador
http://www.slideshare.net/Docker/slides
hare-burns
• CoreOS - Sidekick
• Rancher
• Logging
Container Container
Container
Container
(sidekick)
Rest Api
Service Discovery
Health checks
Orchestration
Container
Host
PaaS/CaaS
How you’ll scale a single service
Beanstalk
-Cloud formation
EC2 Server
Autoscaling
Isolation
Security Groups
Environment Variables
Beanstalk architecture
• Run Over 50+ services on
beanstalk today
• Automagically built web container
per branch of code
• Corp site/Help site
• 100% automated!!
• Great for Web services SOA
• You will have disk issues
Storage
Easy to spin up
DNS service discovery
Load balancer
SSL Termination ELB
Container
RDS
Demo
Beanstalk
One year ago
• CoreOS... so cool
• Mesos… cool with scale
• Beanstalk… with docker support
• Swarm… beta
• Deis… oooo saas
• ECS… ok now we're getting somewhere
• Kubernetes… where did that come from… looks cool too
Now…..
• Kubernetes on top of DCOS, on top of Mesos, on top of CoreOS…
facepalm
PaaS/CaaS Overview
CoreOS DCOS Kubernetes ECS
Orchestration
Scheduler
Resource Allocation
Service Discovery
More than Containers
Health Check
Storage clustering...
Live Migration...
Affinity rules...
DCOS
Mesos Private Slave
Auto Scaling
Health Checks
Intelligence
Being successful with a PaaS/CaaS
Our DCOS Architecture
Built a edge router
Built a Brain router
Infra CLI
This will run all of our
stateless services
Mesos Public Slave
Auto Scaling
Service Discovery
Public <> Private DNS
Can be Internal as well
Storage
SSL Termination
DNS
ELB
Edge
Router
DB2
ServiceService
Edge
Router
DB3DB1
Mesos Master
Marathon
Health Check
API
Change Event
Bus
InfraIQ
Demo
InfraIQ
Summary
• Starting out? Just use the same tools you have
• You’ll need to roll up your sleeves
• Security is not hard but you need to think about it
• Many vendors are entering container space
• Build towards a PaaS
• Many solutions to PaaS
• Know what you're trying to solve
• Have fun!
Thank you!
John Fiedler@johnfiedler
johnfiedler@gmail.com

Dockercon EU 2015

  • 1.
    Running Docker inProduction Successfully John Fiedler Sr. Director of Engineering @ SalesforceIQ
  • 2.
    About me ● Iwork for SalesforceIQ formerly RelateIQ ● I’ve used Docker for over 2 years ● I’ve done a couple of talks on Docker o http://blog.heavybit.com/blog/2015/3/2 3/dockermeetup o https://engineering.twitter.com/universi ty/videos/chef-versus-docker-at- relateiq o https://www.youtube.com/watch?v=z9 yNq-IjCcM ● I co-authored this book: o http://bleedingedgepress.com/docker- in-the-trenches/
  • 3.
    Docker Book ● 50%off for everyone! ● Click here! https://gum.co/lQGH/dockerconeu ● Only $11.50 ● 200 pages
  • 4.
    Agenda Docker Journey withSalesforceIQ Lessons Learned PaaS/CaaS
  • 6.
  • 7.
    What is production? Production!= test dev Isolation, Security, Performance, Monitoring, Logging… Scale, templates, automation… What is successful? >99% uptime or low # of outages? Fast code deployment? 0 Security Incidents?
  • 8.
    100% of ourweb infrastructure running with Docker Boom
  • 9.
    SalesforceIQ journey intoproduction 2013 2014 2014 2014Q4Q4 Q1 Q2 Dev Environment Continuous Deployment in Teamcity Web Zero Downtime Deployments Full Stack Container Azkaban DockerMe Integrations Batch Jobs Mesos Kafka Dev/ Ops CLI Craft CMS Main Website Beanstalk 2015+ Devenv 2.0 P a a S Now2015
  • 10.
    Database CI/CD Server Dev orOps Environment Web Server Api Server Batch Jobs Integrations What we’ve put in containers Rate of Change Dependencies
  • 11.
    Database CI/CD Server Dev orOps Environment Web Server Api Server Batch Jobs Integrations Stateful Long-Life Stateless Short-Life What we’ve put in containers
  • 12.
    Zoom in alittle Persistent Storage Middleware / Integrations / Internal Tools / Scripts / Jobs Web Monitoring Logging Security Dev Environment Ops Environment CI / CD Fully Somewhat No Create Deploy Run Operate Dockerized Batch & Stream processing
  • 13.
  • 14.
    Lots of tidbits ●Docker is prod ready but many surrounding solutions are not (alpha and beta) o Caution with the new toys is required ● Don’t go straight towards a PaaS if you're just starting out o Kubernetes, Mesos, CoreOS, Swarm, ECS ● Keep it simple o Know what works and what doesn’t ● Old tools still work great, and I’ll show you how o Know how to scale what you're doing ● You're going to have to roll your own at some point (orchestration) o As of version 1.5.11, HAProxy does not support zero downtime restarts or reloads of configuration. ● Learn from others, Tons of people in production now o Read the whole internet ● You can secure running containers o Twistlock, Conjur, Banyanops ● Get creative o Docker is golden and mobile
  • 16.
    You can dockerwith Chef, Ansible, SaltStack... • You can use the tools you have today if you're not dockerized already • What… • But those are the tools i’m already using... • Yes they still work and work great
  • 17.
    Our current prodweb server ● Worked with all our existing tools! ○ Chef, Monitoring, Logging ● Security didn’t change ○ Security keys ○ Firewall ● Super easy to scale ○ Could pack with Packer to create AMI ○ Shell script was super easy ● Zero downtime ● Rollbacks Web Container v1 Web Container v2 Hipache/Redis Container Amazon AMI setup with Chef Cron job to run shell script to orchestrate containers
  • 18.
  • 20.
    #1 thing wefound!!!!
  • 21.
    You WILL havedisk/file system issues...
  • 22.
    File system... Volumes notunmounting Long deletion times on device mapper –storage-opt dm.blkdiscard=false Kernel version matters! Great visual deep dive http://merrigrove.blogspot.com/2015/10/visualizi ng-docker-containers-and-images.html?m=1 What we used overtime 1. Started with AUFS - hit 42 layer limit 2. Then moved to device mapper a. Device/Volume not found b. NNOOOOOOOOOO 3. Back using AUFS again after bug fixes and layer 42 limit removal a. Continue to fight layer issues, mount issues 4. Back to device mapper with Docker 1.7 dynamic binaries! What we’ve landed on Ubuntu = AUFS Amazon Linux = Device mapper
  • 23.
    Get a goodregistry Great options • Hub.docker.com • Quay.io • Trusted registry • Google • Azure • AWS • S3.. no registry… save/load 1. We started private registry a. went insane with buggy releases, failed pulls/pushes 2. Went to quay.io a. happy but slow, and costs $$ 3. Back to private registry 0.9 release… now stable 4. Scaled it and working great 5. Now working on upgrading to Docker Registry 2.1
  • 24.
    Storage -Unlimited -Cheap Elasticache -Redis Beanstalk -Autoscale Scaling our registry •100% AWS • Beanstalk ELB Auto scaling Group Docker web service • Redis Cache Elasticache Had issues when a node failed • S3 Backend Had huge issues on layer corruption ELB Docker Registry Cache S3
  • 25.
    Isolation is yourfriend Low container to host ratio • Compute Spikey Processing… no problem • Storage Out of disk… no problem • Networking Shared bandwidth… no problem • Ram Swapping issue… no problem • Security Groups Least privilege… no problem Web Container v2 Amazon AMI setup with Chef Cron job to run shell script to orchestrate containers Hipache/Redis Container Web Container v1
  • 26.
    CI/CD with Docker •The biggest ROI with Docker • Teamcity • Used to use Docker in Docker https://jpetazzo.github.io/2015/09/03/do-not- use-docker-in-docker-for-ci/ • Agents used to run in a docker container Now built with chef and packer • Autoscaling with Docker? Github.com Dockerfile Teamcity Agent Agent Agent Registry Server
  • 27.
    Many PaaS/CaaS utilizesidekicks • Amazon ECS https://github.com/aws/amazon-ecs-agent • Amazon Beanstalk https://github.com/aws/aws-eb-python- dockerfiles • Netflix Prana • Smartstack • Docker Ambassador http://www.slideshare.net/Docker/slides hare-burns • CoreOS - Sidekick • Rancher • Logging Container Container Container Container (sidekick) Rest Api Service Discovery Health checks Orchestration Container Host
  • 28.
  • 29.
    Beanstalk -Cloud formation EC2 Server Autoscaling Isolation SecurityGroups Environment Variables Beanstalk architecture • Run Over 50+ services on beanstalk today • Automagically built web container per branch of code • Corp site/Help site • 100% automated!! • Great for Web services SOA • You will have disk issues Storage Easy to spin up DNS service discovery Load balancer SSL Termination ELB Container RDS
  • 30.
  • 31.
    One year ago •CoreOS... so cool • Mesos… cool with scale • Beanstalk… with docker support • Swarm… beta • Deis… oooo saas • ECS… ok now we're getting somewhere • Kubernetes… where did that come from… looks cool too Now….. • Kubernetes on top of DCOS, on top of Mesos, on top of CoreOS… facepalm
  • 32.
    PaaS/CaaS Overview CoreOS DCOSKubernetes ECS Orchestration Scheduler Resource Allocation Service Discovery More than Containers Health Check Storage clustering... Live Migration... Affinity rules...
  • 33.
    DCOS Mesos Private Slave AutoScaling Health Checks Intelligence Being successful with a PaaS/CaaS Our DCOS Architecture Built a edge router Built a Brain router Infra CLI This will run all of our stateless services Mesos Public Slave Auto Scaling Service Discovery Public <> Private DNS Can be Internal as well Storage SSL Termination DNS ELB Edge Router DB2 ServiceService Edge Router DB3DB1 Mesos Master Marathon Health Check API Change Event Bus InfraIQ
  • 34.
  • 35.
    Summary • Starting out?Just use the same tools you have • You’ll need to roll up your sleeves • Security is not hard but you need to think about it • Many vendors are entering container space • Build towards a PaaS • Many solutions to PaaS • Know what you're trying to solve • Have fun!
  • 37.