ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey to Production" (Patrick Mizer & Steve Woodruff)
Nov. 4, 2015•0 likes
4 likes
Be the first to like this
Show More
•782 views
views
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Report
Technology
Slides from Patrick Mizer & Steve Woodruff's talk "Easing Your Way Into Docker: Lessons From a Journey to Production" at ContainerDays NYC 2015: http://dynamicinfradays.org/events/2015-nyc/programme.html#sparefoot
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey to Production" (Patrick Mizer & Steve Woodruff)
Easing Your Way Into Docker
Lessons From a Journey to Production
ContainerDays NYC
October 30, 2015
Who are we?
Steve Woodruff
❏ DevOps Engineer at SpareFoot
implementing CI/CD
❏ Spent 10+ years at Motorola doing
embedded development (C, C++)
❏ Spent 5 years at IBM as a sys admin in a
large server farm (Linux, AIX, Solaris)
swoodruff@sparefoot.com
Twitter: @sjwoodr
GitHub: sjwoodr
Patrick Mizer
❏ Software Engineer at SpareFoot (6
years)
❏ 12 years as a developer in
consumer web
❏ Convinced Steve to keep letting us
play w/ containers even after we
messed it up countless times
patrick@sparefoot.com
GitHub: maximizer
● Think Hotels.com for self storage*
● All infrastructure in AWS
● 40 Developers on 7 Teams
○ Continuous Delivery
● Docker in production since 2014
*This kind of storage:
What We Will Talk About
● We solved some problems with
Docker Containers
● We started small and eventually
got to production
● We ffff...messed up a lot along
the way
This is the talk that we would have
liked to see before we learned these
lessons the hard way...
The Beginning: SpareFoot + Docker
Hackathon! Docker + Fig
(now compose) allowed us to
run production architecture
locally.
The Development Environment
● We want to be as close as possible to
production
● We want the development environment to be
as fast as possible for interactive use.
Aha: Vagrant + Docker?
Virtual
Machine
% vagrant up
App 1
Redis
Solr Search
MySQL DB
Containers
App 2
Putting it together with Compose
Virtual
Machine
% vagrant up
App 1
Redis
Solr Search
MySQL DB
Containers
App 2
% fig up
Lessons Learned
● Docker creates application isolation in a super lightweight way
○ We can model our production infrastructure locally
● Compose is fantastic for the local dev environment
● Vagrant + Docker gets us an interactive local dev environment via synced
folders and volumes
● We got to cut our teeth on Docker
Ok, so Docker feels like the a solution
… and we kind of know how to do this. But....
● Continuous Integration / Delivery?
○ Docker Registry
○ Bamboo
○ Deployments
● Host Volumes and Port Forwarding rules?
○ Not saved with the source code
● Get Docker to run in local, dev, staging, and production environments?
○ Configuration?
CI and deployments
Janky shell scripts… slow builds, etc…
● Used Bamboo to build images
○ feature branches were built/deployed to Dev
○ master branch was built/deployed to Staging
● Dynamically created custom container start script
● Tried to auto-detect when the containers started to begin post-deploy test
● Build times were rather long
● Spent an awful long time doing docker push (to our registry) and docker pull
(on the target hosts)
Host Volumes and port forwarding rules
● Exposed / Published ports were handled via a text file we parsed at build time
● Tried to accommodate the future when we’d have more apps/containers
● Host volumes that had to be mounted were hard coded in the Bamboo build
plan for the app so they could be added to that dynamically created container
start script
Get Docker running
Supporting multiple environments
● Bamboo would deploy rather well to DEV and STAGE using these
dynamically created scripts.
● Felt rather fragile and could break at any time
● Production deploys were scripts that would do a docker pull on several hosts
and then kill the old containers on a host, start the new containers on that
host, and then move on to the next host.
● Wasn’t always a zero-downtime deployment
Docker in Production (technically)!
We had 2 load balanced
EC2 instances running a
node app.
ELB
443
3000 3000
Docker in Production (technically)!
We had 2 load balanced
EC2 instances running a
node app.
Now we have 2 load
balanced EC2 instances
running docker containers
that run a node app!
ELB
443
3000 3000
ELB
App 1 App 1
3000 3000
443
Docker in Production (technically)!
ELB ELB
App 1 App 1
We had 2 load balanced
EC2 instances running a
node app.
Now we have 2 load
balanced EC2 instances
running docker containers
that run a node app!
NEW443
3000 3000 3000 3000
443
Yim: Trouble in Docker Paradise
● Hosting our own Docker registry was a bad idea
○ Stability was a problem
○ No level of access control on the registry itself
● Mimicking servers - 1 container per host. Need orchestration please!
● Amazon Linux AMI -> old version of Docker… doh!
● Docker push/pull to/from registry was very slow
○ build - push to registry
○ deploy - pull from registry to each host, serially
● Performance was fine….
○ But stability was the issue
○ This internal-facing nodejs app was moved to a pair of EC2 instances and out of Docker after
about 4 months of pain and suffering
Yim: Lessons Learned
● We need orchestration
○ Rolling our own docker deployments was confusing to OPS and to the
Dev team
● Our own docker registry is a bad idea
○ Stability was a problem
○ No level of access control on the registry itself
○ Our S3 backend would grow by several GB per month with no automated
cleanup
● No easy way to rollback failed deploys
○ Just fix it and deploy again...
● All this culminated in a poor build process and affected CI velocity
○ Longer builds, longer deploys, no real gain
Like everyone else....
...we are “deconstructing the monolith”
Application
Monolithic Library
Data
REST API
Data
Microservice
Like everyone else....
...we are “deconstructing the monolith”
Application
REST API
Data
Microservice
REST API
Data
Microservice
REST API
Data
Microservice
REST API
Data
Microservice
API Gateway
Revisiting The Development Environment
● We want to be as close as possible to
production
● We want the development environment to be
as fast as possible for interactive use
● We want our microservices isolated
We’ve learned some things...
● Easier than we thought
● Quay was the glue we needed
○ Use an off the shelf solution.
○ We like Quay.io
● Bolting on to our existing CI pipeline worked really well.
○ Developers didn’t have to learn new process
○ Microservice consumers can pull tagged versions
○ We can automate tests against all versions
Now we talk containers from local -> dev -> staging but NOT
in production.
Production - What is still needed
● Orchestration
○ Yim sucked because we tried to do this ourselves
● Better Deployments
○ With rollbacks
● Configuration Management
○ We have things to hide
Production - Software Selection
● Choosing orchestration software / container service
○ StackEngine
■ Lacked docker-compose support
○ Kubernetes
■ PhD Required
○ Mesosphere
■ Nice, but slow to deploy
○ EC2 Container Service
■ Lacked docker-compose support and custom AMIs
○ Tutum
○ Rancher
Production - Enter Rancher
After running proof-of-concepts of both Tutum and Rancher, we decided to
continue down our path to production deploys with Rancher.
● Had more mature support for docker-compose files.
○ Tutum added this after our evaluation had ended
● Did not require us to orchestrate the deployments through their remote
endpoint
○ Rancher server runs on our EC2 instances and we are in full control of all the things
● Had a full API we can work with in addition to the custom rancher-compose cli
● Had a very-active user community and a beta-forum where the Rancher
development team was active in answering questions and even
troubleshooting configuration problems.
Overlaying Docker on AWS
● ELB as a front-end to each service
● ELB load balances to haproxy containers
● HAProxy containers load balance to the service containers
Overlaying Docker on AWS
● Why the extra HAProxy layer?
○ Allows us to create the ELB and leave them alone
○ When we deploy new versioned services we update the service alias / haproxy links
○ Allows for fast rollback to previous version of the service
Deployments and Rollbacks
● Developers can deploy to production whenever they want
○ HipChat bot commands to deploy and rollback/revert
● Deployments to each of the 3 environments use rancher-compose to
○ Deploy new versioned services / containers
○ Create or update service aliases / haproxy links
○ Delete previous versioned services except for current and previous
● When things go haywire…
○ We simply rollback
○ Production deploy creates a docker-compose-rollback.yml file
■ Query Rancher API to get list of running services
■ Allows us to change haproxy and service alias links back to the previous version
■ Super fast to rollback, no containers need to be spun up!
Technical Challenge - docker-compose
● We needed to support a single docker-compose.yml file, maintained by
developers of an app or service
○ They don’t want to maintain local, dev, stage, and prod versions of this file
○ Changes to multiple files would be error-prone
○ Must support differences in the architecture or configuration of services across environments
○ Secret Secret, I’ve got a Secret
Secret Management
We’re already using SaltStack to manage our EC2 minions (VMs)
● Salt Grains are used for some common variables used in salt states
● Salt Pillar Data exists which is configuration data available only to certain
minions
● This Salt Pillar Data is already broken down by environment (dev/stage/prod)
● We should just use this data to dynamically create the docker-compose and
rancher-compose files!
A templated rancher-compose file
{% set sf_env = grains['bookingservice-env'] %}
{% set version = grains['bookingservice-version'] %}
bookingservice-{{ sf_env }}-{{ version }}:
scale: 1
We use a scale of 1 because we use global host scheduling combined with host affinity so that one
container of this service is deployed to each VM of the specified environment (dev/stage/prod). This
allows us to spin up a new Rancher host and easily deploy to the new host VM.
Deployments with rancher-compose
● Deployments to Dev and Staging are done via Bamboo
● Deployments to Production are done by developers via HipChat commands
● In the end, everything is invoking our salt-deploy.py script
○ Set some salt grains for target env, version, buildid, image tag in quay.io
○ Services get versioned with a timestamp and bamboo build id
○ Render jinja2 / Inject Salt grains and pillar data via salt minion python code
■ caller.sminion.functions['cp.get_template'](cwd + '/docker-compose.
yml', cwd + '/docker-compose-salt.yml')
■ caller.sminion.functions['cp.get_template'](cwd + '/rancher-compose.
yml', cwd + '/rancher-compose-salt.yml')
○ Invokes rancher-compose create / up
○ Cleanup to keep the live verison of a service and live-1 version. The rest are purged.
Surprise! Rancher Adds Variable Support
Does the support for interpolating variables, added in Rancher 0.41, deprecate the
work we've done with Salt and rendering jinja2 templates?
● No. We already maintain data in grains and pillars so we just reuse that data.
● Rancher implementation uses the environment variables on the host running
rancher-compose to fill in the blanks
● It would require logic to load those env variables based on the target env
(dev/stage/prod) so might as well get the data out of salt pillar which has
separate pillars for each service and then broken down by target environment.
Where are we now?
● 10 Microservices in production with Rancher + Docker
○ 5-10 Deployments per day on average
○ Busiest services handling around 50 requests / second
● Consumer facing applications being containerized in
development
○ New teams cutting their teeth
○ Keep on “Strangling”*
* DO NOT: google image search for “strangling hands”