Presentation by Steve Woodruff
The story of how SpareFoot broke up its monolithic application into micro services, deployed Docker into production, and established a "contract" between Dev and Ops.
2. Who am I?
Steve Woodruff
❏ Director of DevOps at SpareFoot
implementing CI/CD
❏ Spent 10+ years at Motorola doing
embedded development (C, C++)
❏ Spent 5 years at IBM as a sys admin in a
large server farm (Linux, AIX, Solaris)
swoodruff@sparefoot.com
Twitter: @sjwoodr
GitHub: sjwoodr
3. ● Think Hotels.com for self storage*
● All infrastructure in AWS
● 40 Developers on 7 Teams
○ Continuous Delivery
● Docker in production since 2014
*This kind of storage:
4. The Beginning: SpareFoot + Docker
Hackathon! Docker + Fig
(now compose) allowed us to
run production architecture
locally.
5. Yim - Call Center Application
Used exclusively by our
call center
Chrome ONLY
Node version n+1
React + Flux
Vers. n
+1
Vers. n
+1
Vers. n
6. Yim - Call Center Application
Used exclusively by our
call center
Chrome ONLY
Node version n+1
React + Flux
Vers. n
+1
Vers. n
+1
Vers. n
7. CI and deployments
Janky shell scripts… slow builds, etc…
Used Bamboo to build images
feature branches were built/deployed to Dev
master branch was built/deployed to Staging
Dynamically created custom container start script
Tried to auto-detect when the containers started to begin post-deploy test
Build times were rather long
Spent an awful long time doing docker push (to our registry) and docker pull (on the target hosts)
8. Ok, so Docker feels like the a solution
… and we kind of know how to do this. But....
Continuous Integration / Delivery?
○ Docker Registry
○ Bamboo
○ Deployments
● Host Volumes and Port Forwarding rules?
○ Not saved with the source code
● Get Docker to run in local, dev, staging, and production environments?
○ Configuration?
9. Docker in Production (technically)!
We had 2 load balanced
EC2 instances running a
node app.
ELB
443
3000 3000
10. Docker in Production (technically)!
We had 2 load balanced
EC2 instances running a
node app.
Now we have 2 load
balanced EC2 instances
running docker containers
that run a node app!
ELB
443
3000 3000
ELB
App 1 App 1
3000 3000
443
11. Docker in Production (technically)!
ELB ELB
App 1 App 1
We had 2 load balanced
EC2 instances running a
node app.
Now we have 2 load
balanced EC2 instances
running docker containers
that run a node app!
NEW443
3000 3000 3000 3000
443
12. Yim: Trouble in Docker Paradise
Hosting our own Docker registry was a bad idea
Stability was a problem
No level of access control on the registry itself
Mimicking servers - 1 container per host. Need orchestration please!
Amazon Linux AMI -> old version of Docker… doh!
Docker push/pull to/from registry was very slow
build - push to registry
deploy - pull from registry to each host, serially
Performance was fine….
But stability was the issue
This internal-facing nodejs app was moved to a pair of EC2 instances and out of Docker after about 4
months of pain and suffering
13. Yim: Lessons Learned
We need orchestration
Rolling our own docker deployments was confusing to OPS and to the Dev team
Our own docker registry is a bad idea
Stability was a problem
No level of access control on the registry itself
Our S3 backend would grow by several GB per month with no automated
cleanup
No easy way to rollback failed deploys
Just fix it and deploy again...
All this culminated in a poor build process and affected CI velocity
Longer builds, longer deploys, no real gain
15. Like everyone else....
...we were “deconstructing the monolith”
Application
REST API
Data
Microservice
REST API
Data
Microservice
REST API
Data
Microservice
REST API
Data
Microservice
API Gateway
16. A Better Docker Registry
With Yim we learned that rolling our own Registry was a bad idea.
Limited Access Control
We have to maintain it
18. We’ve learned some things...
● Easier than we thought
● Quay was the glue we needed
○ Use an off the shelf solution.
○ We like Quay.io
● Bolting on to our existing CI pipeline worked really well.
○ Developers didn’t have to learn new process
○ Microservice consumers can pull tagged versions
○ We can automate tests against all versions
Now we talk containers from local -> dev -> staging but NOT
in production.
20. Production - What is still needed
Orchestration
Yim sucked because we tried to do this ourselves
Better Deployments
With rollbacks
Configuration Management
We have things to hide
23. Production - Software Selection
Choosing orchestration software / container service in early 2015
StackEngine
Lacked docker-compose support
Kubernetes
PhD Required
Mesosphere
Nice, but slow to deploy
EC2 Container Service
Lacked docker-compose support and custom AMIs
Tutum (now Docker Cloud)
Rancher
24. Production - Enter Rancher
After running proof-of-concepts of both Tutum and Rancher, we decided to continue down our
path to production deploys with Rancher.
Had more mature support for docker-compose files.
Tutum added this after our evaluation had ended
Did not require us to orchestrate the deployments through their remote endpoint
Rancher server runs on our EC2 instances and we are in full control of all the things
Had a full API we can work with in addition to the custom rancher-compose cli
Had a very-active user community and a beta-forum where the Rancher development team
was active in answering questions and even troubleshooting configuration problems.
26. Overlaying Docker on AWS
Why the extra HAProxy layer?
Allows us to create the ELB and leave them alone
When we deploy new versioned services we update the service alias / haproxy links
Allows for fast rollback to previous version of the service
27. Deployments and Rollbacks
Developers can deploy to production whenever they want
HipChat bot commands to deploy and rollback/revert
Deployments to each of the 3 environments use rancher-compose to
Deploy new versioned services / containers
Create or update service aliases / haproxy links
Delete previous versioned services except for current and previous
When things go haywire…
We simply rollback
Production deploy creates a docker-compose-rollback.yml file
Query Rancher API to get list of running services
Allows us to change haproxy and service alias links back to the previous version
Super fast to rollback, no containers need to be spun up!
32. Secret Management
We’re already using SaltStack to manage our EC2 minions (VMs)
Salt Grains are used for some common variables used in salt states
Salt Pillar Data exists which is configuration data available only to certain
minions
This Salt Pillar Data is already broken down by environment (dev/stage/prod)
We should just use this data to dynamically create the docker-compose and
rancher-compose files!
33. Technical Challenge - docker-compose
We needed to support a single docker-compose.yml file, maintained by
developers of an app or service
They don’t want to maintain local, dev, stage, and prod versions of this file
Changes to multiple files would be error-prone
Must support differences in the architecture or configuration of services across environments
Secret Secret, I’ve got a Secret
34. A templated rancher-compose file
{% set sf_env = grains['bookingservice-env'] %}
{% set version = grains['bookingservice-version'] %}
bookingservice-{{ sf_env }}-{{ version }}:
scale: 1
We use a scale of 1 because we use global host scheduling combined with host affinity so that one
container of this service is deployed to each VM of the specified environment (dev/stage/prod). This
allows us to spin up a new Rancher host and easily deploy to the new host VM.
37. Deployments with rancher-compose
Deployments to Dev and Staging are done via Bamboo
Deployments to Production are done by developers via HipChat commands
In the end, everything is invoking our salt-deploy.py script
Set some salt grains for target env, version, buildid, image tag in quay.io
Services get versioned with a timestamp and bamboo build id
Render jinja2 / Inject Salt grains and pillar data via salt minion python code
caller.sminion.functions['cp.get_template'](cwd + '/docker-compose.yml', cwd + '/docker-compose-
salt.yml')
caller.sminion.functions['cp.get_template'](cwd + '/rancher-compose.yml', cwd + '/rancher-compose-
salt.yml')
Invokes rancher-compose create / up
Cleanup to keep the live verison of a service and live-1 version. The rest are purged.
38. Surprise! Rancher Adds Variable Support
Does the support for interpolating variables, added in Rancher 0.41, deprecate the
work we've done with Salt and rendering jinja2 templates?
No. We already maintain data in grains and pillars so we just reuse that data.
Rancher implementation uses the environment variables on the host running
rancher-compose to fill in the blanks
It would require logic to load those env variables based on the target env (dev/
stage/prod) so might as well get the data out of salt pillar which has separate
pillars for each service and then broken down by target environment.
45. Where are we now?
52 Microservices in production with Rancher + Docker
5-10 Deployments per day on average
Busiest services handling around 50 requests / second
Consumer facing applications being containerized in development
New teams cutting their teeth
Keep on “Strangling”*
* DO NOT: google image search for “strangling hands”