Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sean schofield & Richard Lister, Spree Commerce_ Fearless deployment @ Open Commerce Conference 2016


Published on

Presented at the Open Commerce Conference on June 28-29, 2016 in New York City

Published in: Technology
  • Be the first to comment

Sean schofield & Richard Lister, Spree Commerce_ Fearless deployment @ Open Commerce Conference 2016

  1. 1. Fearless Deployment Sean Schofield (@uberzealot) Richard Lister (@bnzmnzhnz)
  2. 2. Background ● Open Source ● Consulting company ● VC Backed ● Acquired by First Data in 2015
  3. 3. What are we afraid of? 1. The “Real World” 2. Instability 3. Going Slow
  4. 4. The “Real World” ● Differences between staging and production ● Volume of data ● Nature of data ● Missing configuration
  5. 5. Instability ● Deployments cause most of the problems that impact customers ● Code being deployed as well as the deployment itself ● Risk increases over time ● External sources of instability
  6. 6. Going slow ● Speed of development ○ We don’t want stability at the expense of speed ○ Whatever solution we come up with it will just slow us down ● Intervals between deployments ○ The longer we go between deploys, the more worried we are about the next one ○ Migrations are more likely to fail ○ We’re only making the problem worse by delaying our deployments
  7. 7. Goal #1: Embrace the Real World
  8. 8. Embracing the “Real World” ● Two things keep us separated from the “Real World” ○ Application behavior ○ User behavior ● Let’s figure out a way to eliminate those differences ● No more surprises when we deploy!
  9. 9. Replace Staging Environment with Stacks
  10. 10. Use the stacks to go live ● Each release is done as a self-contained “stack” ● No more staging environment ● No more RAILS_ENV ● Think release candidate for your infrastructure ● No more surprises based on real world data
  11. 11. Stop separating the test data ● DynamoDB is designed for massive amounts of data ● Test data and live customer data can peacefully co-exist ● Use a test attribute to identify our test records ● Everything lives together in a single database!
  12. 12. Stop using ActiveRecord ● Learned things the hard way with Spree ● Really slow when doing a lot of writes ● Use Plain Old Ruby Objects (PORO) instead ● All of our tables have the same structure ○ store_id ○ object_id ○ object_value
  13. 13. Protect the real world data ● No database write access for developers ● Only the store owner change their own data ● No super admin ● Impossible for developers to change data while testing ● Ensure no real world side effects whenever we write data
  14. 14. Complete copy of the database ● Every stack has a complete database copy ● Migrations are performed at the same time as copy ● Shoryuken workers for multi-threaded processing ● We can copy 500,000 records in under ten minutes
  15. 15. Sync changes after the copy ● Track changes since our bulk copy ● DynamoDB streams to monitor these changes ● New data is continuously migrated ● Same migration logic as with bulk copy ● No more migrations on release day!
  16. 16. Goal #2: Stability
  17. 17. Ops Code as First Class Citizen ● Infrastructure must be change-controlled and repeatable ● Operations source-code is in same git repo as application code ● Every release is tracked as a single SHA in Github ● Check out a SHA to get a fully self-contained ops+app setup ● We use AWS Cloudformation templates to describe all resources
  18. 18. Cloudformation Top Tip Don’t do this Do this
  19. 19. The stack contains everything we need ● Networking ● Load-balancers ● Auto-scaling groups ● Instance config ● Permissions ● Database
  20. 20. Docker Containers ● Provide a runnable application artifact ● Dependency management ○ System libraries ○ Ruby + Gems ○ Application code
  21. 21. Docker Decouples Application from OS ● Protect against changes in the underlying OS, which just provides: ○ Kernel ○ Docker daemon ○ Systemd, to start containers ● We are safer making OS updates ○ Updates to system libraries do not affect application
  22. 22. Amazon Machine Image ● AMI provides a runnable server artifact ○ We get the same artifact every time ● What if Docker repository goes down? ○ Create AMI with packer and bake in all docker images ○ We’re happy to trade AMI build time for stability ● What if Github or rubygems are down? ○ Instance needs no external information to start app
  23. 23. The Dreaded AWS Degradation Email
  24. 24. Cattle vs Pets Don’t do this Do this
  25. 25. Auto Scaling ● Stop caring about individual instances ● Autoscaling replaces failed instances ● We trust replacement because we do it all the time ● Copy easily with changing load
  26. 26. Production Deployment
  27. 27. Release Procedure ● Tag branch in git ● Build docker container ● Build AMI ● Create stack ● Copy data from production ● Sync new data from production ● Test, test, test ● Update DNS ● Delete old stack
  28. 28. Immutable once we go live ● New releases require a new stack ● Emergency hotfixes require a new AMI ● Instances are replaced, not modified ● Once deployed nothing can be changed ● There is no SSH
  29. 29. Goal #3: Go Fast
  30. 30. Continuous Deployment for Developers ● We deploy many times a day - just not to production ○ Devs get a stack for each feature branch, with a full copy of production data ○ Go crazy, break things, it will be entirely deleted when done ● Docker lets us build image fast ○ We don’t want to wait for a brand new AMI with each commit ○ Write Dockerfile to use caching in a smart way ● Dev stacks can be deployed by just replacing docker image
  31. 31. Argus for Fast Docker Builds ● Enqueue docker builds using SQS ● Distributed workers for fast builds ● Workers pre-pull existing image layers ● This means all workers can use docker cache ● Pushes image to AWS EC2 Container Registry
  32. 32. Developer Deploys
  33. 33. Developer Deploys Are Fast ● If the bundle is cached, docker build takes about 15 seconds ● AWS SSM Run Command runs a canned script ● Simply pulls latest docker image and restarts container ● Access is controlled with IAM ● Logs are in logstash
  34. 34. Summary ● All infrastructure and code is in the stack ● The stack is immutable ● We use stacks instead of a having a special staging environment ● We use a complete copy of real world data in our stacks ● We’re constantly deploying - just not to production ● Production deploys are just updating the DNS to the new stack
  35. 35. Resources ● - Ruby library for PORO ● - asynchronous Ruby workers with SQS ● - fast Docker build and push to ECR ● - Ruby library for common stack operations ● - Ruby DSL for Cloudformation templates ● - guidelines for stateless software as a service
  36. 36. Questions?