Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DevOpsDays Galway 2019 - Zero-downtime deployments


Published on

Applications built over the years carry historical design assumptions, such as: it is acceptable to take a system out for upgrade maintenance for a few hours every 6 months.

In today’s world, embracing continuous delivery practices means more frequent releases, which means more downtime. Besides, finding a good maintenance window becomes a struggle with worldwide users, as well as for the operators managing the upgrade out of business hours.

In this talk, I demonstrate that by mapping out complex deployments processes, it becomes possible to prioritise work and progressively reduce the deployment impact. I will also give practical advice on how to tackle blockers to zero-downtime deployments, such as:

Migrating database schemas while keeping an application running Ensuring backward compatibility of messages and APIs Dealing with long-running background jobs Mitigating user session loss Deploying without the comfort of a maintenance window also means that stability during the upgrade is a critical concern. I will go through how it can be achieved through systematic pipeline automation and good system visibility to help operators during the upgrade.

The trick is: zero-downtime doesn’t mean everything is up or running the latest version, it only means nobody notices!

Published in: Software
  • Be the first to comment

  • Be the first to like this

DevOpsDays Galway 2019 - Zero-downtime deployments

  1. 1. @PierreVincent Changing tyres on a moving car Our journey to zero-downtime deployments November 18th, 2019 – Galway @PierreVincent
  2. 2. @PierreVincent Pierre Vincent Infra. & Reliability Manager @PierreVincent
  3. 3. @PierreVincent
  4. 4. @PierreVincent There has been a massive earthquake in New Zealand and I need to use Poppulo for regular updates. Please can you advise when it will be back online. “ ”– Poppulo customer
  5. 5. @PierreVincent 2009 2015 Deploying every 3 to 6 months 4 hours downtime On Sunday at 5PM Deploying every 4 weeks 2 hours downtime On Sunday at 8PM 2018 How do we go faster without impacting users?
  6. 6. @PierreVincent High-performing technology organisations deploy 208x more frequently than low-performing ones. (and the gap is getting wider!) Sources: Accelerate - Dr. Forsgren, Humble, Kim Accelerate State of DevOps 2019
  7. 7. @PierreVincent How can we hope to achieve Continuous Delivery, when more frequent deploys means more downtime?
  8. 8. @PierreVincent The “zero-downtime” elephant in the room... Why are we so afraid to try?
  9. 9. @PierreVincent Mapping the deployment process, and its impact on users
  10. 10. @PierreVincent They simply mean users don’t notice a thing while all this is happening. Zero-downtime deployments don’t mean everything stays up or that everything is immediately running the latest version.
  11. 11. @PierreVincent Run database migrations Enable maintenance mode Shut down services Upgrade services Start services Disable maintenance mode Wait for queued jobs to complete 15-60 mins 5-30 mins 15 mins User impact Limited functionality Downtime Wait for services startup Deployment steps
  12. 12. @PierreVincent Keeping the application up and running while applying database schema migration
  13. 13. @PierreVincent Use expand/contract to split breaking changes Application [N] must work with schema [N+1] Online database migration Decouple schema version from application version No destructive operations to tables/columns in use Ensure backward compatibility with non-breaking changes only Detect changes likely to cause locking problems Limit impact to live traffic
  14. 14. @PierreVincent Expand/Contract example Create new column Write to both columns Migrate historical records Read from new column Remove old column Release N+1 N+2 N+3
  15. 15. @PierreVincent More on schema migrations Baron Schwartz - DevOps for the database Chapter: Loosening the Application/Database coupling Michiel Rook - Database Schema Migrations with Zero Downtime e-continuous-lifecycle-london-2019
  16. 16. @PierreVincent Keeping the application up and running with rolling-upgrades
  17. 17. @PierreVincent Drain Stop Upgrade Start Up [N] Up [N+1] 1 2 Drain Stop Upgrade Start Up [N] Up [N+1] Featuredowntime Drain Stop Upgrade Start Up [N] Up [N+1] 1 2 Drain Stop Upgrade Start Up [N] Up [N+1] Featurecontinuouslyavailable Full upgrade Rolling upgrade
  18. 18. @PierreVincent But what about rollbacks?
  19. 19. @PierreVincent Focusing on operability to confidently run upgrades while serving live traffic
  20. 20. @PierreVincent Entire deployment pipeline in source control + Consistent and repeatable deployments No more manual operations ✓ Any change is code-reviewed✓ ✓
  21. 21. @PierreVincent Observable deployments Rolling-upgrade Progress Core healthchecks ✓ Synthetic journey monitoring✓ ✓ Error rates & queues saturation✓
  22. 22. @PierreVincent Limiting risk throughout the transition Sunday night Deploy Time Monday 8am On-demand Live Traffic None Limited Full Customer Notice Planned 3h maintenance for upgrade (7 days email notice) Planned maintenance with no expected impact (in-app message) None System Operations Deployment Ownership Dev ✓ ✓ ✓ ✓ Oct 2018 Jan 2019
  23. 23. @PierreVincent Zero-downtime deployments don’t mean everything stays up or that everything is immediately running the latest version. Thank you! @PierreVincent They simply mean users don’t notice a thing while all this is happening.