Continuous Deployment


Published on

A lightning talk I gave to our development team

Published in: Technology, Business
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • -this had this single biggest effect on releasing more frequently at Etsy
  • “ Any developer can deploy their own code, whenever they want, to whatever fleet they want.” -Jon Jenkins, Director of Platform Analysis,
  • 5.7 million members. 6.5 million items listed. 775 million page views per month, ~$200milion sales
  • Continuous Deployment

    1. 1. Continuous Deployment
    2. 2. Continuous Deployment is the practice of shipping your code as frequently as possible.
    3. 3. Roots <ul><li>Agile & Lean software development </li></ul><ul><li>Development teams work in iterations with incremental delivery </li></ul><ul><li>Operations has not traditionally managed to keep up with the demands of 'the business' </li></ul><ul><li>Cloud Computing, aka Utility Computing </li></ul>
    4. 4. Secret Sauce <ul><li>Operations is the new “Secret Sauce” of many successful businesses (2007 – Adam Jacob, co-founder Opscode) </li></ul><ul><li>What is in the Sauce? </li></ul><ul><li>“ A collection of techniques to avoid sucking.” (@Kellanem – VP Engineering, Etsy) </li></ul><ul><li>-I'll get to these techniques in a bit... </li></ul>
    5. 5. Who does this? <ul><li>You might have heard of... </li></ul><ul><li>IMVU </li></ul><ul><li>Wordpress </li></ul><ul><li>Etsy </li></ul><ul><li>Flickr </li></ul><ul><li>Wealthfront </li></ul><ul><li>37 Signals </li></ul><ul><li>Smart Frog </li></ul><ul><li>Prezi </li></ul><ul><li>Outbrain </li></ul><ul><li>Digg4 </li></ul><ul><li>Heyo </li></ul><ul><li>Atlassian </li></ul><ul><li>Quora </li></ul><ul><li>Countless others... </li></ul>
    6. 6. So you might be thinking this is just for small companies or startups...
    7. 7. Oh, and also these guys: Google Amazon Facebook LinkedIn
    8. 8. So now the WHY? <ul><li>Why would companies deploy to Production multiple times every day? </li></ul><ul><li>Smaller changes = less risk </li></ul><ul><li>More frequent changes = closer feedback loop to knowing when you've introduced a bug </li></ul><ul><li>Closer feedback from customers </li></ul><ul><li>Competition advantage </li></ul>
    9. 9. Stop! <ul><li>That's crazy! How can you deploy to Production multiple times every day?! </li></ul><ul><li>What about release management, risk management, project management, service management, etc? </li></ul><ul><li>What about QA? </li></ul><ul><li>What about all of our policies and best practices? </li></ul>
    10. 10. Continuous Deployment != Deploying new features without Coordination and planning
    11. 11. What's Lean got to do with it? <ul><li>Lean Principles: </li></ul><ul><li>Optimize the Whole </li></ul><ul><li>Build Quality In </li></ul><ul><li>Learn Constantly </li></ul><ul><li>Deliver Fast </li></ul><ul><li>Engage Everyone </li></ul><ul><li>Keep Getting Better </li></ul>
    12. 12. Lean into it... <ul><li>“ Commit to frequent deploys and </li></ul><ul><li>the tooling to support it” </li></ul>
    13. 13. Step 1 Get Deployment Simple [Lean principles: Deliver Fast, Build Quality In, Optimize the Whole]
    14. 14. Step 2 Developers do deployments [Lean principles: Deliver Fast, Optimize the Whole, Engage Everyone]
    15. 15. Step 3 Measure Everything [Lean principles: Learn Constantly, Keep Getting Better]
    16. 16. Step 4 Automate Everything [Lean principles: Optimize the Whole, Build Quality In, Deliver Fast, Keep Getting Better]
    17. 17. What does the workstream look like? Old way Write code->Commit->CI Tests->Manual deploy to test/load environment->Manual Load Tests->Manual Regression Tests->Deploy to production->Observe metrics New way Write code->Commit->CI Tests->Auto deploy to testing/load environment->Automated Tests->Deploy to production->Observe metrics -LonelyPlanet
    18. 18. What about QA? <ul><li>While some companies claim not to have QA, more often they have embedded QA. </li></ul><ul><li>Most companies have dedicated software development teams that write automated tests above unit/component tests. </li></ul><ul><ul><li>-Facebook: Test Engineering Team </li></ul></ul><ul><li>IMVU calls this their “Immune System” </li></ul>
    19. 19. What about Process? <ul><li>“ Once you add a process, it never goes away” </li></ul><ul><li>“ bad process is about fear” </li></ul>
    20. 20. Metrics of CD in action: <ul><li> </li></ul><ul><li>Mean time between deployments: 11.6s </li></ul><ul><li>Max # of deploys in a single hour: 1079 </li></ul><ul><li>Mean # of hosts simultaneously receiving a deployment: 10,000 </li></ul><ul><li>Max # of hosts receiving a deployment: 30,000 </li></ul>
    21. 21. Metrics of this in action: <ul><li>Etsy </li></ul><ul><li>729 deploys to Production, Nov 2010 </li></ul><ul><li>6 change related incidents in 2010 (3500 deploys) </li></ul><ul><li>Deployed new big feature on Dec 1 st , 4 th busiest day of the year without FEAR </li></ul>
    22. 22. Etsy cont. <ul><li>CI System – 10 machine Jenkins cluster </li></ul><ul><li>Types of tests: unit, integration, smoke, functional </li></ul><ul><li>7000 “trunk tests” </li></ul><ul><ul><li>30 minutes to execute, so they don't run sequential </li></ul></ul><ul><ul><li>11 minutes in parallel </li></ul></ul><ul><li>Functional tests = expensive to write and maintain, so mission critical parts of the site only </li></ul><ul><li>Deploy takes 20 minutes to reach Prod </li></ul>
    23. 23. Even more Etsy <ul><li>Engineers deploy to Prod on Day 1 to get over deployment fear </li></ul><ul><li>Use config flags for dark launches </li></ul><ul><li>“ Core platform team” </li></ul><ul><ul><li>Deployment tools </li></ul></ul><ul><ul><li>Front end and back end site performance </li></ul></ul><ul><ul><li>Large scale data migrations </li></ul></ul><ul><ul><li>Caching architecture </li></ul></ul>
    24. 24. More Etsy. Really? <ul><li>Yes. They've now opensourced </li></ul><ul><ul><li>Logster </li></ul></ul><ul><ul><li>Statsd </li></ul></ul><ul><ul><li>Deployinator </li></ul></ul><ul><li>Two books by John Allspaw: </li></ul><ul><ul><li>Web Operations (Allspaw & Jesse Robbins) </li></ul></ul><ul><ul><li>The Art of Capacity Planning </li></ul></ul><ul><li>Awesome engineering blog: </li></ul>
    25. 25. Other techniques <ul><li>Work in Trunk, Branch in Code </li></ul><ul><ul><li>Set true by config, cookie, ip address, probability </li></ul></ul><ul><ul><li>Dark launch </li></ul></ul><ul><ul><li>A/B test </li></ul></ul><ul><li>Schema changes -> only on Thursdays :) </li></ul>
    26. 26. Other techniques <ul><li>Operability Review </li></ul><ul><li>What could possibly go wrong? </li></ul><ul><li>What is our communication strategy? </li></ul><ul><li>What is our contingency plan? </li></ul><ul><li>Blame free post-mortems </li></ul><ul><li> </li></ul>
    27. 27. What to measure? <ul><li>Mean Time Between Failures </li></ul><ul><li>Mean Time To Detect </li></ul><ul><li>Mean Time To Repair </li></ul><ul><li>Changes per week </li></ul><ul><li>Change success rate </li></ul>
    28. 28. Ugly (legacy) metrics <ul><li>Change management overhead </li></ul><ul><li>Number of people to provision env </li></ul><ul><li>Number of people to deploy </li></ul><ul><li>Number of people to access Prod logs </li></ul><ul><li>Number of manual steps to deploy </li></ul><ul><li>Number of manual steps to provision env </li></ul><ul><li>Number of manual steps to access Prod logs </li></ul><ul><li>How long to do any of the above? </li></ul>
    29. 29. The person who says it cannot be done should not interrupt the person doing it. --Chinese Proverb