From Code to the Monkeys: Continuous Delivery at Netflix


Published on

At Netflix, we continue to improve upon our continuous delivery process. We thrive in a hybrid environment, where every developer is able to deploy code, and with that freedom comes the responsibility for ensuring that our customers are not negatively impacted. We have constructed Open Source tools toward a Continuous Delivery solution. In this presentation, from QConSF 2013, you will learn about our tool chain so that you can determine which make sense in your environment.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Overview. Build, Bake, and DeployTesting.Monkeys: resilient to behaviors inherent in the cloudLeave with understanding of tools that we’ve built and open source.How you might be able to modify, augment or create
  • Innovate quicklythink outside of the box deploy solutions.Keep promise of availabilityEncourage best practicesrecommendations, not limitations.
  • Deploy to ProductionBalance innovation with riskSelf-service is scalableDon’t fix build configs, deployGareth Bowles, Agile 2013
  • Teams have unique flows.Let developers write codeJustin RyanJenkins Job DSL- pluginJava Posse Roundupfoundation for our build configurations at Netflix
  • Amazon Machine Images (AMIs)Aminate: source component is combined with another component to make something new
  • BaseAMI : common to all of our microservicesDeploy same image to test, prod, all regionsOther cloud platformsNetflixOSS logo
  • Self-serviceGroovy appRed/BlackGo through example
  • Don’t replace cluster.Spin up a new one.Canary/ ACA. Find problem or continueCloud native. Use the cloud.
  •  Scale up.Leave old cluster.Run through peak?Developer knows best
  • Groovy library that sits on top of SWFClay McCoyStart with a GAllow flow to shineActivity: element of reuse for our deployments. Builds on lessons from manual red/black deployments.
  • Stop now?Complete picture with runtime resiliencyAutomate all the things
  • Danger. Chaos ensues. Instances disappear.Latency happens. Litter.Find problems/build resiliencyIntroduce a fewMore ideas, need staff to build them!Look at vulnerabilities
  • Should we push to everywhere at once?
  • Multiple regions.Errors sometimes make it to productionLimit impactCost: innovation and speedDriftIncrease cognitive load
  • scheduled deployments. button push signifies the scheduling not necessarily the actual push. Providing visibility of what is deployed where, tied back to a Jenkins buildreduces that cognitive load.
  • We can do better. Look across regions:DriftDon’t nag.Use meaningful thresholds.Ask monkeys to help us test our runtimeBalance regional consistency with regional isolation.
  • Pay for only those instances we needDon’t bother developers with what automation can do better.
  • Full circleCode checkin to monkeysBalanced priorities
  • Can you use any of these elements?Share Cloud InfrastructureSolve our business problems!
  • From Code to the Monkeys: Continuous Delivery at Netflix

    1. 1. BUILD BAKE DEPLOY Continuous Delivery at Netflix: From Code to the Monkeys Dianne Marsh QCon San Francisco, November 2013
    3. 3. Netflix Goals High Availability But … Move fast Tools encourage Best Practices But … Freedom to do the right thing
    4. 4. Teams Deploy Their Own Code Run What You Wrote  Rapid Innovation  Rapid Detection  Rapid Response = Freedom + Responsibility
    5. 5. BUILD Jenkins Job DSL Configuration as Code Groovy Script Scripts go in Version Control
    6. 6. BUILD BAKE
    7. 7. BAKE Aminator • Create AMI from Base AMI • Image contains service and everything needed to run it • Unit of Deployment for Test and Prod • Abstracts Cloud Details
    9. 9. DEPLOY Asgard: AWS Deployment Tool Deploys Netflix to the Cloud Red/Black push
    10. 10. CANARY ANALYSIS Test, Int, Prod Choose where to deploy Run canary analysis Scale up new instances Turn on traffic to new ASG Turn off traffic to old ASG Wait … analyze … continue
    11. 11. Asgard Developer Portal
    12. 12. GLISTEN Extending Asgard’s Workflow Automated Red/Black Push Test, Int, Prod stacks Run canary/analysis Scale up new instances Turn on traffic to new ASG Run more tests Turn off traffic to old ASG Wait … analyze … continue
    14. 14. Simian Army • • • • • Chaos Monkey Latency Monkey Janitor Monkey Conformity Monkey (and more!) Test resiliency at runtime
    15. 15. One Button Deployment?
    16. 16. Regional Isolation Limit Impact of Human Error  Stagger deployments  Canary testing per region
    17. 17. Multi-Region Consistency Build Tooling to:  Schedule Deployments  Prefer off peak  Choose next available region automatically  Provide high visibility per region
    18. 18. Send in the Conformity Monkey Have deployments diverged?  Balance regional consistency with regional isolation  Provide meaningful thresholds  Build best practices into tooling and reporting
    19. 19. Clean up with Janitor Monkey  Disassociate unused EIPs  Delete unassociated Amazon EBS volumes  Delete older Amazon EBS snapshots  Leverage Amazon S3 Object Expiration
    20. 20. Key Elements for Netflix  Value Self-service  Test Everywhere  Build Awareness of Multiple Regions  Avoid peak times  Roll back quickly and easily  Be Cloud Native
    21. 21. Put NetflixOSS to Work for You Netflix Platform AMINATOR ** And 30+ more projects at
    22. 22. Keep the Conversation Going Continuous Delivery Open Space Ballroom B/C (here!) 1:35-2:25, immediately following lunch
    23. 23. Thanks! We’re always hiring! Dianne Marsh (@dmarsh)