Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why Everyone Needs DevOps Now: 15 Year Study Of High Performing Technology Orgs

29,738 views

Published on

This presentation describes my interpretation of the Why and How of DevOps, and the key findings from my 15 year study of high-performing IT organizations, and how they simultaneously deliver stellar service levels and rapid implementation of new features into the production environment.

Organizations employing DevOps practices such as Google, Amazon, Facebook, Etsy and Twitter are routinely deploying code into production hundreds, or even thousands, of times per day, while providing world-class availability, reliability and security. In contrast, most organizations struggle to do releases more every nine months.

He will present how these high-performing organizations achieve this fast flow of work through Product Management and Development, through QA and Infosec, and into IT Operations. By doing so, other organizations can now replicate the extraordinary culture and outcomes enabling their organization to win in the marketplace.

Published in: Business, Technology

Why Everyone Needs DevOps Now: 15 Year Study Of High Performing Technology Orgs

  1. 1. @RealGeneKim Why Everyone Needs DevOps Now: My Fifteen Year Journey Studying High Performing IT Organizations Gene Kim Session ID:
  2. 2. IT Operations @RealGeneKim
  3. 3. @RealGeneKim
  4. 4. The Product Managers @RealGeneKim
  5. 5. The Developers @RealGeneKim
  6. 6. @RealGeneKim
  7. 7. @RealGeneKim
  8. 8. @RealGeneKim IT Ops And Dev At War 13
  9. 9. @RealGeneKim
  10. 10. @RealGeneKim
  11. 11. @RealGeneKim The Downward Spiral…
  12. 12. There Is A Better Way… @RealGeneKim
  13. 13. @RealGeneKim Google, Amazon, Netflix, Spotify, Etsy, Spotify, Twitter, Facebook…
  14. 14. @RealGeneKim 10 deploys per day Dev & ops cooperation at Flickr John Allspaw & Paul Hammond Velocity 2009 Source: John Allspaw (@allspaw) and Paul Hammond (@ph)
  15. 15. @RealGeneKim
  16. 16. Little bit weird Sits closer to the boss Thinks too hard Pulls levers & turns knobs Easily excited Yells a lot in emergencies Source: John Allspaw (@allspaw) and Paul Hammond (@ph)
  17. 17. Ops who think like devs Devs who think like ops @RealGeneKim Source: John Allspaw (@allspaw) and Paul Hammond (@ph)
  18. 18. @RealGeneKim Dev and Ops Source: John Allspaw (@allspaw) and Paul Hammond (@ph)
  19. 19. DevOps is incomplete, is interpreted wrong, and is too isolated Source: Theo Schlossnagle (@postwait) @RealGeneKim
  20. 20. @RealGeneKim .*Ops Source: Theo Schlossnagle (@postwait)
  21. 21. ^(?<dept>.+)Ops$ @RealGeneKim Source: Theo Schlossnagle (@postwait)
  22. 22. Source: John Jenkins, Amazon.com @RealGeneKim
  23. 23. @RealGeneKim Making Changes When It Matters Most “By installing a rampant innovation culture, we performed 165 experiments in the peak three months of tax season.” “Our business result? Conversion rate of the website is up 50 percent. Employee result? Everyone loves it, because now their ideas can make it to market.” –Scott Cook, Intuit Founder
  24. 24. @RealGeneKim Who Is Doing DevOps?  Google, Amazon, Netflix, Etsy, Spotify, Twitter, Facebook …  Dynatrace, CSC, IBM, CA, SAP, HP, Microsoft, Red Hat, …  GE Capital, Nationwide, BNP Paribas, BNY Mellon, World Bank, Paychex, Intuit …  The Gap, Nordstrom, Macy’s, Williams-Sonoma, Target …  General Motors, Raytheon, LEGO, Bosche …  UK Government, US Department of Homeland Security …  Kansas State University… Who else?
  25. 25. High Performers Are More Agile 30x 8,000x more frequent deployments @RealGeneKim faster lead times than their peers Source: Puppet Labs 2013 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
  26. 26. @RealGeneKim High Performers Are More Reliable 2x 12x the change success rate faster mean time to recover (MTTR) Source: Puppet Labs 2013 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
  27. 27. High Performers Win In The Marketplace 2x 50% more likely to exceed profitability, market share & productivity goals @RealGeneKim higher market capitalization growth over 3 years* Source: Puppet Labs 2014 State Of DevOps
  28. 28. @RealGeneKim 36 Source: Darren Hague (@dhague)
  29. 29. “This book will have a profound effect on IT, just as The Goal did for manufacturing.” –Jez Humble, co-author Continuous Delivery “This is the IT swamp draining manual for anyone who is neck deep in alligators.” –Adrian Cockroft, Cloud Architect at Netflix “This is The Goal for our decade, and is for any IT professional who wants their life back.” –Charles Betz, IT architect, author “Architecture and Patterns for IT” @RealGeneKim
  30. 30. @RealGeneKim The First Way: Flow
  31. 31. @RealGeneKim “deploys per day” vs. “lead time”
  32. 32. @RealGeneKim “What is your lead time for changes?” “How long does it take to go from code committed to code successfully running in production?”
  33. 33. IT’S A TRAP
  34. 34. @RealGeneKim
  35. 35. @RealGeneKim Create One Step Environment Creation Process  Make environments available early in the Development process  Make sure Dev builds the code and environment at the same time  Create a common Dev, QA and Production environment creation process
  36. 36. @RealGeneKim If I had a magic wand, I’d change the Agile sprints and definition of “done”: “At the end of each sprint, we must have working and shippable code… demonstrated in an environment that resembles production.”
  37. 37. Deploy Smaller Changes, More Frequently * @RealGeneKim Source: http://www.facebook.com/note.php?note_id=14218138919
  38. 38. Deploy Smaller Changes, More Frequently * @RealGeneKim  Decouple feature releases from code deployments  Deploy features in a disabled state, using feature flags  Require all developers check code into trunk daily (at least)  Practice deploying smaller changes, which dramatically reduces risk and improves MTTR
  39. 39. Experiment: Reducing Batch Size By 50% And the customer got the feature in @RealGeneKim half the time! Source: Scott Prugh, Chief Architect, CSG, Inc.
  40. 40. @RealGeneKim “As a lifelong Ops practitioner, I know we need DevOps to make our work humane. In the past, I’ve worked every holiday, on my birthday, my spouse’s birthday, and even on the day my son was born.” Nathan Shimek Engineering Manager, New Context @nathan_shimek
  41. 41. @RealGeneKim Breaking The Bottlenecks In The Flow  Environment creation  Code deployment  Test setup and run (mention @rohansingh)  Overly tight architecture  Development  Product management
  42. 42. “In November 2011, running even the most minimal test for CloudFoundry required deploying to 45 virtual machines, which took a half hour. This was way too long, and also prevented developers from testing on @RealGeneKim their own workstations. By using containers, within months, we got it down to 18 virtual machines so that any developer can deploy the entire system to single VM in six minutes.” — Elisabeth Hendrickson, Director of Quality Engineering, Pivotal Labs @testobsessed
  43. 43. @RealGeneKim Blackboard Learn: 2005-Present 54 LoC Commits Source: David Ashman, Chief Architect, Blackboard, Inc. (@davidbashman) The Problem
  44. 44. @RealGeneKim Blackboard Learn Building Blocks 55 Source: David Ashman, Chief Architect, Blackboard, Inc. (@davidbashman)
  45. 45. Top Predictors Of IT Performance (2014)  Version control of all production artifacts  Continuous integration and deployment  Automated acceptance testing  Peer-review of production changes (vs. external change approval)  High trust culture  Proactive monitoring of the production environment  Win-win relationship between Dev and Ops @RealGeneKim Source: Puppet Labs 2014 State Of DevOps
  46. 46. @RealGeneKim The First Way: Outcomes  Creating single repository for code and environments  Determinism in the release process  Consistent Dev, Test and Production environments, all properly built before deployment begins  Features being deployed daily without catastrophic failures  Decreased lead time  Faster cycle time and release cadence
  47. 47. @RealGeneKim The Second Way: Feedback
  48. 48. @RealGeneKim
  49. 49. How many times per day is the andon cord @RealGeneKim pulled in a typical day at a Toyota manufacturing plant? 3,500 times per day Source: http://www.gembapantarei.com/2008/04/how_many_times_do_you_pull_the_andon_cord_each_day.html
  50. 50. Why would Toyota do something so disruptive as stopping production thousands of times per day? @RealGeneKim “It’s the only way we can build 2,000 vehicles per day – that’s one completed vehicle every 55 seconds.”
  51. 51. @RealGeneKim Google Dev And Ops (2013)  15,000 engineers, working on 4,000+ projects  All code is checked into one source tree (billions of files!)  5,500 code commits/day  75 million test cases are run daily "Automated tests transform fear into boredom." -- Eran Messeri, Google
  52. 52. @RealGeneKim Developers Carry Pagers “We found that when we woke up developers at 2am, defects got fixed faster than ever” – Patrick Lightbody, CEO, BrowserMob “You build it, you run it.” – Werner Vogels CTO, Amazon
  53. 53. @RealGeneKim Developers Carry Pagers “As a developer, there has never been a more satisfying point in my career than when I wrote the code, I pushed the button to deploy it, I watched the metrics to see if it actually worked in production, and fixed it if it broke.” – Tim Tischler Director of Operations Engr, Nike, Inc.
  54. 54. Devs Initially Self-Manage Their Own Code @RealGeneKim 65 Source: Tom Limoncelli (@yesthattom)
  55. 55. @RealGeneKim Return Fragile Services Back To Dev 67 Source: Tom Limoncelli (@yesthattom)
  56. 56. @RealGeneKim Pervasive Production Telemetry “Having a developer add a monitoring metric shouldn’t feel like a schema change.” – John Allspaw, SVP Tech Ops, Etsy
  57. 57. @RealGeneKim 69
  58. 58. @RealGeneKim People actually look at the logs! (Mention Verizon PCI Data Breach Study) 70
  59. 59. @RealGeneKim
  60. 60. @RealGeneKim One Of The Highest Predictors Of Performance
  61. 61. @RealGeneKim One Of The Highest Predictors Of Performance
  62. 62. Top Predictors Of IT Performance (2014)  Version control of all production artifacts  Continuous integration and deployment  Automated acceptance testing  Peer-review of production changes (vs. external change approval)  High trust culture  Proactive monitoring of the production environment  Win-win relationship between Dev and Ops @RealGeneKim Source: Puppet Labs 2014 State Of DevOps
  63. 63. @RealGeneKim The Second Way: Outcomes  Defects and security issues getting fixed faster than ever  Disciplined automated testing enabling many simultaneous small, agile teams to work productively  All groups communicating and coordinating better  Everybody is getting more work done
  64. 64. The Third Way: Continual Experimentation And Learning @RealGeneKim
  65. 65. @RealGeneKim Break Things Early And Often “Do painful things more frequently, so you can make it less painful… We don’t get pushback from Dev, because they know it makes rollouts smoother.” – Adrian Cockcroft, Former Architect, Netflix (Now Technology Fellow, Battery Ventures)
  66. 66. @RealGeneKim 80
  67. 67. @RealGeneKim Inject Failures Often
  68. 68. @RealGeneKim You Don’t Choose Chaos Monkey… Chaos Monkey Chooses You
  69. 69. @RealGeneKim The 2014 AWS Reboot “When we got the news about the emergency EC2 reboots, our jaws dropped. When we got the list of how many Cassandra nodes would be affected, I felt ill. “Then I remembered all the Chaos Monkey exercises we’ve gone through. My reaction was, ‘Bring it on!’” – Christos Kalantzis Netflix Cloud DB Engineering Source: http://techblog.netflix.com/2014/10/a-state-of-xen-chaos-monkey-cassandra.html
  70. 70. @RealGeneKim The 2014 AWS Reboot “Out of our 2700+ production Cassandra nodes, 218 were rebooted. 22 Cassandra nodes did not reboot successfully. “Netflix customers experienced no downtime that weekend.” – Bruce Wong Netflix Chaos Engineering
  71. 71. @RealGeneKim Allocate 20% Of Cycles To Technical Debt Reduction
  72. 72. “By November 2011, Kevin Scott, LinkedIn’s top engineer, had had enough. The system was taxed as LinkedIn attracted more users, and engineers were burnt out. “To fix the problems, Scott, who’d arrived from Google that February, launched Operation InVersion. “He froze development on new features so engineers could overhaul the computing architecture. “`We had to tell management we’re not going to deliver anything new while all of engineering works on this project for the next two months,’ Scott says. “It was a scary thing.’” @RealGeneKim
  73. 73. @RealGeneKim
  74. 74. @RealGeneKim
  75. 75. Source: Pingdom
  76. 76. @RealGeneKim Why Do I Think This Is Important?
  77. 77. @RealGeneKim The Downward Spiral…
  78. 78. @RealGeneKim
  79. 79. @RealGeneKim Opportunity Cost Of Wasted IT Spending? $2,600,000,000,000.00 per year ($2.6 Trillion US)
  80. 80. @RealGeneKim Our Mission Positively influence the lives of one million IT professionals by 2017.
  81. 81. @RealGeneKim DevOps Enterprise: Lessons Learned  On Oct 21-23, we held the DevOps Enterprise Summit, a conference for horses, by horses  Macy’s, Disney, GE Capital, Blackboard, Telstra, US Department of Homeland Security, CSG, Raytheon, Ticketmaster, Union Bank of California  Leaders driving DevOps transformations talked about  The business problem they set out to solve  The obstacles they had to overcome  The business value they created
  82. 82. @RealGeneKim Want More Learn More? To receive the following:  A copy of this presentation  A free 140 page excerpt of The Phoenix Project  Information on the DevOps Enterprise: Lessons Learned  My recommended reading list for enterprise DevOps adoption  See early drafts of our upcoming DevOps Cookbook Just pick up your phone, and send an email: To: realgenekim@SendYourSlides.com Subject: lisa realgenekim@SendYourSlides.com lisa
  83. 83. Can Large Orgs Be High Performers? Yes. But orgs with 10,000+ employees 40% less likely to be high performing vs. 500 employee orgs… Source: Puppet Labs 2014 State Of DevOps @RealGeneKim
  84. 84. @RealGeneKim Other Side Of Innovation 98

×