Successfully reported this slideshow.
Your SlideShare is downloading. ×

ServiceNow ITIL at Ludicrous Speeds - Rugged DevOps


Check these out next

1 of 80 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (20)


Similar to ServiceNow ITIL at Ludicrous Speeds - Rugged DevOps (20)

Recently uploaded (20)


ServiceNow ITIL at Ludicrous Speeds - Rugged DevOps

  1. 1. ITIL At Ludicrous Speeds: Rugged DevOps and More…… Gene Kim Author, Visible Ops Handbook Knowledge12 May 16, 2012 Session ID: @RealGeneKim,
  2. 2. Where Did The High Performers Come From? @RealGeneKim,
  3. 3. Visible Ops: Playbook of High Performers  The IT Process Institute has been studying high-performing organizations since 1999  What is common to all the high performers?  What is different between them and average and low performers?  How did they become great?  Answers have been codified in the Visible Ops Methodology @RealGeneKim,
  4. 4. DevOps: Engage Ludicrous Speed! @RealGeneKim,
  5. 5. Source: John Allspaw @RealGeneKim,
  6. 6. Source: John Allspaw @RealGeneKim,
  7. 7. @RealGeneKim,
  8. 8. Source: John Allspaw @RealGeneKim,
  9. 9. Source: John Allspaw @RealGeneKim,
  10. 10. Source: Theo Schlossnagle @RealGeneKim,
  11. 11. Source: Theo Schlossnagle @RealGeneKim,
  12. 12. Source: Theo Schlossnagle @RealGeneKim,
  13. 13. Source: John Jenkins, @RealGeneKim,
  14. 14. Ludicrous Speed! 16 @RealGeneKim,
  15. 15. Ludicrous Fail?! 17 @RealGeneKim,
  16. 16. @RealGeneKim,
  17. 17. Source: James Wickett @RealGeneKim,
  18. 18. Why DevOps Is So Important To Me @RealGeneKim,
  19. 19. Since 1999, We’ve Benchmarked 1500+ IT Organizations Source: EMA (2009) Source: IT Process Institute (2008) @RealGeneKim,
  20. 20. High Performing IT Organizations  High performers maintain a posture of compliance  Fewest number of repeat audit findings  One-third amount of audit preparation effort  High performers find and fix security breaches faster  5 times more likely to detect breaches by automated control  5 times less likely to have breaches result in a loss event  When high performers implement changes…  14 times more changes  One-half the change failure rate  One-quarter the first fix failure rate  10x faster MTTR for Sev 1 outages  When high performers manage IT resources…  One-third the amount of unplanned work  8 times more projects and IT services  6 times more applications Source: IT Process Institute, 2008 @RealGeneKim,
  21. 21. 2007: Three Controls Predict 60% Of Performance  To what extent does an organization define, monitor and enforce the following?  Standardized configuration strategy  Process discipline  Controlled access to production systems @RealGeneKim, Source: IT Process Institute, 2008
  22. 22. Tough Love From Ari Balogh @RealGeneKim,
  23. 23. The Downward Spiral Operations Sees… Dev Sees…  Too many fragile and insecure  More urgent, date-driven projects applications in production put into the queue  Too much time required to restore  Even more fragile code (less service secure) put into production  Too much firefighting and unplanned  More releases have increasingly work “turbulent installs”  Planned project work cannot complete  Release cycles lengthen to amortize “cost of deployments”  Frustrated customers leave  Bigger deployment failures  Market share goes down  More time spent on firefighting  Business misses Wall Street commitments  Ever increasing backlog of work that cold help the business win  Business makes even larger promises to Wall Street  Ever increasing amount of tension between IT Ops, Development, Design… These aren’t ITSM or IT Operations problems… These are business problems! @RealGeneKim,
  24. 24. My Mission: Figure Out How Break The IT Core Chronic Conflict  Every IT organization is pressured to simultaneously:  Respond more quickly to urgent business needs  Provide stable, secure and predictable IT service Words often used to describe process improvement: “hysterical, irrelevant, bureaucratic, bottleneck, difficult to understand, not aligned with the business, immature, shrill, perpetually focused on irrelevant technical minutiae…” Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written extensively on the theory and practice of identifying and resolving core, chronic conflicts. 26 @RealGeneKim,
  25. 25. Good News: It Can Be Done Bad News: You Can’t Do It Alone @RealGeneKim,
  26. 26. Ops @RealGeneKim,
  27. 27. QA And Test Source: Flickr: vandyll @RealGeneKim,
  28. 28. Development @RealGeneKim,
  29. 29. Process And Controls @RealGeneKim,
  30. 30. Product Management And Design Source: Flickr: birdsandanchors @RealGeneKim,
  31. 31. DevOps: It’s A Real Movement  I would never do another startup that didn’t employ DevOps like principles  It’s not just startups – it’s happening in the enterprise and in public sector, too  I believe working in DevOps environments will be a necessary skillset 5 years from now  Agile helped Dev regain trust with the business; DevOps will help all of IT  IT becoming more automated relies on DevOps practices (especially PaaS) @RealGeneKim,
  32. 32. If I Could Wave A Magic Wand, Everyone Will…  Become conversant with DevOps and recognize the practices when you see them  Be energized about how ITSM practitioners can contribute in this organizational journey  Leave with some concrete steps to get some great outcomes  Become a part of a team that starts putting DevOps practices into place 34 @RealGeneKim,
  33. 33. How Do You Do DevOps? 35 @RealGeneKim,
  34. 34. The Prescriptive DevOps Cookbook  “DevOps Cookbook” Authors  Patrick DeBois, Mike Orzen, John Willis  Goals  Codify how to start and finish DevOps transformations  How does Development, IT Operations and Infosec become dependable partners  Describe in detail how to replicate the transformations describe in “When IT Fails: The Novel” @RealGeneKim,
  35. 35. “The Goal” by Dr. Eliyahu Goldratt @RealGeneKim,
  36. 36. 38 @RealGeneKim,
  37. 37. 39 @RealGeneKim,
  38. 38. The First Way: Systems Thinking @RealGeneKim,
  39. 39. The First Way: Systems Thinking (Business) (Customer) @RealGeneKim,
  40. 40. The First Way: Systems Thinking (Left To Right)  Don’t pass defects downstream  Don’t optimize locally  Always increase flow: elevate bottlenecks, reduce WIP, throttle release of work, reduce batch sizes  Understanding where reliance is placed @RealGeneKim,
  41. 41. Phase 1: Extend the Agile CI/CR Processes  Create one-step Dev, Test and Production environment creation procedure in Sprint 0  Create the one-step automated code deployment procedure  Properly integrate release, configuration and change into the value stream (as well as QA and infosec)  Ensure developers don’t leave until production change is successful  Assign Ops person into Dev team @RealGeneKim,
  42. 42. Definition: Kanban Board  Signaling tool to reduce WIP and increase flow 44 @RealGeneKim,
  43. 43. The First Way: Systems Thinking: ITSM Insurgency  Have someone attend the daily Agile standups  Gain awareness of what the team is working on  Find the automated infrastructure project team (e.g., puppet, chef)  Release managers can provide hardening guidance  Integrate and extend their production configuration monitoring  Find where code packaging is performed  Integrate security testing pre- and post-deployment  Integrate testing into continuous integration and release process  Add security test scripts to automated test library  Define what changes/deploys cannot be made without triggering full retest @RealGeneKim,
  44. 44. The First Way: Outcomes  Determinism in the release process  Creating single repository for code and environments  Consistent Dev, QA, Int, and Staging environments, all properly built before deployment begins  Decreased cycle time  Reduce deployment times from 6 hours to 45 minutes  Refactor deployment process that had 1300+ steps spanning 4 weeks  Faster release cadence @RealGeneKim,
  45. 45. The Second Way: Amplify Feedback Loops @RealGeneKim,
  46. 46. The Second Way: Amplify Feedback Loops (Right to Left)  Expose visual data so everyone can see how their decisions affect the entire system  Get Development closer to Operations and customers  Create a reliable system system of work that improves itself @RealGeneKim,
  47. 47. Phase 2: Extend Release Process And Create Right -> Left Feedback Loops  Embed Dev into Ops escalation process  Invite Dev to post-mortems/root cause analysis meeting  Have Dev cross-train IT Operations  Ensure application monitoring/metrics to aid in Ops and Infosec work (e.g., incident/problem management @RealGeneKim,
  48. 48. The Second Way: Amplify Feedback Loops: ITSM Insurgency  Find areas in the incident and problem management processes where Development knowledge could help  Ensure that countermeasures are captured in the Agile backlog  Find that developer who really cares about the production environment @RealGeneKim,
  49. 49. The Second Way: Outcomes  Defects and security issues getting fixed faster than ever  Reusable Ops and Infosec user stories now part of the Agile process  All groups communicating and coordinating better  Everybody is getting more work done @RealGeneKim,
  50. 50. The Third Way: Culture Of Continual Experimentation And Learning @RealGeneKim,
  51. 51. The Third Way: Culture Of Continual Experimentation And Learning  Foster a culture that rewards:  Experimentation (taking risks) and learning from failure  Repetition is the prerequisite to mastery  Why?  You need a culture that keeps pushing into the danger zone  And have the habits that enable you to survive in the danger zone @RealGeneKim,
  52. 52. You Don’t Choose Chaos Monkey… Chaos Monkey Chooses You @RealGeneKim,
  53. 53. Phase 3: Organize Dev and Ops To Achieve Organizational Goals  Allocate 20% of Dev cycles to non-functional requirements  Integrate fault injection and resilience into design, development and production (e.g., Chaos Monkey) @RealGeneKim,
  54. 54. The Third Way: Culture Of Continual Experimentation And Learning: ITSM  Ensure that process improvement projects are in the Agile backlog  Make technical debt visible  Help prioritize work against features and other non-functional requirements  Release your Chaos Monkey  Rehearse cleaning up after the Chaos Monkey  Find processes that waste everyone’s time @RealGeneKim,
  55. 55. @RealGeneKim,
  56. 56. The Third Way: Outcomes  Technical debt is being paid off  Exploitable attack surface area decreases  Continual reduction of unplanned work  More cycles for planned work  More resilient code and environments  Balancing nimbleness and practiced repetition  Enabling wider range of risk/reward balance @RealGeneKim,
  57. 57. What Does Transformation Feel Like? 61 @RealGeneKim,
  58. 58. Find What’s Most Important First @RealGeneKim,
  59. 59. Quickly Find What Is Different… @RealGeneKim,
  60. 60. Before Something Bad Happens… @RealGeneKim,
  61. 61. Find Risk Early… @RealGeneKim,
  62. 62. Communicate It Effectively To Peers… @RealGeneKim,
  63. 63. Hold People Accountable… @RealGeneKim,
  64. 64. Based On Objective Evidence… @RealGeneKim,
  65. 65. Answer Important Questions… @RealGeneKim,
  66. 66. Recognize Compounding Technical Debt… @RealGeneKim,
  67. 67. That Gets Worse… @RealGeneKim,
  68. 68. And Fixing It… Source: Pingdom @RealGeneKim,
  69. 69. Have What We Need, When When We Need It… @RealGeneKim,
  70. 70. Big Things Get Done Quickly… @RealGeneKim,
  71. 71. Ever Increasing Situational Mastery… @RealGeneKim,
  72. 72. Help The Business Win… @RealGeneKim,
  73. 73. With Support From Your Peers… @RealGeneKim,
  74. 74. And Do More With Less Effort… @RealGeneKim,
  75. 75. This Is An Important Problem Operations Sees… Dev Sees…  Fragile applications are prone to  More urgent, date-driven projects failure put into the queue  Long time required to figure out “which  Even more fragile code (less bit got flipped” secure) put into production  Detective control is a salesperson  More releases have increasingly “turbulent installs”  Too much time required to restore service  Release cycles lengthen to amortize “cost of deployments”  Too much firefighting and unplanned work  Failing bigger deployments more difficult to diagnose  Urgent security rework and remediation  Most senior and constrained IT ops resources have less time to  Planned project work cannot complete fix underlying process problems  Frustrated customers leave  Ever increasing backlog of work  Market share goes down that cold help the business win  Business misses Wall Street  Ever increasing amount of commitments tension between IT Ops, Development, Design…  Business makes even larger promises to Wall Street @RealGeneKim,
  76. 76. 80 @RealGeneKim,
  77. 77. @RealGeneKim,
  78. 78. If I Could Wave A Magic Wand, Everyone Will…  Become conversant with DevOps and recognize the practices when you see them  Be energized about how ITSM practitioners can contribute in this organizational journey  Leave with some concrete steps to get some great outcomes  Become a part of a team that starts putting DevOps practices into place …And fill out the survey forms! 82 @RealGeneKim,
  79. 79. When IT Fails: The Novel and The DevOps Cookbook  Coming in July 2012  “In the tradition of the best MBA case studies, this book should be mandatory reading for business and IT graduates alike.” Paul Muller, VP Software Marketing, Hewlett- Packard Gene Kim, Tripwire  “The greatest IT management book of our founder, Visible Ops co- generation.” author Branden Williams, CTO Marketing, RSA @RealGeneKim,
  80. 80. When IT Fails: The Novel and The DevOps Cookbook  Our mission is to positively affect the lives of 1 million IT workers by 2017  If you would like the “Top 10 Things Infosec Needs To Know About DevOps,” sample chapters and updates on the book: Gene Kim, Tripwire founder, Visible Ops co-author  Sign up at  Email  Hand me a business card @RealGeneKim,

Editor's Notes

  • There are many ways to react to this: like, fear, horror, trying to become invisible… All understandable, given the circumstances…
  • Tell story of Amazon, Netflix: they care about, availability, securityIt’s not a push, it’s a pull – they’re looking for our help (#1 concern: fear of disintermediation and being marginalized)
  • How each side Actively impedes the achievement of each other’s goals.
  • Who are they auditing? IT operations.I love IT operatoins. Why? Because when the developers screw up, the only people who can save the day are the IT operations people. Memory leak? No problem, we’ll do hourly reboots until you figure that out.Who here is from IT operations?Bad day:Not as prepared for the audit as they thoughtSpending 30% of their time scrambling, generating presentation for auditorsOr an outage, and the developer is adamant that they didn’t make the change – they’re saying, “it must be the security guys – they’re always causing outages”Or, there’s 50 systems behind the load balancer, and six systems are acting funny – what different, and who made them differentOr every server is like a snowflake, each having their own personalityWe as Tripwire practitioners can help them make sure changes are made visible, authorized, deployed completely and accurately, find differencesCreate and enforce a culture of change management and causality
  • Who’s introducing variance? Well, it’s often these guys. Show me a developer who isn’t causing an outage, I’ll show you one who is on vacation.Primary measurement is deploy features quickly – get to market.I’ve worked with two of the five largest Internet companies (Google, Microsoft, Yahoo, AOL, Amazon), and I now believe that the biggest differentiator to great time to market is great operations:Bad day: We do 6 weeks of testing, but deployment still fails. Why? QA environment doesn’t match productionOr there’s a failure in testing, and no one can agree whether it’s a code failure or an environment failureOr changes are made in QA, but no one wrote them down, so they didn’t get replicated downstream in productionBelieve it or not, we as Tripwire practitioners can even help them – make sure environments are available when we need them, that they’re properly configured correctly the first time, document all the changes, replicate them downstream
  • So who are all these constituencies that we can help, and increase our relevance as Tripwire practitioners and champions?How many people here are in infosec?Goal: protect critical systems and dataSafeguard organizational commitmentsPrevent security breaches, help quickly detect and recover from themBad day: no security standardsNo one is complyingYes, we’re 3 years behind. “Whaddyagonna do about it?”Vs. we (Tripwire owner) can become more relevant and add value by help infosec by leveraging all the configuration guidance out thereMeasure variance between produciton and those known good statesTrust and verify that when management says, we’ve trued up the configurations, they’ve actually done itWhy? Now, more than ever, there are an ever increasing amount of regulatory and contractual requirements to protect systems and data
  • [ text ] My personal goal is to prescriptively define 1) what does Dev need to do to become a reliable partner, 2) what does IT Operations need to do to become a realiable partner, and then 3) how do they work together to deliver unbelievable value to the business.Of course, the goal is more than happy coexistence. It’s to replicate the Etsy and LinkedIn stories:Increase the rate of features that we can put into production, while simultaneously maintaining the reliability, stability, security and survivability of the production environment.
  • [ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
  • How each side Actively impedes the achievement of each other’s goals.