Your SlideShare is downloading. ×
2011 09 19 LSPE Dev Ops Cookbook 1a
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

2011 09 19 LSPE Dev Ops Cookbook 1a

1,290
views

Published on

Published in: Business, Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,290
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • How each side Actively impedes the achievement of each other’s goals.
  • http://www.flickr.com/photos/keenepubliclibrary/2435790649/
  • [ text ] My personal goal is to prescriptively define 1) what does Dev need to do to become a reliable partner, 2) what does IT Operations need to do to become a realiable partner, and then 3) how do they work together to deliver unbelievable value to the business.Of course, the goal is more than happy coexistence. It’s to replicate the Etsy and LinkedIn stories:Increase the rate of features that we can put into production, while simultaneously maintaining the reliability, stability, security and survivability of the production environment.
  • Since 1986, I’ve been a QA engineer writing filesystem QA tests, system administrator, developer, infosec, process design, operations research, auditorIncidentally, I almost moved to Seattle to be on Microsoft NT network test team in 1991 (TCP/IP stack)For 13 years, I was the founder/CTO of Tripwire, but my primary passion is studying high performing IT operations and security organizations.When I met Chris 3 years ago, he helped me see clearly one of the primary obstacles for successful transformations. I’ll describe this later.First, let me talk about what I meant by “high performers” back in 1999.
  • Transcript

    • 1. The DevOps Cookbook: Codifying Kick-Ass Business Practices That Matter
      Gene Kim, CISA, TOCICO Jonah#lspeSeptember 19, 2011
    • 2. Where Did The High Performers Come From?
    • 3. Higher Performing IT Organizations Are More Stable, Nimble, Compliant And Secure
      • High performers maintain a posture of compliance
      • 4. Fewest number of repeat audit findings
      • 5. One-third amount of audit preparation effort
      • 6. High performers find and fix security breaches faster
      • 7. 5 times more likely to detect breaches by automated control
      • 8. 5 times less likely to have breaches result in a loss event
      • 9. When high performers implement changes…
      • 10. 14 times morechanges
      • 11. One-half the change failure rate
      • 12. One-quarter the first fix failure rate
      • 13. 10x fasterMTTR for Sev 1 outages
      • 14. When high performers manage IT resources…
      • 15. One-third the amount of unplanned work
      • 16. 8 times moreprojects and IT services
      • 17. 6 times moreapplications
      Source: IT Process Institute, 2008
    • 18. Common Traits of High Performers
      Culture of…
      Change management
      • Integration of IT operations/security via problem/change management
      • 19. Processes that serve both organizational needs and business objectives
      • 20. Highest rate of effective change
      Causality
      • Highest service levels (MTTR, MTBF)
      • 21. Highest first fix rate (unneeded rework)
      Compliance and continual reduction of operational variance
      • Production configurations
      • 22. Highest level of pre-production staffing
      • 23. Effective pre-production controls
      • 24. Effective pairing of preventive and detective controls
      Source: IT Process Institute
    • 25. Visible Ops: Playbook of High Performers
      The IT Process Institute has been studying high-performing organizations since 1999
      What is common to all the high performers?
      What is different between them and average and low performers?
      How did they become great?
      Answers have been codified in the Visible Ops Methodology
      The “Visible Ops Handbook” is now available from the ITPI
      www.ITPI.org
    • 26. 2007: Three Controls Predict 60% Of Performance
      To what extent does an organization define, monitor and enforce the following?
      Standardized configuration strategy
      Process discipline
      Controlled access to production systems
      Source: IT Process Institute, 2008
    • 27. The Darkest Moment In My Journey
    • 28. Tough Love From Ari Balogh
    • 29. Why Was I So Unsatisfied With The State Of IT Practice?
      IT operations work continued to be viewed as tactical
      Information security and compliance programs were sucking all the air out of the room (due to scoping problems)
      The activation energy for successful improvement programs was still too high
      The IT operations issues overshadowed by development
      Issues are amplified 10x in production: outages, findings, lawsuits
      Technical debt builds up over time
      IT operations is often the constraint in the organization
      Linkage of IT performance to business performance not obvious enough
      “Why doesn’t the business care? I found the pump handle!”
    • 30. Seeing The Bigger Problem
      Operations Sees…
      Fragile applications are prone to failure
      Long time required to figure out “which bit got flipped”
      Detective control is a salesperson
      Too much time required to restore service
      Too much firefighting and unplanned work
      Planned project work cannot complete
      Frustrated customers leave
      Market share goes down
      Business misses Wall Street commitments
      Business makes even larger promises to Wall Street
      Dev Sees…
      More urgent, date-driven projects put into the queue
      Even more fragile code put into production
      More releases have increasingly “turbulent installs”
      Release cycles lengthen to amortize “cost of deployments”
      Failing bigger deployments more difficult to diagnose
      Most senior and constrained IT ops resources have less time to fix underlying process problems
      Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs
      Ever increasing amount of tension between IT Ops and Development
      These aren’t IT Operations problems…These are business problems!
    • 31. The Dreaded Disease
      IT Operations Constipatus (noun)
      Occurs when IT Operations creates fatal blockages in project flow. Creates blinding pain in Dev organization.Blockage worsens with chronic break/fix and security/compliance work, and when technical debt is never paid off.Causes host to lose energy, become unable to achieve organizational goals. Dangerous to CEOs.
      Photo credit: http://www.flickr.com/photos/keenepubliclibrary/2435790649/
    • 32. 12
      DevOps Can Break A Core Chronic Conflict In IT *
      Every IT organization is pressured to simultaneously:
      Respond more quickly to urgent business needs
      Provide stable, secure and predictable IT service
      Words often used to describe ITIL process owners:“hysterical, irrelevant, bureaucratic, bottleneck, difficult to understand, not aligned with the business, immature, shrill, perpetually focused on irrelevant technical minutiae…”
      Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written extensively on the theory and practice of identifying and resolving core, chronic conflicts.
    • 33. Framed This Way, Help Can Come From A Surprising Place
      The VP Application Development will often have the following complaints:
      IT Operations is the bottleneck
      We complete the code, but it takes too long for IT Operations to get the code into production
      Environments are never available when we need them
      Releases often cause chaos and disruption to all the other production services
      Turbulent installs have become the norm: 30 min installs take 3 days
      Due to slow OS upgrades, applications delayed by 2 quarters
      We are always late getting features to market
    • 34. A Reframed IT Operations Problem Statement
      Increase flow from Dev to Production
      Increase throughput
      Decrease WIP
      Our goal is to create a system of operations that allows
      Planned work to quickly move to production
      Ensure service is quickly restored when things go wrong
      How does this relate to Visible Ops?
      We focused much on “unplanned work”
      What’s happening to all the planned work?
      At any given time, what should IT Ops be working on?
      Now we are focusing on the flow of planned work
    • 35. What These Breakthroughs Look Like
    • 36. Goal #1: Decrease Cycle Time Of Releases
      Create determinism in the release process
      Move packaging responsibility to development
      Release early and often
      Decrease cycle time
      Reduce deployment times from 6 hours to 45 minutes
      Refactor deployment process that had 1300+ steps spanning 4 weeks
      Never again “fix forward,” instead “roll back,” escalating any deviation from plan to Dev
      Verify for all handoffs (e.g., correctness, accuracy, timeliness, etc…)
      Ensure environments are properly built before deployment begins
      Control code and environments down the preproduction runways
      Hold Dev, QA, Int, and Staging owners accountable for integrity
    • 37. Goal #2: Increase Production Rigor
      Define what work is and where work can come from
      Protect the integrity of the work queue (e.g., are checks being written than won’t clear?)
      To preserve and increase throughput, elevate preventive projects and maintenance tasks
      Document all work, changes and outcomes so that it is repeatable
      Ops builds Agile standardized deployment stories, to be completed after Dev sprints are complete
      Maintains adequate situational awareness so that incidents could be quickly detected and corrected
      Standardize unplanned work and escalations
      Always seeking to eradicate unplanned work and increase throughput
      Lean Principle: “Better -> Faster -> Cheaper”
    • 38. Some Principles
      Because operations is constrained, it is always better to prevent than recover
      Operations work must be planned
      We strive to have continual situational awareness
      We will strive to control as many dimensions of our work as possible
      We ruthlessly pursue to understand any deviations from normal
      We expect systems in operations to never stop working
      We never do one-offs (they must be exceptions, not the rule)
      We require determinism to enable resiliency
      We strive for the improvement and mastery of the environment
    • 39. Creating A System Of Operations
      Inj: 1. Projects: ensure rapid project releases from Development
      Inj: 1.1. Created effective centralized work demand queue
      Inj: 1.2: Protect integrity of work queue (e.g., write only checks that will clear)
      Inj: 1.3: Release early and often: Freeze projects if necessary, choking materials release to reduce WIP, allow longer runways of work
      Inj: 1.4: Elevate any deviations or incidents that stop flow of work
      Inj: 1.5: Standardize product deployments with Development
      Inj: 1.6: Continually seek ways to increase flow
      Inj: 2. Ensure reliable IT operations
      Inj: 2.1: When failures, detect/correct quickly inside the plant (e.g., production)
      Inj: 2.2. Prevent failures (e.g., maintenance)
      Inj: 2.3. Study and create projects to reduce/eradicate unplanned work
      Inj: 2.4. Seek ways to increase production
      Inj: 3. Subordinate infosec/PMO/etc. to enable Inj 1 & 2
    • 40. The Prescriptive DevOps Cookbook
      Capture and codify how to start and finish successful DevOps transformations
      Create isomorphic mapping between plant floors and IT shops
      Co-authoring with Patrick DeBois, Mike Orzen, John Willis
      Describe in detail how to replicate the transformations describe in “When IT Fails: The Novel”
      Goals
      How does Development, IT Operations and Infosec become dependable partners
      How do they work together to solve business problems (and Infosec, too)
    • 41. Goal Statement
      Build a system of work where Dev and Ops can be relied upon so that they work together to simultaneously achieve:
      fast flow of features into production
      deliver services in production that are:
      Attributes of Rugged DevOps
      Scalability, availability, survivability, sustainability, security, supportability, defensibility
    • 42. Underpinning Principles
      Agile: increase velocity
      Lean: reduce WIP
      Systems thinking: Dev, Test, IT Operations, Project Management, Information Security
      Lean: implementing effective countermeasures
    • 43. Cookbook Outline
      Part 1: Enable IT Operations to become a dependable partner
      Part 2: Enable Dev to become a dependable partner
      Part 3: Dev and IT Operations to create breakthrough results
    • 44. Part 1: IT Ops
      Enable fast, repeatable and predictable flow of planned work
      Create single work queue, master list of commitments, master production schedule
      Create catalog of acceptable work: bill of materials, work centers, routings
      Runners, repeaters and strangers
      Create job release function
    • 45. Part 1: IT Ops
      Minimize disruption from unplanned work
      Standardize unplanned work: make it repeatable
      Modify first response: ensure constrained resources have all data at hand to diagnose
      Elevate preventive activities to reduce incidents
      Stories about reducing reliance on Brent
    • 46. Part 2: Dev
      Continuous deployment and integration in place
      Working through some assumptions about Agile methods in place
    • 47. Part 3: DevOps
      Pick a pilot project
      Baseline current performance
      Create organization
      Someone needs to see the end-to-end flow from Dev to Production to Incident
      Enable correct feedback loops
    • 48. Part 3: DevOps
      Dev and Ops work together in Sprint 0 and 1 to create code and environments
      Create environment that Dev deploys into
      Create downstream environments: QA, Staging, Production
      Create the Agile information radiator
      Integrate infosec and QA into daily sprint activities
    • 49. Part 3: DevOps
      Embed Ops person into Dev structure
      Describes non-functional requirements, use cases and stories from Ops
      Has a vote like other team members
      Responsible for bringing Ops experiences into “quality at the source”
      Has special responsibility for pulling the Andon cord
    • 50. Part 3: DevOps
      Potentially decouple production releases from Sprint boundaries
      Issue: how to enable deployments that are more frequent than the typical 1 or 2 week intervals
      Sprints vs. Kanbans
    • 51. Part 3: DevOps
      Put Dev into Ops escalation chain
      MobBrowser case study: “Waking up developers at 3am is a great feedback loop: defects get fixed very quickly”
      Determine when SOD is a control being relied upon
    • 52. The Prescriptive DevOps Cookbook
      • I am seeking fellow travelers who want to capture and codify the best known methods, patterns/anti-patterns, recipes and case studies of how to implement successful DevOps-style transformations.
    • Questions
      Are there areas that we’ve neglected to mention?
      What are the largest barriers to implementing what’s been covered?
      Do you have any tricks/tips/cookbooks you’d like to share?
    • 53.
    • 54. The Theory of Constraints Approach To Visible Ops
      Dr. Goldratt wrote The Goal in 1984, describing Alex’s challenge to fix his plant’s cost and due date issues within 90 days
      Some tenets that went against common wisdom:
      Every flow of work has a constraint/bottleneck
      Any improvement not made at the bottleneck is merely an illusion
      Fallacy of cost accounting as operational management tool
    • 55. When IT Fails: The NovelDay 1
      Steve Masters, CEO
      Dick Landry, CFO
      Parts Unlimited$4B revenue/year
    • 56. When IT Fails: The NovelDay 2
      Bill Palmer, VP IT Operations (promoted)
      Wes Davis, Director, Distributed Systems
      Patty McKee, Director, IT Service Support Services
      The payroll outage
      All salaried employees will get paid, but not the hourlies
      CISO put in tokenization application in the factories, breaking database query that uses SSN
      IT Ops thought it was a SAN firmware upgrade failure
      All HR apps go down
      CFO is on front page of news, apologizing to community
    • 57. When IT Fails: The NovelDay 4
      Chris Allers, VP Application Development
      Sarah Moulton, SVP Retail Products
      “We can deploy by next week by cutting some corners, but IT Ops is in the way… again…”
      “Bill, your team lacks a sense of urgency. We must go. We’ve already bought the newspaper ads – they’re bought, paid for and being printed…”
    • 58. When IT Fails: The NovelDay 3
      Nancy Mailer, Chief Audit Executive
      John Pesche, CISO
      IT Operations has 980 IT general control deficiencies on critical financial systems, potentially dooming financial statement to having a footnote. Needs management response in 1 week.
      Bill grapples with who to put on the project. 1 yr of work, just to fix issues, even without Phoenix.
    • 59. The Goal For IT: Day 10
      The Deployment
      Database conversion, the point of no return, taking 1000x longer.
      In store POS won’t come up by Sat 8am, maybe by next Tuesday
      Emptying shopping cart shows last successful order credit card #
    • 60. Resources
      From the IT Process Institute www.itpi.org
      Both Visible Ops Handbooks
      ITPI IT Controls Performance Study
      “Lean IT” by Orzen and Bell
      Winner of the Shingo Prize 2011
      “Inspired: How To Create Products That Customers Love” by Cagan
      “Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation” by Humble, Farley
      Follow Gene Kim
      @RealGeneKim
      mailto:genek@realgenekim.me
      http://realgenekim.me/blog
    • 61. Call To Action
      If you’re interested in reviewing early versions of “When IT Fails: The Novel,” email me.
      If you’re interested in helping build or review the DevOps Cookbook, email me.
      I’m genek@realgenekim.me
      Thank you for allowing me to join your tribe!
    • 62.
    • 63. About Gene Kim
      I’ve spent the last 12 years studying high performing IT organizations, trying to understand:
      What do they have in common?
      What is present in successful transformations, absent in unsuccessful transformations?
      How do we lower the activation energy required to create the transformations?
      Founder and former CTO of Tripwire, Inc.
      Co-author of Visible Ops Handbook, Security Visible Ops Handbook
      Active researcher
      Co-founder of IT Process Institute
      Committee member of Institute of Internal Auditors
      Leader of PCI Security Standards Council Scoping SIG