24. Effective pairing of preventive and detective controlsSource: IT Process Institute
25. Visible Ops: Playbook of High Performers The IT Process Institute has been studying high-performing organizations since 1999 What is common to all the high performers? What is different between them and average and low performers? How did they become great? Answers have been codified in the Visible Ops Methodology The “Visible Ops Handbook” is available from the ITPI www.ITPI.org
26. 2007: Three Controls Predict 60% Of Performance To what extent does an organization define, monitor and enforce the following? Standardized configuration strategy Process discipline Controlled access to production systems Source: IT Process Institute, 2008
30. More Platitudes ”Speak in the language of the business” ”Help foster the right tone at the top” "Build a genuine relationship with your fellow stakeholders” ”Be savvy and take advantage of compelling events” "Create real security programs, so that compliance will be free” ”Because security is everyone's responsibility” "Don't let the auditors create your compliance program for you” “Assess, plan, design, execute, monitor” “Build the right control environment, and security and compliance will come”
32. Why Was I So Unsatisfied With The State Of IT Practice? IT operations work continued to be viewed as tactical Information security and compliance programs were sucking all the air out of the room (due to scoping problems) The activation energy for successful improvement programs was still too high The IT operations and Information Security issues overshadowed by development Issues are amplified 10x in production: outages, findings, lawsuits Technical debt builds up over time IT operations is often the constraint in the organization Linkage of IT performance to business performance not obvious enough “Why doesn’t the business care? I found the pump handle!”
33. Seeing A Bigger Problem Operations Sees… Fragile applications are prone to failure Long time required to figure out “which bit got flipped” Detective control is a salesperson Too much time required to restore service Too much firefighting and unplanned work Urgent security rework and remedation Planned project work cannot complete Frustrated customers leave Market share goes down Business misses Wall Street commitments Business makes even larger promises to Wall Street Dev Sees… More urgent, date-driven projects put into the queue Even more fragile code (less secure) put into production More releases have increasingly “turbulent installs” Release cycles lengthen to amortize “cost of deployments” Failing bigger deployments more difficult to diagnose Most senior and constrained IT ops resources have less time to fix underlying process problems Ever increasing backlog of infrastructure and security projects that could fix root cause and reduce costs Ever increasing amount of tension between IT Ops and Development These aren’t IT Operations or Infosecproblems…These are business problems!
34. 14 Infosec Can Break A Core Chronic Conflict In IT * Every IT organization is pressured to simultaneously: Respond more quickly to urgent business needs Provide stable, secure and predictable IT service Words often used to describe ITIL process owners:“hysterical, irrelevant, bureaucratic, bottleneck, difficult to understand, not aligned with the business, immature, shrill, perpetually focused on irrelevant technical minutiae…” Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written extensively on the theory and practice of identifying and resolving core, chronic conflicts.
35. Framed This Way, Help Can Come From A Surprising Place The VP Application Development will often have the following complaints: IT Operations is the bottleneck We complete the code, but it takes too long for IT Operations to get the code into production Environments are never available when we need them Security changes break the production environment on rollout Releases often cause chaos and disruption to all the other production services Turbulent installs have become the norm: 30 min installs take 3 days Due to slow OS upgrades, applications delayed by 2 quarters We are always late getting features to market
37. A Reframed IT Operations Problem Statement Increase flow from Dev to Production Increase throughput Decrease WIP Our goal is to create a system of operations that allows Planned work to quickly move to production Ensure service is quickly restored when things go wrong Information security built in every stage of Development, Project Management, and IT Operations How does this relate to Visible Ops? We focused much on “unplanned work” What’s happening to all the planned work? At any given time, what should IT Ops be working on? Now we are focusing on the flow of planned work
38. Goal #1: Decrease Cycle Time Of Releases Create determinism in the release process Move packaging responsibility to development Release early and often Decrease cycle time Reduce deployment times from 6 hours to 45 minutes Refactor deployment process that had 1300+ steps spanning 4 weeks Never again “fix forward,” instead “roll back,” escalating any deviation from plan to Dev Verify for all handoffs (e.g., correctness, accuracy, timeliness, etc…) Ensure environments are properly built before deployment begins Control code and environments down the preproduction runways Hold Dev, QA, Int, and Staging owners accountable for integrity
39. Goal #2: Increase Production Rigor Define what work is and where work can come from Protect the integrity of the work queue (e.g., are checks being written than won’t clear?) To preserve and increase throughput, elevate preventive projects and maintenance tasks Document all work, changes and outcomes so that it is repeatable Ops builds Agile standardized deployment stories, to be completed after Dev sprints are complete Maintains adequate situational awareness so that incidents could be quickly detected and corrected Standardize unplanned work and escalations Always seeking to eradicate unplanned work and increase throughput Lean Principle: “Better -> Faster -> Cheaper”
41. The Prescriptive DevOps Cookbook Capture and codify how to start and finish successful DevOps transformations Create isomorphic mapping between plant floors and IT shops Co-authoring with Patrick DeBois, Mike Orzen, John Willis Describe in detail how to replicate the transformations describe in “When IT Fails: The Novel” Goals How does Development, IT Operations and Infosec become dependable partners How do they work together to solve business problems (and Infosec, too)
42. By The Visible Ops Team:Gene Kim, Kevin Behr, George Spafford
43. The Theory of Constraints Approach To Visible Ops Dr. Goldratt wrote The Goal in 1984, describing Alex’s challenge to fix his plant’s cost and due date issues within 90 days Some tenets that went against common wisdom: Every flow of work has a constraint/bottleneck Any improvement not made at the bottleneck is merely an illusion Fallacy of cost accounting as operational management tool
44. When IT Fails: The NovelDay 1 Steve Masters, CEO Bill Palmer, VP IT Operations Parts Unlimited$4B revenue/year “We’re not Google. IT isn’t a core competency”
45. Bill’s First Month On The Job Day 1: CEO loses chairmanship of the company, due to inability to deliver critical project that will “close the gap” Day 2: The payroll outage, due to a tokenization rollout Day 3: VP IT Operations thrown under the bus by Marketing and Development: deployment in 9 days Day 4: 900 IT general control deficiencies in SOX-404 audit Day 12: The launch…
46.
47.
48. John Pesche, CISO CISO for 12 years 39 years old Aggressive career climber Ex-Big Four auditor
54. Assumption #1 Infosecwins based on how much work it can put into the IT system How much budget can we get? How many of the vulnerabilities can we get closed? How much can we push line managers and workers to close security holes?
55. Breakthrough #1 He realizes that excess control complexity continually adds entropy to the rest of the system He becomes the “forebrain of the organization:” what data really needs to be protected, where controls reliance really resides, and where you don't have sole reliance on a technical control Shrinks the scope of the SOX-404 and PCI audit, doubling the capacity of the IT Operations organization
57. Breakthrough #2 He shifts his focus from the work center level, to the plant line level He realizes that he can help design a control environment and build a system of work where Dev and Ops can be relied upon so that they work together to simultaneously achieve: fast flow of features into production deliver services in production that are: Attributes of Rugged DevOps Scalability, availability, survivability, sustainability, security, supportability, defensibility
58. Assumption #3 Cycles for Infosec come at the expense of Development and IT Operations
59. Breakthrough #3 IT Operations constraint capacity quadruples Development release rate goes from quarterly to three times daily 10% of all Dev and Ops cycles go to security requirements Security mean time to Find/Fix goes from quarters to days to hours
77. Interested? If you’re interested in When IT Fails: The Novel, sign up for the newsletter at http://whenitfails.org Or: # mail genek@realgenekim.meSubject: novel # mail genek@realgenekim.meSubject: cookbook
78. Resources From the IT Process Institute www.itpi.org Both Visible Ops Handbooks ITPI IT Controls Performance Study “Lean IT” by Orzen and Bell Winner of the Shingo Prize 2011 Rugged Software by Corman, et al: http://ruggedsoftware.org “Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation” by Humble, Farley Follow Gene Kim @RealGeneKim mailto:genek@realgenekim.me http://realgenekim.me/blog
Editor's Notes
How each side Actively impedes the achievement of each other’s goals.
[ text ] My personal goal is to prescriptively define 1) what does Dev need to do to become a reliable partner, 2) what does IT Operations need to do to become a realiable partner, and then 3) how do they work together to deliver unbelievable value to the business.Of course, the goal is more than happy coexistence. It’s to replicate the Etsy and LinkedIn stories:Increase the rate of features that we can put into production, while simultaneously maintaining the reliability, stability, security and survivability of the production environment.
[ picture of When IT Fails ]But how do we make this an issue that CEOs actually care about, instead of strictly a grass-roots movement?For five years, I’ve been working on a book called “When IT Fails: The Novel.” Which I think can help.The goal of the book is to help bridge the dysfunctional marriage that often exists between the CIO and the CEO.When I told the CIO of Columbia Sportswear about it, he said, “When you finish that book, not only will everyone on my team need to read this, but my boss will need to read this, and my bosses boss will need to read this.”I was so moved by it, that it was one of the main reasons I wrote Tripwire – make completion of the book my sole focus.
http://www.google.com/imgres?imgurl=http://www.examiner.com/images/blog/replicate/EXID5738/images/OilFireGeraldHerbert4(2).jpg&imgrefurl=http://www.examiner.com/political-buzz-in-national/new-video-shows-fire-on-deepwater-horizon-rig-which-began-the-gulf-oil-spill&usg=__4TPjgw3TY6wNZg7zFAXNIWZXaj0=&h=384&w=512&sz=31&hl=en&start=0&sig2=FO4hjd44_Z_J97_hjc3jZQ&zoom=1&tbnid=nKsbxZu5TitT_M:&tbnh=136&tbnw=178&ei=4Mf7Tbf1Jor2swPpo-XeBQ&prev=/search%3Fq%3Dgulf%2Boil%2Bfire%26um%3D1%26hl%3Den%26sa%3DN%26biw%3D1280%26bih%3D730%26tbm%3Disch&um=1&itbs=1&iact=hc&vpx=321&vpy=239&dur=784&hovh=194&hovw=259&tx=117&ty=112&page=1&ndsp=24&ved=1t:429,r:7,s:0&biw=1280&bih=730[ picture of firefighters ] At 5pm, the rollout starts, and disaster ensues. All of Ops is standing by, but Dev is still making changes and there isn’t anything to deployBy 7p, QA has the code, but can’t get Phoenix to run in the test environments. The words from a developer was heard, “That’s funny. It works on my laptop.” And is almost killed by an angry QA and Ops people.And if we can’t get it to run in QA, how are we going to run this in Prod?
http://www.google.com/imgres?imgurl=http://4.bp.blogspot.com/_SqhhJb_P3Kk/S_CFsVck8GI/AAAAAAAAL8k/h61WkSiQvAc/s1600/controlled%2Bfire%2Bof%2BGulf%2Boil.jpg&imgrefurl=http://entrepeuner-artphoto.blogspot.com/2010/05/fire.html&usg=__5jgQ79FdZZj3lqgqujoQK5J6Cw8=&h=808&w=990&sz=156&hl=en&start=0&sig2=OV8uWNo0NB2aBG62mO1cvw&zoom=1&tbnid=vnZLBKkmDWNQNM:&tbnh=136&tbnw=162&ei=4Mf7Tbf1Jor2swPpo-XeBQ&prev=/search%3Fq%3Dgulf%2Boil%2Bfire%26um%3D1%26hl%3Den%26sa%3DN%26biw%3D1280%26bih%3D730%26tbm%3Disch&um=1&itbs=1&iact=hc&vpx=699&vpy=235&dur=1211&hovh=203&hovw=249&tx=144&ty=123&page=1&ndsp=24&ved=1t:429,r:9,s:0&biw=1280&bih=730[ picture of Deep Horizon fire ] At 11p, we hit the point of no return: converting the order entry database. But the database script runs a thousand times slower than expected. So, instead of taking minutes, it will take days.As a result, all the in-store POS systems won’t come up until Tuesday, so stores open with manual card swipes and carbon paper.All weekend long, Ops is doing hourly reboots because of memory leaks. And credit card numbers are leaked everytime you empty the shopping cart.
But it’s not just about effectiveness and efficiency. Or just about being efficiently effective, or effectively efficient. Which brings us to the second theme of this conference, which is relevance. The work has to mean something to someone. In my journey of studying high performing IT organizations, I’ve run into many non-high performers. And in those organizations, controls functions, and information security is often viewed as the shrill, hysterical people who are trying to create bureaucratic processes, which suck the will to live out of everyone it touches.These are the functions that tend to get marginalized, or worse, totally avoided. “We have an urgent project that needs to get done. Make sure you don’t invite Gene, because he’ll guarantee that it won’t get done.” Our job is to make money for the business, and I’m not sure what Gene’s job is…
This story is about how Bill, the thoughtful and methodical VP IT Operations, who saves some of the largest problems of the company. It’s a story about a Visible Ops and DevOps style transformation. It’s how Bill saves the company, helping it achieves their project goals, operational goals, security and compliance goals.And Steve the CEO realizes that Bill, the lowly VP of IT Operations, is the person who saved the company.