Successfully reported this slideshow.
Your SlideShare is downloading. ×

Turning Human Capital into High Performance Organizational Capital

Loading in …3

Check these out next

1 of 87 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (16)


Similar to Turning Human Capital into High Performance Organizational Capital (20)

More from John Willis (20)


Recently uploaded (20)

Turning Human Capital into High Performance Organizational Capital

  1. 1. Devops: Turning Human Capital into High Performance Organizational Capital John Willis @botchagalupe
  2. 2. • One of the founding members of “Devopsdays” • Co-author of the “Devops Handbook”. • Author of the “Introduction to Devops” on Linux Foundation edX. • Podcaster at • Devops Enterprise Summit - Cofounder • Nine person in at Chef (VP of Customer Enablement) • Formally Director of Devops at Dell • Found of Socketplane (Acquired by Docker) • 10 Startups over 25 years About Me
  3. 3. How would I describe Devops to a CEO?
  4. 4. How would you describe Devops to a CEO?
  5. 5. Exercise Time (Deep Breath)
  6. 6. The consequences of failure have never been greater…
  7. 7. Wanna know how?
  8. 8. Devops Practices and Patterns • Continuous Delivery • Everything in version control • Small batch principle • Trunk based deployments • Manage flow (WIP) • Automate everything
 • Culture • Everyone is responsible • Done means released • Stop the line when it breaks • Remove silos12
  9. 9. Human Capital and High Performance Organizations
  10. 10. 30x 200x more frequent deployments faster lead times 60x 168x the change success rate faster mean time to recover (MTTR) 2x 50% more likely to exceed profitability, market share & productivity goals higher market capitalization growth over 3 years* High performers compared to their peers… Data from 2014/2015 State of DevOps Report - Recent IT Performance Data is Compelling
  11. 11. 30x 200x more frequent deployments faster lead times 60x 168x the change success rate faster mean time to recover (MTTR) 2x 50% more likely to exceed profitability, market share & productivity goals higher market capitalization growth over 3 years* High performers compared to their peers… Data from 2014/2015 State of DevOps Report - Recent IT Performance Data is Compelling Faster Higher
 Quality More
 Effective 2555x
  12. 12. Fast CheapGood “Pick Two!” Conventional Wisdom
  13. 13. Faster, Better, and Cheaper?
  14. 14. Organizational culture was one of the strongest predictors of both IT performance and the overall performance of the organization
  15. 15. Devops is about Humans 19 Devops is a set of practices and patterns that turn human capital into high performance organizational capital.
  16. 16. Google • Over 15,000 engineers in over 40 offices • 4,000+ projects under active development • 5500+ code submissions per day (20+ p/m) • Over 75M test cases run daily • 50% of code changes monthly • Single source tree • Over 75M test cases run daily
  17. 17. Amazon • 11.6 second mean time between deploys. • 1079 max deploys in a single hour. • 10,000 mean number of hosts simultaneously receiving a deploy. • 30,000 max number of hosts simultaneously receiving a deploy
  18. 18. 23 Unicorns and Horses (Enterprises) Unicorns Enterprise Shamelessly stolen and repurposed from: Pete Cheslock
  19. 19. Enterprise Organizations • Ticketmaster - 98% reduction in MTTR • Nordstrom - 20% shorter Lead Time • Target - Full Stack Deploy 3 months to minutes • USAA - Release from 28 days to 7 days • ING - 500 applications teams doing devops • CSG - From 200 incidents per release to 18
  20. 20. Faster, Better, and Cheaper. How?
  21. 21. Lean Safety Culture Learning Organization
  22. 22. Lean
  23. 23. Service now Parts Unlimited - "Major Release 6" Early 2014 Project Initiation ZRA (finance) Approve Project Monthly Steering Meeting Portfolio C-level Steering Comittee Provides Input Project Charter High-Level • Stories • Project Info • Description • Budget • Schedule PM Stakeholders (Tech and Biz) Create Work Breakdown Work Breakdown (MS Proj) High-Level • Milestones • Resource Planning 3 months 3 monthsHold / Pause Create Requirements (Project Meeting) MS Office • Detailed Req for new features • Technology refreshes • ERD (Infra req) • DRD (Dev req) • BRD (Biz req) Share Point Create Design Tech Req Tech Req Tech Req Tech Leads Architects Vendor Arch Ops Arch High-Level Server Tickets 3 months Receive Request for Servers Create Server Request Spreadsheet Server Req PM Tixattach Route for Approval Tix 1 week 1 week • Budget • Appropriate Resources DB App or Web or Approved Into Ops Delivery Queue Delivery Manager "Matt" Service now "Heads up" Assign to Delivery Engineer Delivery Engineer Clarify or Confirm Req with Dev or QA 1 - 6 weeks Provision Server and Rework DBA Validation App/Web Validation Restore Data 1 week App Team App Team PM Stakeholders (Tech and Biz) Dev Leads 4 weeks ARB Queue Detailed Analysis and Requirements Jira "Stories" Maybe Track Ticket Dependencies Confluence Pages Team Leads and PMs Assign Requirements add more detail for their teams Architecture Review Board "Bill" plus Architects Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data Ops DBA Service Data Setup (Mainframe) "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" (if new arch) 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Data Setup Integration Testing DEv Arch Create Change Tickets > 100 Service Now Compute Net Facility Cabling Storage "Linda" Ops PM RESET DELIVERY DATE! Steering Comittee Fix Tickets! "Linda" Ops PM Dev Leadership Assign Dev Team Ops Intake Meeting Dev Leadership 1 week Group CIOs and Arch Leads QA Steering Design Dev Breakdown Dev / Test Staging Release Server Requirements Gathering Server Approval and Assignment Provisioning Production Release Initiation and Planning Create Ops Tickets TS PD TS PD Gaps in Requirements • Licenses • Dependencies on 3rd party apps • Capacity planning always seem low ("robbing Peter to pay Paul") • Don't purchase in advance even though we know it's coming Duplicate info across different documents EP D D Procurement of physical servers can take months (lead times for procurement plus facilities groups) Too many Env. in on ticket cases audit confusion Piecemeal requests ("2 this week, 3 next week") 1 queue for delivery team with ~1,000 tickets at once Capacity issues cause delay Often told to stop everything and do something else TS D M TS M W W TS EP H No monitoring or backup for some environments 30% of delivery teams time spent "consulting" on performance and dealing with unfounded requests for more capacity 3-5 days to fix ~10% S/R H D M TS H Often skips CAB. What CAB reviews is often not what built All manual setup. 1 person really knows how. Low data quality. Manual process with lots of back and forth. Many tickets with mismatched priorities Mostly manual testing Manual, per cluster Frequently down. External service updates take offline. Lots of contention. EP M D PD M W TS TS D M TS PD M M S/R - 90% S/R - 55% S/R - 15% D S/R - 20% S/R - 50% Sometimes submits server requests directly to delivery Ad-hoc requests get lost, maybe 2-3 week delays TS High Level S/R - 75% 9+ months of planning before implementation starts (and information / requirements still incorrect or incomplete!) Dev and QA told to submit sever request 6-8 weeks in advance (only done 50% of time) W5. New "white glove" engagement model 3. Standard product catalog ("Environments on Demand") 2. Visualization of flow of work and expected upcoming work 4. Shorten from Design to Implementation 1. Fully Automated Environment Provisioning 7. Small Batches 8. Write end-to- end customer func. tests 11. Resolve interface to legacy 10. Test data setup automation 13. Dev Deploy to Prod for legacy 14. Unify change management tools 15. Tool 9. Service Verification test writing: shift left to Dev (test early) 12. Remove Bottleneck and Environment Contention (test more) • Make the work visibile for all • Manage flow and eliminate waste • Build alignment and consensus across team boundaries • Empower teams to find and fix what is getting in the way
  24. 24. • Small Batch • Reduce Work in Process (WIP) • 1x1 Flow • Reduce Bottlenecks (TOC) • Optimize Globally
  25. 25. Where does lean come from?
  26. 26. Let’s talk Kata
  27. 27. I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times - Bruce Lee
  28. 28. Toyota is not a story about techniques. It’s an organization defined primarily by the unique behavior routines it continually teaches to all it’s members. Mike Rother (Page 262-263)
  29. 29. Wanna see what Kata looks like in Devops?
  30. 30. I have no idea how to answer that question. It would literally never occur to me not to do it! KATA
  31. 31. We are what we repeatedly do. Excellence, then, is not an act, but a habit. The Dude
  32. 32. Improvement Kata Coaching Kata
  33. 33. • Capability 1: Seeing problems as they occur • Complex work is managed so that problems in design are revealed • They see problems as they occur, through relentless testing of assumptions
 • Capability 2: Swarming and solving problems as they are seen to build new knowledge • Problems that are seen are solved so that new knowledge is built quickly • Improvement of daily work is prioritized above daily work
 • Capability 3: Spreading new knowledge throughout the organization • The new discovery of local knowledge and improvements are turned into global improvements, shared throughout the organization • Learning is fed back to prevent future failures
 • Capability 4: Leading by developing • The job of leaders is not the command and control, but to create other capable leaders who can perpetuate this system of work
  34. 34. Safety Culture
  35. 35. Wanna See Another Video?
  36. 36. Views on Human Error
  37. 37. ▪ Views on Human Error ▪ The old view of human error (First Story) ▪ Human error is the cause of accidents ▪ To explain failure,you must seek failure ▪ You must find people’s: inaccurate assessments,wrong decisions, bad judgments
  38. 38. ▪ Views on Human Error ▪ The new view of human error (Second Story) ▪ Human error is a symptom of trouble deeper inside a system ▪ To explain failure, do not try to find where people went wrong ▪ Instead, find how people’s assessments and actions made sense at the time, given the circumstances that surrounded them
  39. 39. ▪ Bad Apple Theory - Throw away the bad apples ▪ Complex systems are basically safe, they need to be protected from unreliable people (bad apples) ▪ Human errors cause accidents: humans are the dominant contributor to more than two thirds of mishaps ▪ Errors occur because of human loss of situation awareness, complacency, negligence ▪ Errors are introduced to the system only through the inherent unreliability of people.
  40. 40. What can go wrong usually goes right, but then we draw the wrong conclusion. Murphy’s Law is Wrong! Sidney Dekker The Field Guide to Human Error
  41. 41. Blameless Culture A blameless culture believes that systems are NOT inherently safe and humans do the best they can to keep them running.
  42. 42. Thematic Vagabonding People jump from one topic to the next, treating all superficially, in certain cases picking up topics dealt with earlier at a later time; they don’t go beyond the surface with any topic and seldom finish with any. (Dörner, 1980)
  43. 43. Your organization must continually affirm that individuals are NEVER the ‘root cause’ of outages.
  44. 44. ▪ Awesome Postmortems - Mindweather LLC ▪ in complex systems, there is no root cause, except… ▪ there are (multiple) conditions, some of which are unknowable, unfixable, outside our control ▪ people did what made sense at the time, given the information they had (no counterfactuals) ▪ failure and success are both normal in complex systems ▪ getting the full account* of what happened is more important than blame/punishment
  45. 45. ▪ Hindsight bias: ▪ knew-it-all-along, to see the event as having been predictable, counterfactuals ▪ Outcome bias: ▪ evaluating the quality of a decision when the outcome of that decision is already known ▪ Availability bias: ▪ preference by decision makers to information and events that are more recent ▪ Fundamental attribution error: ▪ explain behavior in terms of internal disposition, such as personality traits, abilities, motives, etc. as opposed to external situational factors
  46. 46. ▪ Just Culture at Etsy (John Allspaw) ▪ Encourage learning by having these blameless Post- Mortems on outages and accidents ▪ Understand how an accidents happen, in order to better equip ourselves from it happening in the future ▪ Gather details from multiple perspectives on failures, and we don’t punish people for making mistakes ▪ Enable and encourage people who do make mistakes to be the experts on educating the rest of the organization how not to make them in the future
  47. 47. ▪ Just Culture at Etsy (John Allspaw) ▪ Accept that there is always a discretionary space where humans can decide to make actions or not, and that the judgement of those decisions lie in hindsight ▪ Accept that the Hindsight Bias will continue to cloud our assessment of past events, and work hard to eliminate it ▪ Accept that the Fundamental Attribution Error is also difficult to escape, so we focus on the environment and circumstances people are working in when investigating accidents
  48. 48. Learning Organization
  49. 49. That’s how it’s always been done around here!
  50. 50. You are either building a learning organization… or you will be losing to someone who is - Walter Sobchak- Andrew Clay Shafer
  51. 51. ▪Dr Deming
  52. 52. A learning organization is a place where people are continually discovering how they create their reality. - Peter Senge
  53. 53. ▪ Five Disciplines must be adopted to become a learning organization ▪ Systems Thinking ▪ Personal Mastery ▪ Mental Models ▪ Shared Vision ▪ Team Learning
  54. 54. Ladder of Inference Chris Argyris • Action • Beliefs • Conclusions • Assumptions • Meanings • Select • Observe
  55. 55. Ladder of Inference ▪ Can create bad judgement ▪ Our assumptions can lead us to bad conclusions ▪ Question your assumptions and conclusions ▪ Seek contrary data ▪ Make your assumptions visible to others ▪ Invite others to test your assumptions and conclusions ▪ Inquire other peoples assumptions and conclusions ▪ Move down the ladder instead of up
  56. 56. Ladder of Inference - Bad Judgement ▪ Observe - Notice people in the first row ▪ Select - Person in front row keep looking at their phone ▪ Meaning - Not listening to my presentation ▪ Assumption - He is not interested ▪ Conclusion - Doesn’t like my new idea ▪ Beliefs - Their team always blocks new ideas ▪ Action - I send a nasty email to their boss
  57. 57. Ladder of Inference - Alternative Assumption ▪ Observe - I notice people in the first row ▪ Select - Person in the front row keep looking at their phone ▪ Meaning - Not listening to my presentation ▪ Assumption - Try and engage with a question (safely) ▪ Conclusion - Might find out that they are late for another meeting and they really don’t want to miss this one… so they sent an email noticing the next meeting team that they will be late…. ▪ Beliefs - They are very excited about this new idea ▪ Action - Both teams setup another meeting to engage.
  58. 58. Lean Safety Culture Learning Organization Psychology
  59. 59. ▪ very Interesting research…. ▪ Christina Maslach - Organizational Burnout ▪ Geri Puleo - Burnout (BDOC) ▪ Carol Dweck - Mindsets ▪ Kelly McGonigal - Stress
  60. 60. Bonus
  61. 61. ▪ Anomaly Response ▪ Computers do not resolve outages.. people do ▪ Trade-off’s under pressure ▪ Cognition in the wild ▪ An outage is not a detective story ▪ With each step the story changes ▪ Need to see what’s happing with incomplete information ▪ Tools don’t always make thing better
  62. 62. ▪ Anomaly Response - Internet Services are Opaque ▪ Network layer abstractions ▪ Variability in network performance ▪ Interdependent and decoupled services ▪ Internet based distributed computing ▪ Geographically distributed communication ▪ Open internet facing interactions
  63. 63. ▪ Anomaly Response - Challenges ▪ Teamwork ▪ Communication ▪ Diagnosis ▪ Decision Making ▪ Coordination ▪ Improvisation ▪ Tooling
  64. 64. ▪ Anomaly Response - Dynamic Fault Management ▪ Cascading effects ▪ Tempo changes and time pressure ▪ Multiple interleaved tasks ▪ Multiple interacting goals ▪ Need to revise assessments as new evidence comes in
  65. 65. "In dynamic fault management, intervention precedes or is interwoven with diagnosis" - Woods (1994)
  66. 66. Source: (Woods) John Allspaw -