Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ops Happens: Improving Incident Response Using DevOps and SRE Practices

1,380 views

Published on

Damon Edwards talk at Interop ITX in Las Vegas on May 3, 2018

http://rundeck.com

Published in: Technology
  • Dear Dr. Wakina, you are the heart and soul of our reunion after divorce, you made our marriage brighter and sweeter like others you have helped via dr.wakinalovetemple@gmail.com after a triumphant love spell. What an incredible supernatural father you are for seeing and healing my wounded heart as promised. Thanks for bringing back my husband after 14months, words can’t express how gratitude we feel about the tremendous opportunity given to us.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Find your dream house: http://amocasapentrutine.ro
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you find happyness in life Please Read it :: https://www.thesisscientist.com/blog/how-to-be-happy
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Find out Latest Technologies @ https://www.thesisscientist.com/blog
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ToDay $1563 http://expert0ption.com/demo
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Ops Happens: Improving Incident Response Using DevOps and SRE Practices

  1. 1. Ops Happens: Improving Incident Response Using DevOps and SRE Practices Damon Edwards @damonedwards 2018
  2. 2. Ops Improvement DevOps Ops Tools Community Damon Edwards
  3. 3. Deployment doesn’t make us money. Operations does.
  4. 4. Deployment doesn’t make us money. Operations does.
  5. 5. Deployment doesn’t make us money. Operations does. Deployment isn’t the goal. But we treat it like it is.
  6. 6. Deployment doesn’t make us money. Operations does. Deployment isn’t the goal. But we treat it like it is.
  7. 7. Deployment doesn’t make us money. Operations does. Deployment isn’t the goal. But we treat it like it is. Operations rarely gets to transform. Time to start.
  8. 8. Deployment doesn’t make us money. Operations does. Deployment isn’t the goal. But we treat it like it is. Operations rarely gets to transform. Time to start.
  9. 9. Let’s start with a true (but all too familiar) story…
  10. 10. Digital Agile DevOps SRE Cloud Docker Kubernetes Microservices CHANGE Bravo!Wow! Impressive
  11. 11. But what happens after deployment?
  12. 12. It was just another Tuesday…
  13. 13. NOC NOC Biz Manager Escalate! NOC NOC NOC (Bob) Open Incident Ticket 9:30am 10:00am NOC (Bob) Biz Manager Ticket Context Wagon Yes, but this looks different Hasn’t there been some intermittent errors this week? v3 ?!
  14. 14. NOC (Bob) Open Incident Ticket Ticket Biz Manager App-specific SREs “Try this.” “Try that.” SRE SysAdmin with Prod Access (Steve) SRE SRE SRE SRE SRE SRE Bridge Call Biz Manager fixed? fixed? NOC (Bob) Biz Manager NOC (Bob) Biz Manager SysAdmin (Steve) 7 x SRE Ticket Context Wagon Ticket Context Wagon
  15. 15. SRE “It’s a problem with the Foo service” SRE SRE Foo SRE SRE SRE SRE Bridge Call Biz Manager Foo Service No. NOC (Bob) Update Ticket Ticket Foo Lead Dev + add 12:00pm NOC (Bob) Biz Manager Foo SRE Ticket Context Wagon Can you fix it?
  16. 16. o Dev Foo Lead Dev (Karen) ding! Ignore. App Manager Hey did you see that ticket? Foo Lead Dev (Karen) sigh. I’ll take a look I’m go mor pm NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE Scrum Ticket Context Wagon
  17. 17. k Foo Lead Dev (Karen) I’m going to need more log files Ticket SysAdmin Team + add Update Ticket Chat “Can someone with access to Foo Service in Prod01 help me with ticket #42516?” SysAdmin (Lee) Ticket “logs attached” Foo Lead Dev (Karen) Ticket “no the other ones” Le (K NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Ticket Context Wagon
  18. 18. Foo Lead Dev (Karen) Logs -Who restarted these services? (and why?) -They didn’t use the correct environment variables! -This entire service pool needs to be restarted! Ticket Update Ticket NOC (Bob) Update Ticket Ticket Middleware Team + add “Middleware, please urgent restart this entire app pool with the correct environment variable” 2:00pm Ticket Context W
  19. 19. ase s entire e correct iable” NOC (Bob) Middleware Manager (Melissa) No way. It’s the middle of the day! You need business approval. NOC (Bob) Update Ticket Ticket SVP for Line of Business + add (S NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Middleware Manager NOC (B Biz Ma App M Lead D Foo SR Ticket Context Wagon Ticket Context Wagon 2:30pm
  20. 20. Update Ticket Ticket SVP for Line of Business + add SVP (Susan) Chief of Staff Tech VP Tech VP Update Ticket Ticket “Restart approved” Customer impact? Ticket Middlewa Manage (Melissa Wh prod 5:00pm NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Middleware Manager SVP Chief of Staff 2 x Tech VP Ticket Context Wagon
  21. 21. Share point proved” Ticket Middleware Manager (Melissa) Who knows these production services the best? Ellen! Middleware Middleware (Scott) Ellen to Europe office Middleware (Scott) Trial and error .doc 5:00pm NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Middleware Manager SVP Chief of Staff 2 x Tech VP Middleware (Scott) Ticket Context Wagon
  22. 22. Share point Middleware (Scott) Trial and error .doc NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Middleware Manager SVP Chief of Staff 2 x Tech VP Middleware (Scott) ket Context Wagon Middleware (Scott) Bar Service 10 min Middleware (Scott) Waiting for Acme Service Acme startup failed Bar Service 6:00pm
  23. 23. Come on.. no.no.no. What? Why? Middleware (Scott)
  24. 24. Come on.. no.no.no. What? Why? Middleware (Scott)
  25. 25. 8888888 Come on.. no.no.no. What? Why? Middleware (Scott)
  26. 26. -Bar app startup timed out. Error says can’t connect to Acme service. - I looked at Acme but it seems to be running -Is this error message correct? Why can’t Bar connect? Ticket Update Ticket Middleware (Scott) Bar SRE + add Bar SRE (Linda) Middleware (Scott) -URGENT: Network connection issue between Bar and Acme Ticket Update Ticket Network SRE Team + add 6:45 NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Middleware Manager SVP Chief of Staff 2 x Tech VP Middleware (Scott) Bar SRE (Linda)Ticket Context Wagon The new environment pre-flight check is preventing startup. Looks like Bar’s connection to Acme is being blocked.
  27. 27. Bar SRE (Linda) Middleware (Scott) -URGENT: Network connection issue between Bar and Acme Ticket Update Ticket Network SRE Team + add Bar Lead Dev 6:45pm ob) ager nager v (Karen) E SysAdmin (Lee) Middleware Manager SVP Chief of Staff 2 x Tech VP Middleware (Scott) Bar SRE (Linda) Customers are calling. What is going on?The new environment pre-flight check is preventing startup. Looks like Bar’s connection to Acme is being blocked. Bar Lead Dev (Liu) Business Managers I can comment out the test… But the CD pipeline only goes to QA ENV!
  28. 28. Network Dir (Carlos) Middleware (Scott) Carlos, I need a favor. Can you escalate?Middleware Manager (Melissa) Customers are calling. What is going on? Last week.. Net SRE VP VP Priority! Different Incident! Net SRE Net SRE Net SRE Its the network! Business Managers Your network is broken! Business Managers We are already working on it! Network VPs out he ly V!
  29. 29. Network SRE (Hari) The firewall is blocking the traffic You’ll have to take it up with the Firewall Team -URGENT: Firewall is blocking connection between Bar and Acme Ticket Open Firewall Ticket Firewall Team + add Firewall Engineer (Freddie) Middleware (Scott) Paging on-call… Open bridge… Can’t be the firewall, it hasn’t changed since last Thursday. No its the firewall. 8:00p NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Middleware Manager SVP Chief of Staff 2 x Tech VP Middleware (Scott) Bar SRE (Linda) Network PM (Carlos) Network SRE (Bob) Ticket Context Wagon
  30. 30. Firewall Engineer (Freddie) Middleware (Scott) Firewall Engineer (Freddie) Middleware (Scott) Can’t be the firewall, it hasn’t changed since last Thursday. No its the firewall. There was a rule change last Thursday that would stop Bar from talking to Acme. Can you change it back? Sure we make changes on Thursday… Chief of Staff SVP and VPs are livid… this was supposed to be a safe change!! Freddie, we’ve got customers calling. ES Em pro rul Update Firewall Ticket Firewall Engineer (Freddie) 8:00pm
  31. 31. d VPs are livid… this was sed to be a safe change!! we’ve got customers calling. ESCALATE: Emergency production firewall rule change review Ticket Update Firewall Ticket NetSec + add Firewall Engineer (Freddie) Paging on-call… NetSec (Nicole) This is production so I’ll have to get others on the Network CAB… Chief of Staff Firewall (Freddie) Middleware (Scott) Customer outage! … I’ll call SVP Susan Middleware Manager VP VP Bar Lead Dev 9:00pm NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdm Middle SVP Chief o 2 x Tec Ticket Context Wagon
  32. 32. I’ll have Network Chief of Staff Firewall (Freddie) Middleware (Scott) Customer outage! APPROVE: Emergency firewall rule change Ticket Update Firewall Ticket NetSec (Nicole) … I’ll call SVP Susan Middleware Manager VP VP Bar Lead Dev Firewall (Freddie) Net L2 (Bob) Middl (Sc Firewall change Restart Bar 9:30pm NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Middleware Manager SVP Chief of Staff 2 x Tech VP Middleware (Scott) Bar SRE (Linda) Network PM (Carlos) Network SRE (Bob) Firewall (Freddie) Ticket Context Wagon NetSec (Nicole)
  33. 33. Middleware (Scott) Update Ticket Ticket Customer Engagement Manager + add Policy !! “Ready for API tests” 9:45pm Firewall (Freddie) Net L2 (Bob) Middleware (Scott) Firewall change Restart Bar I think we are good! Middleware Manager VP VP Bar Lead Dev You “think?” pm
  34. 34. et gement “Ready for API tests” Customer Engagement Manager (Varsha) NOC (Bob) Customer Engagement Manager (Varsha) Update Ticket Ticket “APIs OK” Middleware (Scott) Upda Tick 11:00pm Ticket Co
  35. 35. e Ticket “APIs OK” Middleware (Scott) Update Ticket Ticket “Services restarted OK” NOC NOC Lights are green… I guess it is fixed. Close Ticket NOC (Bob) Zzz 11:30pm N NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Middleware Manager SVP Chief of Staff 2 x Tech VP Middleware (Scott) Bar SRE (Linda) Network PM (Carlos) Network SRE (Bob) Firewall (Freddie) Ticket Context Wagon NetSec (Nicole) Cust. Engmt. (Varsha)
  36. 36. e Ticket “APIs OK” Middleware (Scott) Update Ticket Ticket “Services restarted OK” NOC NOC Lights are green… I guess it is fixed. Close Ticket NOC (Bob) Zzz 11:30pm N NOC (Bob) Biz Manager App Manager Lead Dev (Karen) Foo SRE SysAdmin (Lee) Middleware Manager SVP Chief of Staff 2 x Tech VP Middleware (Scott) Bar SRE (Linda) Network PM (Carlos) Network SRE (Bob) Firewall (Freddie) Ticket Context Wagon NetSec (Nicole) Cust. Engmt. (Varsha) .
  37. 37. NOC Lights are green… I guess it is fixed. Close Ticket NOC (Bob) Zzz Next Day SVP (Susan) Whose fault is this?! Why are we so bad at change? What additional processes and approvals are you adding to never let this happen again?! VP VP Dir Dir VP Dir VP cott) a) Carlos) (Bob) die) NetSec (Nicole) Cust. Engmt. (Varsha)
  38. 38. Later…
  39. 39. We’ve invested in Cloud, Agile, DevOps, Containers… Why does everything still take too long and cost too much? Executive Team Our transformation has largely ignored Ops
  40. 40. Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting
  41. 41. Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Defects Defects Defects Defects Defects Waiting ! Defects
  42. 42. Manual / Motion Manual / Motion Manual / Motion Manual / Motion Manual / Motion Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Defects Defects Defects Defects Defects Waiting ! Defects ! Motion/Manual
  43. 43. Manual / Motion Manual / Motion Manual / Motion Manual / Motion Manual / Motion Task Switching Task Switching Task SwitchingTask Switching Task Switching Task Switching Task Switching Task Switching Task Switching Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Defects Defects Defects Defects Defects Waiting ! Defects ! Motion/Manual ! Task Switching
  44. 44. Manual / Motion Manual / Motion Manual / Motion Manual / Motion Manual / Motion Task Switching Task Switching Task SwitchingTask Switching Task Switching Task Switching Task Switching Task Switching Task Switching Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Defects Defects Defects Defects Defects Partially Done Partially Done Partially Done Partially Done Partially Done Waiting ! Defects ! Motion/Manual ! Task Switching ! Partially Done
  45. 45. Manual / Motion Manual / Motion Manual / Motion Manual / Motion Manual / Motion Task Switching Task Switching Task SwitchingTask Switching Task Switching Task Switching Task Switching Task Switching Task Switching Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Waiting Defects Defects Defects Defects Defects Partially Done Partially Done Partially Done Partially Done Partially Done Extra Process Extra Process Extra Process Extra Process Extra Process Extra Process Extra Process Extra Process Waiting ! Defects ! Motion/Manual ! Task Switching ! Partially Done ! Extra Process
  46. 46. “We need better tools”
  47. 47. “We need better tools” “We need more people”
  48. 48. “We need better tools” “We need more people” “We need more discipline and attention to detail”
  49. 49. “We need better tools” “We need more people” “We need more discipline and attention to detail” “We need more change reviews/approvals”
  50. 50. “We need better tools” “We need more people” “We need more discipline and attention to detail” “We need more change reviews/approvals”
  51. 51. Challenge the principles and practices we take for granted
  52. 52. 1 2 3 Feedback loops
  53. 53. Where are decisions made? Who can take action? escalate 1° 2° 3° 4° escalate escalateor
  54. 54. All work is contextual John Allspaw DevOps Enterprise Summit San Francisco 2017
  55. 55. All work is contextual rm -rf $PATHNAME John Allspaw DevOps Enterprise Summit San Francisco 2017
  56. 56. All work is contextual rm -rf $PATHNAME Is this dangerous? John Allspaw DevOps Enterprise Summit San Francisco 2017
  57. 57. All work is contextual rm -rf $PATHNAME John Allspaw DevOps Enterprise Summit San Francisco 2017
  58. 58. All work is contextual rm -rf $PATHNAME John Allspaw DevOps Enterprise Summit San Francisco 2017
  59. 59. All work is contextual rm -rf $PATHNAME Is this dangerous? John Allspaw DevOps Enterprise Summit San Francisco 2017
  60. 60. All work is contextual rm -rf $PATHNAME John Allspaw DevOps Enterprise Summit San Francisco 2017
  61. 61. All work is contextual rm -rf $PATHNAME Answer is always “it depends” John Allspaw DevOps Enterprise Summit San Francisco 2017
  62. 62. escalate 1° 2° 3° 4° escalate escalateor Context Where are decisions made? Who can take action? They have all of the context But, decisions are made here?
  63. 63. “Shift Left” the ability to take action Push the ability to take action this direction escalate 1° 2° 3° 4° escalate escalateor
  64. 64. Not Here!
  65. 65. Not Here!
  66. 66. Not Here!
  67. 67. Here?
  68. 68. Even Better!
  69. 69. What gets in the way? Silos Queues Toil Backlog Information I need X PrioritiesTools Backlog I do X Requests for X Silo A Information Priorities Silo B Tools ?? Silo A Silo B Ticket Queue
  70. 70. Silos Backlog Information PrioritiesTools
  71. 71. Backlog Information I need X PrioritiesTools Silos
  72. 72. Backlog Information I need X PrioritiesTools Silos Backlog I do X Requests for X Silo A Information Priorities Silo B Tools
  73. 73. Silos cause disconnects and mismatches Backlog Information I need X PrioritiesTools Backlog I do X Requests for X Silo A Information Priorities Silo B Tools Context Context Process Process Tooling Tooling Capacity Capacity
  74. 74. Function A Function B Function C Becomes siloed labor pools of functional specialists Requests fulfilled by semi- manual or manual effort Primary management focus is on protecting team capacity
  75. 75. How do we cover for our silos disconnects and mismatches? Silo A Silo B
  76. 76. How do we cover for our silos disconnects and mismatches? Silo A Silo B Ticket Queue
  77. 77. ?? Silo A Silo B We all know how well that works Ticket Queue
  78. 78. Ticket queues are an expensive way to manage work Ticket Queue Queues Create… Longer Cycle Time Increased Risk More Variability More Overhead Lower Quality Less Motivation Adapted from Donald G. Reinertsen, The Principles of Product Development Flow: Second Generation Lean Product Development
  79. 79. Tickets queues become “snowflake makers” ?? Silo A Silo B Ticket Queue
  80. 80. Tickets queues become “snowflake makers” ?? Silo A Silo B Ticket Queue Snowflakes (each unique, technically acceptable but unreproducible and brittle)
  81. 81. Silos + Ticket Queues = Locked into a broken system Unreproducible Snowflakes More Outages Longer Lead Times / Unpredictibility Siloed Labor Pools Ticket Queues Diminished Labor Capacity Brittle Environments Errors Error-Prone ManualTasks Interrupt project work Information Mismatches / Miscommunication Silo-Specific Optimization Handoffs With Capacity, Pace, Priority Mismatches “Managing to the Queue” Working out of Context “Align by Functional Capability” Management Strategy
  82. 82. Excessive toil prevents fixing the system
  83. 83. Excessive toil prevents fixing the system “Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” -Vivek Rau Google
  84. 84. Excessive Toil prevents fixing the system Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  85. 85. Excessive Toil prevents fixing the system Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”)
  86. 86. So what can we do differently?
  87. 87. Obvious: Get rid of as many silos as possible Old Silo A Old Silo B Old Silo C Old Silo D
  88. 88. Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Obvious: Get rid of as many silos as possible “Horizontal” shared responsibility is key feature!
  89. 89. But what about the cross-cutting concerns? Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Specialist Capabilities Specialist Capabilities Specialist Capabilities
  90. 90. But what about the cross-cutting concerns? Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Specialist Capabilities Specialist Capabilities Specialist Capabilities Ticket Queue Ticket Queue Ticket Queue
  91. 91. But what about the cross-cutting concerns? Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Specialist Capabilities Specialist Capabilities Specialist Capabilities Ticket Queue Ticket Queue Ticket Queue Ticket Queue Ticket Queue Ticket Queue
  92. 92. Operations as a Service: Turn handoffs into self-service Operations as a Service On Demand On Demand On Demand On Demand Ops (embedded)Cross-Functional Product Team 1 Cross-Functional Product Team n Ops (embedded) Ops (builds & operates) Cross-Functional Product Team 2 Ops (embedded) Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist
  93. 93. Development Team 1 Development Team 2 Development Team n Ops/SRE Team Operations as a Service On Demand On Demand On Demand On Demand Ops (builds & operates) Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Operations as a Service: Works with any org model
  94. 94. Use tickets only for what they are good for Ticket System
  95. 95. Use tickets only for what they are good for 1.Documenting true problems/issues/exceptions Ticket System
  96. 96. Use tickets only for what they are good for 1.Documenting true problems/issues/exceptions 2.Routing for necessary approvals Ticket System
  97. 97. Use tickets only for what they are good for 1.Documenting true problems/issues/exceptions 2.Routing for necessary approvals Not as a general purpose work management system! Ticket System
  98. 98. Security or compliance “in the way”? Operations as a Service On Demand On Demand On Demand On Demand Ops (embedded)Cross-Functional Product Team 1 Cross-Functional Product Team n Ops (embedded) Ops (builds & operates) Cross-Functional Product Team 2 Ops (embedded) Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Build-in Security Here Build-in Compliance Here
  99. 99. Reduce Toil
  100. 100. Reduce Toil 1. Track toil levels for each team
  101. 101. Reduce Toil 1. Track toil levels for each team 2. Set toil limits for each team
  102. 102. Reduce Toil 1. Track toil levels for each team 2. Set toil limits for each team 3. Fund efforts to reduce toil (with emphasis on teams over toil limits)
  103. 103. Reduce Toil 1. Track toil levels for each team 2. Set toil limits for each team 3. Fund efforts to reduce toil (with emphasis on teams over toil limits) Bonus: Use Service Level Objectives, Error Budgets, and other lessons from SRE
  104. 104. Recap Don’t forget about Ops. Challenge conventional wisdom. Team A (Dev) Team B (Ops) Ticket System ?? Leverage the Operations as a Service design pattern “Shift-Left” control and decision making. Understand the cost of silos and ticket-driven request queues Old Silo A Old Silo B Old Silo C Old Silo D Cross-Functional Team 1 Cross-Functional Team 2 Cross-Functional Team n Focus on removing silos and queues Operations as a Service On Demand On Demand On Demand On Demand Ops (embedded)Cross-Functional Product Team 1 Cross-Functional Product Team n Ops (embedded) Ops (builds & operates) Cross-Functional Product Team 2 Ops (embedded) Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Ops Capability SRE, Dev, or Specialist Learn from SRE: Reduce toil to create capacity to change Toil Engineering Work E.W.Toil Reduce toil Improve the business ǡ No capacity to reduce toil No capacity to improve business Toil at manageable percentage of capacity Toil at unmanageable percentage of capacity (“Engineering Bankruptcy”
  105. 105. Let’s talk… @damonedwards damon@rundeck.com https://www.rundeck.com/oaas Dive Deeper Into Operations as a Service:

×