Blameless system design - annotated

Director of IT Operations at Vast.com
May. 4, 2016
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
Blameless system design  - annotated
1 of 20

More Related Content

What's hot

Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...
Intersection18: When a Framework Meets a Roadmap, New Vistas Open - Curtis Mi...Intersection Conference
Why usability problems go unfixed - UX Bristol 2012Why usability problems go unfixed - UX Bristol 2012
Why usability problems go unfixed - UX Bristol 2012Francis Rowland
Rethinking enterprise software - Codemotion 2014Rethinking enterprise software - Codemotion 2014
Rethinking enterprise software - Codemotion 2014Alberto Brandolini
Empowering Agile Self-Organized Teams With Design ThinkingEmpowering Agile Self-Organized Teams With Design Thinking
Empowering Agile Self-Organized Teams With Design ThinkingWilliam Evans
The sweet spotThe sweet spot
The sweet spotAlberto Brandolini
L'illusione dell'ortogonalitàL'illusione dell'ortogonalità
L'illusione dell'ortogonalitàAlberto Brandolini

Similar to Blameless system design - annotated

People are more complex than computers - Mairead O'Connor Equal ExpertsPeople are more complex than computers - Mairead O'Connor Equal Experts
People are more complex than computers - Mairead O'Connor Equal ExpertsMairead O'Connor
Design Thinking talkDesign Thinking talk
Design Thinking talkGlyn Britton
50.000 orange stickies later50.000 orange stickies later
50.000 orange stickies laterAlberto Brandolini
It's Okay to be Wrong (Accelerator Academy Oct '17)It's Okay to be Wrong (Accelerator Academy Oct '17)
It's Okay to be Wrong (Accelerator Academy Oct '17)Matt Mower
From agile projects to agile organizations From agile projects to agile organizations
From agile projects to agile organizations maggie2morgan
Narrated Version Dallas MPUGNarrated Version Dallas MPUG
Narrated Version Dallas MPUGGlen Alleman

Recently uploaded

CoinEZ_whitepaper.pdfCoinEZ_whitepaper.pdf
CoinEZ_whitepaper.pdfKentaAratani
Improve Employee Experiences on Cisco RoomOS Devices, Webex, and Microsoft Te...Improve Employee Experiences on Cisco RoomOS Devices, Webex, and Microsoft Te...
Improve Employee Experiences on Cisco RoomOS Devices, Webex, and Microsoft Te...ThousandEyes
Roottoo Innovation V24_CP.pdfRoottoo Innovation V24_CP.pdf
Roottoo Innovation V24_CP.pdfroottooinnovation
AMAZON-RESUME.pdfAMAZON-RESUME.pdf
AMAZON-RESUME.pdfRegineRaneses
class and object in c++.pptxclass and object in c++.pptx
class and object in c++.pptxAdarsh College, Hingoli
Inclusivity and AI: opportunity or threatInclusivity and AI: opportunity or threat
Inclusivity and AI: opportunity or threatAlan Dix

Blameless system design - annotated

Editor's Notes

  1. • ❑ name • ❑ title • ❑ company • ❑ about talk
  2. Intro: name, occupation Broke ALL OF auth Broke syslog by.. using it Broken all chef runs innumerable times Broke FE by turning back up some old nodes not properly decommissioned Broke our ambassador setup with some bad template logic https://i.ytimg.com/vi/GTkcjjt2TBY/maxresdefault.jpg
  3. I ship 90% code which sometimes makes it into production I hide a LOT of things behind config management that shouldn't be handled at that level I decided to deploy our private cloud with no shared storage I decided to attack service discovery with chef vs making devs register applications Sometimes we make decisions we know are mistakes in the name of moving forward. http://paragondsi.com/wp-content/uploads/2015/06/office-space.jpg
  4. What were people thinking??? Why are they leaving all this technical debt behind??
  5. we all constantly talking about and trying to quantify technical debt Application Debt – Debt that resides in the software package Infrastructure Debt – Debt that resides in the operating environments Architecture Debt – Debt that resides in the design of the entire system measuring technical debt size of code base code coverage coupling and cohesion reports cyclomatic complexity Halstead complexity measures https://upload.wikimedia.org/wikipedia/commons/thumb/c/c7/William_Hogarth_018.jpg/1239px-William_Hogarth_018.jpg
  6. Rather, there is ONLY technical debt - Kellan Elliott-McCrea Former CTO of Etsy - towards-an-understanding-of-technical-debt: "Technical debt is the choices we made in our code, intentionally, to speed up development today, knowing we’d have to change them later. " things ascribed to technical debt are just facets of creating software: maintenance, change in understanding, instead of treating it like an exception, we should just embrace it http://cattype.deviantart.com/art/Tsunami-Relief-Fund-216541678
  7. No one *wants* not to do their job well. We’ve all had to make trade offs to balance priorities Fast, cheap, good - the only people who can beat the good, fast, cheap triangle can't even be running a business As Erik Hollnagel stated, "The ETTO [ Efficiency-Thoroughness Trade-Off ] fallacy is that people are required to be both efficient and thorough at the same time – or rather to be thorough when with hindsight it was wrong to be efficient!" The more complex a system the higher likelihood of failure Shouldn't we stop blaming people for making the tradeoffs they're forced to make? https://www.flickr.com/photos/cafuego/12575046354
  8. etsy has done a great job bringing 'just culture' to postmortems, but that can be expanded beyond the scope of issues There are trade-offs in EVERY system design Restorative vs punative model If we remove fear we will have a more honest conversation about those tradeoffs if we're honest about those tradeoffs crisis might be averted all together If we understand our history, we won't be destined to repeat it https://upload.wikimedia.org/wikipedia/commons/8/8c/Tumbeasts_servers.png https://upload.wikimedia.org/wikipedia/commons/4/49/Smurf_Zombies_-_Flickr_-_SoulStealer.co.uk.jpg
  9. blameless system design is a beech
  10. Most people aren’t trying to bring about computergeddon. Bring empathy to the table when you’re discussing someone’s design. Has tooling improved? Did that shiny OSS project that will fix all of this ‘mess’ even exist in a production ready state when this was implemented? What logic might have lead to this design choice? Put yourself in their shoes. https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Goodwill_Industries_Logo.svg/341px-Goodwill_Industries_Logo.svg.png
  11. while not error focused it's important to have a constructive framework in place when there are problems ensures balanced accountability for both individuals and the organization analyses errors, not judges people removes fear from the process encourages people to improve the system instead of seeks retribution https://upload.wikimedia.org/wikipedia/commons/a/af/Aachen_Allegory.jpg
  12. You might be sitting next to the person who had to make the tough call you’re critiquing. Someday, that person might be you Reject 'contempt culture' and the trading of condescension for prestige try to understand how someone might have arrived at their self-taught narrative and how that might have shaped decisions focus on the good qualities of a design and see if those can be extended or applied other places https://upload.wikimedia.org/wikipedia/commons/8/85/Mother's_love.jpg
  13. No system lives in isolation Without experiments, we have no way to qualify our assumptions about those interactions. Measure Measure Measure and record! We deal with complex system interactions that can cause some very unexpected behavior. Record metrics at every step with every change to qualify your work design your experiments, don’t be a victim of them. https://upload.wikimedia.org/wikipedia/commons/e/e7/Atomic_Laboratory_Experiment_on_Atomic_Materials_-_GPN-2000-000663.jpg
  14. Publish all your experimentation results whether they bore fruit or not Document your decisions somewhere so future reviewers will understand them. Save future reviewers / architects some time by being explicit about issues you came across and how you addressed them. Be honest about trade-offs, this is not the place to be shy about the skeletons in the closet track mitigation responses, at least in a backlog, so they don't get buried over time to later re-emerge from their graves https://www.flickr.com/photos/rosengrant/3929869118
  15. broadcasts cultural expectations throughout the organization reinforce our organization with respect or a sense of achievement provide easy to find and access information about all systems open up meetings and discussions to anyone who wants to participate, they just might provide unexpected insight establish both positive and negative feedback channels https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Communication_shannon-weaver2.svg/2000px-Communication_shannon-weaver2.svg.png
  16. if some of this sounds familiar, it's because it is blameless system design includes many of the skills of the devops movement We've got the CMS in CAMS Culture Measurement Sharing creates feedback loops http://www.bouwkennisblog.nl/wp-content/uploads/2014/04/luisteren.jpg
  17. hard to change retribution culture and the RCA mentality hard to get over hindsight bias It's a lot of work! championing efforts encouraging openness defining what is broadcast everyone will need to get over their impostor syndrome and / or contempt cultures the organization must be willing to accept risk risk from new system design and complexity risk from choosing to leave old systems in place risk from updating old systems once risk has caused failure, organizations must be willing to try restorative measures (and not break trust) organizations must be willing to be honest and frank about both the good and the bad aspects of their systems https://pixabay.com/static/uploads/photo/2013/07/13/10/32/bad-157437_960_720.png
  18. Why do this? removes fear as an obstacle to innovation encourages people to take risks, which could lead to differentiation as a business creates good feedback loops to increase iterations creates good data to prevent 'retracing each other's steps' improves the working environment and relationships https://pixabay.com/static/uploads/photo/2013/07/13/10/32/good-157436_960_720.png
  19. https://upload.wikimedia.org/wikipedia/commons/c/c8/Thank_you_001.jpg