Operating Human Systems:
MTBF v. MTTR
Aaron Aldrich (@crayzeigh) — (@elastic)
1
Mean Time Between Failures
Mean Time To Recovery
Aaron Aldrich - @crayzeigh 2
complex systems
Aaron Aldrich - @crayzeigh 3
we design for reliability:
4 stiff boundaries, layers, formalisms
4 defense in depth
4 redundancy
4 interference protection
4 assurance & accountability
Aaron Aldrich - @crayzeigh 4
we want resilience:
4 withstand transients
4 recover swiftly & smoothly from failures
4 prioritize to serve high level goals
4 recognize & respond to abnormal situations
4 adapt to change
Aaron Aldrich - @crayzeigh 5
failure aversion leads to
kludge and tech debt
relational debt is a real
thing
Aaron Aldrich - @crayzeigh 6
unpaid debt leads to
catastrophic failure
Aaron Aldrich - @crayzeigh 7
there is no root cause to failure in
{ complex systems | relationships }
Aaron Aldrich - @crayzeigh 8
blamelessness (just culture) is required
for improvement
Aaron Aldrich - @crayzeigh 9
experience with failure is
necessary
Aaron Aldrich - @crayzeigh 10
if we do not experience
failure, we are not living to
our potential
Aaron Aldrich - @crayzeigh 11
lessons from networking:
TCP > UDP
avoid feelings of not being
heard
Aaron Aldrich - @crayzeigh 12
"What you're saying is 'X',
what I'm hearing is 'Y'."
Aaron Aldrich - @crayzeigh 13
lessons from distributed
systems:
translations are hard
Aaron Aldrich - @crayzeigh 14
assume good intent
Aaron Aldrich - @crayzeigh 15
/zoom
Aaron Aldrich - @crayzeigh 16
tooling:
Non-Violent Communication
Observation, Feeling, Needs, Requests
Aaron Aldrich - @crayzeigh 17
NVC Framework
1. Observation: !== evaluation || judgement
2. Feelings: !== thinking, [SASHET]
3. Needs: connection, well-being, honestly, play, peace,
autonomy, meaning
4. Requests: what we DO want, != demand
Aaron Aldrich - @crayzeigh 18
NVC Framework
1. Observation: !== evaluation || judgement
2. Feelings: !== thinking, [SASHET]
3. Needs: connection, well-being, honestly, play, peace,
autonomy, meaning
4. Requests: what we DO want, != demand
Aaron Aldrich - @crayzeigh 19
2: Richard Cook - "How Complex Systems Fail" (http://bit.ly/2mKO8UL)(pdf)
3-4: Velocity 2012: Richard Cook - "How Complex Systems Fail"
(https://youtu.be/2S0k12uZR14)
9: Philip G Boysen, II, MD, MBA, FACP, FCCP, FCCM - "Just Culture: A Foundation for
Balanced Accountability and Patient Safety" (http://bit.ly/2DgJM1Z)
13: Certified Fresh Events: "Oh No You Didn't: Conflict Management in Today's Tech
Industry" (https://certifiedfreshevents.com/events/conflict-management/)
17: The Center for Non-Violent Communication (https://www.cnvc.org/)
thanks!
Aaron Aldrich - @crayzeigh 20

2018-01 DevOpsDays NYC: Operating Human Systems: MTBF v. MTTR

  • 1.
    Operating Human Systems: MTBFv. MTTR Aaron Aldrich (@crayzeigh) — (@elastic) 1
  • 2.
    Mean Time BetweenFailures Mean Time To Recovery Aaron Aldrich - @crayzeigh 2
  • 3.
  • 4.
    we design forreliability: 4 stiff boundaries, layers, formalisms 4 defense in depth 4 redundancy 4 interference protection 4 assurance & accountability Aaron Aldrich - @crayzeigh 4
  • 5.
    we want resilience: 4withstand transients 4 recover swiftly & smoothly from failures 4 prioritize to serve high level goals 4 recognize & respond to abnormal situations 4 adapt to change Aaron Aldrich - @crayzeigh 5
  • 6.
    failure aversion leadsto kludge and tech debt relational debt is a real thing Aaron Aldrich - @crayzeigh 6
  • 7.
    unpaid debt leadsto catastrophic failure Aaron Aldrich - @crayzeigh 7
  • 8.
    there is noroot cause to failure in { complex systems | relationships } Aaron Aldrich - @crayzeigh 8
  • 9.
    blamelessness (just culture)is required for improvement Aaron Aldrich - @crayzeigh 9
  • 10.
    experience with failureis necessary Aaron Aldrich - @crayzeigh 10
  • 11.
    if we donot experience failure, we are not living to our potential Aaron Aldrich - @crayzeigh 11
  • 12.
    lessons from networking: TCP> UDP avoid feelings of not being heard Aaron Aldrich - @crayzeigh 12
  • 13.
    "What you're sayingis 'X', what I'm hearing is 'Y'." Aaron Aldrich - @crayzeigh 13
  • 14.
    lessons from distributed systems: translationsare hard Aaron Aldrich - @crayzeigh 14
  • 15.
    assume good intent AaronAldrich - @crayzeigh 15
  • 16.
    /zoom Aaron Aldrich -@crayzeigh 16
  • 17.
    tooling: Non-Violent Communication Observation, Feeling,Needs, Requests Aaron Aldrich - @crayzeigh 17
  • 18.
    NVC Framework 1. Observation:!== evaluation || judgement 2. Feelings: !== thinking, [SASHET] 3. Needs: connection, well-being, honestly, play, peace, autonomy, meaning 4. Requests: what we DO want, != demand Aaron Aldrich - @crayzeigh 18
  • 19.
    NVC Framework 1. Observation:!== evaluation || judgement 2. Feelings: !== thinking, [SASHET] 3. Needs: connection, well-being, honestly, play, peace, autonomy, meaning 4. Requests: what we DO want, != demand Aaron Aldrich - @crayzeigh 19
  • 20.
    2: Richard Cook- "How Complex Systems Fail" (http://bit.ly/2mKO8UL)(pdf) 3-4: Velocity 2012: Richard Cook - "How Complex Systems Fail" (https://youtu.be/2S0k12uZR14) 9: Philip G Boysen, II, MD, MBA, FACP, FCCP, FCCM - "Just Culture: A Foundation for Balanced Accountability and Patient Safety" (http://bit.ly/2DgJM1Z) 13: Certified Fresh Events: "Oh No You Didn't: Conflict Management in Today's Tech Industry" (https://certifiedfreshevents.com/events/conflict-management/) 17: The Center for Non-Violent Communication (https://www.cnvc.org/) thanks! Aaron Aldrich - @crayzeigh 20