Mistakes were made
                             Selena Deckelmann
                         selena@primeradiant.com
                         Twitter/IRC: @selenamarie
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Failure
LC
 A
 20
   12
“Prevention”
         “Risk management”
          “Risk mitigation”
           “MTBF, MTTR”
        “Success Engineering”
LC
 A
 20
   12
Plan for the worst.
        Minimize risk.
        Fail.
        Recover, gracefully.
LC
 A
 20
   12
“We don’t need a risk
      management plan,” he
      emphatically stated, “because this
      project can’t be allowed to fail.”
                                                   - Jim Hightower,
     http://jimhighsmith.com/2012/01/09/can-do-thinking-makes-risk-
                                           management-impossible/
LC
 A
 20
   12
x
           2
       210
        01
       E
  CAAL
SLC
Failure is an option.
LC
 A
 20
   12
SCIENCE
LC
 A
 20
   12
Dr. Jerker Denrell 
LC
 A
 20
   12
x
           2
       210
        01
       E
  CAAL
SLC
x
           2
       210
        01
       E
  CAAL
SLC
x
           2
       210
        01
       E
  CAAL
SLC
"I think getting two accidents
        of this type at the same time
            is a freak occurrence."
             -David Cunliffe, NZ Communications Minister
LC
 A
 20
   12
x
           2
       210
        01
       E
  CAAL
SLC
“Further damage was incurred
            on Tuesday afternoon and our
            engineers returned to repair
            the damage,” said Virgin Media.
SLC
  CAAL
     01E
       2
     10
        2
        x
Plan for when things fail.
LC
 A
 20
   12
x
           2
       210
        01
       E
  CAAL
SLC
x
           2
       210
        01
       E
  CAAL
SLC
Tales of failure to...
                      Document
                      Test
                      Verify
                      Imagine
                      Implement
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Failure to document.
SLC
  CAAL
     01E
       2
     10
        2
        x
Moving Day




                    Thanks, David Prior!
SLC
  CAAL
     01E
       2
     10
        2
        x
Prevent documentation
                             failures.
                      • Write documentation.
                      • Update documentation.
                      • Make documenting a step in your written
                        process.
                      • Assign a fixed amount of time to that step.
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Documentation tools

                      • Graphic designers. (Pretty wikis. Pretty
                        docs. (Sphinx?) Diagrams.)
                      • Timelines.
                      • Bug tracking.
                      • Ordered todo lists.
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Failure to test.
SLC
  CAAL
     01E
       2
     10
        2
        x
“My first day posing as a sysadmin
        (~1990, no previous training....) I
        deleted all zero length files on a Sun
        workstation.”
LC
 A
 20
   12
Prevent testing failures.

                      • Verify success criteria.
                      • Write tests.
                      • Test with a buddy.
                      • Have a plan.
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Testing tools

                      • Your favorite test framework
                      • Repeatable shell scripts
                      • Staging environments
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Failure to verify.
SLC
  CAAL
     01E
       2
     10
        2
        x
“What does ‘-d’ actually do?”
LC
 A
 20
   12
Prevent verification
                              failures.

                      • Have a plan for things going wrong.
                      • Have a staging environment.
                      • Test your rollback plan, not just your
                        implementation plan.
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Verification tools


                      • Staging environments
                      • Your buddy
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Failure to imagine.
LC
 A
 20
   12
For my group the
          bottom line was
        "don't trust anyone".

                     Thanks, Maggie!
LC
 A
 20
   12
Recover from failures
                          to imagine.
                      • Share your stories of failure.
                      • Talk with people who are different from
                        you.
                      • Act out implementation scenarios.
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Failure to implement.
LC
 A
 20
   12
Re-implement.


                      • Learn from mistakes.
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Reflection.
        (or, the Post-Mortem)
LC
 A
 20
   12
Before

                      • Plan to do a post-mortem.
                      • Document the plan with numbered steps
                        and a timeline.
                      • Test the plan and the rollback plan.
                      • Identify a “point of no return”.
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
During

                      • Screen sharing: UNIX screen,VNC, etc.
                      • Chatroom: IRC, AIM, Campfire (scrollback!)
                      • Voice: Campfire, Skype,VOIP, POTS call line
                      • Headsets!
                      • Designated time-keeper.
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
After

                      • Documentation updates
                      • Post-mortem to identify areas of success
                        and areas for improvement.
                      • Limit improvements to 1-2 things.
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Plan for the worst.
        Minimize risk.
        Fail.
        Recover, gracefully.
LC
 A
 20
   12
Thanks!                  xn
                        0e
                       1r2
                            ce
                   ne
                   2 f1
                   E0
                 Ao
                CL
              CA
             SeC
            mL
          So
Mistakes were made
                             Selena Deckelmann
                         selena@primeradiant.com
                         Twitter/IRC: @selenamarie
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c
Photo credits


                      • Flickr: sheepguardingllama
So
  mL
   SeC
    CA
      CL
       Ao
         E0
         2 f1
         ne
             1r2
              0e
               xn
                e c

Mistakes were made - LCA 2012

  • 1.
    Mistakes were made Selena Deckelmann selena@primeradiant.com Twitter/IRC: @selenamarie So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 2.
  • 3.
    “Prevention” “Risk management” “Risk mitigation” “MTBF, MTTR” “Success Engineering” LC A 20 12
  • 4.
    Plan for theworst. Minimize risk. Fail. Recover, gracefully. LC A 20 12
  • 5.
    “We don’t needa risk management plan,” he emphatically stated, “because this project can’t be allowed to fail.” - Jim Hightower, http://jimhighsmith.com/2012/01/09/can-do-thinking-makes-risk- management-impossible/ LC A 20 12
  • 6.
    x 2 210 01 E CAAL SLC
  • 7.
    Failure is anoption. LC A 20 12
  • 8.
  • 9.
  • 10.
    x 2 210 01 E CAAL SLC
  • 11.
    x 2 210 01 E CAAL SLC
  • 12.
    x 2 210 01 E CAAL SLC
  • 13.
    "I think gettingtwo accidents of this type at the same time is a freak occurrence." -David Cunliffe, NZ Communications Minister LC A 20 12
  • 14.
    x 2 210 01 E CAAL SLC
  • 15.
    “Further damage wasincurred on Tuesday afternoon and our engineers returned to repair the damage,” said Virgin Media. SLC CAAL 01E 2 10 2 x
  • 16.
    Plan for whenthings fail. LC A 20 12
  • 17.
    x 2 210 01 E CAAL SLC
  • 18.
    x 2 210 01 E CAAL SLC
  • 19.
    Tales of failureto... Document Test Verify Imagine Implement So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 20.
    Failure to document. SLC CAAL 01E 2 10 2 x
  • 21.
    Moving Day Thanks, David Prior! SLC CAAL 01E 2 10 2 x
  • 22.
    Prevent documentation failures. • Write documentation. • Update documentation. • Make documenting a step in your written process. • Assign a fixed amount of time to that step. So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 23.
    Documentation tools • Graphic designers. (Pretty wikis. Pretty docs. (Sphinx?) Diagrams.) • Timelines. • Bug tracking. • Ordered todo lists. So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 24.
    Failure to test. SLC CAAL 01E 2 10 2 x
  • 25.
    “My first dayposing as a sysadmin (~1990, no previous training....) I deleted all zero length files on a Sun workstation.” LC A 20 12
  • 26.
    Prevent testing failures. • Verify success criteria. • Write tests. • Test with a buddy. • Have a plan. So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 27.
    Testing tools • Your favorite test framework • Repeatable shell scripts • Staging environments So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 28.
    Failure to verify. SLC CAAL 01E 2 10 2 x
  • 29.
    “What does ‘-d’actually do?” LC A 20 12
  • 30.
    Prevent verification failures. • Have a plan for things going wrong. • Have a staging environment. • Test your rollback plan, not just your implementation plan. So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 31.
    Verification tools • Staging environments • Your buddy So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 32.
  • 33.
    For my groupthe bottom line was "don't trust anyone". Thanks, Maggie! LC A 20 12
  • 34.
    Recover from failures to imagine. • Share your stories of failure. • Talk with people who are different from you. • Act out implementation scenarios. So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 35.
  • 36.
    Re-implement. • Learn from mistakes. So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 37.
    Reflection. (or, the Post-Mortem) LC A 20 12
  • 38.
    Before • Plan to do a post-mortem. • Document the plan with numbered steps and a timeline. • Test the plan and the rollback plan. • Identify a “point of no return”. So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 39.
    During • Screen sharing: UNIX screen,VNC, etc. • Chatroom: IRC, AIM, Campfire (scrollback!) • Voice: Campfire, Skype,VOIP, POTS call line • Headsets! • Designated time-keeper. So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 40.
    After • Documentation updates • Post-mortem to identify areas of success and areas for improvement. • Limit improvements to 1-2 things. So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 41.
    Plan for theworst. Minimize risk. Fail. Recover, gracefully. LC A 20 12
  • 42.
    Thanks! xn 0e 1r2 ce ne 2 f1 E0 Ao CL CA SeC mL So
  • 43.
    Mistakes were made Selena Deckelmann selena@primeradiant.com Twitter/IRC: @selenamarie So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c
  • 44.
    Photo credits • Flickr: sheepguardingllama So mL SeC CA CL Ao E0 2 f1 ne 1r2 0e xn e c