Mistakes were made
                           Selena Deckelmann
                             @selenamarie
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
This goes out to all the
                            sysadmins.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
You can never think
                      about failure too much.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
c   e
                11
               en
             20
            fer
       O on
         N
     eCC
 O mS
So
Some goals around
                         pessimism.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Plan for the worst.
                                        c   e
                                      11
                                     en
                                   20
                                  fer
                             O on
                               N
                           eCC
                       O mS
                      So
Minimize risk.
                                   c   e
                                 11
                                en
                              20
                             fer
                        O on
                          N
                      eCC
                  O mS
                 So
Recover, gracefully.
                                         c   e
                                       11
                                      en
                                    20
                                   fer
                              O on
                                N
                            eCC
                        O mS
                       So
Plan for the worst.
                         Minimize risk.
                      Recover, gracefully.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
c   e
                11
               en
             20
            fer
       O on
         N
     eCC
 O mS
So
c   e
                11
               en
             20
            fer
       O on
         N
     eCC
 O mS
So
Tales of failure to...
                      Document
                      Test
                      Verify
                      Imagine
                      Implement
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Failure to document.
                                         c   e
                                       11
                                      en
                                    20
                                   fer
                              O on
                                N
                            eCC
                        O mS
                       So
Moving Day




                              Thanks, David Prior!
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Prevent documentation
                             failures.
                      • Write documentation.
                      • Update documentation.
                      • Make documenting a step in your written
                        process.
                      • Assign a fixed amount of time to that step.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Documentation tools

                      • Graphic designers. (Pretty wikis. Pretty
                        docs. (Sphinx?) Diagrams.)
                      • Timelines.
                      • Bug tracking.
                      • Ordered todo lists.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Failure to test.
                                     c   e
                                   11
                                  en
                                20
                               fer
                          O on
                            N
                        eCC
                    O mS
                   So
“My first day posing as a sysadmin
                  (~1990, no previous training....) I
                  deleted all zero length files on a Sun
                  workstation.”
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Prevent testing failures.

                      • Verify success criteria.
                      • Write tests.
                      • Test with a buddy.
                      • Have a plan.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Testing tools

                      • Your favorite test framework
                      • Repeatable shell scripts
                      • Staging environments
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Failure to verify.
                                       c   e
                                     11
                                    en
                                  20
                                 fer
                            O on
                              N
                          eCC
                      O mS
                     So
“What does ‘-d’ actually do?”
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Prevent verification
                              failures.

                      • Have a plan for things going wrong.
                      • Have a staging environment.
                      • Test your rollback plan, not just your
                        implementation plan.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Verification tools


                      • Staging environments
                      • Your buddy
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Failure to imagine.
                                        c   e
                                      11
                                     en
                                   20
                                  fer
                             O on
                               N
                           eCC
                       O mS
                      So
For my group the
                        bottom line was
                      "don't trust anyone".
So




                                   Thanks, Maggie!
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Recover from failures
                          to imagine.
                      • Share your stories of failure.
                      • Talk with people who are different from
                        you.
                      • Act out implementation scenarios.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Failure to implement.
                                          c   e
                                        11
                                       en
                                     20
                                    fer
                               O on
                                 N
                             eCC
                         O mS
                        So
Re-implement.


                      • Learn from mistakes.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Reflection.
                      (or, the Post-Mortem)
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Before

                      • Document the plan with numbered steps
                        and a timeline.
                      • Test the plan and the rollback plan.
                      • Identify a “point of no return”.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
During

                      • Screen sharing: UNIX screen,VNC, etc.
                      • Chatroom: AIM, Campfire (scrollback!)
                      • Voice: Campfire, Skype,VOIP, POTS call line
                      • Headsets!
                      • Designated time-keeper.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
After

                      • Documentation updates
                      • Post-mortems to identify areas of success
                        and areas for improvement.
                      • Limit improvements to 1-2 things.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Plan for the worst.
                         Minimize risk.
                      Recover, gracefully.
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c
Thanks!                20
                      fer
                          11
                         en
                            c   e
                 O on
                   N
               eCC
           O mS
          So
Photo credits


                      • Flickr: sheepguardingllama
So
 O mS
     eCC
       O on
         N
            fer
             20
               en
                11
                e c

Mistakes were made

  • 1.
    Mistakes were made Selena Deckelmann @selenamarie So O mS eCC O on N fer 20 en 11 e c
  • 2.
    This goes outto all the sysadmins. So O mS eCC O on N fer 20 en 11 e c
  • 3.
    You can neverthink about failure too much. So O mS eCC O on N fer 20 en 11 e c
  • 4.
    c e 11 en 20 fer O on N eCC O mS So
  • 5.
    Some goals around pessimism. So O mS eCC O on N fer 20 en 11 e c
  • 6.
    Plan for theworst. c e 11 en 20 fer O on N eCC O mS So
  • 7.
    Minimize risk. c e 11 en 20 fer O on N eCC O mS So
  • 8.
    Recover, gracefully. c e 11 en 20 fer O on N eCC O mS So
  • 9.
    Plan for theworst. Minimize risk. Recover, gracefully. So O mS eCC O on N fer 20 en 11 e c
  • 10.
    c e 11 en 20 fer O on N eCC O mS So
  • 11.
    c e 11 en 20 fer O on N eCC O mS So
  • 12.
    Tales of failureto... Document Test Verify Imagine Implement So O mS eCC O on N fer 20 en 11 e c
  • 13.
    Failure to document. c e 11 en 20 fer O on N eCC O mS So
  • 14.
    Moving Day Thanks, David Prior! So O mS eCC O on N fer 20 en 11 e c
  • 15.
    Prevent documentation failures. • Write documentation. • Update documentation. • Make documenting a step in your written process. • Assign a fixed amount of time to that step. So O mS eCC O on N fer 20 en 11 e c
  • 16.
    Documentation tools • Graphic designers. (Pretty wikis. Pretty docs. (Sphinx?) Diagrams.) • Timelines. • Bug tracking. • Ordered todo lists. So O mS eCC O on N fer 20 en 11 e c
  • 17.
    Failure to test. c e 11 en 20 fer O on N eCC O mS So
  • 18.
    “My first dayposing as a sysadmin (~1990, no previous training....) I deleted all zero length files on a Sun workstation.” So O mS eCC O on N fer 20 en 11 e c
  • 19.
    Prevent testing failures. • Verify success criteria. • Write tests. • Test with a buddy. • Have a plan. So O mS eCC O on N fer 20 en 11 e c
  • 20.
    Testing tools • Your favorite test framework • Repeatable shell scripts • Staging environments So O mS eCC O on N fer 20 en 11 e c
  • 21.
    Failure to verify. c e 11 en 20 fer O on N eCC O mS So
  • 22.
    “What does ‘-d’actually do?” So O mS eCC O on N fer 20 en 11 e c
  • 23.
    Prevent verification failures. • Have a plan for things going wrong. • Have a staging environment. • Test your rollback plan, not just your implementation plan. So O mS eCC O on N fer 20 en 11 e c
  • 24.
    Verification tools • Staging environments • Your buddy So O mS eCC O on N fer 20 en 11 e c
  • 25.
    Failure to imagine. c e 11 en 20 fer O on N eCC O mS So
  • 26.
    For my groupthe bottom line was "don't trust anyone". So Thanks, Maggie! O mS eCC O on N fer 20 en 11 e c
  • 27.
    Recover from failures to imagine. • Share your stories of failure. • Talk with people who are different from you. • Act out implementation scenarios. So O mS eCC O on N fer 20 en 11 e c
  • 28.
    Failure to implement. c e 11 en 20 fer O on N eCC O mS So
  • 29.
    Re-implement. • Learn from mistakes. So O mS eCC O on N fer 20 en 11 e c
  • 30.
    Reflection. (or, the Post-Mortem) So O mS eCC O on N fer 20 en 11 e c
  • 31.
    Before • Document the plan with numbered steps and a timeline. • Test the plan and the rollback plan. • Identify a “point of no return”. So O mS eCC O on N fer 20 en 11 e c
  • 32.
    During • Screen sharing: UNIX screen,VNC, etc. • Chatroom: AIM, Campfire (scrollback!) • Voice: Campfire, Skype,VOIP, POTS call line • Headsets! • Designated time-keeper. So O mS eCC O on N fer 20 en 11 e c
  • 33.
    After • Documentation updates • Post-mortems to identify areas of success and areas for improvement. • Limit improvements to 1-2 things. So O mS eCC O on N fer 20 en 11 e c
  • 34.
    Plan for theworst. Minimize risk. Recover, gracefully. So O mS eCC O on N fer 20 en 11 e c
  • 35.
    Thanks! 20 fer 11 en c e O on N eCC O mS So
  • 36.
    Photo credits • Flickr: sheepguardingllama So O mS eCC O on N fer 20 en 11 e c