SlideShare a Scribd company logo
1 of 229
Download to read offline
http://gapingvoid.com/
Sunday, June 20, 2010
The Upside of Downtime
         Turning disaster into opportunity




Sunday, June 20, 2010
Who’s had a site go down?




Sunday, June 20, 2010
Who’s hasn’t had a site go
                       down?



Sunday, June 20, 2010
There’s always
                         that one guy!




Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Downtime
                                                   sucks



Source: http://www.motivatedphotos.com/?id=8080
Sunday, June 20, 2010
Why downtime sucks
               Business   $3,000

                          $2,250

                          $1,500
                                                         Sales
                           $750

                             $0
                                   0   2   4   6   8   10 12 14 16 18 20 22




Sunday, June 20, 2010
Why downtime sucks
               Business
               Brand




Sunday, June 20, 2010
Why downtime sucks
               Business
               Brand
               You




Sunday, June 20, 2010
Why downtime sucks
               Business
               Brand
               You
               Users




Sunday, June 20, 2010
Downtime = Bad! (Duh)




Sunday, June 20, 2010
Approach #1
                          Don’t fail



Sunday, June 20, 2010
Source: http://kansansforlife.files.wordpress.com/2009/12/titanic.jpg
Sunday, June 20, 2010
“Everything fails all the time”
                        -- Werner Vogels (Amazon, CTO)




Sunday, June 20, 2010
“Everything fails all the time”
                        -- Werner Vogels (Amazon, CTO)




Sunday, June 20, 2010
Your site
                         will fail



                           Werner Vogels
                          (Amazon, CTO)
Sunday, June 20, 2010
Why?!?




Sunday, June 20, 2010
Why Failure Happens
                            Risk Homeostasis




Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg

Sunday, June 20, 2010
Why Failure Happens
                        Risk Homeostasis
                        Black Swan




Source: Amazon.com
Sunday, June 20, 2010
Why Failure Happens
                          Risk Homeostasis
                          Black Swan
                          Unknown unknowns




Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg
Sunday, June 20, 2010
Why Failure Happens
                           Risk Homeostasis
                           Black Swan
                           Unknown unknowns
                           Change




Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg
Sunday, June 20, 2010
Why Failure Happens
                          Risk Homeostasis
                          Black Swan
                          Unknown unknowns
                          Change
                          Many small failures


Source: http://www.biojobblog.com/uploads/image/dominos.jpg

Sunday, June 20, 2010
Why Failure Happens
                            Risk Homeostasis
                            Black Swan
                            Unknown unknowns
                            Change
                            Many small failures
                            Humans
Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Polisher
                 blocked

         Not unusual




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into
                 blocked                                      air system

         Not unusual                                           Not expected




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected        Not good




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked


                                                                      Dammit Relief valve broken




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked


                                                                      Dammit Relief valve broken


                                                                       WTF        Gauge broken

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked


                                                                      Dammit Relief valve broken


                                                                Meltdown          Gauge broken

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Sunday, June 20, 2010
Source: http://support.rightscale.com/09-Clouds/AWS/02-Amazon_EC2/Designing_Failover_Architectures_on_EC2/03-Advanced_Failover_Architecture
Sunday, June 20, 2010
“accidental power failure”



Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/
Sunday, June 20, 2010
“traffic accident damaged a nearby
                         utility transformer”
Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/
Sunday, June 20, 2010
“unfortunate code change”
Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/
Sunday, June 20, 2010
Sunday, June 20, 2010
“Unhappy customers may get some
             attention, but unhappy networked
             customers can quickly impact your
             business”
                                                                                                                                     -- Clay Shirky

Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
http://labs.webmetrics.com/crowdsourceduptime
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Recap




Sunday, June 20, 2010
Your site will fail




Sunday, June 20, 2010
Your site will fail
          +
          Downtime is bad




Sunday, June 20, 2010
Your site will fail
          +
          Downtime is bad
          +
          Everyone will find out



Sunday, June 20, 2010
Your site will fail
          +
          Downtime is bad
          +
          Everyone will find out
          =
          Screw it, I’ll become a
          lumberjack
                            Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg
Sunday, June 20, 2010
“Embrace fear of outages and
               degradation. Use it to guide your
               architecture, your code, your
               infrastructure. So lean into it.”
                              -- John Allspaw, VP Tech. Ops at Etsy

Sunday, June 20, 2010
Approach #2
                        Prepare for downtime



Sunday, June 20, 2010
Disclaimer:
         Try hard to avoid downtime



Sunday, June 20, 2010
Learning by example...




Sunday, June 20, 2010
Case Study #1
                          Facebook



Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
“The larger issue here isn't just that a portion of
         Facebook's platform has gone down - numerous web
         services have issues from time to time, including
         everything from Gmail to Twitter. An outage of this
         length, however, with no official communication
         from the company itself is disturbing.”
                                                     -- N.Y. Times




Sunday, June 20, 2010
Facebook



         Downtime             Disturbing




Sunday, June 20, 2010
Sunday, June 20, 2010
Case Study #2
                        Google App Engine



Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Google App Engine



                        Downtime     Kudos




Sunday, June 20, 2010
Case Study #3
                          Atlassian



Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Atlassian



                 Downtime           Bravo




Sunday, June 20, 2010
http://atlassian.com/

Sunday, June 20, 2010
Downtime:
         Opportunity to Build Trust



Sunday, June 20, 2010
Downtime:
         Opportunity to Destroy Trust



Sunday, June 20, 2010
How To:
         Prepare for Downtime



Sunday, June 20, 2010
Something > Nothing




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




               Life is good     Oh crap     That sucked
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Prepare   Communicate   Explain




Sunday, June 20, 2010
Prepare   Communicate   Explain

         1. Communication channel




Sunday, June 20, 2010
Prepare   Communicate          Explain

         1. Communication channel


      Something is                Can’t tell if it’s    I’ll assume it’s
        wrong                      me or you                   you




                                                          You suck


Sunday, June 20, 2010
Prepare   Communicate          Explain

         1. Communication channel


      Something is                Can’t tell if it’s    I’ll assume it’s
        wrong                      me or you                   you




                                   Tell me when         You suck a lot
    I know it’s you
                                    you’re back              less


Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find




Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site




Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated




Sunday, June 20, 2010
7 keys for public health dashboards

          1. Must show current status for each “service”
          2. Data must be accurate and timely
          3. Must be easy to find
          4. Must provide details for events in real time
          5. Provide historical uptime and performance data
          6. Provide a way to be notified of status changes
          7. Provide details on the data is gathered


 Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html

Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process



Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process
                        Authority




Sunday, June 20, 2010
Prepare       Communicate    Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process
                        Authority
                        Mean-Time-To-Communicate (MTTC)


Sunday, June 20, 2010
Prepare        Communicate        Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process
                        Authority
                        Mean-Time-To-Communicate (MTTC)
                        On-call/drills/escalations/etc.
Sunday, June 20, 2010
Your servers




Sunday, June 20, 2010
Prepare   Communicate   Explain

         1. Communicate




Sunday, June 20, 2010
Prepare     Communicate     Explain

         1. Communicate
                        Use communication channel




Sunday, June 20, 2010
Prepare     Communicate     Explain

         1. Communicate
                        Use communication channel
                        MTTC




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started
                        ETA




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started
                        ETA
                        Update regularly


Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started
                        ETA
                        Update regularly

         2. Fix it!
Sunday, June 20, 2010
Phew, close
                           one!




Sunday, June 20, 2010
Prepare   Communicate   Explain

         1. Postmortem




Sunday, June 20, 2010
Prepare                                 Communicate   Explain

         1. Postmortem
                        Admit failure




Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/
Sunday, June 20, 2010
Prepare                                Communicate   Explain

          1. Postmortem
                        Admit failure
                        Sound like a human




Source: http://www.bureauofcommunication.com/compose/apology
Sunday, June 20, 2010
Prepare   Communicate   Explain




                         “We apologize for any
                        inconvenience this may
                             have caused”


Sunday, June 20, 2010
Prepare                                   Communicate                    Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time




Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf
Sunday, June 20, 2010
Prepare                                    Communicate      Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted




Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/
Sunday, June 20, 2010
Prepare                                 Communicate   Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong




Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html
Sunday, June 20, 2010
Prepare                           Communicate   Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong
                        Lessons learned


Source: http://graysky.org/2010/02/downtime-postmortem/
Sunday, June 20, 2010
Prepare         Communicate   Explain

         1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong
                        Lessons learned


Sunday, June 20, 2010
Prepare   Communicate   Explain




                “I was completely overwhelmed by
                the amount of positive feedback and
                support I received.”
Sunday, June 20, 2010
Prepare         Communicate   Explain

         1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong
                        Lessons learned

          2. Improve for the future
Sunday, June 20, 2010
Prepare                       Communicate   Explain




               “Google is not just saying sorry, they are
               actually implementing serious changes which
               probably represents millions of dollars of
               development to help make sure this doesn't
               happen again.”




Source: http://news.ycombinator.com/item?id=1168493

Sunday, June 20, 2010
Prepare                                  Communicate                     Explain




Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf
Sunday, June 20, 2010
Prepare   Communicate   Explain




                                  Be human




Sunday, June 20, 2010
Prepare     Communicate   Explain




                                  Be authentic




Sunday, June 20, 2010
Prepare      Communicate   Explain




                                  Be transparent




Sunday, June 20, 2010
Prepare   Communicate   Explain




                          Accept responsibility




Sunday, June 20, 2010
Prepare   Communicate   Explain




                            Learn and improve




Sunday, June 20, 2010
Prepare   Communicate   Explain




                                   Trust




Sunday, June 20, 2010
Upside of Downtime Framework 1.0

                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0

                        Prepare     Communicate                Explain
        1. Communication channel     1. Communicate         1. Post-mortem
        - Easy to find                 - Use channel          - Admit failure
        - Off-site                    - M.T.T.C.             - Sound like a human
        - Real-time                   - Who/what affected    - Start time and end time
                                      - When started         - Who/what was impacted
        2. Process                    - ETA to resolution    - What went wrong
         - Give authority             - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations       2. Fix it!             2. Learn and improve




                Be Prepared       + Be Transparent +          Be Human




Sunday, June 20, 2010
Upside of Downtime Framework 1.0

                        Prepare     Communicate                Explain
        1. Communication channel     1. Communicate         1. Post-mortem
        - Easy to find                 - Use channel          - Admit failure
        - Off-site                    - M.T.T.C.             - Sound like a human
        - Real-time                   - Who/what affected    - Start time and end time
                                      - When started         - Who/what was impacted
        2. Process                    - ETA to resolution    - What went wrong
         - Give authority             - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations       2. Fix it!             2. Learn and improve




                Be Prepared       + Be Transparent +          Be Human             =



Sunday, June 20, 2010
                                    Trust
Disclaimer:
         Don’t screw up too often



Sunday, June 20, 2010
Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught




                     Not
                    Caught



Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught




                     Not
                    Caught                     Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught                 Big Loss


                     Not
                    Caught                     Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught   Big Win Big Loss


                     Not
                    Caught                     Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught   Big Win Big Loss


                     Not
                    Caught     Win             Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught   Big Win Big Loss


                     Not
                    Caught     Win             Win

Sunday, June 20, 2010
Benefits
               Gain trust
               Reduce churn, increase loyalty
               Reduce support costs
               Ability to control the message
               Competitive advantage
               More time to focus on the actual problem
               Reduce stress


Sunday, June 20, 2010
Change != Easy




Sunday, June 20, 2010
Change != Impossible




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve
               Available resources to improve




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve
               Available resources to improve
               Pain




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve
               Available resources to improve
               Pain
               Buy-in




Sunday, June 20, 2010
Product
         Management



                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management



                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness



                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/     Default: I don’t want my customers to know
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/     Default: I don’t want my customers to know
              Marketing   Reality: They’ll find out, better from us

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/     Default: I don’t want my customers to know
              Marketing   Reality: They’ll find out, better from us

Sunday, June 20, 2010
Source: http://delicious.com/lennysan/healthdashboard

Sunday, June 20, 2010
Simple as that!




Sunday, June 20, 2010
Your site
                        will still fail!




Sunday, June 20, 2010
“The measure of a society is how
     well it transforms pain and suffering
     into something worthwhile.”
                           -- Fredrick Nietzsche

Sunday, June 20, 2010
“The measure of a company is how
      well it transforms pain of downtime
      into something worthwhile.”
                                                        -- Lenny Rachitsky

Source: Original quote inspired by Fredrick Nietzsche
Sunday, June 20, 2010
Bare minimum:
         Register a Twitter account



Sunday, June 20, 2010
Thank You

             Slides: http://bit.ly/upside-of-downtime

             Lenny Rachitsky
             @lennysan
             http://www.transparentuptime.com/

                        Webmetrics/Neustar
                        @webmetrics
                        http://www.webmetrics.com/
Sunday, June 20, 2010
Bonus




Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare                                         Communicate                       Explain
          1. Communication channel                                              1. Communicate         1. Post-mortem
          - Easy to find                                                          - Use channel          - Admit failure
          - Off-site                                                             - M.T.T.C.             - Sound like a human
          - Real-time                                                            - Who/what affected    - Start time and end time
                                                                                 - When started         - Who/what was impacted
          2. Process                                                             - ETA to resolution    - What went wrong
           - Give authority                                                      - Update regularly     - Lessons learned
           - M.T.T.C.
           - On-call/escalations                                                2. Fix it!             2. Learn and improve




        "Unlikely that an accidental surface or subsurface
        oil spill would occur from the proposed activities"
                                                                                -- Exploration and environmental impact plan


Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion

Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
“Be not afraid of transparency;
          some are born transparent,
          some achieve transparency,
          and others have transparency
         
 
 
 
 
 
 
 
 thrust upon them.”
                        -- Burrowed from William Shakespeare




Sunday, June 20, 2010
Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)
         7. Tweak the environment - (create a simple process)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)
         7. Tweak the environment - (create a simple process)
         8. Build habits - (build process organically)


Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)
         7. Tweak the environment - (create a simple process)
         8. Build habits - (build process organically)
         9. Rally the herd - (get buy in, rest will follow)
Sunday, June 20, 2010

More Related Content

More from Lenny Rachitsky

Localmind pitch at NewTech Montreal
Localmind pitch at NewTech MontrealLocalmind pitch at NewTech Montreal
Localmind pitch at NewTech MontrealLenny Rachitsky
 
Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)Lenny Rachitsky
 
Upside of Downtime Preparation Framework
Upside of Downtime Preparation FrameworkUpside of Downtime Preparation Framework
Upside of Downtime Preparation FrameworkLenny Rachitsky
 
Google App Engine - Simple Introduction
Google App Engine - Simple IntroductionGoogle App Engine - Simple Introduction
Google App Engine - Simple IntroductionLenny Rachitsky
 
The Cloud - An introduction
The Cloud - An introductionThe Cloud - An introduction
The Cloud - An introductionLenny Rachitsky
 
The Power of Story, Part 1
The Power of Story, Part 1The Power of Story, Part 1
The Power of Story, Part 1Lenny Rachitsky
 
Getting Things Done - Intro
Getting Things Done - IntroGetting Things Done - Intro
Getting Things Done - IntroLenny Rachitsky
 
The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893Lenny Rachitsky
 
Influence - Robert Cialdini
Influence - Robert CialdiniInfluence - Robert Cialdini
Influence - Robert CialdiniLenny Rachitsky
 

More from Lenny Rachitsky (11)

Localmind pitch at NewTech Montreal
Localmind pitch at NewTech MontrealLocalmind pitch at NewTech Montreal
Localmind pitch at NewTech Montreal
 
Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)
 
Upside of Downtime Preparation Framework
Upside of Downtime Preparation FrameworkUpside of Downtime Preparation Framework
Upside of Downtime Preparation Framework
 
Google App Engine - Simple Introduction
Google App Engine - Simple IntroductionGoogle App Engine - Simple Introduction
Google App Engine - Simple Introduction
 
The Cloud - An introduction
The Cloud - An introductionThe Cloud - An introduction
The Cloud - An introduction
 
How to Trust the Cloud
How to Trust the CloudHow to Trust the Cloud
How to Trust the Cloud
 
The Power of Story, Part 1
The Power of Story, Part 1The Power of Story, Part 1
The Power of Story, Part 1
 
Getting Things Done - Intro
Getting Things Done - IntroGetting Things Done - Intro
Getting Things Done - Intro
 
The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893
 
Influence - Robert Cialdini
Influence - Robert CialdiniInfluence - Robert Cialdini
Influence - Robert Cialdini
 
Twitter - An Intro
Twitter - An IntroTwitter - An Intro
Twitter - An Intro
 

Recently uploaded

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

The Upside of Downtime (Velocity 2010)

  • 2. The Upside of Downtime Turning disaster into opportunity Sunday, June 20, 2010
  • 3. Who’s had a site go down? Sunday, June 20, 2010
  • 4. Who’s hasn’t had a site go down? Sunday, June 20, 2010
  • 5. There’s always that one guy! Sunday, June 20, 2010
  • 15. Downtime sucks Source: http://www.motivatedphotos.com/?id=8080 Sunday, June 20, 2010
  • 16. Why downtime sucks Business $3,000 $2,250 $1,500 Sales $750 $0 0 2 4 6 8 10 12 14 16 18 20 22 Sunday, June 20, 2010
  • 17. Why downtime sucks Business Brand Sunday, June 20, 2010
  • 18. Why downtime sucks Business Brand You Sunday, June 20, 2010
  • 19. Why downtime sucks Business Brand You Users Sunday, June 20, 2010
  • 20. Downtime = Bad! (Duh) Sunday, June 20, 2010
  • 21. Approach #1 Don’t fail Sunday, June 20, 2010
  • 23. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 24. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 25. Your site will fail Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 27. Why Failure Happens Risk Homeostasis Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg Sunday, June 20, 2010
  • 28. Why Failure Happens Risk Homeostasis Black Swan Source: Amazon.com Sunday, June 20, 2010
  • 29. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg Sunday, June 20, 2010
  • 30. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg Sunday, June 20, 2010
  • 31. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Source: http://www.biojobblog.com/uploads/image/dominos.jpg Sunday, June 20, 2010
  • 32. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Humans Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg Sunday, June 20, 2010
  • 35. Polisher blocked Not unusual Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 36. Polisher Moisture leaks into blocked air system Not unusual Not expected Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 37. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Not good Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 38. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 39. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 40. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 41. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken WTF Gauge broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 42. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken Meltdown Gauge broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 45. “accidental power failure” Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/ Sunday, June 20, 2010
  • 46. “traffic accident damaged a nearby utility transformer” Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/ Sunday, June 20, 2010
  • 47. “unfortunate code change” Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/ Sunday, June 20, 2010
  • 49. “Unhappy customers may get some attention, but unhappy networked customers can quickly impact your business” -- Clay Shirky Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/ Sunday, June 20, 2010
  • 62. Your site will fail Sunday, June 20, 2010
  • 63. Your site will fail + Downtime is bad Sunday, June 20, 2010
  • 64. Your site will fail + Downtime is bad + Everyone will find out Sunday, June 20, 2010
  • 65. Your site will fail + Downtime is bad + Everyone will find out = Screw it, I’ll become a lumberjack Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg Sunday, June 20, 2010
  • 66. “Embrace fear of outages and degradation. Use it to guide your architecture, your code, your infrastructure. So lean into it.” -- John Allspaw, VP Tech. Ops at Etsy Sunday, June 20, 2010
  • 67. Approach #2 Prepare for downtime Sunday, June 20, 2010
  • 68. Disclaimer: Try hard to avoid downtime Sunday, June 20, 2010
  • 70. Case Study #1 Facebook Sunday, June 20, 2010
  • 77. “The larger issue here isn't just that a portion of Facebook's platform has gone down - numerous web services have issues from time to time, including everything from Gmail to Twitter. An outage of this length, however, with no official communication from the company itself is disturbing.” -- N.Y. Times Sunday, June 20, 2010
  • 78. Facebook Downtime Disturbing Sunday, June 20, 2010
  • 80. Case Study #2 Google App Engine Sunday, June 20, 2010
  • 95. Google App Engine Downtime Kudos Sunday, June 20, 2010
  • 96. Case Study #3 Atlassian Sunday, June 20, 2010
  • 108. Atlassian Downtime Bravo Sunday, June 20, 2010
  • 110. Downtime: Opportunity to Build Trust Sunday, June 20, 2010
  • 111. Downtime: Opportunity to Destroy Trust Sunday, June 20, 2010
  • 112. How To: Prepare for Downtime Sunday, June 20, 2010
  • 113. Something > Nothing Sunday, June 20, 2010
  • 114. Upside of Downtime Framework 1.0 Life is good Oh crap That sucked Time Sunday, June 20, 2010
  • 115. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 116. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 117. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 118. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 119. Prepare Communicate Explain Sunday, June 20, 2010
  • 120. Prepare Communicate Explain 1. Communication channel Sunday, June 20, 2010
  • 121. Prepare Communicate Explain 1. Communication channel Something is Can’t tell if it’s I’ll assume it’s wrong me or you you You suck Sunday, June 20, 2010
  • 122. Prepare Communicate Explain 1. Communication channel Something is Can’t tell if it’s I’ll assume it’s wrong me or you you Tell me when You suck a lot I know it’s you you’re back less Sunday, June 20, 2010
  • 131. Prepare Communicate Explain 1. Communication channel Easy to find Sunday, June 20, 2010
  • 132. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Sunday, June 20, 2010
  • 133. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated Sunday, June 20, 2010
  • 134. 7 keys for public health dashboards 1. Must show current status for each “service” 2. Data must be accurate and timely 3. Must be easy to find 4. Must provide details for events in real time 5. Provide historical uptime and performance data 6. Provide a way to be notified of status changes 7. Provide details on the data is gathered Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html Sunday, June 20, 2010
  • 135. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Sunday, June 20, 2010
  • 136. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Sunday, June 20, 2010
  • 137. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) Sunday, June 20, 2010
  • 138. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) On-call/drills/escalations/etc. Sunday, June 20, 2010
  • 140. Prepare Communicate Explain 1. Communicate Sunday, June 20, 2010
  • 141. Prepare Communicate Explain 1. Communicate Use communication channel Sunday, June 20, 2010
  • 142. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Sunday, June 20, 2010
  • 143. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected Sunday, June 20, 2010
  • 144. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started Sunday, June 20, 2010
  • 145. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Sunday, June 20, 2010
  • 146. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly Sunday, June 20, 2010
  • 147. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly 2. Fix it! Sunday, June 20, 2010
  • 148. Phew, close one! Sunday, June 20, 2010
  • 149. Prepare Communicate Explain 1. Postmortem Sunday, June 20, 2010
  • 150. Prepare Communicate Explain 1. Postmortem Admit failure Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/ Sunday, June 20, 2010
  • 151. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Source: http://www.bureauofcommunication.com/compose/apology Sunday, June 20, 2010
  • 152. Prepare Communicate Explain “We apologize for any inconvenience this may have caused” Sunday, June 20, 2010
  • 153. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  • 154. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/ Sunday, June 20, 2010
  • 155. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html Sunday, June 20, 2010
  • 156. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Source: http://graysky.org/2010/02/downtime-postmortem/ Sunday, June 20, 2010
  • 157. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Sunday, June 20, 2010
  • 158. Prepare Communicate Explain “I was completely overwhelmed by the amount of positive feedback and support I received.” Sunday, June 20, 2010
  • 159. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned 2. Improve for the future Sunday, June 20, 2010
  • 160. Prepare Communicate Explain “Google is not just saying sorry, they are actually implementing serious changes which probably represents millions of dollars of development to help make sure this doesn't happen again.” Source: http://news.ycombinator.com/item?id=1168493 Sunday, June 20, 2010
  • 161. Prepare Communicate Explain Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  • 162. Prepare Communicate Explain Be human Sunday, June 20, 2010
  • 163. Prepare Communicate Explain Be authentic Sunday, June 20, 2010
  • 164. Prepare Communicate Explain Be transparent Sunday, June 20, 2010
  • 165. Prepare Communicate Explain Accept responsibility Sunday, June 20, 2010
  • 166. Prepare Communicate Explain Learn and improve Sunday, June 20, 2010
  • 167. Prepare Communicate Explain Trust Sunday, June 20, 2010
  • 168. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 169. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Be Prepared + Be Transparent + Be Human Sunday, June 20, 2010
  • 170. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Be Prepared + Be Transparent + Be Human = Sunday, June 20, 2010 Trust
  • 171. Disclaimer: Don’t screw up too often Sunday, June 20, 2010
  • 173. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Not Caught Sunday, June 20, 2010
  • 174. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Not Caught Win Sunday, June 20, 2010
  • 175. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Loss Not Caught Win Sunday, June 20, 2010
  • 176. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Sunday, June 20, 2010
  • 177. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Win Sunday, June 20, 2010
  • 178. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Win Sunday, June 20, 2010
  • 179. Benefits Gain trust Reduce churn, increase loyalty Reduce support costs Ability to control the message Competitive advantage More time to focus on the actual problem Reduce stress Sunday, June 20, 2010
  • 180. Change != Easy Sunday, June 20, 2010
  • 182. Keys to Adoption Getting past a culture of “hide the problem” Sunday, June 20, 2010
  • 183. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Sunday, June 20, 2010
  • 184. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Sunday, June 20, 2010
  • 185. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Sunday, June 20, 2010
  • 186. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Buy-in Sunday, June 20, 2010
  • 187. Product Management Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 188. Product Default: Lets wait for complaints Management Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 189. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 190. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 191. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 192. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Sales/ Marketing Sunday, June 20, 2010
  • 193. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Marketing Sunday, June 20, 2010
  • 194. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Sunday, June 20, 2010
  • 195. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Reality: They’ll find out, better from us Sunday, June 20, 2010
  • 196. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Reality: They’ll find out, better from us Sunday, June 20, 2010
  • 198. Simple as that! Sunday, June 20, 2010
  • 199. Your site will still fail! Sunday, June 20, 2010
  • 200. “The measure of a society is how well it transforms pain and suffering into something worthwhile.” -- Fredrick Nietzsche Sunday, June 20, 2010
  • 201. “The measure of a company is how well it transforms pain of downtime into something worthwhile.” -- Lenny Rachitsky Source: Original quote inspired by Fredrick Nietzsche Sunday, June 20, 2010
  • 202. Bare minimum: Register a Twitter account Sunday, June 20, 2010
  • 203. Thank You Slides: http://bit.ly/upside-of-downtime Lenny Rachitsky @lennysan http://www.transparentuptime.com/ Webmetrics/Neustar @webmetrics http://www.webmetrics.com/ Sunday, June 20, 2010
  • 207. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 208. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 209. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 210. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve "Unlikely that an accidental surface or subsurface oil spill would occur from the proposed activities" -- Exploration and environmental impact plan Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion Sunday, June 20, 2010
  • 211. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 212. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 213. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 214. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 215. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 216. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 217. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 218. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 219. “Be not afraid of transparency; some are born transparent, some achieve transparency, and others have transparency thrust upon them.” -- Burrowed from William Shakespeare Sunday, June 20, 2010
  • 221. Making change 1. Find the bright spots - (this presentation has a bunch) Sunday, June 20, 2010
  • 222. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) Sunday, June 20, 2010
  • 223. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) Sunday, June 20, 2010
  • 224. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) Sunday, June 20, 2010
  • 225. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) Sunday, June 20, 2010
  • 226. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) Sunday, June 20, 2010
  • 227. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) Sunday, June 20, 2010
  • 228. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) Sunday, June 20, 2010
  • 229. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) 9. Rally the herd - (get buy in, rest will follow) Sunday, June 20, 2010