The Upside of Downtime (Velocity 2010)

  • 9,172 views
Uploaded on

Your site will inevitably go down, and thanks to services like Twitter there's a good chance that all of your customers will find out about it. This presentation aims to help you turn that downtime …

Your site will inevitably go down, and thanks to services like Twitter there's a good chance that all of your customers will find out about it. This presentation aims to help you turn that downtime into an opportunity to build trust with your users.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Excellent presentation
    Are you sure you want to
    Your message goes here
  • This was a terrific presentation. We are planning some downtime to change over our AMS and CMS. This will definitely assist us with preparing for it in advance. I loved the idea about a postmortem. People are always curious as to why you weren't available even if it didn't impact them. I also like the health dashboard idea. Now to get buy in from the rest of the org. Thanks for the fabulous suggestions, Lenny!
    Are you sure you want to
    Your message goes here
  • Video of the talk: http://www.youtube.com/watch?v=6MF2Pu6IW3Q
    Are you sure you want to
    Your message goes here
  • Really thought that was good. And very practical.
    Are you sure you want to
    Your message goes here
  • Thanks!
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
9,172
On Slideshare
0
From Embeds
0
Number of Embeds
8

Actions

Shares
Downloads
425
Comments
6
Likes
37

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. http://gapingvoid.com/ Sunday, June 20, 2010
  • 2. The Upside of Downtime Turning disaster into opportunity Sunday, June 20, 2010
  • 3. Who’s had a site go down? Sunday, June 20, 2010
  • 4. Who’s hasn’t had a site go down? Sunday, June 20, 2010
  • 5. There’s always that one guy! Sunday, June 20, 2010
  • 6. Sunday, June 20, 2010
  • 7. Sunday, June 20, 2010
  • 8. Sunday, June 20, 2010
  • 9. Sunday, June 20, 2010
  • 10. Sunday, June 20, 2010
  • 11. Sunday, June 20, 2010
  • 12. Sunday, June 20, 2010
  • 13. Sunday, June 20, 2010
  • 14. Sunday, June 20, 2010
  • 15. Downtime sucks Source: http://www.motivatedphotos.com/?id=8080 Sunday, June 20, 2010
  • 16. Why downtime sucks Business $3,000 $2,250 $1,500 Sales $750 $0 0 2 4 6 8 10 12 14 16 18 20 22 Sunday, June 20, 2010
  • 17. Why downtime sucks Business Brand Sunday, June 20, 2010
  • 18. Why downtime sucks Business Brand You Sunday, June 20, 2010
  • 19. Why downtime sucks Business Brand You Users Sunday, June 20, 2010
  • 20. Downtime = Bad! (Duh) Sunday, June 20, 2010
  • 21. Approach #1 Don’t fail Sunday, June 20, 2010
  • 22. Source: http://kansansforlife.files.wordpress.com/2009/12/titanic.jpg Sunday, June 20, 2010
  • 23. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 24. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 25. Your site will fail Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 26. Why?!? Sunday, June 20, 2010
  • 27. Why Failure Happens Risk Homeostasis Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg Sunday, June 20, 2010
  • 28. Why Failure Happens Risk Homeostasis Black Swan Source: Amazon.com Sunday, June 20, 2010
  • 29. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg Sunday, June 20, 2010
  • 30. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg Sunday, June 20, 2010
  • 31. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Source: http://www.biojobblog.com/uploads/image/dominos.jpg Sunday, June 20, 2010
  • 32. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Humans Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg Sunday, June 20, 2010
  • 33. Sunday, June 20, 2010
  • 34. Sunday, June 20, 2010
  • 35. Polisher blocked Not unusual Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 36. Polisher Moisture leaks into blocked air system Not unusual Not expected Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 37. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Not good Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 38. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 39. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 40. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 41. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken WTF Gauge broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 42. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken Meltdown Gauge broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 43. Sunday, June 20, 2010
  • 44. Source: http://support.rightscale.com/09-Clouds/AWS/02-Amazon_EC2/Designing_Failover_Architectures_on_EC2/03-Advanced_Failover_Architecture Sunday, June 20, 2010
  • 45. “accidental power failure” Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/ Sunday, June 20, 2010
  • 46. “traffic accident damaged a nearby utility transformer” Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/ Sunday, June 20, 2010
  • 47. “unfortunate code change” Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/ Sunday, June 20, 2010
  • 48. Sunday, June 20, 2010
  • 49. “Unhappy customers may get some attention, but unhappy networked customers can quickly impact your business” -- Clay Shirky Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/ Sunday, June 20, 2010
  • 50. Sunday, June 20, 2010
  • 51. Sunday, June 20, 2010
  • 52. Sunday, June 20, 2010
  • 53. Sunday, June 20, 2010
  • 54. Sunday, June 20, 2010
  • 55. Sunday, June 20, 2010
  • 56. http://labs.webmetrics.com/crowdsourceduptime Sunday, June 20, 2010
  • 57. Sunday, June 20, 2010
  • 58. Sunday, June 20, 2010
  • 59. Sunday, June 20, 2010
  • 60. Sunday, June 20, 2010
  • 61. Recap Sunday, June 20, 2010
  • 62. Your site will fail Sunday, June 20, 2010
  • 63. Your site will fail + Downtime is bad Sunday, June 20, 2010
  • 64. Your site will fail + Downtime is bad + Everyone will find out Sunday, June 20, 2010
  • 65. Your site will fail + Downtime is bad + Everyone will find out = Screw it, I’ll become a lumberjack Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg Sunday, June 20, 2010
  • 66. “Embrace fear of outages and degradation. Use it to guide your architecture, your code, your infrastructure. So lean into it.” -- John Allspaw, VP Tech. Ops at Etsy Sunday, June 20, 2010
  • 67. Approach #2 Prepare for downtime Sunday, June 20, 2010
  • 68. Disclaimer: Try hard to avoid downtime Sunday, June 20, 2010
  • 69. Learning by example... Sunday, June 20, 2010
  • 70. Case Study #1 Facebook Sunday, June 20, 2010
  • 71. Sunday, June 20, 2010
  • 72. Sunday, June 20, 2010
  • 73. Sunday, June 20, 2010
  • 74. Sunday, June 20, 2010
  • 75. Sunday, June 20, 2010
  • 76. Sunday, June 20, 2010
  • 77. “The larger issue here isn't just that a portion of Facebook's platform has gone down - numerous web services have issues from time to time, including everything from Gmail to Twitter. An outage of this length, however, with no official communication from the company itself is disturbing.” -- N.Y. Times Sunday, June 20, 2010
  • 78. Facebook Downtime Disturbing Sunday, June 20, 2010
  • 79. Sunday, June 20, 2010
  • 80. Case Study #2 Google App Engine Sunday, June 20, 2010
  • 81. Sunday, June 20, 2010
  • 82. Sunday, June 20, 2010
  • 83. Sunday, June 20, 2010
  • 84. Sunday, June 20, 2010
  • 85. Sunday, June 20, 2010
  • 86. Sunday, June 20, 2010
  • 87. Sunday, June 20, 2010
  • 88. Sunday, June 20, 2010
  • 89. Sunday, June 20, 2010
  • 90. Sunday, June 20, 2010
  • 91. Sunday, June 20, 2010
  • 92. Sunday, June 20, 2010
  • 93. Sunday, June 20, 2010
  • 94. Sunday, June 20, 2010
  • 95. Google App Engine Downtime Kudos Sunday, June 20, 2010
  • 96. Case Study #3 Atlassian Sunday, June 20, 2010
  • 97. Sunday, June 20, 2010
  • 98. Sunday, June 20, 2010
  • 99. Sunday, June 20, 2010
  • 100. Sunday, June 20, 2010
  • 101. Sunday, June 20, 2010
  • 102. Sunday, June 20, 2010
  • 103. Sunday, June 20, 2010
  • 104. Sunday, June 20, 2010
  • 105. Sunday, June 20, 2010
  • 106. Sunday, June 20, 2010
  • 107. Sunday, June 20, 2010
  • 108. Atlassian Downtime Bravo Sunday, June 20, 2010
  • 109. http://atlassian.com/ Sunday, June 20, 2010
  • 110. Downtime: Opportunity to Build Trust Sunday, June 20, 2010
  • 111. Downtime: Opportunity to Destroy Trust Sunday, June 20, 2010
  • 112. How To: Prepare for Downtime Sunday, June 20, 2010
  • 113. Something > Nothing Sunday, June 20, 2010
  • 114. Upside of Downtime Framework 1.0 Life is good Oh crap That sucked Time Sunday, June 20, 2010
  • 115. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 116. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 117. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 118. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 119. Prepare Communicate Explain Sunday, June 20, 2010
  • 120. Prepare Communicate Explain 1. Communication channel Sunday, June 20, 2010
  • 121. Prepare Communicate Explain 1. Communication channel Something is Can’t tell if it’s I’ll assume it’s wrong me or you you You suck Sunday, June 20, 2010
  • 122. Prepare Communicate Explain 1. Communication channel Something is Can’t tell if it’s I’ll assume it’s wrong me or you you Tell me when You suck a lot I know it’s you you’re back less Sunday, June 20, 2010
  • 123. Sunday, June 20, 2010
  • 124. Sunday, June 20, 2010
  • 125. Sunday, June 20, 2010
  • 126. Sunday, June 20, 2010
  • 127. Sunday, June 20, 2010
  • 128. Sunday, June 20, 2010
  • 129. Sunday, June 20, 2010
  • 130. Sunday, June 20, 2010
  • 131. Prepare Communicate Explain 1. Communication channel Easy to find Sunday, June 20, 2010
  • 132. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Sunday, June 20, 2010
  • 133. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated Sunday, June 20, 2010
  • 134. 7 keys for public health dashboards 1. Must show current status for each “service” 2. Data must be accurate and timely 3. Must be easy to find 4. Must provide details for events in real time 5. Provide historical uptime and performance data 6. Provide a way to be notified of status changes 7. Provide details on the data is gathered Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html Sunday, June 20, 2010
  • 135. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Sunday, June 20, 2010
  • 136. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Sunday, June 20, 2010
  • 137. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) Sunday, June 20, 2010
  • 138. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) On-call/drills/escalations/etc. Sunday, June 20, 2010
  • 139. Your servers Sunday, June 20, 2010
  • 140. Prepare Communicate Explain 1. Communicate Sunday, June 20, 2010
  • 141. Prepare Communicate Explain 1. Communicate Use communication channel Sunday, June 20, 2010
  • 142. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Sunday, June 20, 2010
  • 143. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected Sunday, June 20, 2010
  • 144. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started Sunday, June 20, 2010
  • 145. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Sunday, June 20, 2010
  • 146. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly Sunday, June 20, 2010
  • 147. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly 2. Fix it! Sunday, June 20, 2010
  • 148. Phew, close one! Sunday, June 20, 2010
  • 149. Prepare Communicate Explain 1. Postmortem Sunday, June 20, 2010
  • 150. Prepare Communicate Explain 1. Postmortem Admit failure Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/ Sunday, June 20, 2010
  • 151. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Source: http://www.bureauofcommunication.com/compose/apology Sunday, June 20, 2010
  • 152. Prepare Communicate Explain “We apologize for any inconvenience this may have caused” Sunday, June 20, 2010
  • 153. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  • 154. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/ Sunday, June 20, 2010
  • 155. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html Sunday, June 20, 2010
  • 156. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Source: http://graysky.org/2010/02/downtime-postmortem/ Sunday, June 20, 2010
  • 157. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Sunday, June 20, 2010
  • 158. Prepare Communicate Explain “I was completely overwhelmed by the amount of positive feedback and support I received.” Sunday, June 20, 2010
  • 159. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned 2. Improve for the future Sunday, June 20, 2010
  • 160. Prepare Communicate Explain “Google is not just saying sorry, they are actually implementing serious changes which probably represents millions of dollars of development to help make sure this doesn't happen again.” Source: http://news.ycombinator.com/item?id=1168493 Sunday, June 20, 2010
  • 161. Prepare Communicate Explain Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  • 162. Prepare Communicate Explain Be human Sunday, June 20, 2010
  • 163. Prepare Communicate Explain Be authentic Sunday, June 20, 2010
  • 164. Prepare Communicate Explain Be transparent Sunday, June 20, 2010
  • 165. Prepare Communicate Explain Accept responsibility Sunday, June 20, 2010
  • 166. Prepare Communicate Explain Learn and improve Sunday, June 20, 2010
  • 167. Prepare Communicate Explain Trust Sunday, June 20, 2010
  • 168. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 169. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Be Prepared + Be Transparent + Be Human Sunday, June 20, 2010
  • 170. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Be Prepared + Be Transparent + Be Human = Sunday, June 20, 2010 Trust
  • 171. Disclaimer: Don’t screw up too often Sunday, June 20, 2010
  • 172. Sunday, June 20, 2010
  • 173. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Not Caught Sunday, June 20, 2010
  • 174. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Not Caught Win Sunday, June 20, 2010
  • 175. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Loss Not Caught Win Sunday, June 20, 2010
  • 176. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Sunday, June 20, 2010
  • 177. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Win Sunday, June 20, 2010
  • 178. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Win Sunday, June 20, 2010
  • 179. Benefits Gain trust Reduce churn, increase loyalty Reduce support costs Ability to control the message Competitive advantage More time to focus on the actual problem Reduce stress Sunday, June 20, 2010
  • 180. Change != Easy Sunday, June 20, 2010
  • 181. Change != Impossible Sunday, June 20, 2010
  • 182. Keys to Adoption Getting past a culture of “hide the problem” Sunday, June 20, 2010
  • 183. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Sunday, June 20, 2010
  • 184. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Sunday, June 20, 2010
  • 185. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Sunday, June 20, 2010
  • 186. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Buy-in Sunday, June 20, 2010
  • 187. Product Management Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 188. Product Default: Lets wait for complaints Management Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 189. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 190. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 191. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 192. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Sales/ Marketing Sunday, June 20, 2010
  • 193. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Marketing Sunday, June 20, 2010
  • 194. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Sunday, June 20, 2010
  • 195. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Reality: They’ll find out, better from us Sunday, June 20, 2010
  • 196. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Reality: They’ll find out, better from us Sunday, June 20, 2010
  • 197. Source: http://delicious.com/lennysan/healthdashboard Sunday, June 20, 2010
  • 198. Simple as that! Sunday, June 20, 2010
  • 199. Your site will still fail! Sunday, June 20, 2010
  • 200. “The measure of a society is how well it transforms pain and suffering into something worthwhile.” -- Fredrick Nietzsche Sunday, June 20, 2010
  • 201. “The measure of a company is how well it transforms pain of downtime into something worthwhile.” -- Lenny Rachitsky Source: Original quote inspired by Fredrick Nietzsche Sunday, June 20, 2010
  • 202. Bare minimum: Register a Twitter account Sunday, June 20, 2010
  • 203. Thank You Slides: http://bit.ly/upside-of-downtime Lenny Rachitsky @lennysan http://www.transparentuptime.com/ Webmetrics/Neustar @webmetrics http://www.webmetrics.com/ Sunday, June 20, 2010
  • 204. Bonus Sunday, June 20, 2010
  • 205. Sunday, June 20, 2010
  • 206. Sunday, June 20, 2010
  • 207. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 208. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 209. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 210. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve "Unlikely that an accidental surface or subsurface oil spill would occur from the proposed activities" -- Exploration and environmental impact plan Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion Sunday, June 20, 2010
  • 211. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 212. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 213. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 214. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 215. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 216. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 217. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 218. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 219. “Be not afraid of transparency; some are born transparent, some achieve transparency, and others have transparency thrust upon them.” -- Burrowed from William Shakespeare Sunday, June 20, 2010
  • 220. Sunday, June 20, 2010
  • 221. Making change 1. Find the bright spots - (this presentation has a bunch) Sunday, June 20, 2010
  • 222. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) Sunday, June 20, 2010
  • 223. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) Sunday, June 20, 2010
  • 224. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) Sunday, June 20, 2010
  • 225. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) Sunday, June 20, 2010
  • 226. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) Sunday, June 20, 2010
  • 227. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) Sunday, June 20, 2010
  • 228. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) Sunday, June 20, 2010
  • 229. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) 9. Rally the herd - (get buy in, rest will follow) Sunday, June 20, 2010