Successfully reported this slideshow.

Upside of Downtime - Lenny Rachitsky

609 views

Published on

  • Be the first to comment

Upside of Downtime - Lenny Rachitsky

  1. 1. http://gapingvoid.com/ Sunday, June 20, 2010
  2. 2. The Upside of Downtime Turning disaster into opportunity Sunday, June 20, 2010
  3. 3. Who’s had a site go down? Sunday, June 20, 2010
  4. 4. Who’s hasn’t had a site go down? Sunday, June 20, 2010
  5. 5. There’s always that one guy! Sunday, June 20, 2010
  6. 6. Sunday, June 20, 2010
  7. 7. Sunday, June 20, 2010
  8. 8. Sunday, June 20, 2010
  9. 9. Sunday, June 20, 2010
  10. 10. Sunday, June 20, 2010
  11. 11. Sunday, June 20, 2010
  12. 12. Sunday, June 20, 2010
  13. 13. Sunday, June 20, 2010
  14. 14. Sunday, June 20, 2010
  15. 15. Downtime sucks Source: http://www.motivatedphotos.com/?id=8080 Sunday, June 20, 2010
  16. 16. Why downtime sucks Business $0 $750 $1,500 $2,250 $3,000 0 2 4 6 8 10 12 14 16 18 20 22 Sales Sunday, June 20, 2010
  17. 17. Why downtime sucks Business Brand Sunday, June 20, 2010
  18. 18. Why downtime sucks Business Brand You Sunday, June 20, 2010
  19. 19. Why downtime sucks Business Brand You Users Sunday, June 20, 2010
  20. 20. Downtime = Bad! (Duh) Sunday, June 20, 2010
  21. 21. Approach #1 Don’t fail Sunday, June 20, 2010
  22. 22. Source: http://kansansforlife.files.wordpress.com/2009/12/titanic.jpg Sunday, June 20, 2010
  23. 23. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  24. 24. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  25. 25. Your site will fail Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  26. 26. Why?!? Sunday, June 20, 2010
  27. 27. Risk Homeostasis Why Failure Happens Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg Sunday, June 20, 2010
  28. 28. Risk Homeostasis Black Swan Why Failure Happens Source: Amazon.com Sunday, June 20, 2010
  29. 29. Risk Homeostasis Black Swan Unknown unknowns Why Failure Happens Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg Sunday, June 20, 2010
  30. 30. Risk Homeostasis Black Swan Unknown unknowns Change Why Failure Happens Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg Sunday, June 20, 2010
  31. 31. Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Why Failure Happens Source: http://www.biojobblog.com/uploads/image/dominos.jpg Sunday, June 20, 2010
  32. 32. Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Humans Why Failure Happens Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg Sunday, June 20, 2010
  33. 33. Sunday, June 20, 2010
  34. 34. Sunday, June 20, 2010
  35. 35. Not unusual Polisher blocked Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  36. 36. Not unusual Not expected Polisher blocked Moisture leaks into air system Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  37. 37. Not unusual Polisher blocked Moisture leaks into air system Flow of cold water stopped Not expected Not good Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  38. 38. Not unusual Polisher blocked Moisture leaks into air system Flow of cold water stopped Not expected Backup disabled Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  39. 39. Not unusual Polisher blocked Moisture leaks into air system Flow of cold water stopped Not expected Backup disabled Indicator blockedDoh! Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  40. 40. Not unusual Polisher blocked Moisture leaks into air system Flow of cold water stopped Not expected Backup disabled Indicator blocked Relief valve broken Doh! Dammit Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  41. 41. Not unusual Polisher blocked Moisture leaks into air system Flow of cold water stopped Not expected Backup disabled Indicator blocked Relief valve broken Gauge broken Doh! Dammit WTF Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  42. 42. Not unusual Polisher blocked Moisture leaks into air system Flow of cold water stopped Meltdown Not expected Backup disabled Indicator blocked Relief valve broken Gauge broken Doh! Dammit Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  43. 43. Sunday, June 20, 2010
  44. 44. Source: http://support.rightscale.com/09-Clouds/AWS/02-Amazon_EC2/Designing_Failover_Architectures_on_EC2/03-Advanced_Failover_Architecture Sunday, June 20, 2010
  45. 45. “accidental power failure” Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/ Sunday, June 20, 2010
  46. 46. “traffic accident damaged a nearby utility transformer” Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/ Sunday, June 20, 2010
  47. 47. “unfortunate code change” Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/ Sunday, June 20, 2010
  48. 48. Sunday, June 20, 2010
  49. 49. “Unhappy customers may get some attention, but unhappy networked customers can quickly impact your business” -- Clay Shirky Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/ Sunday, June 20, 2010
  50. 50. Sunday, June 20, 2010
  51. 51. Sunday, June 20, 2010
  52. 52. Sunday, June 20, 2010
  53. 53. Sunday, June 20, 2010
  54. 54. Sunday, June 20, 2010
  55. 55. Sunday, June 20, 2010
  56. 56. http://labs.webmetrics.com/crowdsourceduptime Sunday, June 20, 2010
  57. 57. Sunday, June 20, 2010
  58. 58. Sunday, June 20, 2010
  59. 59. Sunday, June 20, 2010
  60. 60. Sunday, June 20, 2010
  61. 61. Recap Sunday, June 20, 2010
  62. 62. Your site will fail Sunday, June 20, 2010
  63. 63. Your site will fail + Downtime is bad Sunday, June 20, 2010
  64. 64. Your site will fail + Downtime is bad + Everyone will find out Sunday, June 20, 2010
  65. 65. Your site will fail + Downtime is bad + Everyone will find out = Screw it, I’ll become a lumberjack Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg Sunday, June 20, 2010
  66. 66. “Embrace fear of outages and degradation. Use it to guide your architecture, your code, your infrastructure. So lean into it.” -- John Allspaw, VP Tech. Ops at Etsy Sunday, June 20, 2010
  67. 67. Approach #2 Prepare for downtime Sunday, June 20, 2010
  68. 68. Disclaimer: Try hard to avoid downtime Sunday, June 20, 2010
  69. 69. Learning by example... Sunday, June 20, 2010
  70. 70. Case Study #1 Facebook Sunday, June 20, 2010
  71. 71. Sunday, June 20, 2010
  72. 72. Sunday, June 20, 2010
  73. 73. Sunday, June 20, 2010
  74. 74. Sunday, June 20, 2010
  75. 75. Sunday, June 20, 2010
  76. 76. Sunday, June 20, 2010
  77. 77. “The larger issue here isn't just that a portion of Facebook's platform has gone down - numerous web services have issues from time to time, including everything from Gmail to Twitter. An outage of this length, however, with no official communication from the company itself is disturbing.” -- N.Y. Times Sunday, June 20, 2010
  78. 78. Downtime Disturbing Facebook Sunday, June 20, 2010
  79. 79. Sunday, June 20, 2010
  80. 80. Case Study #2 Google App Engine Sunday, June 20, 2010
  81. 81. Sunday, June 20, 2010
  82. 82. Sunday, June 20, 2010
  83. 83. Sunday, June 20, 2010
  84. 84. Sunday, June 20, 2010
  85. 85. Sunday, June 20, 2010
  86. 86. Sunday, June 20, 2010
  87. 87. Sunday, June 20, 2010
  88. 88. Sunday, June 20, 2010
  89. 89. Sunday, June 20, 2010
  90. 90. Sunday, June 20, 2010
  91. 91. Sunday, June 20, 2010
  92. 92. Sunday, June 20, 2010
  93. 93. Sunday, June 20, 2010
  94. 94. Sunday, June 20, 2010
  95. 95. Downtime Kudos Google App Engine Sunday, June 20, 2010
  96. 96. Case Study #3 Atlassian Sunday, June 20, 2010
  97. 97. Sunday, June 20, 2010
  98. 98. Sunday, June 20, 2010
  99. 99. Sunday, June 20, 2010
  100. 100. Sunday, June 20, 2010
  101. 101. Sunday, June 20, 2010
  102. 102. Sunday, June 20, 2010
  103. 103. Sunday, June 20, 2010
  104. 104. Sunday, June 20, 2010
  105. 105. Sunday, June 20, 2010
  106. 106. Sunday, June 20, 2010
  107. 107. Sunday, June 20, 2010
  108. 108. Downtime Atlassian Bravo Sunday, June 20, 2010
  109. 109. http://atlassian.com/ Sunday, June 20, 2010
  110. 110. Downtime: Opportunity to Build Trust Sunday, June 20, 2010
  111. 111. Downtime: Opportunity to Destroy Trust Sunday, June 20, 2010
  112. 112. How To: Prepare for Downtime Sunday, June 20, 2010
  113. 113. Something > Nothing Sunday, June 20, 2010
  114. 114. Upside of Downtime Framework 1.0 Oh crapLife is good That sucked Time Sunday, June 20, 2010
  115. 115. Upside of Downtime Framework 1.0 CommunicatePrepare Explain Time Sunday, June 20, 2010
  116. 116. Upside of Downtime Framework 1.0 CommunicatePrepare Explain Time Sunday, June 20, 2010
  117. 117. Upside of Downtime Framework 1.0 CommunicatePrepare Explain Time Sunday, June 20, 2010
  118. 118. Upside of Downtime Framework 1.0 CommunicatePrepare Explain Time Sunday, June 20, 2010
  119. 119. CommunicatePrepare Explain Sunday, June 20, 2010
  120. 120. CommunicatePrepare Explain 1. Communication channel Sunday, June 20, 2010
  121. 121. 1. Communication channel Something is wrong Can’t tell if it’s me or you I’ll assume it’s you You suck CommunicatePrepare Explain Sunday, June 20, 2010
  122. 122. Something is wrong Can’t tell if it’s me or you I’ll assume it’s you I know it’s you Tell me when you’re back You suck a lot less CommunicatePrepare Explain 1. Communication channel Sunday, June 20, 2010
  123. 123. Sunday, June 20, 2010
  124. 124. Sunday, June 20, 2010
  125. 125. Sunday, June 20, 2010
  126. 126. Sunday, June 20, 2010
  127. 127. Sunday, June 20, 2010
  128. 128. Sunday, June 20, 2010
  129. 129. Sunday, June 20, 2010
  130. 130. Sunday, June 20, 2010
  131. 131. CommunicatePrepare Explain 1. Communication channel Easy to find Sunday, June 20, 2010
  132. 132. CommunicatePrepare Explain 1. Communication channel Easy to find Hosted off-site Sunday, June 20, 2010
  133. 133. CommunicatePrepare Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated Sunday, June 20, 2010
  134. 134. 7 keys for public health dashboards 1. Must show current status for each “service” 2. Data must be accurate and timely 3. Must be easy to find 4. Must provide details for events in real time 5. Provide historical uptime and performance data 6. Provide a way to be notified of status changes 7. Provide details on the data is gathered Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html Sunday, June 20, 2010
  135. 135. CommunicatePrepare Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Sunday, June 20, 2010
  136. 136. CommunicatePrepare Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Sunday, June 20, 2010
  137. 137. CommunicatePrepare Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) Sunday, June 20, 2010
  138. 138. CommunicatePrepare Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) On-call/drills/escalations/etc. Sunday, June 20, 2010
  139. 139. Your servers Sunday, June 20, 2010
  140. 140. Prepare ExplainCommunicate 1. Communicate Sunday, June 20, 2010
  141. 141. Prepare ExplainCommunicate 1. Communicate Use communication channel Sunday, June 20, 2010
  142. 142. Prepare ExplainCommunicate 1. Communicate Use communication channel MTTC Sunday, June 20, 2010
  143. 143. Prepare ExplainCommunicate 1. Communicate Use communication channel MTTC Who/what is affected Sunday, June 20, 2010
  144. 144. Prepare ExplainCommunicate 1. Communicate Use communication channel MTTC Who/what is affected When the incident started Sunday, June 20, 2010
  145. 145. Prepare ExplainCommunicate 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Sunday, June 20, 2010
  146. 146. Prepare ExplainCommunicate 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly Sunday, June 20, 2010
  147. 147. Prepare ExplainCommunicate 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly 2. Fix it! Sunday, June 20, 2010
  148. 148. Phew, close one! Sunday, June 20, 2010
  149. 149. Prepare ExplainCommunicate 1. Postmortem Sunday, June 20, 2010
  150. 150. Prepare ExplainCommunicate 1. Postmortem Admit failure Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/ Sunday, June 20, 2010
  151. 151. Prepare ExplainCommunicate 1. Postmortem Admit failure Sound like a human Source: http://www.bureauofcommunication.com/compose/apology Sunday, June 20, 2010
  152. 152. Prepare ExplainCommunicate “We apologize for any inconvenience this may have caused” Sunday, June 20, 2010
  153. 153. Prepare ExplainCommunicate 1. Postmortem Admit failure Sound like a human Start time and end time Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  154. 154. Prepare ExplainCommunicate 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/ Sunday, June 20, 2010
  155. 155. Prepare ExplainCommunicate 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html Sunday, June 20, 2010
  156. 156. Prepare ExplainCommunicate 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Source: http://graysky.org/2010/02/downtime-postmortem/ Sunday, June 20, 2010
  157. 157. Prepare ExplainCommunicate 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Sunday, June 20, 2010
  158. 158. Prepare ExplainCommunicate “I was completely overwhelmed by the amount of positive feedback and support I received.” Sunday, June 20, 2010
  159. 159. Prepare ExplainCommunicate 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned 2. Improve for the future Sunday, June 20, 2010
  160. 160. “Google is not just saying sorry, they are actually implementing serious changes which probably represents millions of dollars of development to help make sure this doesn't happen again.” Prepare ExplainCommunicate Source: http://news.ycombinator.com/item?id=1168493 Sunday, June 20, 2010
  161. 161. Prepare ExplainCommunicate Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  162. 162. Prepare ExplainCommunicate Be human Sunday, June 20, 2010
  163. 163. Prepare ExplainCommunicate Be authentic Sunday, June 20, 2010
  164. 164. Prepare ExplainCommunicate Be transparent Sunday, June 20, 2010
  165. 165. Prepare ExplainCommunicate Accept responsibility Sunday, June 20, 2010
  166. 166. Prepare ExplainCommunicate Learn and improve Sunday, June 20, 2010
  167. 167. Trust Prepare ExplainCommunicate Sunday, June 20, 2010
  168. 168. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  169. 169. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations Upside of Downtime Framework 1.0 Be HumanBe TransparentBe Prepared + + 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! Sunday, June 20, 2010
  170. 170. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations Upside of Downtime Framework 1.0 Be HumanBe TransparentBe Prepared + + Trust = 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! Sunday, June 20, 2010
  171. 171. Disclaimer: Don’t screw up too often Sunday, June 20, 2010
  172. 172. Sunday, June 20, 2010
  173. 173. Transparent Not Transparent Caught Not Caught Downtime Prisoner’s Dilemma Sunday, June 20, 2010
  174. 174. Transparent Not Transparent Caught Not Caught Win Downtime Prisoner’s Dilemma Sunday, June 20, 2010
  175. 175. Transparent Not Transparent Caught Not Caught Big Loss Win Downtime Prisoner’s Dilemma Sunday, June 20, 2010
  176. 176. Transparent Not Transparent Caught Not Caught Big Win Big Loss Win Downtime Prisoner’s Dilemma Sunday, June 20, 2010
  177. 177. Transparent Not Transparent Caught Not Caught Big Win Big Loss Win Win Downtime Prisoner’s Dilemma Sunday, June 20, 2010
  178. 178. Transparent Not Transparent Caught Not Caught Big Win Big Loss Win Win Downtime Prisoner’s Dilemma Sunday, June 20, 2010
  179. 179. Benefits Gain trust Reduce churn, increase loyalty Reduce support costs Ability to control the message Competitive advantage More time to focus on the actual problem Reduce stress Sunday, June 20, 2010
  180. 180. Change != Easy Sunday, June 20, 2010
  181. 181. Change != Impossible Sunday, June 20, 2010
  182. 182. Keys to Adoption Getting past a culture of “hide the problem” Sunday, June 20, 2010
  183. 183. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Sunday, June 20, 2010
  184. 184. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Sunday, June 20, 2010
  185. 185. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Sunday, June 20, 2010
  186. 186. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Buy-in Sunday, June 20, 2010
  187. 187. Product Management Support Sales/ Marketing Engineering/ Operations Sunday, June 20, 2010
  188. 188. Product Management Support Default: Lets wait for complaints Sales/ Marketing Engineering/ Operations Sunday, June 20, 2010
  189. 189. Product Management Support Default: Lets wait for complaints Reality: Proactiveness => Forgiveness Sales/ Marketing Engineering/ Operations Sunday, June 20, 2010
  190. 190. Product Management Support Reality: Proactiveness => Forgiveness Default: Too much work Sales/ Marketing Default: Lets wait for complaints Engineering/ Operations Sunday, June 20, 2010
  191. 191. Product Management Support Reality: Proactiveness => Forgiveness Default: Too much work Reality: More upfront, less when it matters Sales/ Marketing Default: Lets wait for complaints Engineering/ Operations Sunday, June 20, 2010
  192. 192. Product Management Support Reality: Proactiveness => Forgiveness Default: Too much work Reality: More upfront, less when it matters Default: Don’t want to look bad Sales/ Marketing Default: Lets wait for complaints Engineering/ Operations Sunday, June 20, 2010
  193. 193. Engineering/ Operations Product Management Support Reality: Proactiveness => Forgiveness Default: Too much work Reality: More upfront, less when it matters Default: Don’t want to look bad Reality: Opportunity to learn/improve Sales/ Marketing Default: Lets wait for complaints Sunday, June 20, 2010
  194. 194. Product Management Support Reality: Proactiveness => Forgiveness Default: Too much work Reality: More upfront, less when it matters Default: Don’t want to look bad Reality: Opportunity to learn/improve Default: I don’t want my customers to knowSales/ Marketing Default: Lets wait for complaints Engineering/ Operations Sunday, June 20, 2010
  195. 195. Product Management Support Reality: Proactiveness => Forgiveness Default: Too much work Reality: More upfront, less when it matters Default: Don’t want to look bad Reality: Opportunity to learn/improve Default: I don’t want my customers to know Reality: They’ll find out, better from us Sales/ Marketing Default: Lets wait for complaints Engineering/ Operations Sunday, June 20, 2010
  196. 196. Product Management Support Reality: Proactiveness => Forgiveness Default: Too much work Reality: More upfront, less when it matters Default: Don’t want to look bad Reality: Opportunity to learn/improve Default: I don’t want my customers to know Reality: They’ll find out, better from us Sales/ Marketing Default: Lets wait for complaints Engineering/ Operations Sunday, June 20, 2010
  197. 197. Source: http://delicious.com/lennysan/healthdashboard Sunday, June 20, 2010
  198. 198. Simple as that! Sunday, June 20, 2010
  199. 199. Your site will still fail! Sunday, June 20, 2010
  200. 200. “The measure of a society is how well it transforms pain and suffering into something worthwhile.” -- Fredrick Nietzsche Sunday, June 20, 2010
  201. 201. “The measure of a company is how well it transforms pain of downtime into something worthwhile.” -- Lenny Rachitsky Source: Original quote inspired by Fredrick Nietzsche Sunday, June 20, 2010
  202. 202. Bare minimum: Register a Twitter account Sunday, June 20, 2010
  203. 203. Lenny Rachitsky @lennysan http://www.transparentuptime.com/ Webmetrics/Neustar @webmetrics http://www.webmetrics.com/ Slides: http://bit.ly/upside-of-downtime Thank You Sunday, June 20, 2010
  204. 204. Bonus Sunday, June 20, 2010
  205. 205. Sunday, June 20, 2010
  206. 206. Sunday, June 20, 2010
  207. 207. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  208. 208. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  209. 209. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  210. 210. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 "Unlikely that an accidental surface or subsurface oil spill would occur from the proposed activities" -- Exploration and environmental impact plan Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion Sunday, June 20, 2010
  211. 211. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  212. 212. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  213. 213. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  214. 214. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  215. 215. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  216. 216. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  217. 217. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  218. 218. CommunicatePrepare Explain 1. Communication channel - Easy to find - Off-site - Real-time 2. Process - Give authority - M.T.T.C. - On-call/escalations 1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly 2. Fix it! 1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned 2. Learn and improve Upside of Downtime Framework 1.0 Sunday, June 20, 2010
  219. 219. “Be not afraid of transparency; some are born transparent, some achieve transparency, and others have transparency thrust upon them.” -- Burrowed from William Shakespeare Sunday, June 20, 2010
  220. 220. Sunday, June 20, 2010
  221. 221. Making change 1. Find the bright spots - (this presentation has a bunch) Sunday, June 20, 2010
  222. 222. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) Sunday, June 20, 2010
  223. 223. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) Sunday, June 20, 2010
  224. 224. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) Sunday, June 20, 2010
  225. 225. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) Sunday, June 20, 2010
  226. 226. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) Sunday, June 20, 2010
  227. 227. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) Sunday, June 20, 2010
  228. 228. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) Sunday, June 20, 2010
  229. 229. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) 9. Rally the herd - (get buy in, rest will follow) Sunday, June 20, 2010

×