3days september

436 views

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
436
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

3days september

  1. 1. THREE DAYS IN SEPTEMBER “Houston, We Have a Problem.” by Steve Feldman, @PerfForensics    
  2. 2. The Agenda I.  This is a True Story...It Really Did Happen II.  Houston, We Have a Problem. III.  The Really Good Vendors Care IV.  Getting to Zero V.  The Damage Was Already Done VI.  Where We Are Today
  3. 3. Houston, We Have a Problem...
  4. 4. Our Outage Affected our most Important Asset
  5. 5. Our Outage Was Caused By Human Error
  6. 6. NEVER REBOOT A UNIX MACHINE!
  7. 7. The Monitoring “Cameras” Should Always Be On
  8. 8. 24  
  9. 9. 25  
  10. 10. 26  
  11. 11. Keep Everyone Informed
  12. 12. Who Wants their Users to Report the Problem first?
  13. 13. Not All of the Data is Believable
  14. 14. Crisis are the Best Time to Determine the Strength of the Team
  15. 15. Keep Your Boss Informed
  16. 16. Keep Your Users Informed
  17. 17. Keep Your Users Updated
  18. 18. Continue to Keep Your Users Updated
  19. 19. Getting to Zero
  20. 20. Log  Consolida0on  
  21. 21. Continue to Keep Your Users Updated
  22. 22. It is Not Just About Restoring Service
  23. 23. It is OK to Admit Mistakes
  24. 24. Let Your Boss Take Credit
  25. 25. Your Boss Did Not Build a Fragile System
  26. 26. Do a Post-Mortem
  27. 27. The Problem Started Long Before
  28. 28. Where We are Today
  29. 29. Practice Really Matters
  30. 30. Practice Failure
  31. 31. Look at Your Manuals
  32. 32. Practice Routines and Roles
  33. 33. Practice Everyday
  34. 34. NEVER REBOOT A UNIX MACHINE!
  35. 35. Thanks for Listening @PerfForensics

×