Your SlideShare is downloading. ×
MTBF / MTTR - Energized Work TekTalk, Mar 2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

MTBF / MTTR - Energized Work TekTalk, Mar 2012

2,132
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,132
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
33
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MTBF / MTTR Availability or recoverability? Presented by Michael Richardson, Energized Work 21 March 2012ENERGIZED WORK25 MACKLIN STREETLONDON WC2B 5NN+44 (0)20 7691 8933WWW.ENERGIZEDWORK.COM
  • 2. Michael Richardson Twitter: @mr_spb Email: michael@energizedwork.com #ewtektalk © 2012 Energized Work - www.energizedwork.com 2
  • 3. So what is high availability?•  Five nines?•  No single point of failures?•  Multiple data centres?•  Fault tolerance?•  Load balancing?•  Uptime?© 2012 Energized Work - www.energizedwork.com 3
  • 4. Ninesof availability 9 9 9 9 99 9 9© 2012 Energized Work - www.energizedwork.com 4
  • 5. Ninesof availability Availability Downtime per Year One nine (90%) 36.5 days Two nines (99%) 3.65 days Three nines (99.9%) 8.76 hours Four nines (99.99%) 52.56 minutes Five nines (99.999%) 5.26 minutes© 2012 Energized Work - www.energizedwork.com 5
  • 6. Problem withthe nines•  What do they mean?•  Guaranteed or just an SLA?•  Multiplicity (99.9% * 99.9% * 99.9% = 99.7%)© 2012 Energized Work - www.energizedwork.com 6
  • 7. SLA availability numbersjust aim to provide a level ofconfidence in a website’s service© 2012 Energized Work - www.energizedwork.com 7
  • 8. No single point of failure(SPOF)© 2012 Energized Work - www.energizedwork.com 8
  • 9. Two of everything?© 2012 Energized Work - www.energizedwork.com 9
  • 10. Start with this Users Index.html© 2012 Energized Work - www.energizedwork.com 10
  • 11. End with this Users Firewall 1 Firewall 2 Switch 1 Switch 2 WEB1 WEB2 APP1 APP2 DB1 DB2© 2012 Energized Work - www.energizedwork.com 11
  • 12. Problems witheliminating SPOF•  It’s expensive•  Where do you draw the line?•  Are failures independent?•  Can you guarantee no SPOF?•  Increased complexity© 2012 Energized Work - www.energizedwork.com 12
  • 13. Problem:Data centres fail© 2012 Energized Work - www.energizedwork.com 13
  • 14. Solution:Get a second data centre© 2012 Energized Work - www.energizedwork.com 14
  • 15. Hot – Hotmultisite•  Full range of services available in multiple locations•  Easy to automate failover of sites•  Data consistency is hard•  Capacity planning concerns +© 2012 Energized Work - www.energizedwork.com 15
  • 16. Hot – Warmmultisite•  Simpler than hot – hot•  Read / Write ratio dependent•  Synchronously or asynchronously replicate data? +© 2012 Energized Work - www.energizedwork.com 16
  • 17. Hot – Coldmultisite•  Easy to setup•  Will it work?•  Can it be trusted?•  Cold site rapidly becomes stale•  Is it actually valuable? +© 2012 Energized Work - www.energizedwork.com 17
  • 18. DR multisite•  Fingers crossed you never need it•  How can / should you test it?•  Cloud? +© 2012 Energized Work - www.energizedwork.com 18
  • 19. Problemswith multiple sites•  It’s expensive•  Managing more systems•  Managing data consistency•  Managing capacity•  Is it still fail proof?•  Unless you test it, it’s just a plan© 2012 Energized Work - www.energizedwork.com 19
  • 20. We now havea complex system© 2012 Energized Work - www.energizedwork.com 20
  • 21. Complex systems•  More redundancy and automation leads to more complexity•  More complexity often adds more points of failure© 2012 Energized Work - www.energizedwork.com 21
  • 22. How complex systems fail - Dr. Richard Cook•  Catastrophe is always just around the corner•  Human operators have dual roles•  Change introduces new forms of failure© 2012 Energized Work - www.energizedwork.com 22
  • 23. Failure and recovery© 2012 Energized Work - www.energizedwork.com 23
  • 24. Questionsfor the business•  What is the cost of downtime?•  What are the Recovery Time Objectives (RTO)•  What are the Recovery Point Objectives (RPO)?© 2012 Energized Work - www.energizedwork.com 24
  • 25. Aggressive RTO and RPOare expensive and have aperformance impact© 2012 Energized Work - www.energizedwork.com 25
  • 26. RTO / RPOexampleProblem:•  Simple DB•  Business can tolerate up to 15 minutes downtime•  10-minute window of data loss© 2012 Energized Work - www.energizedwork.com 26
  • 27. RTO / RPOexamplePossible solution:•  Continuously replicate data to second host•  Continue with nightly backups and also copy DB transaction logs from the primary host to another system© 2012 Energized Work - www.energizedwork.com 27
  • 28. So what is more important –increasing availabilityor reducing recovery time?© 2012 Energized Work - www.energizedwork.com 28
  • 29. MTBF or MTTR?What about MTTD?© 2012 Energized Work - www.energizedwork.com 29
  • 30. The answer is:It depends© 2012 Energized Work - www.energizedwork.com 30
  • 31. Failureis inevitable© 2012 Energized Work - www.energizedwork.com 31
  • 32. Ask anyone© 2012 Energized Work - www.energizedwork.com 32
  • 33. LicenseThis presentation is provided under the Creative Commons Attribution Share Alike 3.0 Unported License. You are free: To share – to copy, distribute and transmit the work To remix – to adapt the work Under the following conditions: Attribution – You must attribute the work in the manner specified by Energized Work (but not in any way that suggests that Energized Work endorse you or your use of the work). Share Alike – If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. ENERGIZED WORK 25 MACKLIN STREET LONDON WC2B 5NN +44 (0)20 7691 8933© 2012 Energized Work - www.energizedwork.com WWW.ENERGIZEDWORK.COM 33