SlideShare a Scribd company logo
1 of 33
Download to read offline
MTBF / MTTR
                        Availability or recoverability?

                        Presented by 
                        Michael Richardson, Energized Work
                        21 March 2012




ENERGIZED WORK
25 MACKLIN STREET
LONDON WC2B 5NN
+44 (0)20 7691 8933
WWW.ENERGIZEDWORK.COM
Michael Richardson
                                                Twitter: @mr_spb
                                                




                                                Email: michael@energizedwork.com
                                                
                                                #ewtektalk
                                                




© 2012 Energized Work - www.energizedwork.com                                       2
So what is 
high availability?

•      Five nines?
•      No single point of failures?
•      Multiple data centres?
•      Fault tolerance?
•      Load balancing?
•      Uptime?




© 2012 Energized Work - www.energizedwork.com   3
Nines
of availability
                                                    9       9
                                                9
                             9
                                                        9
9
                                                9           9
© 2012 Energized Work - www.energizedwork.com                   4
Nines
of availability

                   Availability
                Downtime per Year
                   One nine (90%)
              36.5 days
                   Two nines (99%)
             3.65 days
                   Three nines (99.9%)
         8.76 hours
                   Four nines (99.99%)
         52.56 minutes
                   Five nines (99.999%)
        5.26 minutes




© 2012 Energized Work - www.energizedwork.com                        5
Problem with
the nines

•  What do they mean?
•  Guaranteed or just an SLA?
•  Multiplicity (99.9% * 99.9% * 99.9% = 99.7%)




© 2012 Energized Work - www.energizedwork.com      6
SLA availability numbers
just aim to provide a level of
confidence in a website’s service




© 2012 Energized Work - www.energizedwork.com   7
No single point of failure
(SPOF)




© 2012 Energized Work - www.energizedwork.com   8
Two of everything?




© 2012 Energized Work - www.energizedwork.com   9
Start with this

                                                 Users




                                                Index.html




© 2012 Energized Work - www.energizedwork.com                10
End with this
                                                     Users


                                       Firewall 1                   Firewall 2



                                        Switch 1                    Switch 2




               WEB1                   WEB2          APP1     APP2                DB1   DB2

© 2012 Energized Work - www.energizedwork.com                                                11
Problems with
eliminating SPOF

•      It’s expensive
•      Where do you draw the line?
•      Are failures independent?
•      Can you guarantee no SPOF?
•      Increased complexity




© 2012 Energized Work - www.energizedwork.com   12
Problem:
Data centres fail




© 2012 Energized Work - www.energizedwork.com   13
Solution:
Get a second data centre




© 2012 Energized Work - www.energizedwork.com   14
Hot – Hot
multisite

•      Full range of services available in multiple locations
•      Easy to automate failover of sites
•      Data consistency is hard
•      Capacity planning concerns



                                                       +


© 2012 Energized Work - www.energizedwork.com                    15
Hot – Warm
multisite

•  Simpler than hot – hot
•  Read / Write ratio dependent
•  Synchronously or asynchronously replicate data?




                                                 +


© 2012 Energized Work - www.energizedwork.com         16
Hot – Cold
multisite

•      Easy to setup
•      Will it work?
•      Can it be trusted?
•      Cold site rapidly becomes stale
•      Is it actually valuable?


                                                +


© 2012 Energized Work - www.energizedwork.com       17
DR multisite


•  Fingers crossed you never need it
•  How can / should you test it?
•  Cloud?




                                                +


© 2012 Energized Work - www.energizedwork.com       18
Problems
with multiple sites

•      It’s expensive
•      Managing more systems
•      Managing data consistency
•      Managing capacity
•      Is it still fail proof?
•      Unless you test it, it’s just a plan





© 2012 Energized Work - www.energizedwork.com   19
We now have
a complex system




© 2012 Energized Work - www.energizedwork.com   20
Complex systems


•  More redundancy and automation leads to more complexity
•  More complexity often adds more points of failure





© 2012 Energized Work - www.energizedwork.com                 21
How complex systems fail
 - Dr. Richard Cook


•  Catastrophe is always just around the corner
•  Human operators have dual roles
•  Change introduces new forms of failure





© 2012 Energized Work - www.energizedwork.com      22
Failure and recovery




© 2012 Energized Work - www.energizedwork.com   23
Questions
for the business

•  What is the cost of downtime?
•  What are the Recovery Time Objectives (RTO)
•  What are the Recovery Point Objectives (RPO)?




© 2012 Energized Work - www.energizedwork.com       24
Aggressive RTO and RPO
are expensive and have a
performance impact




© 2012 Energized Work - www.energizedwork.com   25
RTO / RPO
example

Problem:
•  Simple DB
•  Business can tolerate up to 15 minutes downtime
•  10-minute window of data loss




© 2012 Energized Work - www.energizedwork.com         26
RTO / RPO
example

Possible solution:
•  Continuously replicate data to second host
•  Continue with nightly backups and also copy DB transaction logs
   from the primary host to another system




© 2012 Energized Work - www.energizedwork.com                        27
So what is more important –
increasing availability
or reducing recovery time?





© 2012 Energized Work - www.energizedwork.com   28
MTBF or MTTR?


What about MTTD?




© 2012 Energized Work - www.energizedwork.com   29
The answer is:
It depends




© 2012 Energized Work - www.energizedwork.com   30
Failure
is inevitable




© 2012 Energized Work - www.energizedwork.com   31
Ask anyone




© 2012 Energized Work - www.energizedwork.com   32
License
This presentation is provided under the Creative Commons 
Attribution Share Alike 3.0 Unported License.

               You are free:
                 
               To share – to copy, distribute and transmit the work
               
               To remix – to adapt the work
               
               
               Under the following conditions:
               
               Attribution – You must attribute the work in the manner specified by 
               Energized Work (but not in any way that suggests that Energized Work 
               endorse you or your use of the work).
               
               Share Alike – If you alter, transform, or build upon this work, you may 
               distribute the resulting work only under the same or similar license to this 
               one. 

                                                                                                ENERGIZED WORK
                                                                                                25 MACKLIN STREET
                                                                                                LONDON WC2B 5NN
                                                                                                +44 (0)20 7691 8933
© 2012 Energized Work - www.energizedwork.com                                                   WWW.ENERGIZEDWORK.COM
                                                                                                                    33

More Related Content

What's hot

Fmea presentation
Fmea presentationFmea presentation
Fmea presentation
Murat Terzi
 
Top 10 mechanical maintenance engineer interview questions and answers
Top 10 mechanical maintenance engineer interview questions and answersTop 10 mechanical maintenance engineer interview questions and answers
Top 10 mechanical maintenance engineer interview questions and answers
robin26331
 
Poka yoke (mistake proofing)
Poka yoke (mistake proofing)Poka yoke (mistake proofing)
Poka yoke (mistake proofing)
Animesh Khamesra
 

What's hot (20)

My machine camp
My machine campMy machine camp
My machine camp
 
tpm presentation
tpm presentationtpm presentation
tpm presentation
 
Chokotei, What Is It?
Chokotei, What Is It?Chokotei, What Is It?
Chokotei, What Is It?
 
Failure Mode & Effects Analysis (FMEA)
Failure Mode & Effects Analysis (FMEA)Failure Mode & Effects Analysis (FMEA)
Failure Mode & Effects Analysis (FMEA)
 
Failure Modes & Effects Analysis (FMEA)
Failure Modes & Effects Analysis (FMEA)Failure Modes & Effects Analysis (FMEA)
Failure Modes & Effects Analysis (FMEA)
 
DESIGN FMEA TRAINING FOR LITENS AUTOMOTIVE
DESIGN FMEA TRAINING FOR LITENS AUTOMOTIVE DESIGN FMEA TRAINING FOR LITENS AUTOMOTIVE
DESIGN FMEA TRAINING FOR LITENS AUTOMOTIVE
 
8 Steps To Success In Maintenance Planning And Scheduling
8 Steps To Success In Maintenance Planning And Scheduling8 Steps To Success In Maintenance Planning And Scheduling
8 Steps To Success In Maintenance Planning And Scheduling
 
Process fmea
Process fmea Process fmea
Process fmea
 
8 D – Problem Solving Process
8 D – Problem Solving Process8 D – Problem Solving Process
8 D – Problem Solving Process
 
Diagnostic Techniques Notes
Diagnostic Techniques NotesDiagnostic Techniques Notes
Diagnostic Techniques Notes
 
8D analysis presentation
8D analysis presentation8D analysis presentation
8D analysis presentation
 
TPM - tech talk
TPM - tech talk TPM - tech talk
TPM - tech talk
 
Fmea presentation
Fmea presentationFmea presentation
Fmea presentation
 
5 why analysis
5 why analysis5 why analysis
5 why analysis
 
Top 10 mechanical maintenance engineer interview questions and answers
Top 10 mechanical maintenance engineer interview questions and answersTop 10 mechanical maintenance engineer interview questions and answers
Top 10 mechanical maintenance engineer interview questions and answers
 
Autonomous Maintenance
Autonomous MaintenanceAutonomous Maintenance
Autonomous Maintenance
 
Presentation on Condition Monitoring
Presentation on Condition MonitoringPresentation on Condition Monitoring
Presentation on Condition Monitoring
 
Poka yoke (mistake proofing)
Poka yoke (mistake proofing)Poka yoke (mistake proofing)
Poka yoke (mistake proofing)
 
16 major losses tng
16 major losses tng16 major losses tng
16 major losses tng
 
Advanced Product Quality Planning presentation
Advanced Product Quality Planning presentationAdvanced Product Quality Planning presentation
Advanced Product Quality Planning presentation
 

Viewers also liked

Basics in Maintenance
Basics in MaintenanceBasics in Maintenance
Basics in Maintenance
raghuttam
 
Metastability,MTBF,synchronizer & synchronizer failure
Metastability,MTBF,synchronizer & synchronizer failureMetastability,MTBF,synchronizer & synchronizer failure
Metastability,MTBF,synchronizer & synchronizer failure
prashant singh
 
Principles of RF Microwave Power Measurement
Principles of RF Microwave Power MeasurementPrinciples of RF Microwave Power Measurement
Principles of RF Microwave Power Measurement
Robert Kirchhoefer
 
Rf power measurement
Rf power measurement Rf power measurement
Rf power measurement
ruwaghmare
 
پروژه ویدئو کنفرانس شرکت پارس حیات
پروژه ویدئو کنفرانس شرکت پارس حیاتپروژه ویدئو کنفرانس شرکت پارس حیات
پروژه ویدئو کنفرانس شرکت پارس حیات
شرکت مهندسی نوآوران تحقیق
 
راه اندازی ویدئو پروژکتور در قدیمی ترین دبیرستان تهران - دبیرستان علوی
راه اندازی ویدئو پروژکتور در قدیمی ترین دبیرستان تهران - دبیرستان علویراه اندازی ویدئو پروژکتور در قدیمی ترین دبیرستان تهران - دبیرستان علوی
راه اندازی ویدئو پروژکتور در قدیمی ترین دبیرستان تهران - دبیرستان علوی
شرکت مهندسی نوآوران تحقیق
 

Viewers also liked (20)

Reliability - Availability
Reliability -  AvailabilityReliability -  Availability
Reliability - Availability
 
MTTR
MTTRMTTR
MTTR
 
Basics in Maintenance
Basics in MaintenanceBasics in Maintenance
Basics in Maintenance
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
Reliability engineering ppt-Internship
Reliability engineering ppt-InternshipReliability engineering ppt-Internship
Reliability engineering ppt-Internship
 
mtbf
mtbfmtbf
mtbf
 
mttr
mttrmttr
mttr
 
Misuses of MTBF
Misuses of MTBFMisuses of MTBF
Misuses of MTBF
 
Metastability,MTBF,synchronizer & synchronizer failure
Metastability,MTBF,synchronizer & synchronizer failureMetastability,MTBF,synchronizer & synchronizer failure
Metastability,MTBF,synchronizer & synchronizer failure
 
Overview and Basic Maintenance
Overview and Basic MaintenanceOverview and Basic Maintenance
Overview and Basic Maintenance
 
A Proposal for an Alternative to MTBF/MTTF
A Proposal for an Alternative to MTBF/MTTFA Proposal for an Alternative to MTBF/MTTF
A Proposal for an Alternative to MTBF/MTTF
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 
Principles of RF Microwave Power Measurement
Principles of RF Microwave Power MeasurementPrinciples of RF Microwave Power Measurement
Principles of RF Microwave Power Measurement
 
Rf power measurement
Rf power measurement Rf power measurement
Rf power measurement
 
Alternatives to MTBF
Alternatives to MTBF Alternatives to MTBF
Alternatives to MTBF
 
Trapped by MTBF
Trapped by MTBFTrapped by MTBF
Trapped by MTBF
 
پروژه ویدئو کنفرانس شرکت پارس حیات
پروژه ویدئو کنفرانس شرکت پارس حیاتپروژه ویدئو کنفرانس شرکت پارس حیات
پروژه ویدئو کنفرانس شرکت پارس حیات
 
Ltx 2003 q1_kpi
Ltx 2003 q1_kpiLtx 2003 q1_kpi
Ltx 2003 q1_kpi
 
راه اندازی ویدئو پروژکتور در قدیمی ترین دبیرستان تهران - دبیرستان علوی
راه اندازی ویدئو پروژکتور در قدیمی ترین دبیرستان تهران - دبیرستان علویراه اندازی ویدئو پروژکتور در قدیمی ترین دبیرستان تهران - دبیرستان علوی
راه اندازی ویدئو پروژکتور در قدیمی ترین دبیرستان تهران - دبیرستان علوی
 
Sf6 gas properties
Sf6 gas propertiesSf6 gas properties
Sf6 gas properties
 

Similar to MTBF / MTTR - Energized Work TekTalk, Mar 2012

Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and Tungsten
Jeff Mace
 
Why the Cloud matters for Encoding
Why the Cloud matters for EncodingWhy the Cloud matters for Encoding
Why the Cloud matters for Encoding
Brightcove
 
Scaling mature systems
Scaling mature systemsScaling mature systems
Scaling mature systems
HanMorten
 

Similar to MTBF / MTTR - Energized Work TekTalk, Mar 2012 (20)

System Availability Talk
System Availability TalkSystem Availability Talk
System Availability Talk
 
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and Tungsten
 
Specifics of Managing Large, Complex Projects
Specifics of Managing Large, Complex ProjectsSpecifics of Managing Large, Complex Projects
Specifics of Managing Large, Complex Projects
 
Getting Started Developing with Platform as a Service
Getting Started Developing with Platform as a ServiceGetting Started Developing with Platform as a Service
Getting Started Developing with Platform as a Service
 
Large Complex Projects (PMI-MY presentation Sept 2012)
Large Complex Projects (PMI-MY presentation Sept 2012)Large Complex Projects (PMI-MY presentation Sept 2012)
Large Complex Projects (PMI-MY presentation Sept 2012)
 
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
 
How to Plan and Budget for 2013 with Cloud in Mind
How to Plan and Budget for 2013 with Cloud in MindHow to Plan and Budget for 2013 with Cloud in Mind
How to Plan and Budget for 2013 with Cloud in Mind
 
MySQL vs NoSQL
MySQL vs NoSQLMySQL vs NoSQL
MySQL vs NoSQL
 
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
 
JVM Multitenancy (JavaOne 2012)
JVM Multitenancy (JavaOne 2012)JVM Multitenancy (JavaOne 2012)
JVM Multitenancy (JavaOne 2012)
 
Diving Deeper into DevOps Deployments
Diving Deeper into DevOps DeploymentsDiving Deeper into DevOps Deployments
Diving Deeper into DevOps Deployments
 
Building Agile Data Warehouses with Ralph Hughes
Building Agile Data Warehouses with Ralph HughesBuilding Agile Data Warehouses with Ralph Hughes
Building Agile Data Warehouses with Ralph Hughes
 
Get Loose! Microservices and Loosely Coupled Architectures
Get Loose! Microservices and Loosely Coupled ArchitecturesGet Loose! Microservices and Loosely Coupled Architectures
Get Loose! Microservices and Loosely Coupled Architectures
 
Get Loose! Microservices and Loosely Coupled Architectures
Get Loose! Microservices and Loosely Coupled Architectures Get Loose! Microservices and Loosely Coupled Architectures
Get Loose! Microservices and Loosely Coupled Architectures
 
Virtual Worlds: A Future History
Virtual Worlds: A Future HistoryVirtual Worlds: A Future History
Virtual Worlds: A Future History
 
Why the Cloud matters for Encoding
Why the Cloud matters for EncodingWhy the Cloud matters for Encoding
Why the Cloud matters for Encoding
 
Scaling mature systems
Scaling mature systemsScaling mature systems
Scaling mature systems
 
The 10 biggest metering and billing mistakes
The 10 biggest metering and billing mistakesThe 10 biggest metering and billing mistakes
The 10 biggest metering and billing mistakes
 
Micro frontends with react and redux dev day
Micro frontends with react and redux   dev dayMicro frontends with react and redux   dev day
Micro frontends with react and redux dev day
 
10 Do’s for DevOps!
 10 Do’s for DevOps!  10 Do’s for DevOps!
10 Do’s for DevOps!
 

More from Energized Work

Experience report on agile tools for management teams
Experience report on agile tools for management teamsExperience report on agile tools for management teams
Experience report on agile tools for management teams
Energized Work
 
Business model innovation by experimentation
Business model innovation by experimentationBusiness model innovation by experimentation
Business model innovation by experimentation
Energized Work
 
Debugging Grails Database Performance
Debugging Grails Database PerformanceDebugging Grails Database Performance
Debugging Grails Database Performance
Energized Work
 
Energized Work: Software that means business
Energized Work: Software that means businessEnergized Work: Software that means business
Energized Work: Software that means business
Energized Work
 
Leaning - Energized Work Presentation
Leaning - Energized Work PresentationLeaning - Energized Work Presentation
Leaning - Energized Work Presentation
Energized Work
 

More from Energized Work (11)

Agile Practitioners Feedback to improve teams
Agile Practitioners Feedback to improve teamsAgile Practitioners Feedback to improve teams
Agile Practitioners Feedback to improve teams
 
Surviving SOA
Surviving SOASurviving SOA
Surviving SOA
 
Experience report on agile tools for management teams
Experience report on agile tools for management teamsExperience report on agile tools for management teams
Experience report on agile tools for management teams
 
Innovation Governance
Innovation GovernanceInnovation Governance
Innovation Governance
 
Business model innovation by experimentation
Business model innovation by experimentationBusiness model innovation by experimentation
Business model innovation by experimentation
 
Debugging Grails Database Performance
Debugging Grails Database PerformanceDebugging Grails Database Performance
Debugging Grails Database Performance
 
Governance - Friend or Foe?
Governance - Friend or Foe?Governance - Friend or Foe?
Governance - Friend or Foe?
 
Energized Work: Software that means business
Energized Work: Software that means businessEnergized Work: Software that means business
Energized Work: Software that means business
 
Product Development in the Land of the Free - Energized Work Presentation
Product Development in the Land of the Free - Energized Work PresentationProduct Development in the Land of the Free - Energized Work Presentation
Product Development in the Land of the Free - Energized Work Presentation
 
Leaning - Energized Work Presentation
Leaning - Energized Work PresentationLeaning - Energized Work Presentation
Leaning - Energized Work Presentation
 
Concept to Cash - Energized Work Presentation
Concept to Cash - Energized Work PresentationConcept to Cash - Energized Work Presentation
Concept to Cash - Energized Work Presentation
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

MTBF / MTTR - Energized Work TekTalk, Mar 2012

  • 1. MTBF / MTTR Availability or recoverability? Presented by Michael Richardson, Energized Work 21 March 2012 ENERGIZED WORK 25 MACKLIN STREET LONDON WC2B 5NN +44 (0)20 7691 8933 WWW.ENERGIZEDWORK.COM
  • 2. Michael Richardson Twitter: @mr_spb Email: michael@energizedwork.com #ewtektalk © 2012 Energized Work - www.energizedwork.com 2
  • 3. So what is high availability? •  Five nines? •  No single point of failures? •  Multiple data centres? •  Fault tolerance? •  Load balancing? •  Uptime? © 2012 Energized Work - www.energizedwork.com 3
  • 4. Nines of availability 9 9 9 9 9 9 9 9 © 2012 Energized Work - www.energizedwork.com 4
  • 5. Nines of availability Availability Downtime per Year One nine (90%) 36.5 days Two nines (99%) 3.65 days Three nines (99.9%) 8.76 hours Four nines (99.99%) 52.56 minutes Five nines (99.999%) 5.26 minutes © 2012 Energized Work - www.energizedwork.com 5
  • 6. Problem with the nines •  What do they mean? •  Guaranteed or just an SLA? •  Multiplicity (99.9% * 99.9% * 99.9% = 99.7%) © 2012 Energized Work - www.energizedwork.com 6
  • 7. SLA availability numbers just aim to provide a level of confidence in a website’s service © 2012 Energized Work - www.energizedwork.com 7
  • 8. No single point of failure (SPOF) © 2012 Energized Work - www.energizedwork.com 8
  • 9. Two of everything? © 2012 Energized Work - www.energizedwork.com 9
  • 10. Start with this Users Index.html © 2012 Energized Work - www.energizedwork.com 10
  • 11. End with this Users Firewall 1 Firewall 2 Switch 1 Switch 2 WEB1 WEB2 APP1 APP2 DB1 DB2 © 2012 Energized Work - www.energizedwork.com 11
  • 12. Problems with eliminating SPOF •  It’s expensive •  Where do you draw the line? •  Are failures independent? •  Can you guarantee no SPOF? •  Increased complexity © 2012 Energized Work - www.energizedwork.com 12
  • 13. Problem: Data centres fail © 2012 Energized Work - www.energizedwork.com 13
  • 14. Solution: Get a second data centre © 2012 Energized Work - www.energizedwork.com 14
  • 15. Hot – Hot multisite •  Full range of services available in multiple locations •  Easy to automate failover of sites •  Data consistency is hard •  Capacity planning concerns + © 2012 Energized Work - www.energizedwork.com 15
  • 16. Hot – Warm multisite •  Simpler than hot – hot •  Read / Write ratio dependent •  Synchronously or asynchronously replicate data? + © 2012 Energized Work - www.energizedwork.com 16
  • 17. Hot – Cold multisite •  Easy to setup •  Will it work? •  Can it be trusted? •  Cold site rapidly becomes stale •  Is it actually valuable? + © 2012 Energized Work - www.energizedwork.com 17
  • 18. DR multisite •  Fingers crossed you never need it •  How can / should you test it? •  Cloud? + © 2012 Energized Work - www.energizedwork.com 18
  • 19. Problems with multiple sites •  It’s expensive •  Managing more systems •  Managing data consistency •  Managing capacity •  Is it still fail proof? •  Unless you test it, it’s just a plan © 2012 Energized Work - www.energizedwork.com 19
  • 20. We now have a complex system © 2012 Energized Work - www.energizedwork.com 20
  • 21. Complex systems •  More redundancy and automation leads to more complexity •  More complexity often adds more points of failure © 2012 Energized Work - www.energizedwork.com 21
  • 22. How complex systems fail - Dr. Richard Cook •  Catastrophe is always just around the corner •  Human operators have dual roles •  Change introduces new forms of failure © 2012 Energized Work - www.energizedwork.com 22
  • 23. Failure and recovery © 2012 Energized Work - www.energizedwork.com 23
  • 24. Questions for the business •  What is the cost of downtime? •  What are the Recovery Time Objectives (RTO) •  What are the Recovery Point Objectives (RPO)? © 2012 Energized Work - www.energizedwork.com 24
  • 25. Aggressive RTO and RPO are expensive and have a performance impact © 2012 Energized Work - www.energizedwork.com 25
  • 26. RTO / RPO example Problem: •  Simple DB •  Business can tolerate up to 15 minutes downtime •  10-minute window of data loss © 2012 Energized Work - www.energizedwork.com 26
  • 27. RTO / RPO example Possible solution: •  Continuously replicate data to second host •  Continue with nightly backups and also copy DB transaction logs from the primary host to another system © 2012 Energized Work - www.energizedwork.com 27
  • 28. So what is more important – increasing availability or reducing recovery time? © 2012 Energized Work - www.energizedwork.com 28
  • 29. MTBF or MTTR? What about MTTD? © 2012 Energized Work - www.energizedwork.com 29
  • 30. The answer is: It depends © 2012 Energized Work - www.energizedwork.com 30
  • 31. Failure is inevitable © 2012 Energized Work - www.energizedwork.com 31
  • 32. Ask anyone © 2012 Energized Work - www.energizedwork.com 32
  • 33. License This presentation is provided under the Creative Commons Attribution Share Alike 3.0 Unported License. You are free: To share – to copy, distribute and transmit the work To remix – to adapt the work Under the following conditions: Attribution – You must attribute the work in the manner specified by Energized Work (but not in any way that suggests that Energized Work endorse you or your use of the work). Share Alike – If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. ENERGIZED WORK 25 MACKLIN STREET LONDON WC2B 5NN +44 (0)20 7691 8933 © 2012 Energized Work - www.energizedwork.com WWW.ENERGIZEDWORK.COM 33