SlideShare a Scribd company logo
1 of 14
MTTR “>>” MTTF
Armando Fox, June 2002 ROC Retreat
© 2002 Armando Fox
Low MTTR Beats High MTTF
 Previous ROC gospel:
 A = MTTF / (MTTF+MTTR)
 10x decrease MTTR just as good as 10x increase MTTF
 New ROC gospel?:
 10x decrease MTTR better than 10x increase MTTF
 In fact, decreasing MTTR may even beat a proportionally larger
increase in MTTF (ie less improvement in A)
© 2002 Armando Fox
Why Focus on MTTR?
1. Today’s MTTF’s cannot be directly verified by most
customers. MTTR’s can, thus MTTR claims are verifiable.
 “For better or worse, benchmarks shape a field”
2. For end-user-interactive services, lowering MTTR directly
improves user experience of a specific outage, and directly
reduces impact to operator ($$ and customer loyalty).
Increasing MTTF does neither, as long as MTTF is greater
than the length of one user session.
© 2002 Armando Fox
MTTF Can’t Be Directly Verified
 Today’s availabilities for data-center-based Internet sites:
between 0.99 and 0.999 [Gray and others, 2001]
 Recall A is defined as MTTF/(MTTF+MTTR)
 A=0.99 to 0.999 implies MTTF is 100x to 1000x MTTR
 Hardware: Today’s disk MTTF’s >100 years, but MTTR’s for complex
software ~ hours or tens of hours
 Software: ~30-year MTTF, based on latent software bugs [Gray,
HDCC01]
 Result: verifying MTTF requires observing many system-
years of operation; beyond the reach of most customers
© 2002 Armando Fox
MTTF Can’t Be Directly Verified (cont.)
 Vendor MTTF’s don’t capture environmental/operator errors
 MS’s 2001 Web properties outage was due to operator error
 “Five nines” as advertised implies sites will be up for next 250yrs
 Result: high MTTF can’t guarantee a failure-free interval - only tells
you the chance something will happen (under best circumstances)
 But downtime cost is incurred by impact of specific outages - not by
the likelihood of outages
 So what are the costs of outages?
 (Direct) dollar cost in lost revenue during downtime?
 (Indirect) temporary/permanent loss of customers?
 (Indirect?) effect on company’s credibility -> investor confidence
© 2002 Armando Fox
A Motivational Anecdote about Ebay
 Recent software-related outages: 4.5 hours in Apr02, 22
hours in Jun99, 7 hours in May99, 9 hours in Dec98
 Assume two 4-hour (“newsworthy”) outages/year
 A=(182*24 hours)/(182*24 + 4 hours) = 99.9%
 Dollar cost: Ebay policy for >2 hour outage, fees credited to all
affected users (US$3-5M for Jun99)
 Customer loyalty: after Jun99 outage, Yahoo Auctions reported
statistically significant increase in users
 Stock: Ebay’s market cap dropped US$4B after Jun99 outage
 What about a 10-minute outage once per week?
 A=(7*24 hours)/(7*24 + 1/6 hours) = 99.9% - the same
 Can we quantify “savings” over the previous scenario?
© 2002 Armando Fox
End-user Impact of MTTR
 Thresholds from HCI on user impatience (Miller, 1968)
 Miller, 1968: >1sec “sluggish”, >10sec “distracted” (user moves on
to another task)
 2001 Web user study: Tok~5 sec “acceptable”, Tstop ~10 sec
“excessively slow”
 much more forgiving on both if incremental page views used
 Note, the above thresholds appear to be technology-independent
 If S is steady-state latency of site response, then:
 MTTR ≤ Tok– S: failure effectively masked (weak motivation to
reduce MTTR further)
 Tok – S ≤ MTTR ≤ Tstop– S: user annoyed but unlikely to give up
(individual judgment of users will prevail)
 MTTR ≥ Tstop – S: most users will likely give up, maybe click over to
competitor
© 2002 Armando Fox
Outages: how long is too long?
 Ebay user tasks = auction browsing and bidding
 Number of auctions affected is proportional to duration of outage
 Assuming auction end-times are approx. uniformly distributed
 Assuming # of active auctions is correlated with # of active users,
duration of a single outage is proportional to # affected users
 another (fictitious) example: failure of dynamic content
generation for a news site. What is critical outage duration?
 Fallback = serve cached (stale) content
 Theadline: how quickly updates to “headline” news must be visible
 Tother: same, for “second claass” news
 Suggests different MTTR requirements for front-ends (Tstop), small
content-gen for headline news (Theadline), larger content-gen for “old”
news (Tother)
© 2002 Armando Fox
MTTR as a utility function
 When an outage occurs during normal operation, what is
“usefulness” to each affected end-user of application as a
function of MTTR?
 We can consider 2 things:
 Length of recovery time
 Level of service available during recovery
 A generic utility curve for recovery time
 Threshold points and shape of curved
part may differ widely for different apps
 Interactive vs. noninteractive may be
a key distinction
Tok-S Tstop-S
© 2002 Armando Fox
Level of service during recovery
 Many “server farm” systems allow a subset of nodes to fail
and redistribute work among remaining good nodes
 Assume N nodes, k simultaneous failures, similar offered load
 Option 1 - k/N spare capacity on each node, or k standbys
 no perceptible performance degradation, but cost of idle resources
 Option 2 - turn away k/N work using admission control
 Will those users come back? What’s their “utility threshold” for
suffering inconvenience? (eg Ebay example)
 If cost of admission control is reflected in latency of requests that
are served, must ensure S+f(k/N) < Tstop (or admission control is for
naught)
© 2002 Armando Fox
Level of service during recovery, cont.
 Option 3 - keep latency and throughput, degrade quality of
service
 E.g. harvest/yield - can trade data per query vs. number of queries
 E.g. CNN.com front page - can adopt “above-the-fold” format to
reduce amount of work per user (also “minimal” format)
 E.g. dynamic content service - use caching and regenerate less
content (more staleness)
 In all cases, can use technology-independent thresholds for
length of the degraded service
© 2002 Armando Fox
Some questions that arise
 If users are accustomed to some steady-state latency…
 for how long will they tolerate temporary degradation?
 how much degradation?
 Do they show a preference for increased latency vs. worse QOS vs.
being turned away and incentivized to return?
 For a given app, which tradeoffs are proportionally better
than others?
 Ebay: can’t afford to show “stale” auction prices
 vs CNN: “above-the-fold” lead story may be better than all stories
slowly
© 2002 Armando Fox
Motivation to focus on reducing MTTR
 Stateful components often have long recovery times
 Database: minutes to hours
 Oracle “fast recovery” trades frequency of checkpointing (hence
steady-state throughput) for fast recovery
 What about building state from multiple redundant copies of
stateless components?
 Can we reduce recovery time by settling for probabilistic (bounded-
lifetime) durability and probabilistic consistency (with detectable
inconsistency)? (RAINS)
 For what limited-lifetime state is this a good idea? “Shopping cart”?
Session? User profile?
© 2002 Armando Fox
Summary
 MTTR can be directly measured, verified
 Costs of downtime often arise not from too low Availability
(whatever that is…) but too high MTTR
 Technology-independent thresholds for user satisfaction can
be used as a guideline for system response time and target
for MTTR

More Related Content

Similar to fox.ppt

Enterprise applications in the cloud - are providers ready?
Enterprise applications in the cloud - are providers ready?Enterprise applications in the cloud - are providers ready?
Enterprise applications in the cloud - are providers ready?Leonid Grinshpan, Ph.D.
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsAmr Awadallah
 
SOA, Microservices and Event Driven Architecture
SOA, Microservices and Event Driven ArchitectureSOA, Microservices and Event Driven Architecture
SOA, Microservices and Event Driven ArchitectureJeppe Cramon
 
[Solace] Open Data Movement for Connected Vehicles
[Solace] Open Data Movement for Connected Vehicles[Solace] Open Data Movement for Connected Vehicles
[Solace] Open Data Movement for Connected VehiclesTomo Yamaguchi
 
Disaster recovery - What, Why, and How
Disaster recovery - What, Why, and HowDisaster recovery - What, Why, and How
Disaster recovery - What, Why, and HowManish Pandit
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloudMichael Kopp
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsDynatrace
 
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)Jan Penninkhof
 
Nfr testing(performance)
Nfr testing(performance)Nfr testing(performance)
Nfr testing(performance)Dilip Sharma
 
Supercharge Your Applications
Supercharge Your ApplicationsSupercharge Your Applications
Supercharge Your ApplicationsSean Boiling
 
Economics of Cloud Computing (Jazoon'11)
Economics of Cloud Computing (Jazoon'11)Economics of Cloud Computing (Jazoon'11)
Economics of Cloud Computing (Jazoon'11)Netcetera
 
Effektives Consulting - Performance Engineering
Effektives Consulting - Performance EngineeringEffektives Consulting - Performance Engineering
Effektives Consulting - Performance Engineeringhitdhits
 
Public Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLPublic Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLEDB
 
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017Andrew Miller
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineAndreas Grabner
 
Full Consistency Lag and its Applications
Full Consistency Lag and its ApplicationsFull Consistency Lag and its Applications
Full Consistency Lag and its ApplicationsCassandra Austin
 
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your PipelineMetrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your PipelineAndreas Grabner
 
Cloud Economics for Java at Java2Days
Cloud Economics for Java at Java2DaysCloud Economics for Java at Java2Days
Cloud Economics for Java at Java2DaysSteve Poole
 
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...stevemcpherson
 

Similar to fox.ppt (20)

Enterprise applications in the cloud - are providers ready?
Enterprise applications in the cloud - are providers ready?Enterprise applications in the cloud - are providers ready?
Enterprise applications in the cloud - are providers ready?
 
Performance Testing
Performance Testing Performance Testing
Performance Testing
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale Applications
 
SOA, Microservices and Event Driven Architecture
SOA, Microservices and Event Driven ArchitectureSOA, Microservices and Event Driven Architecture
SOA, Microservices and Event Driven Architecture
 
[Solace] Open Data Movement for Connected Vehicles
[Solace] Open Data Movement for Connected Vehicles[Solace] Open Data Movement for Connected Vehicles
[Solace] Open Data Movement for Connected Vehicles
 
Disaster recovery - What, Why, and How
Disaster recovery - What, Why, and HowDisaster recovery - What, Why, and How
Disaster recovery - What, Why, and How
 
What does performance mean in the cloud
What does performance mean in the cloudWhat does performance mean in the cloud
What does performance mean in the cloud
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for Ops
 
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
How to Build High-Volume, Scalable, and Resilient APIs (EXP18038)
 
Nfr testing(performance)
Nfr testing(performance)Nfr testing(performance)
Nfr testing(performance)
 
Supercharge Your Applications
Supercharge Your ApplicationsSupercharge Your Applications
Supercharge Your Applications
 
Economics of Cloud Computing (Jazoon'11)
Economics of Cloud Computing (Jazoon'11)Economics of Cloud Computing (Jazoon'11)
Economics of Cloud Computing (Jazoon'11)
 
Effektives Consulting - Performance Engineering
Effektives Consulting - Performance EngineeringEffektives Consulting - Performance Engineering
Effektives Consulting - Performance Engineering
 
Public Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLPublic Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQL
 
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Full Consistency Lag and its Applications
Full Consistency Lag and its ApplicationsFull Consistency Lag and its Applications
Full Consistency Lag and its Applications
 
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your PipelineMetrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
 
Cloud Economics for Java at Java2Days
Cloud Economics for Java at Java2DaysCloud Economics for Java at Java2Days
Cloud Economics for Java at Java2Days
 
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
 

More from indirakumar86

More from indirakumar86 (6)

Mainteance Program
Mainteance ProgramMainteance Program
Mainteance Program
 
Inventory System
Inventory SystemInventory System
Inventory System
 
3C & 5S.ppt
3C & 5S.ppt3C & 5S.ppt
3C & 5S.ppt
 
Paint training material.pptx
Paint training material.pptxPaint training material.pptx
Paint training material.pptx
 
Ik
IkIk
Ik
 
Lab6 rtos
Lab6 rtosLab6 rtos
Lab6 rtos
 

Recently uploaded

一比一原版(UVic学位证书)维多利亚大学毕业证学历认证买留学回国
一比一原版(UVic学位证书)维多利亚大学毕业证学历认证买留学回国一比一原版(UVic学位证书)维多利亚大学毕业证学历认证买留学回国
一比一原版(UVic学位证书)维多利亚大学毕业证学历认证买留学回国ezgenuh
 
Business Bay Escorts $#$ O56521286O $#$ Escort Service In Business Bay Dubai
Business Bay Escorts $#$ O56521286O $#$ Escort Service In Business Bay DubaiBusiness Bay Escorts $#$ O56521286O $#$ Escort Service In Business Bay Dubai
Business Bay Escorts $#$ O56521286O $#$ Escort Service In Business Bay DubaiAroojKhan71
 
Vip Hot Call Girls 🫤 Mahipalpur ➡️ 9711199171 ➡️ Delhi 🫦 Whatsapp Number
Vip Hot Call Girls 🫤 Mahipalpur ➡️ 9711199171 ➡️ Delhi 🫦 Whatsapp NumberVip Hot Call Girls 🫤 Mahipalpur ➡️ 9711199171 ➡️ Delhi 🫦 Whatsapp Number
Vip Hot Call Girls 🫤 Mahipalpur ➡️ 9711199171 ➡️ Delhi 🫦 Whatsapp Numberkumarajju5765
 
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111Sapana Sha
 
如何办理女王大学毕业证(QU毕业证书)成绩单原版一比一
如何办理女王大学毕业证(QU毕业证书)成绩单原版一比一如何办理女王大学毕业证(QU毕业证书)成绩单原版一比一
如何办理女王大学毕业证(QU毕业证书)成绩单原版一比一opyff
 
Call me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
Call me @ 9892124323 Call Girl in Andheri East With Free Home DeliveryCall me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
Call me @ 9892124323 Call Girl in Andheri East With Free Home DeliveryPooja Nehwal
 
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
83778-77756 ( HER.SELF ) Brings Call Girls In Laxmi Nagar
83778-77756 ( HER.SELF ) Brings Call Girls In Laxmi Nagar83778-77756 ( HER.SELF ) Brings Call Girls In Laxmi Nagar
83778-77756 ( HER.SELF ) Brings Call Girls In Laxmi Nagardollysharma2066
 
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Greenery-Palette Pitch Deck by Slidesgo.pptx
Greenery-Palette Pitch Deck by Slidesgo.pptxGreenery-Palette Pitch Deck by Slidesgo.pptx
Greenery-Palette Pitch Deck by Slidesgo.pptxzohiiimughal286
 
Hot Modals Call Girls (Delhi) Dwarka9711199171✔️ High Class Service 100% Saf...
Hot Modals Call Girls (Delhi) Dwarka9711199171✔️ High Class  Service 100% Saf...Hot Modals Call Girls (Delhi) Dwarka9711199171✔️ High Class  Service 100% Saf...
Hot Modals Call Girls (Delhi) Dwarka9711199171✔️ High Class Service 100% Saf...shivangimorya083
 
How To Fix Mercedes Benz Anti-Theft Protection Activation Issue
How To Fix Mercedes Benz Anti-Theft Protection Activation IssueHow To Fix Mercedes Benz Anti-Theft Protection Activation Issue
How To Fix Mercedes Benz Anti-Theft Protection Activation IssueTerry Sayther Automotive
 
How To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
How To Troubleshoot Mercedes Blind Spot Assist Inoperative ErrorHow To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
How To Troubleshoot Mercedes Blind Spot Assist Inoperative ErrorAndres Auto Service
 
Vip Mumbai Call Girls Mumbai Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Mumbai Call On 9920725232 With Body to body massage wit...Vip Mumbai Call Girls Mumbai Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Mumbai Call On 9920725232 With Body to body massage wit...amitlee9823
 
9990611130 Find & Book Russian Call Girls In Vijay Nagar
9990611130 Find & Book Russian Call Girls In Vijay Nagar9990611130 Find & Book Russian Call Girls In Vijay Nagar
9990611130 Find & Book Russian Call Girls In Vijay NagarGenuineGirls
 
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdfSales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdfAggregage
 
Lucknow 💋 (Genuine) Escort Service Lucknow | Service-oriented sexy call girls...
Lucknow 💋 (Genuine) Escort Service Lucknow | Service-oriented sexy call girls...Lucknow 💋 (Genuine) Escort Service Lucknow | Service-oriented sexy call girls...
Lucknow 💋 (Genuine) Escort Service Lucknow | Service-oriented sexy call girls...anilsa9823
 
Why Won't Your Subaru Key Come Out Of The Ignition Find Out Here!
Why Won't Your Subaru Key Come Out Of The Ignition Find Out Here!Why Won't Your Subaru Key Come Out Of The Ignition Find Out Here!
Why Won't Your Subaru Key Come Out Of The Ignition Find Out Here!AutoScandia
 
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 

Recently uploaded (20)

一比一原版(UVic学位证书)维多利亚大学毕业证学历认证买留学回国
一比一原版(UVic学位证书)维多利亚大学毕业证学历认证买留学回国一比一原版(UVic学位证书)维多利亚大学毕业证学历认证买留学回国
一比一原版(UVic学位证书)维多利亚大学毕业证学历认证买留学回国
 
(ISHITA) Call Girls Service Jammu Call Now 8617697112 Jammu Escorts 24x7
(ISHITA) Call Girls Service Jammu Call Now 8617697112 Jammu Escorts 24x7(ISHITA) Call Girls Service Jammu Call Now 8617697112 Jammu Escorts 24x7
(ISHITA) Call Girls Service Jammu Call Now 8617697112 Jammu Escorts 24x7
 
Business Bay Escorts $#$ O56521286O $#$ Escort Service In Business Bay Dubai
Business Bay Escorts $#$ O56521286O $#$ Escort Service In Business Bay DubaiBusiness Bay Escorts $#$ O56521286O $#$ Escort Service In Business Bay Dubai
Business Bay Escorts $#$ O56521286O $#$ Escort Service In Business Bay Dubai
 
Vip Hot Call Girls 🫤 Mahipalpur ➡️ 9711199171 ➡️ Delhi 🫦 Whatsapp Number
Vip Hot Call Girls 🫤 Mahipalpur ➡️ 9711199171 ➡️ Delhi 🫦 Whatsapp NumberVip Hot Call Girls 🫤 Mahipalpur ➡️ 9711199171 ➡️ Delhi 🫦 Whatsapp Number
Vip Hot Call Girls 🫤 Mahipalpur ➡️ 9711199171 ➡️ Delhi 🫦 Whatsapp Number
 
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
ENJOY Call Girls In Okhla Vihar Delhi Call 9654467111
 
如何办理女王大学毕业证(QU毕业证书)成绩单原版一比一
如何办理女王大学毕业证(QU毕业证书)成绩单原版一比一如何办理女王大学毕业证(QU毕业证书)成绩单原版一比一
如何办理女王大学毕业证(QU毕业证书)成绩单原版一比一
 
Call me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
Call me @ 9892124323 Call Girl in Andheri East With Free Home DeliveryCall me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
Call me @ 9892124323 Call Girl in Andheri East With Free Home Delivery
 
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Mayur Vihar 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
83778-77756 ( HER.SELF ) Brings Call Girls In Laxmi Nagar
83778-77756 ( HER.SELF ) Brings Call Girls In Laxmi Nagar83778-77756 ( HER.SELF ) Brings Call Girls In Laxmi Nagar
83778-77756 ( HER.SELF ) Brings Call Girls In Laxmi Nagar
 
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls East Of Kailash 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Greenery-Palette Pitch Deck by Slidesgo.pptx
Greenery-Palette Pitch Deck by Slidesgo.pptxGreenery-Palette Pitch Deck by Slidesgo.pptx
Greenery-Palette Pitch Deck by Slidesgo.pptx
 
Hot Modals Call Girls (Delhi) Dwarka9711199171✔️ High Class Service 100% Saf...
Hot Modals Call Girls (Delhi) Dwarka9711199171✔️ High Class  Service 100% Saf...Hot Modals Call Girls (Delhi) Dwarka9711199171✔️ High Class  Service 100% Saf...
Hot Modals Call Girls (Delhi) Dwarka9711199171✔️ High Class Service 100% Saf...
 
How To Fix Mercedes Benz Anti-Theft Protection Activation Issue
How To Fix Mercedes Benz Anti-Theft Protection Activation IssueHow To Fix Mercedes Benz Anti-Theft Protection Activation Issue
How To Fix Mercedes Benz Anti-Theft Protection Activation Issue
 
How To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
How To Troubleshoot Mercedes Blind Spot Assist Inoperative ErrorHow To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
How To Troubleshoot Mercedes Blind Spot Assist Inoperative Error
 
Vip Mumbai Call Girls Mumbai Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Mumbai Call On 9920725232 With Body to body massage wit...Vip Mumbai Call Girls Mumbai Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Mumbai Call On 9920725232 With Body to body massage wit...
 
9990611130 Find & Book Russian Call Girls In Vijay Nagar
9990611130 Find & Book Russian Call Girls In Vijay Nagar9990611130 Find & Book Russian Call Girls In Vijay Nagar
9990611130 Find & Book Russian Call Girls In Vijay Nagar
 
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdfSales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
Sales & Marketing Alignment_ How to Synergize for Success.pptx.pdf
 
Lucknow 💋 (Genuine) Escort Service Lucknow | Service-oriented sexy call girls...
Lucknow 💋 (Genuine) Escort Service Lucknow | Service-oriented sexy call girls...Lucknow 💋 (Genuine) Escort Service Lucknow | Service-oriented sexy call girls...
Lucknow 💋 (Genuine) Escort Service Lucknow | Service-oriented sexy call girls...
 
Why Won't Your Subaru Key Come Out Of The Ignition Find Out Here!
Why Won't Your Subaru Key Come Out Of The Ignition Find Out Here!Why Won't Your Subaru Key Come Out Of The Ignition Find Out Here!
Why Won't Your Subaru Key Come Out Of The Ignition Find Out Here!
 
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Kadugodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 

fox.ppt

  • 1. MTTR “>>” MTTF Armando Fox, June 2002 ROC Retreat
  • 2. © 2002 Armando Fox Low MTTR Beats High MTTF  Previous ROC gospel:  A = MTTF / (MTTF+MTTR)  10x decrease MTTR just as good as 10x increase MTTF  New ROC gospel?:  10x decrease MTTR better than 10x increase MTTF  In fact, decreasing MTTR may even beat a proportionally larger increase in MTTF (ie less improvement in A)
  • 3. © 2002 Armando Fox Why Focus on MTTR? 1. Today’s MTTF’s cannot be directly verified by most customers. MTTR’s can, thus MTTR claims are verifiable.  “For better or worse, benchmarks shape a field” 2. For end-user-interactive services, lowering MTTR directly improves user experience of a specific outage, and directly reduces impact to operator ($$ and customer loyalty). Increasing MTTF does neither, as long as MTTF is greater than the length of one user session.
  • 4. © 2002 Armando Fox MTTF Can’t Be Directly Verified  Today’s availabilities for data-center-based Internet sites: between 0.99 and 0.999 [Gray and others, 2001]  Recall A is defined as MTTF/(MTTF+MTTR)  A=0.99 to 0.999 implies MTTF is 100x to 1000x MTTR  Hardware: Today’s disk MTTF’s >100 years, but MTTR’s for complex software ~ hours or tens of hours  Software: ~30-year MTTF, based on latent software bugs [Gray, HDCC01]  Result: verifying MTTF requires observing many system- years of operation; beyond the reach of most customers
  • 5. © 2002 Armando Fox MTTF Can’t Be Directly Verified (cont.)  Vendor MTTF’s don’t capture environmental/operator errors  MS’s 2001 Web properties outage was due to operator error  “Five nines” as advertised implies sites will be up for next 250yrs  Result: high MTTF can’t guarantee a failure-free interval - only tells you the chance something will happen (under best circumstances)  But downtime cost is incurred by impact of specific outages - not by the likelihood of outages  So what are the costs of outages?  (Direct) dollar cost in lost revenue during downtime?  (Indirect) temporary/permanent loss of customers?  (Indirect?) effect on company’s credibility -> investor confidence
  • 6. © 2002 Armando Fox A Motivational Anecdote about Ebay  Recent software-related outages: 4.5 hours in Apr02, 22 hours in Jun99, 7 hours in May99, 9 hours in Dec98  Assume two 4-hour (“newsworthy”) outages/year  A=(182*24 hours)/(182*24 + 4 hours) = 99.9%  Dollar cost: Ebay policy for >2 hour outage, fees credited to all affected users (US$3-5M for Jun99)  Customer loyalty: after Jun99 outage, Yahoo Auctions reported statistically significant increase in users  Stock: Ebay’s market cap dropped US$4B after Jun99 outage  What about a 10-minute outage once per week?  A=(7*24 hours)/(7*24 + 1/6 hours) = 99.9% - the same  Can we quantify “savings” over the previous scenario?
  • 7. © 2002 Armando Fox End-user Impact of MTTR  Thresholds from HCI on user impatience (Miller, 1968)  Miller, 1968: >1sec “sluggish”, >10sec “distracted” (user moves on to another task)  2001 Web user study: Tok~5 sec “acceptable”, Tstop ~10 sec “excessively slow”  much more forgiving on both if incremental page views used  Note, the above thresholds appear to be technology-independent  If S is steady-state latency of site response, then:  MTTR ≤ Tok– S: failure effectively masked (weak motivation to reduce MTTR further)  Tok – S ≤ MTTR ≤ Tstop– S: user annoyed but unlikely to give up (individual judgment of users will prevail)  MTTR ≥ Tstop – S: most users will likely give up, maybe click over to competitor
  • 8. © 2002 Armando Fox Outages: how long is too long?  Ebay user tasks = auction browsing and bidding  Number of auctions affected is proportional to duration of outage  Assuming auction end-times are approx. uniformly distributed  Assuming # of active auctions is correlated with # of active users, duration of a single outage is proportional to # affected users  another (fictitious) example: failure of dynamic content generation for a news site. What is critical outage duration?  Fallback = serve cached (stale) content  Theadline: how quickly updates to “headline” news must be visible  Tother: same, for “second claass” news  Suggests different MTTR requirements for front-ends (Tstop), small content-gen for headline news (Theadline), larger content-gen for “old” news (Tother)
  • 9. © 2002 Armando Fox MTTR as a utility function  When an outage occurs during normal operation, what is “usefulness” to each affected end-user of application as a function of MTTR?  We can consider 2 things:  Length of recovery time  Level of service available during recovery  A generic utility curve for recovery time  Threshold points and shape of curved part may differ widely for different apps  Interactive vs. noninteractive may be a key distinction Tok-S Tstop-S
  • 10. © 2002 Armando Fox Level of service during recovery  Many “server farm” systems allow a subset of nodes to fail and redistribute work among remaining good nodes  Assume N nodes, k simultaneous failures, similar offered load  Option 1 - k/N spare capacity on each node, or k standbys  no perceptible performance degradation, but cost of idle resources  Option 2 - turn away k/N work using admission control  Will those users come back? What’s their “utility threshold” for suffering inconvenience? (eg Ebay example)  If cost of admission control is reflected in latency of requests that are served, must ensure S+f(k/N) < Tstop (or admission control is for naught)
  • 11. © 2002 Armando Fox Level of service during recovery, cont.  Option 3 - keep latency and throughput, degrade quality of service  E.g. harvest/yield - can trade data per query vs. number of queries  E.g. CNN.com front page - can adopt “above-the-fold” format to reduce amount of work per user (also “minimal” format)  E.g. dynamic content service - use caching and regenerate less content (more staleness)  In all cases, can use technology-independent thresholds for length of the degraded service
  • 12. © 2002 Armando Fox Some questions that arise  If users are accustomed to some steady-state latency…  for how long will they tolerate temporary degradation?  how much degradation?  Do they show a preference for increased latency vs. worse QOS vs. being turned away and incentivized to return?  For a given app, which tradeoffs are proportionally better than others?  Ebay: can’t afford to show “stale” auction prices  vs CNN: “above-the-fold” lead story may be better than all stories slowly
  • 13. © 2002 Armando Fox Motivation to focus on reducing MTTR  Stateful components often have long recovery times  Database: minutes to hours  Oracle “fast recovery” trades frequency of checkpointing (hence steady-state throughput) for fast recovery  What about building state from multiple redundant copies of stateless components?  Can we reduce recovery time by settling for probabilistic (bounded- lifetime) durability and probabilistic consistency (with detectable inconsistency)? (RAINS)  For what limited-lifetime state is this a good idea? “Shopping cart”? Session? User profile?
  • 14. © 2002 Armando Fox Summary  MTTR can be directly measured, verified  Costs of downtime often arise not from too low Availability (whatever that is…) but too high MTTR  Technology-independent thresholds for user satisfaction can be used as a guideline for system response time and target for MTTR