SlideShare a Scribd company logo
1 of 9
Download to read offline
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 1, Slide 1
Dependability
• Dependability is the ability of a system to deliver a specified service.
• System service is classified as proper if it is delivered as specified; otherwise it
is improper.
• System failure is a transition from proper to improper service.
• System restoration is a transition from improper to proper service.
⇒ The “properness” of service depends on the user’s viewpoint!
Reference: J.C. Laprie (ed.), Dependability: Basic Concepts and Terminology,
Springer-Verlag, 1992.
improper
service
failure
restoration
proper
service
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 1, Slide 2
Examples of Specifications of Proper Service
• k out of N components are functioning.
• every working processor can communicate with every other working processor.
• every message is delivered within t milliseconds from the time it is sent.
• all messages are delivered in the same order to all working processors.
• the system does not reach an unsafe state.
• 90% of all remote procedure calls return within x seconds with a correct result.
• 99.999% of all telephone calls are correctly routed.
⇒ Notion of “proper service” provides a specification by which to evaluate a
system’s dependability.
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 1, Slide 3
Dependability Concepts
• Measures - properties expected from
a dependable system
– Availability
– Reliability
– Safety
– Confidentiality
– Integrity
– Maintainability
– Coverage
• Means - methods to achieve
dependability
– Fault Avoidance
– Fault Tolerance
– Fault Removal
– Dependability Assessment
• Impairments - causes of
undependable operation
– Faults
– Errors
– Failures
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 1, Slide 4
Faults, Errors, and Failures can Cause Improper Service
• Failure - transition from proper to improper service
• Error - that part of system state that is liable to lead to subsequent failure
• Fault - the hypothesized cause of error(s)
Module 1
Module 3
Module 2
Fault Error Failure
Fault Error Failure
Fault Error Failure
Fault Error Failure
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 1, Slide 5
Dependability Measures: Availability
Availability - quantifies the alternation between deliveries of proper and improper
service.
– A(t) is 1 if service is proper at time t, 0 otherwise.
– E[A(t)] (Expected value of A(t)) is the probability that service is proper at
time t.
– A(0,t) is the fraction of time the system delivers proper service during [0,t].
– E[A(0,t)] is the expected fraction of time service is proper during [0,t].
– P[A(0,t) > t*] (0 ≤ t* ≤ 1) is the probability that service is proper more than
100t*% of the time during [0,t].
– A(0,t)t→∞ is the fraction of time that service is proper in steady state.
– E[A(0,t)t→∞], P[A(0,t)t→∞ > t*] as above.
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 1, Slide 6
Other Dependability Measures
• Reliability - a measure of the continuous delivery of service
– R(t) is the probability that a system delivers proper service throughout [0,t].
• Safety - a measure of the time to catastrophic failure
– S(t) is the probability that no catastrophic failures occur during [0,t].
– Analogous to reliability, but concerned with catastrophic failures.
• Time to Failure - measure of the time to failure from last restoration. (Expected
value of this measure is referred to as MTTF - Mean time to failure.)
• Maintainability - measure of the time to restoration from last experienced
failure. (Expected value of this measure is referred to as MTTR - Mean time to
repair.)
• Coverage - the probability that, given a fault, the system can tolerate the fault
and continue to deliver proper service.
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 1, Slide 7
Illustration of the Impact of Coverage on Dependability
• Consider two well-known architectures: simplex and duplex.
• The Markov model for both architectures is:
• The analytical expression of the MTTF can be calculated for each architecture
using these Markov models.
λ
λ
Duplex System
1
2
2 c λ
µ
2 (1-c) λ
λ
λ
Simplex System
1
λ
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 1, Slide 8
Illustration of the Impact of Coverage, cont.
• The following plot shows the ratio of MTTF (duplex)/MTTF (simplex) for
different values of coverage (all other parameter values being the same).
• The ratio shows the dependability gain by the duplex architecture.
• We observe that the coverage of the detection mechanism has a significant
impact on the gain: a change of coverage of only 10-3 reduces the gain in
dependability by the duplex system by a full order of magnitude.
!
"
#
$
%
&
µ
'
rate
repair
to
rate
failure
of
Ratio
1E+04
1E+03
1E+02
1E+01
1E+00
1E-04 1E-03 1E-02
c = 1
c = 0.95
c = 0.999
c = 0.99
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 1, Slide 9
Failure Sources and Frequencies
Non-Fault-Tolerant Systems
– Japan, 1383 organizations
(Watanabe 1986, Siewiorek &
Swarz 1992)
– USA, 450 companies (FIND/SVP
1993)
Mean time to failure: 6 to 12 weeks
Average outage duration after failure:
1 to 4 hours
Fault-Tolerant Systems
– Tandem Computers (Gray 1990)
– Bell Northern Research (Cramp et al.
1992)
Mean time to failure:
21 years (Tandem)
Hardware
Software
Communications
Environment
Operations-
Procedures
50%
25%
15%
10%
65%
10% 8%
7%
Failure Sources:

More Related Content

Similar to l4a.pdf

Ch11 - Reliability Engineering
Ch11 - Reliability EngineeringCh11 - Reliability Engineering
Ch11 - Reliability EngineeringHarsh Verdhan Raj
 
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...vtunotesbysree
 
Chap 01 lecture 1distributed computer lecture
Chap 01 lecture 1distributed computer lectureChap 01 lecture 1distributed computer lecture
Chap 01 lecture 1distributed computer lectureMuhammad Arslan
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Peter Tröger
 
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...Neelamani Samal
 
Reliability engineering chapter-2 reliability of systems
Reliability engineering chapter-2 reliability of systemsReliability engineering chapter-2 reliability of systems
Reliability engineering chapter-2 reliability of systemsCharlton Inao
 
Unit 9 implementing the reliability strategy
Unit 9  implementing the reliability strategyUnit 9  implementing the reliability strategy
Unit 9 implementing the reliability strategyCharlton Inao
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time SystemsDeepak John
 
Cost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resourcesCost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resourcesVincenzo De Florio
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance SystemEhsan Ilahi
 
Rs methodologyandreliability
Rs methodologyandreliabilityRs methodologyandreliability
Rs methodologyandreliabilityRakshita Upadhyay
 
Fault tol final ppt.pptx
Fault tol final ppt.pptxFault tol final ppt.pptx
Fault tol final ppt.pptxSamanShaheen5
 

Similar to l4a.pdf (20)

Ch11 reliability engineering
Ch11 reliability engineeringCh11 reliability engineering
Ch11 reliability engineering
 
Ch11 - Reliability Engineering
Ch11 - Reliability EngineeringCh11 - Reliability Engineering
Ch11 - Reliability Engineering
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
 
Chap 01 lecture 1distributed computer lecture
Chap 01 lecture 1distributed computer lectureChap 01 lecture 1distributed computer lecture
Chap 01 lecture 1distributed computer lecture
 
State model based
State model basedState model based
State model based
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)
 
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
 
Reliability engineering chapter-2 reliability of systems
Reliability engineering chapter-2 reliability of systemsReliability engineering chapter-2 reliability of systems
Reliability engineering chapter-2 reliability of systems
 
Unit 9 implementing the reliability strategy
Unit 9  implementing the reliability strategyUnit 9  implementing the reliability strategy
Unit 9 implementing the reliability strategy
 
Module5 ece541
Module5 ece541Module5 ece541
Module5 ece541
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systems
 
Cost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resourcesCost-effective software reliability through autonomic tuning of system resources
Cost-effective software reliability through autonomic tuning of system resources
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
 
metrics.ppt
metrics.pptmetrics.ppt
metrics.ppt
 
Rs methodologyandreliability
Rs methodologyandreliabilityRs methodologyandreliability
Rs methodologyandreliability
 
SEPM_MODULE 2 PPT.pptx
SEPM_MODULE 2 PPT.pptxSEPM_MODULE 2 PPT.pptx
SEPM_MODULE 2 PPT.pptx
 
Fault tol final ppt.pptx
Fault tol final ppt.pptxFault tol final ppt.pptx
Fault tol final ppt.pptx
 
Intro720T5.pptx
Intro720T5.pptxIntro720T5.pptx
Intro720T5.pptx
 

Recently uploaded

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

l4a.pdf

  • 1. ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without permission of the author. Module 1, Slide 1 Dependability • Dependability is the ability of a system to deliver a specified service. • System service is classified as proper if it is delivered as specified; otherwise it is improper. • System failure is a transition from proper to improper service. • System restoration is a transition from improper to proper service. ⇒ The “properness” of service depends on the user’s viewpoint! Reference: J.C. Laprie (ed.), Dependability: Basic Concepts and Terminology, Springer-Verlag, 1992. improper service failure restoration proper service
  • 2. ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without permission of the author. Module 1, Slide 2 Examples of Specifications of Proper Service • k out of N components are functioning. • every working processor can communicate with every other working processor. • every message is delivered within t milliseconds from the time it is sent. • all messages are delivered in the same order to all working processors. • the system does not reach an unsafe state. • 90% of all remote procedure calls return within x seconds with a correct result. • 99.999% of all telephone calls are correctly routed. ⇒ Notion of “proper service” provides a specification by which to evaluate a system’s dependability.
  • 3. ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without permission of the author. Module 1, Slide 3 Dependability Concepts • Measures - properties expected from a dependable system – Availability – Reliability – Safety – Confidentiality – Integrity – Maintainability – Coverage • Means - methods to achieve dependability – Fault Avoidance – Fault Tolerance – Fault Removal – Dependability Assessment • Impairments - causes of undependable operation – Faults – Errors – Failures
  • 4. ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without permission of the author. Module 1, Slide 4 Faults, Errors, and Failures can Cause Improper Service • Failure - transition from proper to improper service • Error - that part of system state that is liable to lead to subsequent failure • Fault - the hypothesized cause of error(s) Module 1 Module 3 Module 2 Fault Error Failure Fault Error Failure Fault Error Failure Fault Error Failure
  • 5. ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without permission of the author. Module 1, Slide 5 Dependability Measures: Availability Availability - quantifies the alternation between deliveries of proper and improper service. – A(t) is 1 if service is proper at time t, 0 otherwise. – E[A(t)] (Expected value of A(t)) is the probability that service is proper at time t. – A(0,t) is the fraction of time the system delivers proper service during [0,t]. – E[A(0,t)] is the expected fraction of time service is proper during [0,t]. – P[A(0,t) > t*] (0 ≤ t* ≤ 1) is the probability that service is proper more than 100t*% of the time during [0,t]. – A(0,t)t→∞ is the fraction of time that service is proper in steady state. – E[A(0,t)t→∞], P[A(0,t)t→∞ > t*] as above.
  • 6. ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without permission of the author. Module 1, Slide 6 Other Dependability Measures • Reliability - a measure of the continuous delivery of service – R(t) is the probability that a system delivers proper service throughout [0,t]. • Safety - a measure of the time to catastrophic failure – S(t) is the probability that no catastrophic failures occur during [0,t]. – Analogous to reliability, but concerned with catastrophic failures. • Time to Failure - measure of the time to failure from last restoration. (Expected value of this measure is referred to as MTTF - Mean time to failure.) • Maintainability - measure of the time to restoration from last experienced failure. (Expected value of this measure is referred to as MTTR - Mean time to repair.) • Coverage - the probability that, given a fault, the system can tolerate the fault and continue to deliver proper service.
  • 7. ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without permission of the author. Module 1, Slide 7 Illustration of the Impact of Coverage on Dependability • Consider two well-known architectures: simplex and duplex. • The Markov model for both architectures is: • The analytical expression of the MTTF can be calculated for each architecture using these Markov models. λ λ Duplex System 1 2 2 c λ µ 2 (1-c) λ λ λ Simplex System 1 λ
  • 8. ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without permission of the author. Module 1, Slide 8 Illustration of the Impact of Coverage, cont. • The following plot shows the ratio of MTTF (duplex)/MTTF (simplex) for different values of coverage (all other parameter values being the same). • The ratio shows the dependability gain by the duplex architecture. • We observe that the coverage of the detection mechanism has a significant impact on the gain: a change of coverage of only 10-3 reduces the gain in dependability by the duplex system by a full order of magnitude. ! " # $ % & µ ' rate repair to rate failure of Ratio 1E+04 1E+03 1E+02 1E+01 1E+00 1E-04 1E-03 1E-02 c = 1 c = 0.95 c = 0.999 c = 0.99
  • 9. ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without permission of the author. Module 1, Slide 9 Failure Sources and Frequencies Non-Fault-Tolerant Systems – Japan, 1383 organizations (Watanabe 1986, Siewiorek & Swarz 1992) – USA, 450 companies (FIND/SVP 1993) Mean time to failure: 6 to 12 weeks Average outage duration after failure: 1 to 4 hours Fault-Tolerant Systems – Tandem Computers (Gray 1990) – Bell Northern Research (Cramp et al. 1992) Mean time to failure: 21 years (Tandem) Hardware Software Communications Environment Operations- Procedures 50% 25% 15% 10% 65% 10% 8% 7% Failure Sources: