SlideShare a Scribd company logo
1 of 23
Oct. 2007 Terminology, Models, and Measures Slide 1
Fault-Tolerant Computing
Basic Concepts
and Tools
Oct. 2007 Terminology, Models, and Measures Slide 2
About This Presentation
Edition Released Revised Revised
First Oct. 2006 Oct. 2007
This presentation has been prepared for the graduate
course ECE 257A (Fault-Tolerant Computing) by
Behrooz Parhami, Professor of Electrical and Computer
Engineering at University of California, Santa Barbara.
The material contained herein can be used freely in
classroom teaching or any other educational setting.
Unauthorized uses are prohibited. © Behrooz Parhami
Oct. 2007 Terminology, Models, and Measures Slide 3
Terminology, Models, and Measures
for Dependability
Oct. 2007 Terminology, Models, and Measures Slide 4
Oct. 2007 Terminology, Models, and Measures Slide 5
Impairments to Dependability
Oct. 2007 Terminology, Models, and Measures Slide 6
The Fault-Error-Failure Cycle
Schematic diagram of the Newcastle hierarchical model
and the impairments within one level.
Failure
Aspect Impairment
Structure Fault
 
State Error
 
Behavior
Includes both
components
and design
0 0
0
Fault Correct
signal
Replaced
with
NAND?
Oct. 2007 Terminology, Models, and Measures Slide 7
The Four-Universe Model
Cause-effect diagram for Avižienis’ four-universe
model of impairments to dependability.
Universe Impairment
Physical Failure
 
Logical Fault
 
Informational Error
 
External Crash
Oct. 2007 Terminology, Models, and Measures Slide 8
Unrolling the Fault-Error-Failure Cycle
Cause-effect diagram for an extended six-level
view of impairments to dependability.
Abstraction Impairment
Component Defect
 
Logic Fault
 
Information Error
 
System Malfunction
 
Service Degradation
 
Result Failure
Low-
Level
Mid-
Level
High-
Level
First
Cycle
Second
Cycle
Failure
Aspect Impairment
Structure Fault
 
State Error
 
Behavior
Oct. 2007 Terminology, Models, and Measures Slide 9
Multilevel Model
Component
Logic
Service
Result
Information
System
Low-Level
Impaired
Mid-Level
Impaired
High-Level
Impaired
Initial
Entry
Deviation
Remedy
Legned:
Ideal
Defective
Faulty
Erroneous
Malfunctioning
Degraded
Failed
Legend:
Tolerance
Entry
Oct. 2007 Terminology, Models, and Measures Slide 10
Analogy for the Multilevel Model
An analogy for our
multi-level model of
dependable computing.
Defects, faults, errors,
malfunctions,
degradations, and
failures are
represented by pouring
water from above.
Valves represent
avoidance and
tolerance techniques.
The goal is to avoid
overflow.
Oct. 2007 Terminology, Models, and Measures Slide 11
Why Our Concern with Dependability?
Reliability of n-transistor system, each having failure rate l
R(t) = e–nlt
There are only 3 ways of making systems more reliable
Reduce l
Reduce n
1.0
0.8
0.6
0.4
0.2
0.0
e
–n t
l
.9999 .9990 .9900
.9048
.3679
10
10
8
10
6
10
4
10
nt
Reduce t
Alternative:
Change the reliability
formula by introducing
redundancy in system
Oct. 2007 Terminology, Models, and Measures Slide 12
Highly Dependable Computer Systems
Long-life systems: Fail-slow, Rugged, High-reliability
Spacecraft with multiyear missions, systems in inaccessible locations
Methods: Replication (spares), error coding, monitoring, shielding
Safety-critical systems: Fail-safe, Sound, High-integrity
Flight control computers, nuclear-plant shutdown, medical monitoring
Methods: Replication with voting, time redundancy, design diversity
Non-stop systems: Fail-soft, Robust, High-availability
Telephone switching centers, transaction processing, e-commerce
Methods: HW/info redundancy, backup schemes, hot-swap, recovery
Just as performance enhancement techniques gradually migrate from
supercomputers to desktops, so too dependability enhancement
methods find their way from exotic systems into personal computers
Oct. 2007 Terminology, Models, and Measures Slide 13
Aspects of Dependability
Oct. 2007 Terminology, Models, and Measures Slide 14
Concepts from Probability Theory
Cumulative distribution function: CDF
F(t) = prob[x  t] = 0 f(x)dx
t
Probability density function: pdf
f(t) = prob[t  x  t + dt] / dt = dF(t) / dt
Time
0 10 20 30 40 50
Time
0 10 20 30 40 50
Time
0 10 20 30 40 50
1.0
0.8
0.6
0.4
0.2
0.0
CDF
pdf
0.05
0.04
0.03
0.02
0.01
0.00
F(t)
f(t)
Expected value of x
Ex = - xf(x)dx = k xk f(xk)
+
Lifetimes of 20
identical systems
Covariance of x and y
yx,y = E[(x – Ex)(y – Ey)]
= E[xy] – Ex Ey
Variance of x
sx = - (x – Ex)2 f(x)dx
= k (xk – Ex)2 f(xk)
+
2
Oct. 2007 Terminology, Models, and Measures Slide 15
Some Simple Probability Distributions
CDF
pdf
F(x)
f(x)
1
Uniform Exponential Normal Binomial
CDF
pdf
CDF
pdf
CDF
Oct. 2007 Terminology, Models, and Measures Slide 16
Reliability and MTTF
Reliability: R(t)
Probability that system remains in the
“Good” state through the interval [0, t]
Two-state
nonrepairable
system
R(t + dt) = R(t) [1 – z(t)dt]
Hazard function
Constant hazard function z(t) = l  R(t) = e–lt
(system failure rate is independent of its age)
R(t) = 1 – F(t) CDF of the system lifetime, or its unreliability
Exponential
reliability law
Mean time to failure: MTTF
MTTF =  0 tf(t)dt = 0 R(t)dt
+ +
Expected value of lifetime
Area under the reliability curve
(easily provable)
Start
state Failure
Up Down
Oct. 2007 Terminology, Models, and Measures Slide 17
Failure Distributions of Interest
Exponential: z(t) = l
R(t) = e–lt MTTF = 1/l
Weibull: z(t) = al(lt)a–1
R(t) = e(-lt)a
MTTF = (1/l) G(1 + 1/a)
Erlang:
MTTF = k/l
Gamma:
Erlang and exponential are special cases
Normal:
Reliability and MTTF formulas are complicated
Rayleigh: z(t) = 2l(lt)
R(t) = e(-lt)2
MTTF = (1/l) p / 2
Discrete versions
Geometric
Binomial
Discrete Weibull
R(k) = q k
Oct. 2007 Terminology, Models, and Measures Slide 18
Comparing Reliabilities
Reliability gain: R2 / R1
Reliability difference: R2 – R1
Reliability functions
for Systems 1/2
Reliability improv. index
RII = log R1(tM) / log R2(tM)
System Reliability (R)
Mission time extension
MTE2/1(rG) = T2(rG) – T1(rG)
Mission time improv. factor:
MTIF2/1(rG) = T2(rG) / T1(rG)
Reliability improvement factor
RIF2/1 = [1–R1(tM)]/[1–R2(tM)]
Example:
[1 – 0.9] / [1 – 0.99] = 10
Oct. 2007 Terminology, Models, and Measures Slide 19
Availability, MTTR, and MTBF
(Interval) Availability: A(t)
Fraction of time that system is in the
“Up” state during the interval [0, t]
Two-state
repairable
system
Availability = Reliability, when there is no repair
Availability is a function not only of how rarely a system fails (reliability)
but also of how quickly it can be repaired (time to repair)
Pointwise availability: a(t)
Probability that system available at time t
A(t) = (1/t)  0 a(x)dx
t
Steady-state availability: A = limt A(t)
MTTF MTTF m
MTTF + MTTR MTBF l + m
A = = =
Repair rate
1/m = MTTR
(Will justify this
equation later)
In general, m >> l, leading to A  1
Repair
Start
state
Failure
Up Down
Oct. 2007 Terminology, Models, and Measures Slide 20
System Up and Down Times
Time
Up
Down
0 t
Time to first failure Time between failures
Repair time
t1
t2
t'1 t'2
Short repair time implies
good maintainability (serviceability)
Repair
Start
state
Failure
Up Down
Oct. 2007 Terminology, Models, and Measures Slide 21
Performability and MCBF
Performability: P
Composite measure, incorporating
both performance and reliability
Three-state
degradable system
P = 2pUp2 + pUp1
Simple example
Worth of “Up2” twice that of “Up1”
pUpi = probability system is in state Upi
t
pUp2 = 0.92, pUp1 = 0.06, pDown = 0.02, P = 1.90
(system performance equiv. To that of 1.9 processors on average)
Performability improvement factor of this system (akin to RIF) relative
to a fail-hard system that goes down when either processor fails:
PIF = (2 – 2  0.92) / (2 – 1.90) = 1.6
Question:
What is system
availability here?
Repair Partial repair
Start
state
Failure
Partial failure
Up 1 Down
Up 2
Oct. 2007 Terminology, Models, and Measures Slide 22
Time
Up
Down
0 t
t1
t2 t'2 t'1 t3 t'3
Partial
Failure
Total
Failure
Partial
Repair
Partially Up
System Up, Partially Up, and Down Times
Important to prevent
direct transitions to the
“Down” state (coverage)
MCBF
Repair Partial repair
Start state
Failure
Partial failure
Up 1 Down
Up 2
Oct. 2007 Terminology, Models, and Measures Slide 23
Integrity and Safety
Risk: Prob. of being in “Unsafe Failed” state
There may be multiple unsafe states,
each with a different consequence (cost)
Three-state
fail-safe system
Simple analysis
Lump “Safe Failed” state with “Good”
state; proceed as in reliability analysis
More detailed analysis
Even though “Safe Failed” state is more
desirable than “Unsafe Failed”, it is still
not as desirable as the “Good” state;
so keeping it separate makes sense
For example, if a repair transition is introduced between “Safe Failed”
and “Good” states, we can tackle questions such as the expected
outage of the system in safe mode, and thus its availability
Safe
failed
Unsafe
failed
Failure
Failure
Start state
Good

More Related Content

Similar to f33-ft-computing-lec02-measures.ppt

Computation of Simple Robust PI/PID Controller Design for Time-Delay Systems ...
Computation of Simple Robust PI/PID Controller Design for Time-Delay Systems ...Computation of Simple Robust PI/PID Controller Design for Time-Delay Systems ...
Computation of Simple Robust PI/PID Controller Design for Time-Delay Systems ...IRJET Journal
 
L1811_PID.pdf
L1811_PID.pdfL1811_PID.pdf
L1811_PID.pdfBaghdad
 
2015 Reliability of complex systems
2015 Reliability of complex systems 2015 Reliability of complex systems
2015 Reliability of complex systems Jan Eite Bullema
 
Master Thesis Presentation
Master Thesis PresentationMaster Thesis Presentation
Master Thesis PresentationMohamed Sobh
 
Feedback control of_dynamic_systems
Feedback control of_dynamic_systemsFeedback control of_dynamic_systems
Feedback control of_dynamic_systemskarina G
 
Bi-objective Optimization Apply to Environment a land Economic Dispatch Probl...
Bi-objective Optimization Apply to Environment a land Economic Dispatch Probl...Bi-objective Optimization Apply to Environment a land Economic Dispatch Probl...
Bi-objective Optimization Apply to Environment a land Economic Dispatch Probl...ijceronline
 
Software Defect Repair Times: A Multiplicative Model
Software Defect Repair Times: A Multiplicative ModelSoftware Defect Repair Times: A Multiplicative Model
Software Defect Repair Times: A Multiplicative Modelgregoryg
 
A Methodology for Degradation Based Long-Term Analysis and Optimization of En...
A Methodology for Degradation Based Long-Term Analysis and Optimization of En...A Methodology for Degradation Based Long-Term Analysis and Optimization of En...
A Methodology for Degradation Based Long-Term Analysis and Optimization of En...TarannomParhizkar
 
A New Approach for Design of Model Matching Controllers for Time Delay System...
A New Approach for Design of Model Matching Controllers for Time Delay System...A New Approach for Design of Model Matching Controllers for Time Delay System...
A New Approach for Design of Model Matching Controllers for Time Delay System...IJERA Editor
 
Effect of stability indices on robustness and system response in coefficient ...
Effect of stability indices on robustness and system response in coefficient ...Effect of stability indices on robustness and system response in coefficient ...
Effect of stability indices on robustness and system response in coefficient ...eSAT Journals
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
Multi-objective Optimization of PID Controller using Pareto-based Surrogate ...
Multi-objective Optimization of PID Controller using  Pareto-based Surrogate ...Multi-objective Optimization of PID Controller using  Pareto-based Surrogate ...
Multi-objective Optimization of PID Controller using Pareto-based Surrogate ...IJECEIAES
 
An Overview of Performance Evaluation & Simulation
An Overview of Performance Evaluation & SimulationAn Overview of Performance Evaluation & Simulation
An Overview of Performance Evaluation & Simulationdasdfadfdsfsdfasdf
 
reliability workshop
reliability workshopreliability workshop
reliability workshopGaurav Dixit
 

Similar to f33-ft-computing-lec02-measures.ppt (20)

Computation of Simple Robust PI/PID Controller Design for Time-Delay Systems ...
Computation of Simple Robust PI/PID Controller Design for Time-Delay Systems ...Computation of Simple Robust PI/PID Controller Design for Time-Delay Systems ...
Computation of Simple Robust PI/PID Controller Design for Time-Delay Systems ...
 
L1811_PID_experiment.pdf
L1811_PID_experiment.pdfL1811_PID_experiment.pdf
L1811_PID_experiment.pdf
 
L1811_PID.pdf
L1811_PID.pdfL1811_PID.pdf
L1811_PID.pdf
 
2015 Reliability of complex systems
2015 Reliability of complex systems 2015 Reliability of complex systems
2015 Reliability of complex systems
 
Master Thesis Presentation
Master Thesis PresentationMaster Thesis Presentation
Master Thesis Presentation
 
Feedback control of_dynamic_systems
Feedback control of_dynamic_systemsFeedback control of_dynamic_systems
Feedback control of_dynamic_systems
 
B010411016
B010411016B010411016
B010411016
 
Bi-objective Optimization Apply to Environment a land Economic Dispatch Probl...
Bi-objective Optimization Apply to Environment a land Economic Dispatch Probl...Bi-objective Optimization Apply to Environment a land Economic Dispatch Probl...
Bi-objective Optimization Apply to Environment a land Economic Dispatch Probl...
 
Software Defect Repair Times: A Multiplicative Model
Software Defect Repair Times: A Multiplicative ModelSoftware Defect Repair Times: A Multiplicative Model
Software Defect Repair Times: A Multiplicative Model
 
A Methodology for Degradation Based Long-Term Analysis and Optimization of En...
A Methodology for Degradation Based Long-Term Analysis and Optimization of En...A Methodology for Degradation Based Long-Term Analysis and Optimization of En...
A Methodology for Degradation Based Long-Term Analysis and Optimization of En...
 
A New Approach for Design of Model Matching Controllers for Time Delay System...
A New Approach for Design of Model Matching Controllers for Time Delay System...A New Approach for Design of Model Matching Controllers for Time Delay System...
A New Approach for Design of Model Matching Controllers for Time Delay System...
 
Effect of stability indices on robustness and system response in coefficient ...
Effect of stability indices on robustness and system response in coefficient ...Effect of stability indices on robustness and system response in coefficient ...
Effect of stability indices on robustness and system response in coefficient ...
 
Quality tools
Quality toolsQuality tools
Quality tools
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
Multi-objective Optimization of PID Controller using Pareto-based Surrogate ...
Multi-objective Optimization of PID Controller using  Pareto-based Surrogate ...Multi-objective Optimization of PID Controller using  Pareto-based Surrogate ...
Multi-objective Optimization of PID Controller using Pareto-based Surrogate ...
 
An Overview of Performance Evaluation & Simulation
An Overview of Performance Evaluation & SimulationAn Overview of Performance Evaluation & Simulation
An Overview of Performance Evaluation & Simulation
 
reliability workshop
reliability workshopreliability workshop
reliability workshop
 
Ch1
Ch1Ch1
Ch1
 
Ch1
Ch1Ch1
Ch1
 
Ece4510 notes01
Ece4510 notes01Ece4510 notes01
Ece4510 notes01
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

f33-ft-computing-lec02-measures.ppt

  • 1. Oct. 2007 Terminology, Models, and Measures Slide 1 Fault-Tolerant Computing Basic Concepts and Tools
  • 2. Oct. 2007 Terminology, Models, and Measures Slide 2 About This Presentation Edition Released Revised Revised First Oct. 2006 Oct. 2007 This presentation has been prepared for the graduate course ECE 257A (Fault-Tolerant Computing) by Behrooz Parhami, Professor of Electrical and Computer Engineering at University of California, Santa Barbara. The material contained herein can be used freely in classroom teaching or any other educational setting. Unauthorized uses are prohibited. © Behrooz Parhami
  • 3. Oct. 2007 Terminology, Models, and Measures Slide 3 Terminology, Models, and Measures for Dependability
  • 4. Oct. 2007 Terminology, Models, and Measures Slide 4
  • 5. Oct. 2007 Terminology, Models, and Measures Slide 5 Impairments to Dependability
  • 6. Oct. 2007 Terminology, Models, and Measures Slide 6 The Fault-Error-Failure Cycle Schematic diagram of the Newcastle hierarchical model and the impairments within one level. Failure Aspect Impairment Structure Fault   State Error   Behavior Includes both components and design 0 0 0 Fault Correct signal Replaced with NAND?
  • 7. Oct. 2007 Terminology, Models, and Measures Slide 7 The Four-Universe Model Cause-effect diagram for Avižienis’ four-universe model of impairments to dependability. Universe Impairment Physical Failure   Logical Fault   Informational Error   External Crash
  • 8. Oct. 2007 Terminology, Models, and Measures Slide 8 Unrolling the Fault-Error-Failure Cycle Cause-effect diagram for an extended six-level view of impairments to dependability. Abstraction Impairment Component Defect   Logic Fault   Information Error   System Malfunction   Service Degradation   Result Failure Low- Level Mid- Level High- Level First Cycle Second Cycle Failure Aspect Impairment Structure Fault   State Error   Behavior
  • 9. Oct. 2007 Terminology, Models, and Measures Slide 9 Multilevel Model Component Logic Service Result Information System Low-Level Impaired Mid-Level Impaired High-Level Impaired Initial Entry Deviation Remedy Legned: Ideal Defective Faulty Erroneous Malfunctioning Degraded Failed Legend: Tolerance Entry
  • 10. Oct. 2007 Terminology, Models, and Measures Slide 10 Analogy for the Multilevel Model An analogy for our multi-level model of dependable computing. Defects, faults, errors, malfunctions, degradations, and failures are represented by pouring water from above. Valves represent avoidance and tolerance techniques. The goal is to avoid overflow.
  • 11. Oct. 2007 Terminology, Models, and Measures Slide 11 Why Our Concern with Dependability? Reliability of n-transistor system, each having failure rate l R(t) = e–nlt There are only 3 ways of making systems more reliable Reduce l Reduce n 1.0 0.8 0.6 0.4 0.2 0.0 e –n t l .9999 .9990 .9900 .9048 .3679 10 10 8 10 6 10 4 10 nt Reduce t Alternative: Change the reliability formula by introducing redundancy in system
  • 12. Oct. 2007 Terminology, Models, and Measures Slide 12 Highly Dependable Computer Systems Long-life systems: Fail-slow, Rugged, High-reliability Spacecraft with multiyear missions, systems in inaccessible locations Methods: Replication (spares), error coding, monitoring, shielding Safety-critical systems: Fail-safe, Sound, High-integrity Flight control computers, nuclear-plant shutdown, medical monitoring Methods: Replication with voting, time redundancy, design diversity Non-stop systems: Fail-soft, Robust, High-availability Telephone switching centers, transaction processing, e-commerce Methods: HW/info redundancy, backup schemes, hot-swap, recovery Just as performance enhancement techniques gradually migrate from supercomputers to desktops, so too dependability enhancement methods find their way from exotic systems into personal computers
  • 13. Oct. 2007 Terminology, Models, and Measures Slide 13 Aspects of Dependability
  • 14. Oct. 2007 Terminology, Models, and Measures Slide 14 Concepts from Probability Theory Cumulative distribution function: CDF F(t) = prob[x  t] = 0 f(x)dx t Probability density function: pdf f(t) = prob[t  x  t + dt] / dt = dF(t) / dt Time 0 10 20 30 40 50 Time 0 10 20 30 40 50 Time 0 10 20 30 40 50 1.0 0.8 0.6 0.4 0.2 0.0 CDF pdf 0.05 0.04 0.03 0.02 0.01 0.00 F(t) f(t) Expected value of x Ex = - xf(x)dx = k xk f(xk) + Lifetimes of 20 identical systems Covariance of x and y yx,y = E[(x – Ex)(y – Ey)] = E[xy] – Ex Ey Variance of x sx = - (x – Ex)2 f(x)dx = k (xk – Ex)2 f(xk) + 2
  • 15. Oct. 2007 Terminology, Models, and Measures Slide 15 Some Simple Probability Distributions CDF pdf F(x) f(x) 1 Uniform Exponential Normal Binomial CDF pdf CDF pdf CDF
  • 16. Oct. 2007 Terminology, Models, and Measures Slide 16 Reliability and MTTF Reliability: R(t) Probability that system remains in the “Good” state through the interval [0, t] Two-state nonrepairable system R(t + dt) = R(t) [1 – z(t)dt] Hazard function Constant hazard function z(t) = l  R(t) = e–lt (system failure rate is independent of its age) R(t) = 1 – F(t) CDF of the system lifetime, or its unreliability Exponential reliability law Mean time to failure: MTTF MTTF =  0 tf(t)dt = 0 R(t)dt + + Expected value of lifetime Area under the reliability curve (easily provable) Start state Failure Up Down
  • 17. Oct. 2007 Terminology, Models, and Measures Slide 17 Failure Distributions of Interest Exponential: z(t) = l R(t) = e–lt MTTF = 1/l Weibull: z(t) = al(lt)a–1 R(t) = e(-lt)a MTTF = (1/l) G(1 + 1/a) Erlang: MTTF = k/l Gamma: Erlang and exponential are special cases Normal: Reliability and MTTF formulas are complicated Rayleigh: z(t) = 2l(lt) R(t) = e(-lt)2 MTTF = (1/l) p / 2 Discrete versions Geometric Binomial Discrete Weibull R(k) = q k
  • 18. Oct. 2007 Terminology, Models, and Measures Slide 18 Comparing Reliabilities Reliability gain: R2 / R1 Reliability difference: R2 – R1 Reliability functions for Systems 1/2 Reliability improv. index RII = log R1(tM) / log R2(tM) System Reliability (R) Mission time extension MTE2/1(rG) = T2(rG) – T1(rG) Mission time improv. factor: MTIF2/1(rG) = T2(rG) / T1(rG) Reliability improvement factor RIF2/1 = [1–R1(tM)]/[1–R2(tM)] Example: [1 – 0.9] / [1 – 0.99] = 10
  • 19. Oct. 2007 Terminology, Models, and Measures Slide 19 Availability, MTTR, and MTBF (Interval) Availability: A(t) Fraction of time that system is in the “Up” state during the interval [0, t] Two-state repairable system Availability = Reliability, when there is no repair Availability is a function not only of how rarely a system fails (reliability) but also of how quickly it can be repaired (time to repair) Pointwise availability: a(t) Probability that system available at time t A(t) = (1/t)  0 a(x)dx t Steady-state availability: A = limt A(t) MTTF MTTF m MTTF + MTTR MTBF l + m A = = = Repair rate 1/m = MTTR (Will justify this equation later) In general, m >> l, leading to A  1 Repair Start state Failure Up Down
  • 20. Oct. 2007 Terminology, Models, and Measures Slide 20 System Up and Down Times Time Up Down 0 t Time to first failure Time between failures Repair time t1 t2 t'1 t'2 Short repair time implies good maintainability (serviceability) Repair Start state Failure Up Down
  • 21. Oct. 2007 Terminology, Models, and Measures Slide 21 Performability and MCBF Performability: P Composite measure, incorporating both performance and reliability Three-state degradable system P = 2pUp2 + pUp1 Simple example Worth of “Up2” twice that of “Up1” pUpi = probability system is in state Upi t pUp2 = 0.92, pUp1 = 0.06, pDown = 0.02, P = 1.90 (system performance equiv. To that of 1.9 processors on average) Performability improvement factor of this system (akin to RIF) relative to a fail-hard system that goes down when either processor fails: PIF = (2 – 2  0.92) / (2 – 1.90) = 1.6 Question: What is system availability here? Repair Partial repair Start state Failure Partial failure Up 1 Down Up 2
  • 22. Oct. 2007 Terminology, Models, and Measures Slide 22 Time Up Down 0 t t1 t2 t'2 t'1 t3 t'3 Partial Failure Total Failure Partial Repair Partially Up System Up, Partially Up, and Down Times Important to prevent direct transitions to the “Down” state (coverage) MCBF Repair Partial repair Start state Failure Partial failure Up 1 Down Up 2
  • 23. Oct. 2007 Terminology, Models, and Measures Slide 23 Integrity and Safety Risk: Prob. of being in “Unsafe Failed” state There may be multiple unsafe states, each with a different consequence (cost) Three-state fail-safe system Simple analysis Lump “Safe Failed” state with “Good” state; proceed as in reliability analysis More detailed analysis Even though “Safe Failed” state is more desirable than “Unsafe Failed”, it is still not as desirable as the “Good” state; so keeping it separate makes sense For example, if a repair transition is introduced between “Safe Failed” and “Good” states, we can tackle questions such as the expected outage of the system in safe mode, and thus its availability Safe failed Unsafe failed Failure Failure Start state Good