SlideShare a Scribd company logo
1 of 23
Download to read offline
Dependable Systems
!
Dependability Means
Dr. Peter Tröger

!
Sources: 

!
J.C. Laprie. Dependability: Basic Concepts and Terminology

Eusgeld, Irene et al.: Dependability Metrics. 4909. Springer Publishing, 2008

Echtle, Klaus: Fehlertoleranzverfahren. Heidelberg, Germany : Springer Verlag, 1990.

Pfister, Gregory F.: High Availability. In: In Search of Clusters. , S. 379-452

!
Dependable Systems Course PT 2014
Dependability
• Umbrella term for operational requirements on a system

• IFIP WG 10.4: "[..] the trustworthiness of a computing system which allows reliance
to be justifiably placed on the service it delivers [..]"

• IEC IEV: "dependability (is) the collective term used to describe the availability
performance and its influencing factors : reliability performance, maintainability
performance and maintenance support performance"

• Laprie: „ Trustworthiness of a computer system such that 

reliance can be placed on the service it delivers to the user “

• Adds a third dimension to system quality

• General question: How to deal with unexpected events ?

• In German: ,Verlässlichkeit‘ vs. ,Zuverlässigkeit‘

2
Dependable Systems Course PT 2014
Dependability Tree (Laprie)
3
Dependable Systems Course PT 2014
Means for Dependability
• Fault prevention - Prevent fault occurrence or introduction

• Fault tolerance - Provide service matching the specification under faults

• Fault removal - How to reduce the presence of faults

• Fault forecasting- Estimate the present number, future incidence, and the
consequences of faults

• Combined utilization
4
Dependable Systems Course PT 2014
Dependability Means (Laprie)
• Offline / online techniques

• Fault intolerance techniques

• Fault prevention - Prevent fault occurrence or introduction

• Fault removal - Reduce the presence of faults

• 100% fault-free servicing for the whole life time is not possible

• Fault tolerance techniques

• Fault forecasting - Estimate the present number, future incidence, and the
consequences of faults

• Fault tolerance - Provide service complying with specification in spite of faults

• Problems with coverage and validation of the validator
5
Dependable Systems Course PT 2014
Dependability Means (Echtle)
6
Dependable Systems Course PT 2014
Fault Prevention
• Specific approaches for avoiding faults

• Specialized specification formalisms and techniques

• Specialized development / manufacturing process to prevent design faults

• Shielding

• Only use ultra-reliable components

• Choice of programming language

• General engineering approaches

• Software engineering procedures

• Quality management regulations and enforcement

• Training and organization of maintenance departments
7
Dependable Systems Course PT 2014
Fault Removal
• Make faults disappear before fault tolerance becomes relevant

• Step 1: Verification

• Check if the system adheres to verification conditions; if not, take next steps

• Static verification: Static analysis, data flow analysis, compiler checks

• Dynamic verification: Symbolic execution or verification testing

• Step 2: Diagnosis

• Find the faults that influenced the verification conditions

• Step 3: Correction

• Fix the problem, repeat the steps (regression)

• Fault removal during operation: Corrective maintenance (curative / preventive)
8
Dependable Systems Course PT 2014
Testing
• Selecting test inputs is driven from different view points

• Testing purpose: conformance testing, fault-finding testing
• System model: functional testing (with functional model) or structural testing

• Fault model: enables fault-based testing

• Deterministic testing vs. random testing

• Structural testing of hardware is fault-finding, fault-based, structural testing

• Structural testing of software is fault-finding, non-fault-based, structural testing

• Golden unit: Reference system for comparison of output for a given input

(e.g. reference implementation)
9
Dependable Systems Course PT 2014
Fault Tolerance
• Fault tolerance is the ability of a system to operate correctly in presence of faults. 

or

• A system S is called k-fault-tolerant with respect to a set of 

algorithms {A1, A2, ... , Ap} and a set of faults {F1, F2, ... , Fp} 

if for every k-fault F in S, Ai is executable by a subsystem of system S with k faults.
(Hayes, 9/76)

or

• Fault tolerance is the use of redundancy (time or space) to achieve the desired level
of system dependability - costs !

• Accepts that an implemented system will not be fault-free

• Implements automatic recovery from errors

• Is a recursive concept (voter replication, self-checking checkers, stable memory)
10
Dependable Systems Course PT 2014
Fault Tolerance
• Typical design methodology in many technical and biological systems

• Spare wheel in cars, redundant organs, ...

• Fault tolerance mechanisms need to be evaluated by dependability attributes

• Minimum, maximum, average reliability and availability

• Easy to formulate and understand, hard to prove - failure rate remains unknown

• Quantitative limits based on fault model (which faults in which components)

• Typically ,one-fault-at-a-time‘ assumption

• Different attributes of fault tolerance implementation to be checked

• Functional verification, sensitivity analysis, minimum amount of resource resp.
computational overhead, implementation performance, transparency, portability
11
Error Processing
Dependable Systems Course PT 2014
Phases of Fault Tolerance (Hanmer)
12
Latent
Fault
Error
Normal
OperationFault
Activation
Error Recovery
Error Mitigation
Error
Detection
Fault
Treatment
Dependable Systems Course PT 2014
Decomposition of Fault Tolerance (Lee & Anderson)
• Error detection
• Presence of fault is deducted by detecting an error in some subsystem

• Implies failure of the according component
• Damage confinement
• Delimit damage caused due to the component failure
• Error processing - recovery / compensation

• System recovers from the effect of an error

• Fault treatment 

• Ensure that fault does not cause again failures
13
Dependable Systems Course PT 2014
Fault Tolerance - Error Detection
• Replication check
• Output of replicated components is compared / voted

• Independent failures, physical causes -> many replicas possible (e.g. HW)

• Finds also design faults, if replicated components are from different vendors

• Timing checks (‚watchdog timers‘)

• Timing violation often implies that component output is also incorrect

• Typical solution for node failure detection in a distributed system

• Reasonableness checks - Run-time range checks, assertions

• Structural and coding checks, diagnostics checks, algorithmic checks

• Ideal: Self-checking component with clear error confinement areas
14
Dependable Systems Course PT 2014
Fault Tolerance - Error Detection
• Replication checks are powerful and expensive, examples:

• Execute identical copies on different hardware (component failures)

• Execute separate and different versions (assumes independent design faults)

• Execute same copies different times (transient faults)

• Replicate only portion of the system

• Works for both hardware and software

• Signaling aspect in the error detection task

• Typical software model are exceptions, a way for implementing forward recovery

• Combination fault detection and fault location
15 (C) IEEE
Dependable Systems Course PT 2014
Fault Tolerance - Damage Confinement (Taylor)
• System decomposition
• Every communication link might enable damage spreading

• Introduce mutual suspicion

• Hardware-based separation of software components

• OS-based separation (processes, runtime monitors, special shells)

• Law-governed architecture
• Externalize contrains on interaction by runtime rules

• Strongly-typed language
• Language guarantees the absence of unintended control flows
16
Dependable Systems Course PT 2014
Fault Tolerance - Error Processing Through Recovery
• Forward error recovery

• Error is masked to reach again a consistent state (fault compensation)

• Corrective actions need detailled knowledge (damage assessment)

• New state is typically computed in another way

• Examples with compensation: Error correcting codes, non-journaling file
system check, advanced exception handlers, voters

• Backward error recovery

• Roll back to previous consistent state (recovery point / checkpoint) 

• Very suitable for transient faults

• Computation can be re-done with same components (retry), 

with alternate components (reconfigure), or can be ignored (skip frame)
17
Dependable Systems Course PT 2014
Forward & Backward Recovery - Example
• Transaction processing systems support ACID through journaling and backups

• Journal: Audit trail of transactions and database changes

• Forward recovery: Restore a database backup and apply the transaction journal 

• Journal provides the intended transaction operation (write-ahead logging) and
the ,after-image‘ for completed transactions

• Replay of all conducted and intended transactions from safe point in history

• Backward recovery (,backout‘): Apply journal ,backwards‘ on current database

• Demands ,before-image‘ before each database modification

• Involves logic of reprocessing aborted transactions, which is time-consuming

• Typically both applied - forward recovery for reaching a stable database status,
and backward recovery of modifications from unfinished transactions
18
Dependable Systems Course PT 2014
Fault Tolerance - Fault Treatment
• Fault diagnosis - determine error cause‘s location and nature

• Fault passivation - (remove faulty component &) reconfigure system

• Error processing might already remove the fault - ,soft fault‘

• Typical example are temporary faults

• Fault tolerance manager

• Careful diagnosis with hardware support

• Damage assessment by disabling faulty components automatically

• Example: IBM mainframe architecture

• Software rejuvenation

• Gracefully terminating an application and immediately restarting it at a
clean internal state
19
Dependable Systems Course PT 2014
Fault Tolerant Mindset (Hanmer)
• What can go wrong in any given situation ?

• Mindset to be applied in all development stages

• „Every problem in computer science boils down to tradeoffs“ [Henschen]

• Fault prevention vs. fault tolerance vs. failure severity

• KISS principles, leave out „bells and whistles“

• Incremental additions of reliability - long-term products

• Defensive Programming

• Simple error handling; fix root cause, not symptoms; make data auditble; 

make code maintainable;
20
Dependable Systems Course PT 2014
„Fail-Fast“
• A common concept from system engineering, company management, ...

• „Report failure and stop immediately without further action“
• Discussed by Jim Gray in 1985 as part of his famous article

„Why do computers stop and what can be done about it ?“

• Useful when benefit from recovery is not good enough for its costs, 

or if error propagation is highly probable

• Single units of a redundant set

• Deeply interwired IT system components

• Components under heavy request load

• And ... crappy start-up companies
21
Dependable Systems Course PT 2014
Fault Tolerant Design Methodology (Hanmer)
• Assess things that can go wrong with the system (e.g. fault trees).

• Find potential risks and according system failures.

• Define strategies to mitigate the identified risks.

• Failure avoidance options, prevent faults from activation

• Create a mental model of the system design with redundancy.

• Design error detection and error processing capabilities.

• Design in the failure mitigation capabilities.

• Design human-computer interactions and modes of management.
22
Dependable Systems Course PT 2014
Dependable Design Strategies (Malek)
• Decompose the system 

• Identify fault classes, fault latency and fault impact for the components

• Identify “weak spots” and assess potential damage

• Integrate partial recovery / reintegration / restart 

• Determine qualitative and quantitative specs for fault tolerance and evaluate your
design in specific environment 

• Develop / utilize fault and error detection techniques and algorithms 

• Develop / utilize fault isolation techniques and algorithms 

• Refine fault tolerance, iterate for improvement

• Re-use proven components, but be aware of integration issues
23

More Related Content

What's hot

Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Peter Tröger
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Peter Tröger
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.Peter Tröger
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsPeter Tröger
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Peter Tröger
 
Availability tactics
Availability tacticsAvailability tactics
Availability tacticsahsan riaz
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault ToleranceAnkit Singh
 
Formal Verification
Formal VerificationFormal Verification
Formal VerificationIlia Levin
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance Systemprakashjjaya
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computingPalani murugan
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant systemarvinthsaran
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance SystemEhsan Ilahi
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systemsleo3004
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentationskadyan1
 
Reverse Engineering of Software Architecture
Reverse Engineering of Software ArchitectureReverse Engineering of Software Architecture
Reverse Engineering of Software ArchitectureDharmalingam Ganesan
 

What's hot (20)

Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissions
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Availability tactics
Availability tacticsAvailability tactics
Availability tactics
 
SRE Tools
SRE ToolsSRE Tools
SRE Tools
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault Tolerance
 
Bangalore march07
Bangalore march07Bangalore march07
Bangalore march07
 
Formal Verification
Formal VerificationFormal Verification
Formal Verification
 
Sw testing
Sw testingSw testing
Sw testing
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computing
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant system
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systems
 
Fault tolerance techniques
Fault tolerance techniquesFault tolerance techniques
Fault tolerance techniques
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentation
 
Reverse Engineering of Software Architecture
Reverse Engineering of Software ArchitectureReverse Engineering of Software Architecture
Reverse Engineering of Software Architecture
 

Similar to Dependable Systems -Dependability Means (3/16)

Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...Chromatography & Mass Spectrometry Solutions
 
Introduction to performance testing
Introduction to performance testingIntroduction to performance testing
Introduction to performance testingRichard Bishop
 
SENG202-v-and-v-modeling_121810.pptx
SENG202-v-and-v-modeling_121810.pptxSENG202-v-and-v-modeling_121810.pptx
SENG202-v-and-v-modeling_121810.pptxMinsasWorld
 
Risk based testing and random testing
Risk based testing and random testingRisk based testing and random testing
Risk based testing and random testingHimanshu
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applicationsGR8Conf
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testingBipul Roy Bpl
 
testing strategies and tactics
 testing strategies and tactics testing strategies and tactics
testing strategies and tacticsPreeti Mishra
 
Innovation day 2013 2.5 joris vanderschrick (verhaert) - embedded system de...
Innovation day 2013   2.5 joris vanderschrick (verhaert) - embedded system de...Innovation day 2013   2.5 joris vanderschrick (verhaert) - embedded system de...
Innovation day 2013 2.5 joris vanderschrick (verhaert) - embedded system de...Verhaert Masters in Innovation
 
6. System and its Life Cycle.pptx
6. System and its Life Cycle.pptx6. System and its Life Cycle.pptx
6. System and its Life Cycle.pptxAminaButt14
 
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...ShudipPal
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.Khushboo Shaukat
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality AssuranceSaqib Raza
 
Context Driven Automation Gtac 2008
Context Driven Automation Gtac 2008Context Driven Automation Gtac 2008
Context Driven Automation Gtac 2008Pete Schneider
 
Software testing-and-analysis
Software testing-and-analysisSoftware testing-and-analysis
Software testing-and-analysisWBUTTUTORIALS
 
Process and Regulated Processes Software Validation Elements
Process and Regulated Processes Software Validation ElementsProcess and Regulated Processes Software Validation Elements
Process and Regulated Processes Software Validation ElementsArta Doci
 
Data Integrity II - Chromatography data system (CDS) in Pharma
Data Integrity II - Chromatography data system (CDS) in PharmaData Integrity II - Chromatography data system (CDS) in Pharma
Data Integrity II - Chromatography data system (CDS) in PharmaSathish Vemula
 
Quality attributes in software architecture
Quality attributes in software architectureQuality attributes in software architecture
Quality attributes in software architectureGang Tao
 
Testing- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solutionTesting- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solutionMazenetsolution
 

Similar to Dependable Systems -Dependability Means (3/16) (20)

Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
 
Introduction to performance testing
Introduction to performance testingIntroduction to performance testing
Introduction to performance testing
 
SENG202-v-and-v-modeling_121810.pptx
SENG202-v-and-v-modeling_121810.pptxSENG202-v-and-v-modeling_121810.pptx
SENG202-v-and-v-modeling_121810.pptx
 
Risk based testing and random testing
Risk based testing and random testingRisk based testing and random testing
Risk based testing and random testing
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testing
 
testing strategies and tactics
 testing strategies and tactics testing strategies and tactics
testing strategies and tactics
 
Innovation day 2013 2.5 joris vanderschrick (verhaert) - embedded system de...
Innovation day 2013   2.5 joris vanderschrick (verhaert) - embedded system de...Innovation day 2013   2.5 joris vanderschrick (verhaert) - embedded system de...
Innovation day 2013 2.5 joris vanderschrick (verhaert) - embedded system de...
 
6. System and its Life Cycle.pptx
6. System and its Life Cycle.pptx6. System and its Life Cycle.pptx
6. System and its Life Cycle.pptx
 
System testing
System testingSystem testing
System testing
 
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality Assurance
 
Context Driven Automation Gtac 2008
Context Driven Automation Gtac 2008Context Driven Automation Gtac 2008
Context Driven Automation Gtac 2008
 
Software testing-and-analysis
Software testing-and-analysisSoftware testing-and-analysis
Software testing-and-analysis
 
Process and Regulated Processes Software Validation Elements
Process and Regulated Processes Software Validation ElementsProcess and Regulated Processes Software Validation Elements
Process and Regulated Processes Software Validation Elements
 
Data Integrity II - Chromatography data system (CDS) in Pharma
Data Integrity II - Chromatography data system (CDS) in PharmaData Integrity II - Chromatography data system (CDS) in Pharma
Data Integrity II - Chromatography data system (CDS) in Pharma
 
Quality attributes in software architecture
Quality attributes in software architectureQuality attributes in software architecture
Quality attributes in software architecture
 
Testing- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solutionTesting- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solution
 
Software automation
Software automationSoftware automation
Software automation
 

More from Peter Tröger

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspectivePeter Tröger
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and VirtualizationPeter Tröger
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Peter Tröger
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded SystemsPeter Tröger
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.Peter Tröger
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Peter Tröger
 
Operating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryOperating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryPeter Tröger
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyPeter Tröger
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsPeter Tröger
 
Operating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsOperating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsPeter Tröger
 
Operating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - HistoryOperating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - HistoryPeter Tröger
 
Operating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware BasicsOperating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware BasicsPeter Tröger
 
Operating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / OutputOperating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / OutputPeter Tröger
 
Operating Systems 1 (3/12) - Architectures
Operating Systems 1 (3/12) - ArchitecturesOperating Systems 1 (3/12) - Architectures
Operating Systems 1 (3/12) - ArchitecturesPeter Tröger
 
Operating Systems 1 (6/12) - Processes
Operating Systems 1 (6/12) - ProcessesOperating Systems 1 (6/12) - Processes
Operating Systems 1 (6/12) - ProcessesPeter Tröger
 
Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)Peter Tröger
 
Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)Peter Tröger
 

More from Peter Tröger (17)

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspective
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded Systems
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0
 
Operating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryOperating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - Summary
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - Concurrency
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management Concepts
 
Operating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsOperating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - Threads
 
Operating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - HistoryOperating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - History
 
Operating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware BasicsOperating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware Basics
 
Operating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / OutputOperating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / Output
 
Operating Systems 1 (3/12) - Architectures
Operating Systems 1 (3/12) - ArchitecturesOperating Systems 1 (3/12) - Architectures
Operating Systems 1 (3/12) - Architectures
 
Operating Systems 1 (6/12) - Processes
Operating Systems 1 (6/12) - ProcessesOperating Systems 1 (6/12) - Processes
Operating Systems 1 (6/12) - Processes
 
Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)
 
Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)
 

Recently uploaded

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Recently uploaded (20)

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

Dependable Systems -Dependability Means (3/16)

  • 1. Dependable Systems ! Dependability Means Dr. Peter Tröger ! Sources: ! J.C. Laprie. Dependability: Basic Concepts and Terminology Eusgeld, Irene et al.: Dependability Metrics. 4909. Springer Publishing, 2008 Echtle, Klaus: Fehlertoleranzverfahren. Heidelberg, Germany : Springer Verlag, 1990. Pfister, Gregory F.: High Availability. In: In Search of Clusters. , S. 379-452 !
  • 2. Dependable Systems Course PT 2014 Dependability • Umbrella term for operational requirements on a system • IFIP WG 10.4: "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]" • IEC IEV: "dependability (is) the collective term used to describe the availability performance and its influencing factors : reliability performance, maintainability performance and maintenance support performance" • Laprie: „ Trustworthiness of a computer system such that 
 reliance can be placed on the service it delivers to the user “ • Adds a third dimension to system quality • General question: How to deal with unexpected events ? • In German: ,Verlässlichkeit‘ vs. ,Zuverlässigkeit‘
 2
  • 3. Dependable Systems Course PT 2014 Dependability Tree (Laprie) 3
  • 4. Dependable Systems Course PT 2014 Means for Dependability • Fault prevention - Prevent fault occurrence or introduction • Fault tolerance - Provide service matching the specification under faults • Fault removal - How to reduce the presence of faults • Fault forecasting- Estimate the present number, future incidence, and the consequences of faults • Combined utilization 4
  • 5. Dependable Systems Course PT 2014 Dependability Means (Laprie) • Offline / online techniques • Fault intolerance techniques • Fault prevention - Prevent fault occurrence or introduction • Fault removal - Reduce the presence of faults • 100% fault-free servicing for the whole life time is not possible • Fault tolerance techniques • Fault forecasting - Estimate the present number, future incidence, and the consequences of faults • Fault tolerance - Provide service complying with specification in spite of faults • Problems with coverage and validation of the validator 5
  • 6. Dependable Systems Course PT 2014 Dependability Means (Echtle) 6
  • 7. Dependable Systems Course PT 2014 Fault Prevention • Specific approaches for avoiding faults • Specialized specification formalisms and techniques • Specialized development / manufacturing process to prevent design faults • Shielding • Only use ultra-reliable components • Choice of programming language • General engineering approaches • Software engineering procedures • Quality management regulations and enforcement • Training and organization of maintenance departments 7
  • 8. Dependable Systems Course PT 2014 Fault Removal • Make faults disappear before fault tolerance becomes relevant • Step 1: Verification • Check if the system adheres to verification conditions; if not, take next steps • Static verification: Static analysis, data flow analysis, compiler checks • Dynamic verification: Symbolic execution or verification testing • Step 2: Diagnosis • Find the faults that influenced the verification conditions • Step 3: Correction • Fix the problem, repeat the steps (regression) • Fault removal during operation: Corrective maintenance (curative / preventive) 8
  • 9. Dependable Systems Course PT 2014 Testing • Selecting test inputs is driven from different view points • Testing purpose: conformance testing, fault-finding testing • System model: functional testing (with functional model) or structural testing • Fault model: enables fault-based testing • Deterministic testing vs. random testing • Structural testing of hardware is fault-finding, fault-based, structural testing • Structural testing of software is fault-finding, non-fault-based, structural testing • Golden unit: Reference system for comparison of output for a given input
 (e.g. reference implementation) 9
  • 10. Dependable Systems Course PT 2014 Fault Tolerance • Fault tolerance is the ability of a system to operate correctly in presence of faults. or • A system S is called k-fault-tolerant with respect to a set of 
 algorithms {A1, A2, ... , Ap} and a set of faults {F1, F2, ... , Fp} 
 if for every k-fault F in S, Ai is executable by a subsystem of system S with k faults. (Hayes, 9/76) or • Fault tolerance is the use of redundancy (time or space) to achieve the desired level of system dependability - costs ! • Accepts that an implemented system will not be fault-free • Implements automatic recovery from errors • Is a recursive concept (voter replication, self-checking checkers, stable memory) 10
  • 11. Dependable Systems Course PT 2014 Fault Tolerance • Typical design methodology in many technical and biological systems • Spare wheel in cars, redundant organs, ... • Fault tolerance mechanisms need to be evaluated by dependability attributes • Minimum, maximum, average reliability and availability • Easy to formulate and understand, hard to prove - failure rate remains unknown • Quantitative limits based on fault model (which faults in which components) • Typically ,one-fault-at-a-time‘ assumption • Different attributes of fault tolerance implementation to be checked • Functional verification, sensitivity analysis, minimum amount of resource resp. computational overhead, implementation performance, transparency, portability 11
  • 12. Error Processing Dependable Systems Course PT 2014 Phases of Fault Tolerance (Hanmer) 12 Latent Fault Error Normal OperationFault Activation Error Recovery Error Mitigation Error Detection Fault Treatment
  • 13. Dependable Systems Course PT 2014 Decomposition of Fault Tolerance (Lee & Anderson) • Error detection • Presence of fault is deducted by detecting an error in some subsystem • Implies failure of the according component • Damage confinement • Delimit damage caused due to the component failure • Error processing - recovery / compensation • System recovers from the effect of an error • Fault treatment • Ensure that fault does not cause again failures 13
  • 14. Dependable Systems Course PT 2014 Fault Tolerance - Error Detection • Replication check • Output of replicated components is compared / voted • Independent failures, physical causes -> many replicas possible (e.g. HW) • Finds also design faults, if replicated components are from different vendors • Timing checks (‚watchdog timers‘) • Timing violation often implies that component output is also incorrect • Typical solution for node failure detection in a distributed system • Reasonableness checks - Run-time range checks, assertions • Structural and coding checks, diagnostics checks, algorithmic checks • Ideal: Self-checking component with clear error confinement areas 14
  • 15. Dependable Systems Course PT 2014 Fault Tolerance - Error Detection • Replication checks are powerful and expensive, examples: • Execute identical copies on different hardware (component failures) • Execute separate and different versions (assumes independent design faults) • Execute same copies different times (transient faults) • Replicate only portion of the system • Works for both hardware and software • Signaling aspect in the error detection task • Typical software model are exceptions, a way for implementing forward recovery • Combination fault detection and fault location 15 (C) IEEE
  • 16. Dependable Systems Course PT 2014 Fault Tolerance - Damage Confinement (Taylor) • System decomposition • Every communication link might enable damage spreading • Introduce mutual suspicion • Hardware-based separation of software components • OS-based separation (processes, runtime monitors, special shells) • Law-governed architecture • Externalize contrains on interaction by runtime rules • Strongly-typed language • Language guarantees the absence of unintended control flows 16
  • 17. Dependable Systems Course PT 2014 Fault Tolerance - Error Processing Through Recovery • Forward error recovery • Error is masked to reach again a consistent state (fault compensation) • Corrective actions need detailled knowledge (damage assessment) • New state is typically computed in another way • Examples with compensation: Error correcting codes, non-journaling file system check, advanced exception handlers, voters • Backward error recovery • Roll back to previous consistent state (recovery point / checkpoint) • Very suitable for transient faults • Computation can be re-done with same components (retry), 
 with alternate components (reconfigure), or can be ignored (skip frame) 17
  • 18. Dependable Systems Course PT 2014 Forward & Backward Recovery - Example • Transaction processing systems support ACID through journaling and backups • Journal: Audit trail of transactions and database changes • Forward recovery: Restore a database backup and apply the transaction journal • Journal provides the intended transaction operation (write-ahead logging) and the ,after-image‘ for completed transactions • Replay of all conducted and intended transactions from safe point in history • Backward recovery (,backout‘): Apply journal ,backwards‘ on current database • Demands ,before-image‘ before each database modification • Involves logic of reprocessing aborted transactions, which is time-consuming • Typically both applied - forward recovery for reaching a stable database status, and backward recovery of modifications from unfinished transactions 18
  • 19. Dependable Systems Course PT 2014 Fault Tolerance - Fault Treatment • Fault diagnosis - determine error cause‘s location and nature • Fault passivation - (remove faulty component &) reconfigure system • Error processing might already remove the fault - ,soft fault‘ • Typical example are temporary faults • Fault tolerance manager • Careful diagnosis with hardware support • Damage assessment by disabling faulty components automatically • Example: IBM mainframe architecture • Software rejuvenation • Gracefully terminating an application and immediately restarting it at a clean internal state 19
  • 20. Dependable Systems Course PT 2014 Fault Tolerant Mindset (Hanmer) • What can go wrong in any given situation ? • Mindset to be applied in all development stages • „Every problem in computer science boils down to tradeoffs“ [Henschen] • Fault prevention vs. fault tolerance vs. failure severity • KISS principles, leave out „bells and whistles“ • Incremental additions of reliability - long-term products • Defensive Programming • Simple error handling; fix root cause, not symptoms; make data auditble; 
 make code maintainable; 20
  • 21. Dependable Systems Course PT 2014 „Fail-Fast“ • A common concept from system engineering, company management, ... • „Report failure and stop immediately without further action“ • Discussed by Jim Gray in 1985 as part of his famous article
 „Why do computers stop and what can be done about it ?“ • Useful when benefit from recovery is not good enough for its costs, 
 or if error propagation is highly probable • Single units of a redundant set • Deeply interwired IT system components • Components under heavy request load • And ... crappy start-up companies 21
  • 22. Dependable Systems Course PT 2014 Fault Tolerant Design Methodology (Hanmer) • Assess things that can go wrong with the system (e.g. fault trees). • Find potential risks and according system failures. • Define strategies to mitigate the identified risks. • Failure avoidance options, prevent faults from activation • Create a mental model of the system design with redundancy. • Design error detection and error processing capabilities. • Design in the failure mitigation capabilities. • Design human-computer interactions and modes of management. 22
  • 23. Dependable Systems Course PT 2014 Dependable Design Strategies (Malek) • Decompose the system • Identify fault classes, fault latency and fault impact for the components • Identify “weak spots” and assess potential damage • Integrate partial recovery / reintegration / restart • Determine qualitative and quantitative specs for fault tolerance and evaluate your design in specific environment • Develop / utilize fault and error detection techniques and algorithms • Develop / utilize fault isolation techniques and algorithms • Refine fault tolerance, iterate for improvement • Re-use proven components, but be aware of integration issues 23