SlideShare a Scribd company logo
1 of 23
Download to read offline
Dependable Systems
!
Dependability Means
Dr. Peter Tröger

!
Sources: 

!
J.C. Laprie. Dependability: Basic Concepts and Terminology

Eusgeld, Irene et al.: Dependability Metrics. 4909. Springer Publishing, 2008

Echtle, Klaus: Fehlertoleranzverfahren. Heidelberg, Germany : Springer Verlag, 1990.

Pfister, Gregory F.: High Availability. In: In Search of Clusters. , S. 379-452

!
Dependable Systems Course PT 2014
Dependability
• Umbrella term for operational requirements on a system

• IFIP WG 10.4: "[..] the trustworthiness of a computing system which allows reliance
to be justifiably placed on the service it delivers [..]"

• IEC IEV: "dependability (is) the collective term used to describe the availability
performance and its influencing factors : reliability performance, maintainability
performance and maintenance support performance"

• Laprie: „ Trustworthiness of a computer system such that 

reliance can be placed on the service it delivers to the user “

• Adds a third dimension to system quality

• General question: How to deal with unexpected events ?

• In German: ,Verlässlichkeit‘ vs. ,Zuverlässigkeit‘

2
Dependable Systems Course PT 2014
Dependability Tree (Laprie)
3
Dependable Systems Course PT 2014
Means for Dependability
• Fault prevention - Prevent fault occurrence or introduction

• Fault tolerance - Provide service matching the specification under faults

• Fault removal - How to reduce the presence of faults

• Fault forecasting- Estimate the present number, future incidence, and the
consequences of faults

• Combined utilization
4
Dependable Systems Course PT 2014
Dependability Means (Laprie)
• Offline / online techniques

• Fault intolerance techniques

• Fault prevention - Prevent fault occurrence or introduction

• Fault removal - Reduce the presence of faults

• 100% fault-free servicing for the whole life time is not possible

• Fault tolerance techniques

• Fault forecasting - Estimate the present number, future incidence, and the
consequences of faults

• Fault tolerance - Provide service complying with specification in spite of faults

• Problems with coverage and validation of the validator
5
Dependable Systems Course PT 2014
Dependability Means (Echtle)
6
Dependable Systems Course PT 2014
Fault Prevention
• Specific approaches for avoiding faults

• Specialized specification formalisms and techniques

• Specialized development / manufacturing process to prevent design faults

• Shielding

• Only use ultra-reliable components

• Choice of programming language

• General engineering approaches

• Software engineering procedures

• Quality management regulations and enforcement

• Training and organization of maintenance departments
7
Dependable Systems Course PT 2014
Fault Removal
• Make faults disappear before fault tolerance becomes relevant

• Step 1: Verification

• Check if the system adheres to verification conditions; if not, take next steps

• Static verification: Static analysis, data flow analysis, compiler checks

• Dynamic verification: Symbolic execution or verification testing

• Step 2: Diagnosis

• Find the faults that influenced the verification conditions

• Step 3: Correction

• Fix the problem, repeat the steps (regression)

• Fault removal during operation: Corrective maintenance (curative / preventive)
8
Dependable Systems Course PT 2014
Testing
• Selecting test inputs is driven from different view points

• Testing purpose: conformance testing, fault-finding testing
• System model: functional testing (with functional model) or structural testing

• Fault model: enables fault-based testing

• Deterministic testing vs. random testing

• Structural testing of hardware is fault-finding, fault-based, structural testing

• Structural testing of software is fault-finding, non-fault-based, structural testing

• Golden unit: Reference system for comparison of output for a given input

(e.g. reference implementation)
9
Dependable Systems Course PT 2014
Fault Tolerance
• Fault tolerance is the ability of a system to operate correctly in presence of faults. 

or

• A system S is called k-fault-tolerant with respect to a set of 

algorithms {A1, A2, ... , Ap} and a set of faults {F1, F2, ... , Fp} 

if for every k-fault F in S, Ai is executable by a subsystem of system S with k faults.
(Hayes, 9/76)

or

• Fault tolerance is the use of redundancy (time or space) to achieve the desired level
of system dependability - costs !

• Accepts that an implemented system will not be fault-free

• Implements automatic recovery from errors

• Is a recursive concept (voter replication, self-checking checkers, stable memory)
10
Dependable Systems Course PT 2014
Fault Tolerance
• Typical design methodology in many technical and biological systems

• Spare wheel in cars, redundant organs, ...

• Fault tolerance mechanisms need to be evaluated by dependability attributes

• Minimum, maximum, average reliability and availability

• Easy to formulate and understand, hard to prove - failure rate remains unknown

• Quantitative limits based on fault model (which faults in which components)

• Typically ,one-fault-at-a-time‘ assumption

• Different attributes of fault tolerance implementation to be checked

• Functional verification, sensitivity analysis, minimum amount of resource resp.
computational overhead, implementation performance, transparency, portability
11
Error Processing
Dependable Systems Course PT 2014
Phases of Fault Tolerance (Hanmer)
12
Latent
Fault
Error
Normal
OperationFault
Activation
Error Recovery
Error Mitigation
Error
Detection
Fault
Treatment
Dependable Systems Course PT 2014
Decomposition of Fault Tolerance (Lee & Anderson)
• Error detection
• Presence of fault is deducted by detecting an error in some subsystem

• Implies failure of the according component
• Damage confinement
• Delimit damage caused due to the component failure
• Error processing - recovery / compensation

• System recovers from the effect of an error

• Fault treatment 

• Ensure that fault does not cause again failures
13
Dependable Systems Course PT 2014
Fault Tolerance - Error Detection
• Replication check
• Output of replicated components is compared / voted

• Independent failures, physical causes -> many replicas possible (e.g. HW)

• Finds also design faults, if replicated components are from different vendors

• Timing checks (‚watchdog timers‘)

• Timing violation often implies that component output is also incorrect

• Typical solution for node failure detection in a distributed system

• Reasonableness checks - Run-time range checks, assertions

• Structural and coding checks, diagnostics checks, algorithmic checks

• Ideal: Self-checking component with clear error confinement areas
14
Dependable Systems Course PT 2014
Fault Tolerance - Error Detection
• Replication checks are powerful and expensive, examples:

• Execute identical copies on different hardware (component failures)

• Execute separate and different versions (assumes independent design faults)

• Execute same copies different times (transient faults)

• Replicate only portion of the system

• Works for both hardware and software

• Signaling aspect in the error detection task

• Typical software model are exceptions, a way for implementing forward recovery

• Combination fault detection and fault location
15 (C) IEEE
Dependable Systems Course PT 2014
Fault Tolerance - Damage Confinement (Taylor)
• System decomposition
• Every communication link might enable damage spreading

• Introduce mutual suspicion

• Hardware-based separation of software components

• OS-based separation (processes, runtime monitors, special shells)

• Law-governed architecture
• Externalize contrains on interaction by runtime rules

• Strongly-typed language
• Language guarantees the absence of unintended control flows
16
Dependable Systems Course PT 2014
Fault Tolerance - Error Processing Through Recovery
• Forward error recovery

• Error is masked to reach again a consistent state (fault compensation)

• Corrective actions need detailled knowledge (damage assessment)

• New state is typically computed in another way

• Examples with compensation: Error correcting codes, non-journaling file
system check, advanced exception handlers, voters

• Backward error recovery

• Roll back to previous consistent state (recovery point / checkpoint) 

• Very suitable for transient faults

• Computation can be re-done with same components (retry), 

with alternate components (reconfigure), or can be ignored (skip frame)
17
Dependable Systems Course PT 2014
Forward & Backward Recovery - Example
• Transaction processing systems support ACID through journaling and backups

• Journal: Audit trail of transactions and database changes

• Forward recovery: Restore a database backup and apply the transaction journal 

• Journal provides the intended transaction operation (write-ahead logging) and
the ,after-image‘ for completed transactions

• Replay of all conducted and intended transactions from safe point in history

• Backward recovery (,backout‘): Apply journal ,backwards‘ on current database

• Demands ,before-image‘ before each database modification

• Involves logic of reprocessing aborted transactions, which is time-consuming

• Typically both applied - forward recovery for reaching a stable database status,
and backward recovery of modifications from unfinished transactions
18
Dependable Systems Course PT 2014
Fault Tolerance - Fault Treatment
• Fault diagnosis - determine error cause‘s location and nature

• Fault passivation - (remove faulty component &) reconfigure system

• Error processing might already remove the fault - ,soft fault‘

• Typical example are temporary faults

• Fault tolerance manager

• Careful diagnosis with hardware support

• Damage assessment by disabling faulty components automatically

• Example: IBM mainframe architecture

• Software rejuvenation

• Gracefully terminating an application and immediately restarting it at a
clean internal state
19
Dependable Systems Course PT 2014
Fault Tolerant Mindset (Hanmer)
• What can go wrong in any given situation ?

• Mindset to be applied in all development stages

• „Every problem in computer science boils down to tradeoffs“ [Henschen]

• Fault prevention vs. fault tolerance vs. failure severity

• KISS principles, leave out „bells and whistles“

• Incremental additions of reliability - long-term products

• Defensive Programming

• Simple error handling; fix root cause, not symptoms; make data auditble; 

make code maintainable;
20
Dependable Systems Course PT 2014
„Fail-Fast“
• A common concept from system engineering, company management, ...

• „Report failure and stop immediately without further action“
• Discussed by Jim Gray in 1985 as part of his famous article

„Why do computers stop and what can be done about it ?“

• Useful when benefit from recovery is not good enough for its costs, 

or if error propagation is highly probable

• Single units of a redundant set

• Deeply interwired IT system components

• Components under heavy request load

• And ... crappy start-up companies
21
Dependable Systems Course PT 2014
Fault Tolerant Design Methodology (Hanmer)
• Assess things that can go wrong with the system (e.g. fault trees).

• Find potential risks and according system failures.

• Define strategies to mitigate the identified risks.

• Failure avoidance options, prevent faults from activation

• Create a mental model of the system design with redundancy.

• Design error detection and error processing capabilities.

• Design in the failure mitigation capabilities.

• Design human-computer interactions and modes of management.
22
Dependable Systems Course PT 2014
Dependable Design Strategies (Malek)
• Decompose the system 

• Identify fault classes, fault latency and fault impact for the components

• Identify “weak spots” and assess potential damage

• Integrate partial recovery / reintegration / restart 

• Determine qualitative and quantitative specs for fault tolerance and evaluate your
design in specific environment 

• Develop / utilize fault and error detection techniques and algorithms 

• Develop / utilize fault isolation techniques and algorithms 

• Refine fault tolerance, iterate for improvement

• Re-use proven components, but be aware of integration issues
23

More Related Content

What's hot

Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Peter Tröger
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Peter Tröger
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.Peter Tröger
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsPeter Tröger
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Peter Tröger
 
Availability tactics
Availability tacticsAvailability tactics
Availability tacticsahsan riaz
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault ToleranceAnkit Singh
 
Formal Verification
Formal VerificationFormal Verification
Formal VerificationIlia Levin
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance Systemprakashjjaya
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computingPalani murugan
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant systemarvinthsaran
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance SystemEhsan Ilahi
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systemsleo3004
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentationskadyan1
 
Reverse Engineering of Software Architecture
Reverse Engineering of Software ArchitectureReverse Engineering of Software Architecture
Reverse Engineering of Software ArchitectureDharmalingam Ganesan
 

What's hot (20)

Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissions
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Availability tactics
Availability tacticsAvailability tactics
Availability tactics
 
SRE Tools
SRE ToolsSRE Tools
SRE Tools
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault Tolerance
 
Bangalore march07
Bangalore march07Bangalore march07
Bangalore march07
 
Formal Verification
Formal VerificationFormal Verification
Formal Verification
 
Sw testing
Sw testingSw testing
Sw testing
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computing
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant system
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systems
 
Fault tolerance techniques
Fault tolerance techniquesFault tolerance techniques
Fault tolerance techniques
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentation
 
Reverse Engineering of Software Architecture
Reverse Engineering of Software ArchitectureReverse Engineering of Software Architecture
Reverse Engineering of Software Architecture
 

Similar to Dependable Systems -Dependability Means (3/16)

Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...Chromatography & Mass Spectrometry Solutions
 
Introduction to performance testing
Introduction to performance testingIntroduction to performance testing
Introduction to performance testingRichard Bishop
 
SENG202-v-and-v-modeling_121810.pptx
SENG202-v-and-v-modeling_121810.pptxSENG202-v-and-v-modeling_121810.pptx
SENG202-v-and-v-modeling_121810.pptxMinsasWorld
 
Risk based testing and random testing
Risk based testing and random testingRisk based testing and random testing
Risk based testing and random testingHimanshu
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applicationsGR8Conf
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testingBipul Roy Bpl
 
testing strategies and tactics
 testing strategies and tactics testing strategies and tactics
testing strategies and tacticsPreeti Mishra
 
Innovation day 2013 2.5 joris vanderschrick (verhaert) - embedded system de...
Innovation day 2013   2.5 joris vanderschrick (verhaert) - embedded system de...Innovation day 2013   2.5 joris vanderschrick (verhaert) - embedded system de...
Innovation day 2013 2.5 joris vanderschrick (verhaert) - embedded system de...Verhaert Masters in Innovation
 
6. System and its Life Cycle.pptx
6. System and its Life Cycle.pptx6. System and its Life Cycle.pptx
6. System and its Life Cycle.pptxAminaButt14
 
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...ShudipPal
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.Khushboo Shaukat
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality AssuranceSaqib Raza
 
Context Driven Automation Gtac 2008
Context Driven Automation Gtac 2008Context Driven Automation Gtac 2008
Context Driven Automation Gtac 2008Pete Schneider
 
Software testing-and-analysis
Software testing-and-analysisSoftware testing-and-analysis
Software testing-and-analysisWBUTTUTORIALS
 
Process and Regulated Processes Software Validation Elements
Process and Regulated Processes Software Validation ElementsProcess and Regulated Processes Software Validation Elements
Process and Regulated Processes Software Validation ElementsArta Doci
 
Data Integrity II - Chromatography data system (CDS) in Pharma
Data Integrity II - Chromatography data system (CDS) in PharmaData Integrity II - Chromatography data system (CDS) in Pharma
Data Integrity II - Chromatography data system (CDS) in PharmaSathish Vemula
 
Quality attributes in software architecture
Quality attributes in software architectureQuality attributes in software architecture
Quality attributes in software architectureGang Tao
 
Testing- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solutionTesting- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solutionMazenetsolution
 

Similar to Dependable Systems -Dependability Means (3/16) (20)

Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
 
Introduction to performance testing
Introduction to performance testingIntroduction to performance testing
Introduction to performance testing
 
SENG202-v-and-v-modeling_121810.pptx
SENG202-v-and-v-modeling_121810.pptxSENG202-v-and-v-modeling_121810.pptx
SENG202-v-and-v-modeling_121810.pptx
 
Risk based testing and random testing
Risk based testing and random testingRisk based testing and random testing
Risk based testing and random testing
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testing
 
testing strategies and tactics
 testing strategies and tactics testing strategies and tactics
testing strategies and tactics
 
Innovation day 2013 2.5 joris vanderschrick (verhaert) - embedded system de...
Innovation day 2013   2.5 joris vanderschrick (verhaert) - embedded system de...Innovation day 2013   2.5 joris vanderschrick (verhaert) - embedded system de...
Innovation day 2013 2.5 joris vanderschrick (verhaert) - embedded system de...
 
6. System and its Life Cycle.pptx
6. System and its Life Cycle.pptx6. System and its Life Cycle.pptx
6. System and its Life Cycle.pptx
 
System testing
System testingSystem testing
System testing
 
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
Software Engineering (Software Quality Assurance & Testing: Supplementary Mat...
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.
 
Software Quality Assurance
Software Quality AssuranceSoftware Quality Assurance
Software Quality Assurance
 
Context Driven Automation Gtac 2008
Context Driven Automation Gtac 2008Context Driven Automation Gtac 2008
Context Driven Automation Gtac 2008
 
Software testing-and-analysis
Software testing-and-analysisSoftware testing-and-analysis
Software testing-and-analysis
 
Process and Regulated Processes Software Validation Elements
Process and Regulated Processes Software Validation ElementsProcess and Regulated Processes Software Validation Elements
Process and Regulated Processes Software Validation Elements
 
Data Integrity II - Chromatography data system (CDS) in Pharma
Data Integrity II - Chromatography data system (CDS) in PharmaData Integrity II - Chromatography data system (CDS) in Pharma
Data Integrity II - Chromatography data system (CDS) in Pharma
 
Quality attributes in software architecture
Quality attributes in software architectureQuality attributes in software architecture
Quality attributes in software architecture
 
Testing- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solutionTesting- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solution
 
Software automation
Software automationSoftware automation
Software automation
 

More from Peter Tröger

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspectivePeter Tröger
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and VirtualizationPeter Tröger
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Peter Tröger
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded SystemsPeter Tröger
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.Peter Tröger
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Peter Tröger
 
Operating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryOperating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryPeter Tröger
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyPeter Tröger
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsPeter Tröger
 
Operating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsOperating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsPeter Tröger
 
Operating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - HistoryOperating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - HistoryPeter Tröger
 
Operating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware BasicsOperating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware BasicsPeter Tröger
 
Operating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / OutputOperating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / OutputPeter Tröger
 
Operating Systems 1 (3/12) - Architectures
Operating Systems 1 (3/12) - ArchitecturesOperating Systems 1 (3/12) - Architectures
Operating Systems 1 (3/12) - ArchitecturesPeter Tröger
 
Operating Systems 1 (6/12) - Processes
Operating Systems 1 (6/12) - ProcessesOperating Systems 1 (6/12) - Processes
Operating Systems 1 (6/12) - ProcessesPeter Tröger
 
Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)Peter Tröger
 
Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)Peter Tröger
 

More from Peter Tröger (17)

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspective
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded Systems
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0
 
Operating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryOperating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - Summary
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - Concurrency
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management Concepts
 
Operating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsOperating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - Threads
 
Operating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - HistoryOperating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - History
 
Operating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware BasicsOperating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware Basics
 
Operating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / OutputOperating Systems 1 (11/12) - Input / Output
Operating Systems 1 (11/12) - Input / Output
 
Operating Systems 1 (3/12) - Architectures
Operating Systems 1 (3/12) - ArchitecturesOperating Systems 1 (3/12) - Architectures
Operating Systems 1 (3/12) - Architectures
 
Operating Systems 1 (6/12) - Processes
Operating Systems 1 (6/12) - ProcessesOperating Systems 1 (6/12) - Processes
Operating Systems 1 (6/12) - Processes
 
Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)
 
Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)Operating Systems 1 (4/12) - Architectures (Windows)
Operating Systems 1 (4/12) - Architectures (Windows)
 

Recently uploaded

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 

Recently uploaded (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 

Dependable Systems -Dependability Means (3/16)

  • 1. Dependable Systems ! Dependability Means Dr. Peter Tröger ! Sources: ! J.C. Laprie. Dependability: Basic Concepts and Terminology Eusgeld, Irene et al.: Dependability Metrics. 4909. Springer Publishing, 2008 Echtle, Klaus: Fehlertoleranzverfahren. Heidelberg, Germany : Springer Verlag, 1990. Pfister, Gregory F.: High Availability. In: In Search of Clusters. , S. 379-452 !
  • 2. Dependable Systems Course PT 2014 Dependability • Umbrella term for operational requirements on a system • IFIP WG 10.4: "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]" • IEC IEV: "dependability (is) the collective term used to describe the availability performance and its influencing factors : reliability performance, maintainability performance and maintenance support performance" • Laprie: „ Trustworthiness of a computer system such that 
 reliance can be placed on the service it delivers to the user “ • Adds a third dimension to system quality • General question: How to deal with unexpected events ? • In German: ,Verlässlichkeit‘ vs. ,Zuverlässigkeit‘
 2
  • 3. Dependable Systems Course PT 2014 Dependability Tree (Laprie) 3
  • 4. Dependable Systems Course PT 2014 Means for Dependability • Fault prevention - Prevent fault occurrence or introduction • Fault tolerance - Provide service matching the specification under faults • Fault removal - How to reduce the presence of faults • Fault forecasting- Estimate the present number, future incidence, and the consequences of faults • Combined utilization 4
  • 5. Dependable Systems Course PT 2014 Dependability Means (Laprie) • Offline / online techniques • Fault intolerance techniques • Fault prevention - Prevent fault occurrence or introduction • Fault removal - Reduce the presence of faults • 100% fault-free servicing for the whole life time is not possible • Fault tolerance techniques • Fault forecasting - Estimate the present number, future incidence, and the consequences of faults • Fault tolerance - Provide service complying with specification in spite of faults • Problems with coverage and validation of the validator 5
  • 6. Dependable Systems Course PT 2014 Dependability Means (Echtle) 6
  • 7. Dependable Systems Course PT 2014 Fault Prevention • Specific approaches for avoiding faults • Specialized specification formalisms and techniques • Specialized development / manufacturing process to prevent design faults • Shielding • Only use ultra-reliable components • Choice of programming language • General engineering approaches • Software engineering procedures • Quality management regulations and enforcement • Training and organization of maintenance departments 7
  • 8. Dependable Systems Course PT 2014 Fault Removal • Make faults disappear before fault tolerance becomes relevant • Step 1: Verification • Check if the system adheres to verification conditions; if not, take next steps • Static verification: Static analysis, data flow analysis, compiler checks • Dynamic verification: Symbolic execution or verification testing • Step 2: Diagnosis • Find the faults that influenced the verification conditions • Step 3: Correction • Fix the problem, repeat the steps (regression) • Fault removal during operation: Corrective maintenance (curative / preventive) 8
  • 9. Dependable Systems Course PT 2014 Testing • Selecting test inputs is driven from different view points • Testing purpose: conformance testing, fault-finding testing • System model: functional testing (with functional model) or structural testing • Fault model: enables fault-based testing • Deterministic testing vs. random testing • Structural testing of hardware is fault-finding, fault-based, structural testing • Structural testing of software is fault-finding, non-fault-based, structural testing • Golden unit: Reference system for comparison of output for a given input
 (e.g. reference implementation) 9
  • 10. Dependable Systems Course PT 2014 Fault Tolerance • Fault tolerance is the ability of a system to operate correctly in presence of faults. or • A system S is called k-fault-tolerant with respect to a set of 
 algorithms {A1, A2, ... , Ap} and a set of faults {F1, F2, ... , Fp} 
 if for every k-fault F in S, Ai is executable by a subsystem of system S with k faults. (Hayes, 9/76) or • Fault tolerance is the use of redundancy (time or space) to achieve the desired level of system dependability - costs ! • Accepts that an implemented system will not be fault-free • Implements automatic recovery from errors • Is a recursive concept (voter replication, self-checking checkers, stable memory) 10
  • 11. Dependable Systems Course PT 2014 Fault Tolerance • Typical design methodology in many technical and biological systems • Spare wheel in cars, redundant organs, ... • Fault tolerance mechanisms need to be evaluated by dependability attributes • Minimum, maximum, average reliability and availability • Easy to formulate and understand, hard to prove - failure rate remains unknown • Quantitative limits based on fault model (which faults in which components) • Typically ,one-fault-at-a-time‘ assumption • Different attributes of fault tolerance implementation to be checked • Functional verification, sensitivity analysis, minimum amount of resource resp. computational overhead, implementation performance, transparency, portability 11
  • 12. Error Processing Dependable Systems Course PT 2014 Phases of Fault Tolerance (Hanmer) 12 Latent Fault Error Normal OperationFault Activation Error Recovery Error Mitigation Error Detection Fault Treatment
  • 13. Dependable Systems Course PT 2014 Decomposition of Fault Tolerance (Lee & Anderson) • Error detection • Presence of fault is deducted by detecting an error in some subsystem • Implies failure of the according component • Damage confinement • Delimit damage caused due to the component failure • Error processing - recovery / compensation • System recovers from the effect of an error • Fault treatment • Ensure that fault does not cause again failures 13
  • 14. Dependable Systems Course PT 2014 Fault Tolerance - Error Detection • Replication check • Output of replicated components is compared / voted • Independent failures, physical causes -> many replicas possible (e.g. HW) • Finds also design faults, if replicated components are from different vendors • Timing checks (‚watchdog timers‘) • Timing violation often implies that component output is also incorrect • Typical solution for node failure detection in a distributed system • Reasonableness checks - Run-time range checks, assertions • Structural and coding checks, diagnostics checks, algorithmic checks • Ideal: Self-checking component with clear error confinement areas 14
  • 15. Dependable Systems Course PT 2014 Fault Tolerance - Error Detection • Replication checks are powerful and expensive, examples: • Execute identical copies on different hardware (component failures) • Execute separate and different versions (assumes independent design faults) • Execute same copies different times (transient faults) • Replicate only portion of the system • Works for both hardware and software • Signaling aspect in the error detection task • Typical software model are exceptions, a way for implementing forward recovery • Combination fault detection and fault location 15 (C) IEEE
  • 16. Dependable Systems Course PT 2014 Fault Tolerance - Damage Confinement (Taylor) • System decomposition • Every communication link might enable damage spreading • Introduce mutual suspicion • Hardware-based separation of software components • OS-based separation (processes, runtime monitors, special shells) • Law-governed architecture • Externalize contrains on interaction by runtime rules • Strongly-typed language • Language guarantees the absence of unintended control flows 16
  • 17. Dependable Systems Course PT 2014 Fault Tolerance - Error Processing Through Recovery • Forward error recovery • Error is masked to reach again a consistent state (fault compensation) • Corrective actions need detailled knowledge (damage assessment) • New state is typically computed in another way • Examples with compensation: Error correcting codes, non-journaling file system check, advanced exception handlers, voters • Backward error recovery • Roll back to previous consistent state (recovery point / checkpoint) • Very suitable for transient faults • Computation can be re-done with same components (retry), 
 with alternate components (reconfigure), or can be ignored (skip frame) 17
  • 18. Dependable Systems Course PT 2014 Forward & Backward Recovery - Example • Transaction processing systems support ACID through journaling and backups • Journal: Audit trail of transactions and database changes • Forward recovery: Restore a database backup and apply the transaction journal • Journal provides the intended transaction operation (write-ahead logging) and the ,after-image‘ for completed transactions • Replay of all conducted and intended transactions from safe point in history • Backward recovery (,backout‘): Apply journal ,backwards‘ on current database • Demands ,before-image‘ before each database modification • Involves logic of reprocessing aborted transactions, which is time-consuming • Typically both applied - forward recovery for reaching a stable database status, and backward recovery of modifications from unfinished transactions 18
  • 19. Dependable Systems Course PT 2014 Fault Tolerance - Fault Treatment • Fault diagnosis - determine error cause‘s location and nature • Fault passivation - (remove faulty component &) reconfigure system • Error processing might already remove the fault - ,soft fault‘ • Typical example are temporary faults • Fault tolerance manager • Careful diagnosis with hardware support • Damage assessment by disabling faulty components automatically • Example: IBM mainframe architecture • Software rejuvenation • Gracefully terminating an application and immediately restarting it at a clean internal state 19
  • 20. Dependable Systems Course PT 2014 Fault Tolerant Mindset (Hanmer) • What can go wrong in any given situation ? • Mindset to be applied in all development stages • „Every problem in computer science boils down to tradeoffs“ [Henschen] • Fault prevention vs. fault tolerance vs. failure severity • KISS principles, leave out „bells and whistles“ • Incremental additions of reliability - long-term products • Defensive Programming • Simple error handling; fix root cause, not symptoms; make data auditble; 
 make code maintainable; 20
  • 21. Dependable Systems Course PT 2014 „Fail-Fast“ • A common concept from system engineering, company management, ... • „Report failure and stop immediately without further action“ • Discussed by Jim Gray in 1985 as part of his famous article
 „Why do computers stop and what can be done about it ?“ • Useful when benefit from recovery is not good enough for its costs, 
 or if error propagation is highly probable • Single units of a redundant set • Deeply interwired IT system components • Components under heavy request load • And ... crappy start-up companies 21
  • 22. Dependable Systems Course PT 2014 Fault Tolerant Design Methodology (Hanmer) • Assess things that can go wrong with the system (e.g. fault trees). • Find potential risks and according system failures. • Define strategies to mitigate the identified risks. • Failure avoidance options, prevent faults from activation • Create a mental model of the system design with redundancy. • Design error detection and error processing capabilities. • Design in the failure mitigation capabilities. • Design human-computer interactions and modes of management. 22
  • 23. Dependable Systems Course PT 2014 Dependable Design Strategies (Malek) • Decompose the system • Identify fault classes, fault latency and fault impact for the components • Identify “weak spots” and assess potential damage • Integrate partial recovery / reintegration / restart • Determine qualitative and quantitative specs for fault tolerance and evaluate your design in specific environment • Develop / utilize fault and error detection techniques and algorithms • Develop / utilize fault isolation techniques and algorithms • Refine fault tolerance, iterate for improvement • Re-use proven components, but be aware of integration issues 23