This document discusses techniques for engineering dependable software systems. It covers redundancy and diversity approaches to achieve fault tolerance. Dependable systems are achieved through fault avoidance, detection, and tolerance. Critical systems often use regulated processes and dependable architectures like protection systems, self-monitoring architectures, and N-version programming which involve redundant and diverse components to continue operating despite failures. The document gives examples of how these techniques are applied in systems like aircraft flight control to maximize availability.
This document summarizes Chapter 12 of a textbook on dependability and security specification. It discusses risk-driven specification, including identifying risks, analyzing risks, and defining requirements to reduce risks. It also covers specifying safety requirements by identifying hazards, assessing hazards, and analyzing hazards to discover root causes. The goal is to specify requirements that ensure systems function dependably and securely without failures causing harm.
The document discusses techniques for achieving dependable software systems. It covers redundancy and diversity approaches including N-version programming where multiple versions of software are developed independently. Dependable system architectures like protection systems and self-monitoring architectures that use redundancy are described. The document emphasizes that a well-defined development process is important for minimizing faults and notes validation activities should include requirements reviews, testing, and change management.
This document discusses safety engineering for systems that contain software. It covers topics like safety-critical systems, safety requirements, and safety engineering processes. Safety is defined as a system's ability to operate normally and abnormally without harm. For safety-critical systems like aircraft or medical devices, software is often used for control and monitoring, so software safety is important. Hazard identification, risk assessment, and specifying safety requirements to mitigate risks are key parts of the safety engineering process. The goal is to design systems where failures cannot cause injury, death or environmental damage.
The document discusses techniques for developing dependable software systems, including fault avoidance, fault tolerance, and fault detection. It describes dependable programming practices like structured programming and exception handling. It also covers fault tolerance mechanisms like redundancy, diversity, and architectures like N-version programming which implement multiple versions of the system and vote on the output.
This document summarizes key topics from a lecture on security engineering:
1. It discusses security engineering and management, risk assessment, and designing systems for security. Application security focuses on design while infrastructure security is a management problem.
2. It outlines guidelines for secure system design including basing decisions on security policies, avoiding single points of failure, balancing security and usability, validating all inputs, and designing for deployment and recoverability.
3. It also covers risk management, assessing threats, and designing architectures with layered protection and distributed assets to minimize the effects of attacks.
The document discusses critical systems and system dependability. It defines critical systems as systems where failure could result in significant economic losses, damage, or threats to human life. It describes four dimensions of dependability for critical systems: availability, reliability, safety, and security. It emphasizes that critical systems require trusted development methods to achieve high dependability.
The document discusses specifications for dependability and security. It covers topics like risk-driven specification, safety specification, and security specification. It emphasizes that critical systems specification should be risk-driven as risks pose a threat to the system. The risk-driven approach aims to understand risks faced by the system and define requirements to reduce these risks through phased risk analysis including preliminary, life cycle, and operational risk analysis. Safety specification identifies protection requirements to ensure system failures do not cause harm, with risk identification, analysis, and reduction mirroring hazard identification, assessment, and analysis. An example of a safety-critical insulin pump system is provided to illustrate dependability requirements and risk analysis.
This document summarizes Chapter 12 of a textbook on dependability and security specification. It discusses risk-driven specification, including identifying risks, analyzing risks, and defining requirements to reduce risks. It also covers specifying safety requirements by identifying hazards, assessing hazards, and analyzing hazards to discover root causes. The goal is to specify requirements that ensure systems function dependably and securely without failures causing harm.
The document discusses techniques for achieving dependable software systems. It covers redundancy and diversity approaches including N-version programming where multiple versions of software are developed independently. Dependable system architectures like protection systems and self-monitoring architectures that use redundancy are described. The document emphasizes that a well-defined development process is important for minimizing faults and notes validation activities should include requirements reviews, testing, and change management.
This document discusses safety engineering for systems that contain software. It covers topics like safety-critical systems, safety requirements, and safety engineering processes. Safety is defined as a system's ability to operate normally and abnormally without harm. For safety-critical systems like aircraft or medical devices, software is often used for control and monitoring, so software safety is important. Hazard identification, risk assessment, and specifying safety requirements to mitigate risks are key parts of the safety engineering process. The goal is to design systems where failures cannot cause injury, death or environmental damage.
The document discusses techniques for developing dependable software systems, including fault avoidance, fault tolerance, and fault detection. It describes dependable programming practices like structured programming and exception handling. It also covers fault tolerance mechanisms like redundancy, diversity, and architectures like N-version programming which implement multiple versions of the system and vote on the output.
This document summarizes key topics from a lecture on security engineering:
1. It discusses security engineering and management, risk assessment, and designing systems for security. Application security focuses on design while infrastructure security is a management problem.
2. It outlines guidelines for secure system design including basing decisions on security policies, avoiding single points of failure, balancing security and usability, validating all inputs, and designing for deployment and recoverability.
3. It also covers risk management, assessing threats, and designing architectures with layered protection and distributed assets to minimize the effects of attacks.
The document discusses critical systems and system dependability. It defines critical systems as systems where failure could result in significant economic losses, damage, or threats to human life. It describes four dimensions of dependability for critical systems: availability, reliability, safety, and security. It emphasizes that critical systems require trusted development methods to achieve high dependability.
The document discusses specifications for dependability and security. It covers topics like risk-driven specification, safety specification, and security specification. It emphasizes that critical systems specification should be risk-driven as risks pose a threat to the system. The risk-driven approach aims to understand risks faced by the system and define requirements to reduce these risks through phased risk analysis including preliminary, life cycle, and operational risk analysis. Safety specification identifies protection requirements to ensure system failures do not cause harm, with risk identification, analysis, and reduction mirroring hazard identification, assessment, and analysis. An example of a safety-critical insulin pump system is provided to illustrate dependability requirements and risk analysis.
The document discusses faults, errors, and failures in systems. A fault is a defect, an error is unexpected behavior, and a failure occurs when specifications are not met. Fault tolerance allows a system to continue operating despite errors. Fault tolerant systems are gracefully degradable and aim to ensure small failure probabilities. Faults can be hardware or software issues. Various failure types and objectives of fault tolerance like availability and reliability are also described.
This document discusses using a virtual machine monitor (VMM) to provide resilient execution of critical applications even when the operating system is corrupted. The VMM monitors application execution and memory to detect corruption and uses error recovery techniques like Luby Transform and Reed-Solomon codes to repair corruption and allow continued application execution. Experimental results show the VMM approach imposes reasonable memory and performance overhead. The VMM protects applications from memory corruption attacks and denial of service attacks by repairing corruption and preventing termination.
Here is an example operations list for a medical enteral pump system:
1. Power on pump
2. Navigate main menu
1. Set patient details
2. Set feeding program
1. Select feeding mode (continuous, intermittent)
2. Set feeding rate
3. Set feeding duration
3. Start/stop feeding
4. View feeding history
5. Adjust alarm settings
3. Acknowledge/silence alarms
4. Power off pump
This list was developed by walking through the menu structure and identifying the key operations a user could perform with the pump system. The numbering indicates sub-operations under main operations.
The document discusses critical systems and their dependability requirements. It defines critical systems as those where failure could result in loss of life, environmental damage, or large economic losses. Dependability encompasses availability, reliability, safety, and security. The document uses the example of an insulin pump, a safety-critical system, to illustrate dependability dimensions and how failures could threaten human life. Formal development methods may be required for critical systems due to the high costs of failure.
This document discusses fault-tolerant design techniques for real-time systems. It covers fault masking, reconfiguration, redundancy in hardware, software, and information. Specific techniques discussed include majority voting, error correcting memories, reconfiguration through fault detection, location, containment, and recovery. Hardware redundancy approaches like triple modular redundancy are examined, as are software redundancy techniques like N-version programming and recovery blocks.
The document discusses different types of software testing:
- Development testing includes unit, component, and system testing to discover bugs during development. Unit testing involves testing individual program units in isolation.
- Release testing is done by a separate team to test a complete version before public release.
- User testing involves potential users testing the system in their own environment.
The goals of testing are to demonstrate that software meets requirements and to discover incorrect or undesirable behavior to find defects. Different testing types include validation testing to check correct functionality and defect testing to uncover bugs. Both inspections and testing are important and complementary methods in software verification.
Fault-tolerant computer systems are designed to continue operating properly even when some components fail. They achieve this through techniques like redundancy, where backup components take over if primary components fail. The document discusses the goals of fault tolerance like ensuring no single point of failure. It provides examples of how fault tolerance is implemented in areas like data storage and outlines techniques used to design and evaluate fault-tolerant systems.
This document presents several fault-tolerant scheduling schemes and dynamic voltage scaling techniques for real-time embedded systems. It discusses:
1) Methods for fault tolerance including checkpointing, rollback recovery, and determining the optimal number of faults to tolerate.
2) Algorithms for offline application-level and task-level voltage scaling to minimize energy consumption while maintaining schedulability.
3) A technique for online reevaluation of voltage scaling policies using runtime slacks to further reduce energy.
4) Evaluation of the approaches using simulations on different processor architectures showing significant energy savings.
This document provides an overview of key topics from Chapter 11 on security and dependability, including:
- The principal dependability properties of availability, reliability, safety, and security.
- Dependability covers attributes like maintainability, repairability, survivability, and error tolerance.
- Dependability is important because system failures can have widespread effects and undependable systems may be rejected.
- Dependability is achieved through techniques like fault avoidance, detection and removal, and building in fault tolerance.
This document summarizes key concepts from Chapter 15 on resilience engineering. It discusses resilience as the ability of systems to maintain critical services during disruptions like failures or cyberattacks. Resilience involves recognizing issues, resisting failures when possible, and recovering quickly through activities like redundancy. The document also covers sociotechnical resilience, where human and organizational factors are considered, and characteristics of resilient organizations like responsiveness, monitoring, anticipation, and learning.
The document discusses evaluating the impact of soft errors on embedded systems through fault injection. It provides an overview of soft errors as a prominent problem in embedded systems and integrated circuits due to technology scaling. The report describes assembling a generic motor control system and experimenting with fault injection to better understand how these soft errors are covered. It aims to inject faults at the module level of an embedded system to simulate the effects of soft errors and observe the device behavior to map vulnerable nodes that need protection.
The document discusses dependability in systems. It covers topics like dependability properties, sociotechnical systems, redundancy and diversity, and dependable processes. Dependability reflects how trustworthy a system is and includes attributes like reliability, availability, and security. Dependability is important because system failures can have widespread impacts. Both hardware and software failures and human errors can cause systems to fail. Techniques like redundancy, diversity, and formal methods can help improve dependability. Regulation is also discussed as many critical systems require approval from regulators.
This document provides an overview of topics in chapter 13 on security engineering. It discusses security and dependability, security dimensions of confidentiality, integrity and availability. It also outlines different security levels including infrastructure, application and operational security. Key aspects of security engineering are discussed such as secure system design, security testing and assurance. Security terminology and examples are provided. The relationship between security and dependability factors like reliability, availability, safety and resilience is examined. The document also covers security in organizations and the role of security policies.
The document discusses software evolution and maintenance. It covers evolutionary software development, the staged model of software lifespan, the phased model of software change, research and teaching approaches, and software maintenance. Key topics include iterative development, concept location, impact analysis, reasoning about evolution, and the end of software evolution. The purpose is to provide an overview of these topics and discuss current research and future directions.
Software evolution involves making ongoing changes to software systems to address new requirements, fix errors, and improve performance. There are several approaches to managing software evolution, including maintenance, reengineering, refactoring, and legacy system management. Key considerations for legacy systems include assessing their business value and quality to determine whether they should be replaced, transformed, or maintained.
This document discusses principles of software safety for clinical information systems and electronic medical records (EMRs). It provides background on software safety incidents in other industries. Key concepts discussed include adjusting the software development methodology based on risk level, and that no software is completely safe. The document advocates analyzing EMR software to understand how defects could contribute to patient safety risk scenarios from minor to catastrophic. It suggests increased rigor for software that controls computerized protocols, clinical data posting and updating, and overall EMR performance and availability.
The document provides an introduction to requirements engineering and system requirements. It discusses the importance of requirements engineering in the broader systems engineering process. Requirements engineering involves developing requirements documents that define what a system must do and its constraints. Key challenges include ensuring requirements accurately reflect customer needs and avoiding inconsistencies or misunderstandings.
This document discusses techniques for engineering dependable software systems. It covers topics like redundancy and diversity, dependable processes, and dependable system architectures. Redundancy and diversity are fundamental approaches to achieving fault tolerance. Dependable processes use well-defined and repeatable development practices to minimize faults. Dependable system architectures, like protection systems and self-monitoring architectures, are designed to tolerate faults through techniques such as voting across redundant or diverse versions of the system.
This chapter discusses dependable systems and covers several topics related to ensuring system dependability. It defines key dependability properties like availability, reliability, safety and security. It discusses causes of system failures and the importance of dependability. It also covers approaches to improving dependability like redundancy, diversity, and formal methods. Dependable processes with activities like requirements reviews, testing and inspections are also discussed.
Software fault tolerance technologies can increase application availability and data consistency. The paper discusses Watchdog, libft, and nDFS - reusable software components that provide up to the third level of software fault tolerance. Watchdog monitors processes for crashes, libft checkpoints critical data, and nDFS replicates persistent data. These components have been used to enhance fault tolerance in AT&T telecom products with performance overhead from 0.1-14% depending on amount of data checkpointed and replicated. The components provide efficient and economical means to increase software fault tolerance.
What features should your Automation Change Management software have?MDT Software
You already know your automation programs need to be backed-up and the different versions of the program need to be stored and managed, so you always have a copy for quick disaster recovery. To do this, you need automation change management software (CMS). This SlideShare will explain the main features needed for a CMS to be effective.
The document discusses reliability engineering and fault tolerance. It covers topics like availability, reliability requirements, fault-tolerant architectures, and reliability measurement. It defines key terms like faults, errors, and failures. It also describes techniques for achieving reliability like fault avoidance, fault detection, and fault tolerance. Specific architectures discussed include redundant systems and protection systems that can take emergency action if failures occur.
The document discusses faults, errors, and failures in systems. A fault is a defect, an error is unexpected behavior, and a failure occurs when specifications are not met. Fault tolerance allows a system to continue operating despite errors. Fault tolerant systems are gracefully degradable and aim to ensure small failure probabilities. Faults can be hardware or software issues. Various failure types and objectives of fault tolerance like availability and reliability are also described.
This document discusses using a virtual machine monitor (VMM) to provide resilient execution of critical applications even when the operating system is corrupted. The VMM monitors application execution and memory to detect corruption and uses error recovery techniques like Luby Transform and Reed-Solomon codes to repair corruption and allow continued application execution. Experimental results show the VMM approach imposes reasonable memory and performance overhead. The VMM protects applications from memory corruption attacks and denial of service attacks by repairing corruption and preventing termination.
Here is an example operations list for a medical enteral pump system:
1. Power on pump
2. Navigate main menu
1. Set patient details
2. Set feeding program
1. Select feeding mode (continuous, intermittent)
2. Set feeding rate
3. Set feeding duration
3. Start/stop feeding
4. View feeding history
5. Adjust alarm settings
3. Acknowledge/silence alarms
4. Power off pump
This list was developed by walking through the menu structure and identifying the key operations a user could perform with the pump system. The numbering indicates sub-operations under main operations.
The document discusses critical systems and their dependability requirements. It defines critical systems as those where failure could result in loss of life, environmental damage, or large economic losses. Dependability encompasses availability, reliability, safety, and security. The document uses the example of an insulin pump, a safety-critical system, to illustrate dependability dimensions and how failures could threaten human life. Formal development methods may be required for critical systems due to the high costs of failure.
This document discusses fault-tolerant design techniques for real-time systems. It covers fault masking, reconfiguration, redundancy in hardware, software, and information. Specific techniques discussed include majority voting, error correcting memories, reconfiguration through fault detection, location, containment, and recovery. Hardware redundancy approaches like triple modular redundancy are examined, as are software redundancy techniques like N-version programming and recovery blocks.
The document discusses different types of software testing:
- Development testing includes unit, component, and system testing to discover bugs during development. Unit testing involves testing individual program units in isolation.
- Release testing is done by a separate team to test a complete version before public release.
- User testing involves potential users testing the system in their own environment.
The goals of testing are to demonstrate that software meets requirements and to discover incorrect or undesirable behavior to find defects. Different testing types include validation testing to check correct functionality and defect testing to uncover bugs. Both inspections and testing are important and complementary methods in software verification.
Fault-tolerant computer systems are designed to continue operating properly even when some components fail. They achieve this through techniques like redundancy, where backup components take over if primary components fail. The document discusses the goals of fault tolerance like ensuring no single point of failure. It provides examples of how fault tolerance is implemented in areas like data storage and outlines techniques used to design and evaluate fault-tolerant systems.
This document presents several fault-tolerant scheduling schemes and dynamic voltage scaling techniques for real-time embedded systems. It discusses:
1) Methods for fault tolerance including checkpointing, rollback recovery, and determining the optimal number of faults to tolerate.
2) Algorithms for offline application-level and task-level voltage scaling to minimize energy consumption while maintaining schedulability.
3) A technique for online reevaluation of voltage scaling policies using runtime slacks to further reduce energy.
4) Evaluation of the approaches using simulations on different processor architectures showing significant energy savings.
This document provides an overview of key topics from Chapter 11 on security and dependability, including:
- The principal dependability properties of availability, reliability, safety, and security.
- Dependability covers attributes like maintainability, repairability, survivability, and error tolerance.
- Dependability is important because system failures can have widespread effects and undependable systems may be rejected.
- Dependability is achieved through techniques like fault avoidance, detection and removal, and building in fault tolerance.
This document summarizes key concepts from Chapter 15 on resilience engineering. It discusses resilience as the ability of systems to maintain critical services during disruptions like failures or cyberattacks. Resilience involves recognizing issues, resisting failures when possible, and recovering quickly through activities like redundancy. The document also covers sociotechnical resilience, where human and organizational factors are considered, and characteristics of resilient organizations like responsiveness, monitoring, anticipation, and learning.
The document discusses evaluating the impact of soft errors on embedded systems through fault injection. It provides an overview of soft errors as a prominent problem in embedded systems and integrated circuits due to technology scaling. The report describes assembling a generic motor control system and experimenting with fault injection to better understand how these soft errors are covered. It aims to inject faults at the module level of an embedded system to simulate the effects of soft errors and observe the device behavior to map vulnerable nodes that need protection.
The document discusses dependability in systems. It covers topics like dependability properties, sociotechnical systems, redundancy and diversity, and dependable processes. Dependability reflects how trustworthy a system is and includes attributes like reliability, availability, and security. Dependability is important because system failures can have widespread impacts. Both hardware and software failures and human errors can cause systems to fail. Techniques like redundancy, diversity, and formal methods can help improve dependability. Regulation is also discussed as many critical systems require approval from regulators.
This document provides an overview of topics in chapter 13 on security engineering. It discusses security and dependability, security dimensions of confidentiality, integrity and availability. It also outlines different security levels including infrastructure, application and operational security. Key aspects of security engineering are discussed such as secure system design, security testing and assurance. Security terminology and examples are provided. The relationship between security and dependability factors like reliability, availability, safety and resilience is examined. The document also covers security in organizations and the role of security policies.
The document discusses software evolution and maintenance. It covers evolutionary software development, the staged model of software lifespan, the phased model of software change, research and teaching approaches, and software maintenance. Key topics include iterative development, concept location, impact analysis, reasoning about evolution, and the end of software evolution. The purpose is to provide an overview of these topics and discuss current research and future directions.
Software evolution involves making ongoing changes to software systems to address new requirements, fix errors, and improve performance. There are several approaches to managing software evolution, including maintenance, reengineering, refactoring, and legacy system management. Key considerations for legacy systems include assessing their business value and quality to determine whether they should be replaced, transformed, or maintained.
This document discusses principles of software safety for clinical information systems and electronic medical records (EMRs). It provides background on software safety incidents in other industries. Key concepts discussed include adjusting the software development methodology based on risk level, and that no software is completely safe. The document advocates analyzing EMR software to understand how defects could contribute to patient safety risk scenarios from minor to catastrophic. It suggests increased rigor for software that controls computerized protocols, clinical data posting and updating, and overall EMR performance and availability.
The document provides an introduction to requirements engineering and system requirements. It discusses the importance of requirements engineering in the broader systems engineering process. Requirements engineering involves developing requirements documents that define what a system must do and its constraints. Key challenges include ensuring requirements accurately reflect customer needs and avoiding inconsistencies or misunderstandings.
This document discusses techniques for engineering dependable software systems. It covers topics like redundancy and diversity, dependable processes, and dependable system architectures. Redundancy and diversity are fundamental approaches to achieving fault tolerance. Dependable processes use well-defined and repeatable development practices to minimize faults. Dependable system architectures, like protection systems and self-monitoring architectures, are designed to tolerate faults through techniques such as voting across redundant or diverse versions of the system.
This chapter discusses dependable systems and covers several topics related to ensuring system dependability. It defines key dependability properties like availability, reliability, safety and security. It discusses causes of system failures and the importance of dependability. It also covers approaches to improving dependability like redundancy, diversity, and formal methods. Dependable processes with activities like requirements reviews, testing and inspections are also discussed.
Software fault tolerance technologies can increase application availability and data consistency. The paper discusses Watchdog, libft, and nDFS - reusable software components that provide up to the third level of software fault tolerance. Watchdog monitors processes for crashes, libft checkpoints critical data, and nDFS replicates persistent data. These components have been used to enhance fault tolerance in AT&T telecom products with performance overhead from 0.1-14% depending on amount of data checkpointed and replicated. The components provide efficient and economical means to increase software fault tolerance.
What features should your Automation Change Management software have?MDT Software
You already know your automation programs need to be backed-up and the different versions of the program need to be stored and managed, so you always have a copy for quick disaster recovery. To do this, you need automation change management software (CMS). This SlideShare will explain the main features needed for a CMS to be effective.
The document discusses reliability engineering and fault tolerance. It covers topics like availability, reliability requirements, fault-tolerant architectures, and reliability measurement. It defines key terms like faults, errors, and failures. It also describes techniques for achieving reliability like fault avoidance, fault detection, and fault tolerance. Specific architectures discussed include redundant systems and protection systems that can take emergency action if failures occur.
This document provides an overview of reliability engineering topics including software reliability, fault tolerance, and reliability requirements. It discusses key concepts such as availability, reliability, faults, errors and failures. It also describes different fault-tolerant system architectures and reliability metrics including probability of failure on demand, rate of occurrence of failures, and availability. Functional reliability requirements and examples are also presented relating to checking requirements, recovery requirements, redundancy requirements and development process requirements.
An Investigation of Fault Tolerance Techniques in Cloud Computingijtsrd
Cloud computing which is created on Internet has the most powerful architecture of computation that provides users with the capabilities of information technology as a service and allows them to have access to these services without having specialized information or controlling the infrastructure. Fault tolerance has. The main advantages of using fault tolerance that has all the necessary techniques to keep active power and reliability in cloud computing include failure recovery, lower costs, and improved performance criteria. In this paper, we will investigation of the different techniques that are used for fault tolerance on cloud computing. Ya Min | Khin Myat Nwe Win | Aye Mya Sandar "An Investigation of Fault Tolerance Techniques in Cloud Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26611.pdfPaper URL: https://www.ijtsrd.com/computer-science/distributed-computing/26611/an-investigation-of-fault-tolerance-techniques-in-cloud-computing/ya-min
The document discusses various techniques for designing fault-tolerant systems, including having a fault-tolerant mindset, performing design tradeoffs that balance reliability and availability, keeping designs simple, and incrementally adding reliability over time. It also covers defensive programming techniques, data structure design, coding standards, redundancy approaches, static analysis tools, and fault insertion testing. The document proposes a six-step fault-tolerant design methodology involving assessing failures, defining risk mitigation strategies, creating system models, making architectural decisions, designing error handling capabilities, and considering human interfaces.
System reliability is an important issue in designing modern multiprocessor systems. This paper proposes a fault-tolerant, scalable, multiprocessor system architecture that adopts a pipeline scheme. To verify the performance of the proposed system, the SimEvent/Stateflow tool of the MATLAB program was used to simulate the system. The proposed system uses twelve processors (P), connected in a linear array, to build a ten-stage system with two backup processors (BP). However, the system can be expanded by adding more processors to increase pipeline stages and performance, and more backup processors to increase system reliability. The system can automatically reorganize itself in the event of a failure of one or two processors and execution continues without interruption. Each processor communicates with its neighboring processors through input/output (I/O) ports which are used as bypass links between the processors. In the event of a processor failure, the function of the faulty processor is assigned to the next processor that is free from faults. The fast Fourier transform (FFT) algorithm is implemented on the simulated circuit to evaluate the performance of the proposed system. The results showed that the system can continue to execute even if one or two processors fail without a noticeable decrease in performance.
The document discusses software fault tolerance techniques. It begins by explaining why fault tolerant software is needed, particularly for safety critical systems. It then covers single version techniques like checkpointing and process pairs. Next it discusses multi-version techniques using multiple variants of software. It also covers software fault injection testing. Finally it provides examples of fault tolerant systems used in aircraft like the Airbus and Boeing.
This document discusses aspects of secure systems programming and dependable programming guidelines. It outlines eight guidelines for writing dependable code, including limiting information visibility, checking all inputs, handling exceptions, minimizing error-prone constructs, providing restart capabilities, checking array bounds, including timeouts for external calls, and naming constants. Following these guidelines can help improve program reliability and security by reducing vulnerabilities.
Fault tolerance is an important issue in the field of cloud computing which is concerned with the techniques or mechanism needed to enable a system to tolerate the faults that may encounter during its functioning. Fault tolerance policy can be categorized into three categories viz. proactive, reactive and adaptive. Providing a systematic solution the loss can be minimized and guarantee the availability and reliability of the critical services. The purpose and scope of this study is to recommend Support Vector Machine, a supervised machine learning algorithm to proactively monitor the fault so as to increase the availability and reliability by combining the strength of machine learning algorithm with cloud computing.
Cloud computing has gained popularity over the years, some organizations are using some form of cloud
computing to enhance their business operations while reducing infrastructure costs and gaining more
agility by deploying applications and making changes to applications easily. Cloud computing systems just
like any other computer system are prone to failure, these failures are due to the distributed and complex
nature of the cloud computing platforms.
Cloud computing systems need to be built for failure to ensure that they continue operating even if the
cloud system has an error. The errors should be masked from the cloud users to ensure that users continue
accessing the cloud services and this intern leads to cloud consumers gaining confidence in the availability
and reliability of cloud services.
In this paper, we propose the use of N-Modular redundancy to design and implement failure-free clouds
Cloud computing has gained popularity over the years, some organizations are using some form of cloud
computing to enhance their business operations while reducing infrastructure costs and gaining more
agility by deploying applications and making changes to applications easily. Cloud computing systems just
like any other computer system are prone to failure, these failures are due to the distributed and complex
nature of the cloud computing platforms.
The document provides an overview of software engineering, discussing what it is, why it is important, and key concepts like the software development lifecycle, processes, and models. It introduces software engineering as a way to build software in a controlled, predictable manner by giving control over functionality, quality, and resources. It also summarizes several software development process models like waterfall, evolutionary development, and spiral.
The document discusses software quality factors and McCall's quality factors model. McCall's model classifies software quality requirements into 11 factors across 3 categories: product operation, product revision, and product transition. Product operation factors include correctness, reliability, efficiency, integrity, and usability. Product revision factors are maintainability, flexibility, and testability. Product transition factors are portability, reusability, and interoperability. The document provides examples to illustrate each quality factor. Alternative models from other authors that expand on McCall's 11 factors are also mentioned.
Distributed Software Engineering with Client-Server ComputingHaseeb Rehman
The document discusses distributed software engineering, which involves developing software systems across multiple independent computers. The key benefits are faster development, fewer faults, and less burden on individual machines. Distributed systems allow resources and software to be shared while developing components concurrently. Some challenges are ensuring transparency for users, authentication, fault tolerance, network reliability, synchronization, security, and quality of service. Client-server models separate programs into client and server components that communicate through defined interfaces.
IRJET- An Efficient Hardware-Oriented Runtime Approach for Stack-Based Softwa...IRJET Journal
This document discusses an efficient hardware-oriented runtime approach for detecting stack-based buffer overflow attacks during program execution. The approach automatically archives and compares the original and modified information of static variables in the program to detect any changes from the compiler-generated object code. This is done transparently to programmers without requiring any source code modifications. By leveraging the hardware of the CPU pipeline, the approach can identify buffer overflows during runtime to prevent security vulnerabilities from being exploited. The approach aims to provide protections against runtime attacks while having low performance and memory overhead.
This document discusses software evolution and maintenance. It covers topics like change processes, program evolution dynamics, software maintenance, and legacy system management. It notes that software change is inevitable due to new requirements, business changes, errors that need fixing, and other factors. Most software budgets are spent on evolving existing systems rather than new development. Lehman's laws describe insights into software evolution, such as the notion that change is continuous. Software maintenance involves modifying operational software to fix bugs, adapt to new environments, or add new functionality. Maintenance costs are typically higher than development costs and increase over time as software degrades.
Dependable Software Development in Software Engineering SE18koolkampus
The document discusses techniques for developing dependable software systems, including fault avoidance, fault tolerance, and fault removal. It covers structured programming, error-prone constructs, information hiding, exceptions, fault detection, damage assessment, recovery approaches, and fault-tolerant architectures using hardware and software redundancy. Key aspects are avoiding errors during development, detecting and handling faults at runtime, and designing systems that can continue operating despite failures.
This document provides an overview of the process improvement process, including process measurement, analysis, change, and the CMMI framework. It discusses measuring current processes to establish a baseline and analyzing processes to identify bottlenecks. Process changes are then introduced based on the analysis. The goal is to understand existing processes, relate them to models and standards, and identify constraints in order to improve quality, reduce costs and time. Process and product quality are closely linked, so improving processes enhances products.
The document summarizes key aspects of configuration management discussed in Chapter 25, including change management, version management, system building, and release management. Version management involves tracking different versions of software components to prevent interference between changes made by different developers. System building is the process of compiling and linking components to create an executable system. Release management prepares software for external distribution and tracks released system versions.
This document provides an overview of quality management in software engineering. It discusses software quality, standards, reviews, and measurements. Specifically, it covers three key areas:
1) Software quality management is concerned with ensuring software meets required quality levels through organizational processes and standards, applying quality processes at the project level, and establishing quality plans.
2) Quality management activities include independent checks on the development process and deliverables to ensure consistency with standards and goals.
3) Reviews and inspections allow groups to examine software and documentation to identify potential problems and approve progress between development stages.
This document provides an overview of project planning and estimation techniques used in software development. It discusses plan-driven and agile planning approaches. For plan-driven projects, it describes creating a detailed project plan with work breakdown, scheduling, milestones and resource allocation. It also discusses estimation techniques like experience-based estimating and algorithmic models like COCOMO. COCOMO models like application composition, early design, reuse and post-architecture are explained for estimating effort at different stages. Factors affecting accuracy and uncertainty in estimates are also covered.
This document discusses project management and risk management. It covers topics such as managing people, teamwork, risk identification, analysis, planning, monitoring, and strategies to manage common project risks like staff turnover, requirements changes, and underestimating timelines. The key aspects of software project management are planning, reporting, risk assessment, and people management to deliver software on schedule, within budget, and that meets customer expectations.
The document discusses aspect-oriented software development and key concepts in aspect-oriented programming such as aspects, join points, pointcuts, and advice. It covers how aspect-oriented programming supports separation of concerns by encapsulating cross-cutting concerns in aspects. Aspects define where advice code should be inserted via pointcuts specified at join points in the base code. This approach addresses issues like tangling and scattering that arise when implementing cross-cutting concerns in traditional object-oriented design.
This document provides an overview of embedded systems and real-time systems. It discusses embedded software characteristics including responsiveness in real-time. Common architectural patterns for embedded systems like observe and react, environmental control, and process pipeline are described. The document also covers timing analysis, real-time operating systems components, and non-stop system components to ensure continuous operation.
The document discusses service-oriented architecture (SOA) and key concepts in SOA, including:
1) SOA uses independent, reusable services as components for developing distributed systems, with services executing on different computers and communicating through standard protocols.
2) The benefits of SOA include flexibility in where services are provided and enabling inter-organizational computing through simplified information exchange.
3) Key SOA standards include SOAP for message exchange, WSDL for defining service interfaces and bindings, and WS-BPEL for defining service composition workflows.
This document provides an overview of distributed software engineering. It discusses key topics like distributed systems issues, client-server computing, and architectural patterns. Design issues for distributed systems like transparency, openness, scalability, security, and failure management are also covered. The document explains models of interaction, including remote procedure calls and message passing, and how middleware supports interaction and common services in distributed systems.
This document provides an overview of component-based software engineering (CBSE). It discusses key CBSE concepts like components, component models, CBSE processes, and component composition. Components are independent, reusable software entities with well-defined interfaces. A component model defines standards for component implementation and interoperability. CBSE processes include developing components for reuse, and developing software using existing reusable components. Middleware provides platform services to allow component communication.
The document discusses software reuse and chapter 16 of an unknown textbook. It covers several topics related to software reuse including application frameworks, software product lines, and COTS product reuse. Application frameworks provide a collection of abstract and concrete classes that can be extended and customized to create specific applications. Frameworks aim to increase reuse by providing a standardized architecture and interfaces that applications can build upon rather than developing everything from scratch. They must be extended by adding new classes and methods to create a complete application or subsystem.
This document discusses techniques for validating critical systems, including static analysis, reliability testing, and security testing. Static analysis techniques like formal verification, model checking, and automated program analysis examine source code or models without executing programs to check for errors. Reliability testing involves exercising software with test data matching actual usage to measure reliability levels. Validation costs are higher for critical systems due to additional analysis needed to ensure safety.
This document provides an overview of topics covered in Chapter 14 on Security Engineering. It discusses security engineering and how it is concerned with applying security to applications, as well as security risk assessment and designing systems based on risk assessments. The document outlines the importance of security management, as well as risk management approaches like preliminary risk assessment, life cycle risk assessment, and operational risk assessment. It also discusses designing systems for security through approaches like incorporating security into architectural design, following best practices, and minimizing vulnerabilities introduced during deployment. Finally, the document discusses system survivability and delivering essential services even when under attack.
This document discusses approaches for specifying dependability and security requirements, including risk-driven, safety, and reliability specifications. It covers topics such as identifying risks and hazards, assessing their likelihood and impacts, analyzing root causes using techniques like fault trees, and defining requirements to reduce risks and prevent accidents. Safety requirements for an insulin pump example are provided. The key points are that risk analysis identifies risks that could lead to accidents, hazards are decomposed to discover their causes, and safety requirements ensure hazards do not occur or are limited if they do.
This document discusses the key aspects of system dependability, including availability, reliability, safety, and security. It notes that dependability reflects the degree to which users trust a system and defines it as covering attributes like availability, reliability, and security. It also discusses factors that influence perceptions of reliability and availability, such as usage patterns, outage length and number of users affected.
This document provides an overview of sociotechnical systems. It discusses how software systems are part of broader systems that include human, social, and organizational aspects. It describes the layers in a sociotechnical systems stack from equipment to society. Emergent properties, non-determinism, and differing views of success from stakeholders are characteristics of these complex systems. Systems engineering is the process of procuring, developing, and maintaining sociotechnical systems over their lifecycle.
The document provides an overview of software testing concepts including:
- The goals of testing are to validate that software meets requirements and to discover defects.
- There are different types of testing such as unit testing, component testing, and system testing that are done during development.
- Release and user testing are also discussed as later stages of the testing process.
- Key concepts covered include test-driven development, validation vs. defect testing, and strategies for effective unit testing including automated testing and partition testing.
This document discusses the design and implementation chapter of a lecture. It covers topics like using UML for object-oriented design, design patterns, and implementation issues. It then discusses the weather station case study used to illustrate the design process, including defining system context, use cases, architectural design, identifying object classes, design models, and interface specification.
The document discusses architectural design and key concepts:
- Architectural design determines the subsystems of a system and how they communicate. It represents the link between specification and design.
- Views and patterns are used to document architectures. Common views include logical, process, development, and physical views. Patterns like MVC and layered architectures organize systems.
- Architectural design involves decisions like the application architecture, distribution, styles, decomposition, and documentation. Views and non-functional requirements influence these decisions.
The document discusses different types of system models, including context models, interaction models, structural models, and behavioral models. It provides examples of each type of model using a case study of a mental health care patient management system (MHC-PMS). Context models show the environment and other related systems. Interaction models include use case diagrams and sequence diagrams to illustrate interactions between users and the system. Structural models, like class diagrams, depict the organization and architecture of a system through classes and their relationships.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
2. Topics covered
Redundancy and diversity
Fundamental approaches to achieve fault tolerance.
Dependable processes
How the use of dependable processes leads to dependable
systems
Dependable systems architectures
Architectural patterns for software fault tolerance
Dependable programming
Guidelines for programming to avoid errors.
Chapter 13 Dependability Engineering 2
3. Software dependability
In general, software customers expect all software to be
dependable. However, for non-critical applications, they
may be willing to accept some system failures.
Some applications (critical systems) have very high
dependability requirements and special software
engineering techniques may be used to achieve this.
Medical systems
Telecommunications and power systems
Aerospace systems
3Chapter 13 Dependability Engineering
4. Dependability achievement
Fault avoidance
The system is developed in such a way that human error is
avoided and thus system faults are minimised.
The development process is organised so that faults in the
system are detected and repaired before delivery to the
customer.
Fault detection
Verification and validation techniques are used to discover and
remove faults in a system before it is deployed.
Fault tolerance
The system is designed so that faults in the delivered software
do not result in system failure.
4Chapter 13 Dependability Engineering
5. The increasing costs of residual fault removal
5Chapter 13 Dependability Engineering
6. Regulated systems
Many critical systems are regulated systems, which
means that their use must be approved by an external
regulator before the systems go into service.
Nuclear systems
Air traffic control systems
Medical devices
A safety and dependability case has to be approved by
the regulator. Therefore, critical systems development
has to create the evidence to convince a regulator that
the system is dependable, safe and secure.
Chapter 13 Dependability Engineering 6
7. Diversity and redundancy
Redundancy
Keep more than 1 version of a critical component available so
that if one fails then a backup is available.
Diversity
Provide the same functionality in different ways so that they will
not fail in the same way.
However, adding diversity and redundancy adds
complexity and this can increase the chances of error.
Some engineers advocate simplicity and extensive V & V
is a more effective route to software dependability.
7Chapter 13 Dependability Engineering
8. Diversity and redundancy examples
Redundancy. Where availability is critical (e.g. in e-
commerce systems), companies normally keep backup
servers and switch to these automatically if failure
occurs.
Diversity. To provide resilience against external attacks,
different servers may be implemented using different
operating systems (e.g. Windows and Linux)
8Chapter 13 Dependability Engineering
9. Process diversity and redundancy
Process activities, such as validation, should not depend
on a single approach, such as testing, to validate the
system
Rather, multiple different process activities the
complement each other and allow for cross-checking
help to avoid process errors, which may lead to errors in
the software
Chapter 13 Dependability Engineering 9
10. Dependable processes
To ensure a minimal number of software faults, it is
important to have a well-defined, repeatable software
process.
A well-defined repeatable process is one that does not
depend entirely on individual skills; rather can be
enacted by different people.
Regulators use information about the process to check if
good software engineering practice has been used.
For fault detection, it is clear that the process activities
should include significant effort devoted to verification
and validation.
10Chapter 13 Dependability Engineering
11. Attributes of dependable processes
Process characteristic Description
Documentable The process should have a defined process
model that sets out the activities in the process
and the documentation that is to be produced
during these activities.
Standardized A comprehensive set of software development
standards covering software production and
documentation should be available.
Auditable The process should be understandable by people
apart from process participants, who can check
that process standards are being followed and
make suggestions for process improvement.
Diverse The process should include redundant and
diverse verification and validation activities.
Robust The process should be able to recover from
failures of individual process activities.
11Chapter 13 Dependability Engineering
12. Validation activities
Requirements reviews.
Requirements management.
Formal specification.
System modeling
Design and code inspection.
Static analysis.
Test planning and management.
Change management, discussed in Chapter 25, is also
essential.
12Chapter 13 Dependability Engineering
13. Fault tolerance
In critical situations, software systems must be
fault tolerant.
Fault tolerance is required where there are high
availability requirements or where system failure costs
are very high.
Fault tolerance means that the system can continue in
operation in spite of software failure.
Even if the system has been proved to conform to its
specification, it must also be fault tolerant as there may
be specification errors or the validation may be incorrect.
13Chapter 13 Dependability Engineering
14. Dependable system architectures
Dependable systems architectures are used in situations
where fault tolerance is essential. These architectures
are generally all based on redundancy and diversity.
Examples of situations where dependable architectures
are used:
Flight control systems, where system failure could threaten the
safety of passengers
Reactor systems where failure of a control system could lead to
a chemical or nuclear emergency
Telecommunication systems, where there is a need for 24/7
availability.
Chapter 13 Dependability Engineering 14
15. Protection systems
A specialized system that is associated with some other
control system, which can take emergency action if a
failure occurs.
System to stop a train if it passes a red light
System to shut down a reactor if temperature/pressure are too
high
Protection systems independently monitor the controlled
system and the environment.
If a problem is detected, it issues commands to take
emergency action to shut down the system and avoid a
catastrophe.
Chapter 13 Dependability Engineering 15
17. Protection system functionality
Protection systems are redundant because they include
monitoring and control capabilities that replicate those in
the control software.
Protection systems should be diverse and use different
technology from the control software.
They are simpler than the control system so more effort
can be expended in validation and dependability
assurance.
Aim is to ensure that there is a low probability of failure
on demand for the protection system.
Chapter 13 Dependability Engineering 17
18. Self-monitoring architectures
Multi-channel architectures where the system monitors
its own operations and takes action if inconsistencies are
detected.
The same computation is carried out on each channel
and the results are compared. If the results are identical
and are produced at the same time, then it is assumed
that the system is operating correctly.
If the results are different, then a failure is assumed and
a failure exception is raised.
Chapter 13 Dependability Engineering 18
20. Self-monitoring systems
Hardware in each channel has to be diverse so that
common mode hardware failure will not lead to each
channel producing the same results.
Software in each channel must also be diverse,
otherwise the same software error would affect each
channel.
If high-availability is required, you may use several self-
checking systems in parallel.
This is the approach used in the Airbus family of aircraft for their
flight control systems.
Chapter 13 Dependability Engineering 20
22. Airbus architecture discussion
The Airbus FCS has 5 separate computers, any one of
which can run the control software.
Extensive use has been made of diversity
Primary systems use a different processor from the secondary
systems.
Primary and secondary systems use chipsets from different
manufacturers.
Software in secondary systems is less complex than in primary
system – provides only critical functionality.
Software in each channel is developed in different programming
languages by different teams.
Different programming languages used in primary and
secondary systems.
Chapter 13 Dependability Engineering 22
23. Key points
Dependability in a program can be achieved by avoiding the
introduction of faults, by detecting and removing faults before
system deployment, and by including fault tolerance facilities.
The use of redundancy and diversity in hardware, software
processes and software systems is essential for the development of
dependable systems.
The use of a well-defined, repeatable process is essential if faults in
a system are to be minimized.
Dependable system architectures are system architectures that are
designed for fault tolerance. Architectural styles that support fault
tolerance include protection systems, self-monitoring architectures
and N-version programming.
Chapter 13 Dependability Engineering 23
25. N-version programming
Multiple versions of a software system carry out
computations at the same time. There should be an odd
number of computers involved, typically 3.
The results are compared using a voting system and the
majority result is taken to be the correct result.
Approach derived from the notion of triple-modular
redundancy, as used in hardware systems.
Chapter 13 Dependability Engineering 25
26. Hardware fault tolerance
Depends on triple-modular redundancy (TMR).
There are three replicated identical components that
receive the same input and whose outputs are
compared.
If one output is different, it is ignored and component
failure is assumed.
Based on most faults resulting from component failures
rather than design faults and a low probability of
simultaneous component failure.
26Chapter 13 Dependability Engineering
29. N-version programming
The different system versions are designed and
implemented by different teams. It is assumed that there
is a low probability that they will make the same
mistakes. The algorithms used should but may not be
different.
There is some empirical evidence that teams commonly
misinterpret specifications in the same way and chose
the same algorithms in their systems.
29Chapter 13 Dependability Engineering
30. Software diversity
Approaches to software fault tolerance depend on
software diversity where it is assumed that different
implementations of the same software specification will
fail in different ways.
It is assumed that implementations are (a) independent
and (b) do not include common errors.
Strategies to achieve diversity
Different programming languages
Different design methods and tools
Explicit specification of different algorithms
Chapter 13 Dependability Engineering 30
31. Problems with design diversity
Teams are not culturally diverse so they tend to tackle
problems in the same way.
Characteristic errors
Different teams make the same mistakes. Some parts of an
implementation are more difficult than others so all teams tend to
make mistakes in the same place;
Specification errors;
If there is an error in the specification then this is reflected in all
implementations;
This can be addressed to some extent by using multiple
specification representations.
31Chapter 13 Dependability Engineering
32. Specification dependency
Both approaches to software redundancy are susceptible
to specification errors. If the specification is incorrect, the
system could fail
This is also a problem with hardware but software
specifications are usually more complex than hardware
specifications and harder to validate.
This has been addressed in some cases by developing
separate software specifications from the same user
specification.
32Chapter 13 Dependability Engineering
33. Improvements in practice
In principle, if diversity and independence can be
achieved, multi-version programming leads to very
significant improvements in reliability and availability.
In practice, observed improvements are much less
significant but the approach seems leads to reliability
improvements of between 5 and 9 times.
The key question is whether or not such improvements
are worth the considerable extra development costs for
multi-version programming.
Chapter 13 Dependability Engineering 33
34. Dependable programming
Good programming practices can be adopted that help
reduce the incidence of program faults.
These programming practices support
Fault avoidance
Fault detection
Fault tolerance
Chapter 13 Dependability Engineering 34
35. Good practice guidelines for dependable
programming
Dependable programming guidelines
1. Limit the visibility of information in a program
2. Check all inputs for validity
3. Provide a handler for all exceptions
4. Minimize the use of error-prone constructs
5. Provide restart capabilities
6. Check array bounds
7. Include timeouts when calling external components
8. Name all constants that represent real-world values
35Chapter 13 Dependability Engineering
36. Control the visibility of information in a program
Program components should only be allowed access to
data that they need for their implementation.
This means that accidental corruption of parts of the
program state by these components is impossible.
You can control visibility by using abstract data types
where the data representation is private and you only
allow access to the data through predefined operations
such as get () and put ().
Chapter 13 Dependability Engineering 36
37. Check all inputs for validity
All program take inputs from their environment and make
assumptions about these inputs.
However, program specifications rarely define what to do
if an input is not consistent with these assumptions.
Consequently, many programs behave unpredictably
when presented with unusual inputs and, sometimes,
these are threats to the security of the system.
Consequently, you should always check inputs before
processing against the assumptions made about these
inputs.
Chapter 13 Dependability Engineering 37
38. Validity checks
Range checks
Check that the input falls within a known range.
Size checks
Check that the input does not exceed some maximum size e.g.
40 characters for a name.
Representation checks
Check that the input does not include characters that should not
be part of its representation e.g. names do not include numerals.
Reasonableness checks
Use information about the input to check if it is reasonable rather
than an extreme value.
Chapter 13 Dependability Engineering 38
39. Provide a handler for all exceptions
A program exception is an error or some
unexpected event such as a power failure.
Exception handling constructs allow for such
events to be handled without the need for
continual status checking to detect exceptions.
Using normal control constructs to detect
exceptions needs many additional statements to be
added to the program. This adds a significant
overhead and is potentially error-prone.
39Chapter 13 Dependability Engineering
41. Exception handling
Three possible exception handling strategies
Signal to a calling component that an exception has occurred
and provide information about the type of exception.
Carry out some alternative processing to the processing where
the exception occurred. This is only possible where the
exception handler has enough information to recover from the
problem that has arisen.
Pass control to a run-time support system to handle the
exception.
Exception handling is a mechanism to provide some fault
tolerance
Chapter 13 Dependability Engineering 41
42. Minimize the use of error-prone constructs
Program faults are usually a consequence of human
error because programmers lose track of the
relationships between the different parts of the system
This is exacerbated by error-prone constructs in
programming languages that are inherently complex or
that don’t check for mistakes when they could do so.
Therefore, when programming, you should try to avoid or
at least minimize the use of these error-prone constructs.
Chapter 13 Dependability Engineering 42
43. Error-prone constructs
Unconditional branch (goto) statements
Floating-point numbers
Inherently imprecise. The imprecision may lead to invalid
comparisons.
Pointers
Pointers referring to the wrong memory areas can corrupt
data. Aliasing can make programs difficult to understand
and change.
Dynamic memory allocation
Run-time allocation can cause memory overflow.
43Chapter 13 Dependability Engineering
44. Error-prone constructs
Parallelism
Can result in subtle timing errors because of unforeseen
interaction between parallel processes.
Recursion
Errors in recursion can cause memory overflow as the
program stack fills up.
Interrupts
Interrupts can cause a critical operation to be terminated
and make a program difficult to understand.
Inheritance
Code is not localised. This can result in unexpected
behaviour when changes are made and problems of
understanding the code.
44Chapter 13 Dependability Engineering
45. Error-prone constructs
Aliasing
Using more than 1 name to refer to the same state variable.
Unbounded arrays
Buffer overflow failures can occur if no bound checking on
arrays.
Default input processing
An input action that occurs irrespective of the input.
This can cause problems if the default action is to transfer
control elsewhere in the program. In incorrect or deliberately
malicious input can then trigger a program failure.
Chapter 13 Dependability Engineering 45
46. Provide restart capabilities
For systems that involve long transactions or user
interactions, you should always provide a restart
capability that allows the system to restart after failure
without users having to redo everything that they have
done.
Restart depends on the type of system
Keep copies of forms so that users don’t have to fill them in
again if there is a problem
Save state periodically and restart from the saved state
Chapter 13 Dependability Engineering 46
47. Check array bounds
In some programming languages, such as C, it is
possible to address a memory location outside of the
range allowed for in an array declaration.
This leads to the well-known ‘bounded buffer’
vulnerability where attackers write executable code into
memory by deliberately writing beyond the top element
in an array.
If your language does not include bound checking, you
should therefore always check that an array access is
within the bounds of the array.
Chapter 13 Dependability Engineering 47
48. Include timeouts when calling external
components
In a distributed system, failure of a remote computer can
be ‘silent’ so that programs expecting a service from that
computer may never receive that service or any
indication that there has been a failure.
To avoid this, you should always include timeouts on all
calls to external components.
After a defined time period has elapsed without a
response, your system should then assume failure and
take whatever actions are required to recover from this.
Chapter 13 Dependability Engineering 48
49. Name all constants that represent real-world
values
Always give constants that reflect real-world values
(such as tax rates) names rather than using their
numeric values and always refer to them by name
You are less likely to make mistakes and type the wrong
value when you are using a name rather than a value.
This means that when these ‘constants’ change (for
sure, they are not really constant), then you only have to
make the change in one place in your program.
Chapter 13 Dependability Engineering 49
50. Key points
Software diversity is difficult to achieve because it is
practically impossible to ensure that each version of the
software is truly independent.
Dependable programming relies on the inclusion of
redundancy in a program to check the validity of inputs
and the values of program variables.
Some programming constructs and techniques, such as
goto statements, pointers, recursion, inheritance and
floating-point numbers, are inherently error-prone. You
should try to avoid these constructs when developing
dependable systems.
Chapter 13 Dependability Engineering 50