This document discusses dependable systems architectures, including protection systems, self-monitoring architectures, and N-version programming. It notes that dependable architectures use redundancy and diversity to ensure fault tolerance. Key challenges include achieving true software and design diversity, as teams may interpret specifications similarly and diverse versions could still contain common errors.
CS 5032 L6 reliability and security specification 2013Ian Sommerville
This document discusses reliability and security specification. It defines reliability metrics like probability of failure on demand, rate of occurrence of failures, mean time to failure, and availability. It describes the reliability specification process of risk identification, analysis, and decomposition to generate quantitative requirements. The document also discusses security specification, threat assessment, and defining security requirements to protect system assets. Formal methods for specification are introduced.
This document discusses dependability and security in computer systems. It defines dependability as the extent to which a system operates as expected without failure. Dependability is determined by attributes like availability, reliability, safety, and security. A system is considered dependable if it does not fail and continues delivering its expected services. The document outlines the importance of dependability and explains how attributes like availability, reliability, safety, and security are related and impact one another. It provides terminology and concepts regarding faults, failures, hazards, and risks as they relate to system dependability and security.
This document discusses key aspects of dependability engineering including: achieving dependability through fault avoidance, detection, and tolerance; using redundancy and diversity; the importance of well-defined, repeatable processes; and guidelines for dependable programming such as checking inputs, handling exceptions, avoiding error-prone constructs, and including timeouts. Critical systems often have high dependability requirements and their development must convince regulators that the system is dependable, safe, and secure.
This document discusses dependability and security specification. It covers topics like risk-driven specification, safety specification, and security specification. For risk-driven specification, it emphasizes identifying risks through preliminary, life cycle, and operational risk analysis to define requirements that reduce risks. For safety specification, it describes identifying hazards, assessing hazards, and defining safety requirements to ensure system failures do not cause harm. Examples of applying these techniques to an insulin pump are provided.
CS 5032 L1 critical socio-technical systems 2013Ian Sommerville
This document outlines the aims and topics of a course on critical systems engineering. The course aims to help students understand critical systems, which are technical systems that are profoundly affected by organizational and human factors. Key topics covered include system dependability, security engineering, and human/organizational factors. The course will examine critical infrastructure systems through topics like resilience engineering and cybersecurity. Assessment includes an exam and a coursework assignment involving requirements specification.
CS5032 L11 validation and reliability testing 2013Ian Sommerville
Critical systems require additional validation processes beyond non-critical systems due to the high costs of failure. Validation costs for critical systems are significantly higher, usually taking over 50% of development costs. Various static analysis techniques can be used for validation, including formal verification, model checking, and automated program analysis. Statistical testing with an accurate operational profile is also used to measure a critical system's reliability.
Requirements engineering involves discovering, documenting, and maintaining requirements for computer systems. It is important because getting requirements wrong can lead to projects being late, over budget, or delivering systems users do not like. Requirements engineering is difficult due to changing needs, differing stakeholder views, and politics influencing priorities.
This document summarizes key topics from a lecture on security engineering, including design guidelines for security, design for deployment, and system survivability. The design guidelines encourage basing decisions on an explicit security policy, avoiding single points of failure, and failing securely. Deployment issues like vulnerable defaults and access permissions are addressed. Finally, resilience strategies like resistance, recognition and recovery are discussed to help systems continue operating during attacks.
CS 5032 L6 reliability and security specification 2013Ian Sommerville
This document discusses reliability and security specification. It defines reliability metrics like probability of failure on demand, rate of occurrence of failures, mean time to failure, and availability. It describes the reliability specification process of risk identification, analysis, and decomposition to generate quantitative requirements. The document also discusses security specification, threat assessment, and defining security requirements to protect system assets. Formal methods for specification are introduced.
This document discusses dependability and security in computer systems. It defines dependability as the extent to which a system operates as expected without failure. Dependability is determined by attributes like availability, reliability, safety, and security. A system is considered dependable if it does not fail and continues delivering its expected services. The document outlines the importance of dependability and explains how attributes like availability, reliability, safety, and security are related and impact one another. It provides terminology and concepts regarding faults, failures, hazards, and risks as they relate to system dependability and security.
This document discusses key aspects of dependability engineering including: achieving dependability through fault avoidance, detection, and tolerance; using redundancy and diversity; the importance of well-defined, repeatable processes; and guidelines for dependable programming such as checking inputs, handling exceptions, avoiding error-prone constructs, and including timeouts. Critical systems often have high dependability requirements and their development must convince regulators that the system is dependable, safe, and secure.
This document discusses dependability and security specification. It covers topics like risk-driven specification, safety specification, and security specification. For risk-driven specification, it emphasizes identifying risks through preliminary, life cycle, and operational risk analysis to define requirements that reduce risks. For safety specification, it describes identifying hazards, assessing hazards, and defining safety requirements to ensure system failures do not cause harm. Examples of applying these techniques to an insulin pump are provided.
CS 5032 L1 critical socio-technical systems 2013Ian Sommerville
This document outlines the aims and topics of a course on critical systems engineering. The course aims to help students understand critical systems, which are technical systems that are profoundly affected by organizational and human factors. Key topics covered include system dependability, security engineering, and human/organizational factors. The course will examine critical infrastructure systems through topics like resilience engineering and cybersecurity. Assessment includes an exam and a coursework assignment involving requirements specification.
CS5032 L11 validation and reliability testing 2013Ian Sommerville
Critical systems require additional validation processes beyond non-critical systems due to the high costs of failure. Validation costs for critical systems are significantly higher, usually taking over 50% of development costs. Various static analysis techniques can be used for validation, including formal verification, model checking, and automated program analysis. Statistical testing with an accurate operational profile is also used to measure a critical system's reliability.
Requirements engineering involves discovering, documenting, and maintaining requirements for computer systems. It is important because getting requirements wrong can lead to projects being late, over budget, or delivering systems users do not like. Requirements engineering is difficult due to changing needs, differing stakeholder views, and politics influencing priorities.
This document summarizes key topics from a lecture on security engineering, including design guidelines for security, design for deployment, and system survivability. The design guidelines encourage basing decisions on an explicit security policy, avoiding single points of failure, and failing securely. Deployment issues like vulnerable defaults and access permissions are addressed. Finally, resilience strategies like resistance, recognition and recovery are discussed to help systems continue operating during attacks.
CS 5032 L12 security testing and dependability cases 2013Ian Sommerville
The document discusses security validation techniques like experience-based validation using known attacks, tiger teams that simulate attacks, and tool-based validation. It also discusses the importance of having a well-defined development process for safety-critical systems that includes identifying and tracking hazards. Safety and dependability cases collect evidence like hazard analyses, test results, and review reports to argue that a system meets its safety requirements. Structured safety arguments demonstrate that hazardous conditions cannot occur by considering all program paths and showing unsafe conditions cannot be true.
This document summarizes the topics covered in the first lecture of a security engineering course. It discusses security engineering and management, security risk assessment, and designing systems for security. The lecture covers tools and techniques for developing secure systems, assessing security risks, and designing system architectures to protect assets and distribute them for redundancy.
Reliability and security specification (CS 5032 2012)Ian Sommerville
This document discusses reliability specification and metrics. It describes how to identify types of system failure, estimate costs and consequences, and identify root causes to generate reliability specifications. Types of failures include loss of service, incorrect service, and system/data corruption. Reliability metrics are discussed such as probability of failure on demand, rate of occurrence of failures/mean time to failure, and availability. These metrics provide measurements of system reliability.
The document discusses specifications for dependability and security. It covers topics like risk-driven specification, safety specification, and security specification. It emphasizes that critical systems specification should be risk-driven as risks pose a threat to the system. The risk-driven approach aims to understand risks faced by the system and define requirements to reduce these risks through phased risk analysis including preliminary, life cycle, and operational risk analysis. Safety specification identifies protection requirements to ensure system failures do not cause harm, with risk identification, analysis, and reduction mirroring hazard identification, assessment, and analysis. An example of a safety-critical insulin pump system is provided to illustrate dependability requirements and risk analysis.
The document discusses security engineering design guidelines and system survivability. It covers:
1) Design guidelines that help make secure design decisions and raise security awareness.
2) Guidelines for avoiding single points of failure, failing securely, balancing security and usability, and more.
3) Designing for deployment to minimize vulnerabilities introduced during configuration and installation.
4) Ensuring systems can continue essential services when under attack through resilience and recoverability.
This document discusses security cases, which provide a structured argument and evidence to support the claim that a system is acceptably secure. It focuses on addressing the potential for buffer overflows in code. The argument is that coding practices, code reviews, static analysis, and system testing with invalid inputs provide evidence there are no buffer overflow possibilities in the code. Tool support is needed to manage the large amount of documentation required to build the security case.
This document provides an overview of key topics from Chapter 11 on security and dependability, including:
- The principal dependability properties of availability, reliability, safety, and security.
- Dependability covers attributes like maintainability, repairability, survivability, and error tolerance.
- Dependability is important because system failures can have widespread effects and undependable systems may be rejected.
- Dependability is achieved through techniques like fault avoidance, detection and removal, and building in fault tolerance.
The document discusses critical systems and their dependability requirements. It defines critical systems as those where failure could result in loss of life, environmental damage, or large economic losses. Dependability encompasses availability, reliability, safety, and security. The document uses the example of an insulin pump, a safety-critical system, to illustrate dependability dimensions and how failures could threaten human life. Formal development methods may be required for critical systems due to the high costs of failure.
This document provides an overview of topics in chapter 13 on security engineering. It discusses security and dependability, security dimensions of confidentiality, integrity and availability. It also outlines different security levels including infrastructure, application and operational security. Key aspects of security engineering are discussed such as secure system design, security testing and assurance. Security terminology and examples are provided. The relationship between security and dependability factors like reliability, availability, safety and resilience is examined. The document also covers security in organizations and the role of security policies.
Norman Patch and
Remediation Advanced
provides:
• Rapid, accurate and secure
patch management
• Automated collection, analysis
and delivery of patches
• Security for your organization
from worms, trojans, viruses and other malicious threats
• Single consolidated solution
for heterogeneous environments
provides effective management
at a significantly reduced TCO
This document provides an overview of topics covered in a lecture on security engineering. It discusses security engineering and management, security risk assessment, and designing systems for security. It also covers security concerns like confidentiality, integrity and availability. Application security is a software problem while infrastructure security is a systems management problem. Security risk management involves preliminary risk assessment, life cycle risk assessment and operational risk assessment using techniques like misuse cases and asset analysis. The document provides examples of threats, controls, and security requirements.
This document summarizes Chapter 12 of a textbook on dependability and security specification. It discusses risk-driven specification, including identifying risks, analyzing risks, and defining requirements to reduce risks. It also covers specifying safety requirements by identifying hazards, assessing hazards, and analyzing hazards to discover root causes. The goal is to specify requirements that ensure systems function dependably and securely without failures causing harm.
This document discusses software security engineering. It covers security concepts like assets, vulnerabilities and threats. It discusses why security engineering is important to protect systems from malicious attackers. The document outlines security risk management processes like preliminary risk assessment. It also discusses designing systems for security through architectural choices that provide protection and distributing assets. The document concludes by covering system survivability through building resistance, recognition and recovery capabilities into systems.
This document discusses safety engineering for systems that contain software. It covers topics like safety-critical systems, safety requirements, and safety engineering processes. Safety is defined as a system's ability to operate normally and abnormally without harm. For safety-critical systems like aircraft or medical devices, software is often used for control and monitoring, so software safety is important. Hazard identification, risk assessment, and specifying safety requirements to mitigate risks are key parts of the safety engineering process. The goal is to design systems where failures cannot cause injury, death or environmental damage.
This document describes a method for qualifying IT infrastructure in a way that can scale to organizations of different sizes. It defines what constitutes IT infrastructure, including servers, networks, desktops, and management applications. The method aims to minimize validation effort through a risk-based and layered approach while still meeting regulatory requirements. IT infrastructure is expected to be fault-free, continuously available, and compliant with processes and procedures like critical utilities. Regulations surrounding IT infrastructure are discussed, noting the need to demonstrate control over infrastructure through a planned qualification process and ongoing compliance procedures.
This document discusses requirements specification for critical systems. It covers dependability requirements, risk-driven specification, safety specification, security specification, system reliability specification, and non-functional reliability requirements. For risk-driven specification, it describes the stages of risk identification, analysis and classification, decomposition, and risk reduction assessment. It provides examples of applying this process to an insulin pump. For safety specification, it discusses safety requirements, the safety life cycle, and the IEC 61508 standard. For security specification, it outlines a similar process to safety with stages of asset identification, threat analysis, and security requirements specification. It also discusses different types of security requirements.
The document discusses techniques for achieving dependable software systems. It covers redundancy and diversity approaches including N-version programming where multiple versions of software are developed independently. Dependable system architectures like protection systems and self-monitoring architectures that use redundancy are described. The document emphasizes that a well-defined development process is important for minimizing faults and notes validation activities should include requirements reviews, testing, and change management.
This document summarizes key concepts from Chapter 15 on resilience engineering. It discusses resilience as the ability of systems to maintain critical services during disruptions like failures or cyberattacks. Resilience involves recognizing issues, resisting failures when possible, and recovering quickly through activities like redundancy. The document also covers sociotechnical resilience, where human and organizational factors are considered, and characteristics of resilient organizations like responsiveness, monitoring, anticipation, and learning.
This document provides a comparative survey of fault handling in field programmable gate arrays (FPGAs) and microcontrollers (MCUs) for safety-critical embedded applications. It discusses how the different hardware platforms lead to fundamental differences in design that influence aspects of safety, reliability, and fault tolerance. Specifically, it examines how the hardware architecture, software design process, verification capabilities, and device technologies of FPGAs and MCUs each impact the ability to avoid or tolerate faults in hardware and software.
The document discusses cybersecurity and why a technological approach alone is not sufficient. It argues that cybersecurity is a socio-technical problem, as technology cannot guarantee reliability and human and organizational factors like insider threats, procedures, carelessness, and social engineering present vulnerabilities. A holistic approach is needed across personal, organizational, national, and international levels that includes deterrence, awareness, realistic procedures, monitoring, and cooperation.
British Midland Flight 92 from Heathrow to Belfast crashed near Kegworth, England in 1989, killing 47 people. A fan blade broke off on the left engine, causing vibration, but the pilot mistakenly shut down the right engine instead. The left engine then failed 20 minutes later, as the pilot had not detected the initial error. The crash resulted from a combination of factors including pilot error, inadequate training, aircraft design issues, and lack of communication between the pilot and cabin crew.
CS 5032 L12 security testing and dependability cases 2013Ian Sommerville
The document discusses security validation techniques like experience-based validation using known attacks, tiger teams that simulate attacks, and tool-based validation. It also discusses the importance of having a well-defined development process for safety-critical systems that includes identifying and tracking hazards. Safety and dependability cases collect evidence like hazard analyses, test results, and review reports to argue that a system meets its safety requirements. Structured safety arguments demonstrate that hazardous conditions cannot occur by considering all program paths and showing unsafe conditions cannot be true.
This document summarizes the topics covered in the first lecture of a security engineering course. It discusses security engineering and management, security risk assessment, and designing systems for security. The lecture covers tools and techniques for developing secure systems, assessing security risks, and designing system architectures to protect assets and distribute them for redundancy.
Reliability and security specification (CS 5032 2012)Ian Sommerville
This document discusses reliability specification and metrics. It describes how to identify types of system failure, estimate costs and consequences, and identify root causes to generate reliability specifications. Types of failures include loss of service, incorrect service, and system/data corruption. Reliability metrics are discussed such as probability of failure on demand, rate of occurrence of failures/mean time to failure, and availability. These metrics provide measurements of system reliability.
The document discusses specifications for dependability and security. It covers topics like risk-driven specification, safety specification, and security specification. It emphasizes that critical systems specification should be risk-driven as risks pose a threat to the system. The risk-driven approach aims to understand risks faced by the system and define requirements to reduce these risks through phased risk analysis including preliminary, life cycle, and operational risk analysis. Safety specification identifies protection requirements to ensure system failures do not cause harm, with risk identification, analysis, and reduction mirroring hazard identification, assessment, and analysis. An example of a safety-critical insulin pump system is provided to illustrate dependability requirements and risk analysis.
The document discusses security engineering design guidelines and system survivability. It covers:
1) Design guidelines that help make secure design decisions and raise security awareness.
2) Guidelines for avoiding single points of failure, failing securely, balancing security and usability, and more.
3) Designing for deployment to minimize vulnerabilities introduced during configuration and installation.
4) Ensuring systems can continue essential services when under attack through resilience and recoverability.
This document discusses security cases, which provide a structured argument and evidence to support the claim that a system is acceptably secure. It focuses on addressing the potential for buffer overflows in code. The argument is that coding practices, code reviews, static analysis, and system testing with invalid inputs provide evidence there are no buffer overflow possibilities in the code. Tool support is needed to manage the large amount of documentation required to build the security case.
This document provides an overview of key topics from Chapter 11 on security and dependability, including:
- The principal dependability properties of availability, reliability, safety, and security.
- Dependability covers attributes like maintainability, repairability, survivability, and error tolerance.
- Dependability is important because system failures can have widespread effects and undependable systems may be rejected.
- Dependability is achieved through techniques like fault avoidance, detection and removal, and building in fault tolerance.
The document discusses critical systems and their dependability requirements. It defines critical systems as those where failure could result in loss of life, environmental damage, or large economic losses. Dependability encompasses availability, reliability, safety, and security. The document uses the example of an insulin pump, a safety-critical system, to illustrate dependability dimensions and how failures could threaten human life. Formal development methods may be required for critical systems due to the high costs of failure.
This document provides an overview of topics in chapter 13 on security engineering. It discusses security and dependability, security dimensions of confidentiality, integrity and availability. It also outlines different security levels including infrastructure, application and operational security. Key aspects of security engineering are discussed such as secure system design, security testing and assurance. Security terminology and examples are provided. The relationship between security and dependability factors like reliability, availability, safety and resilience is examined. The document also covers security in organizations and the role of security policies.
Norman Patch and
Remediation Advanced
provides:
• Rapid, accurate and secure
patch management
• Automated collection, analysis
and delivery of patches
• Security for your organization
from worms, trojans, viruses and other malicious threats
• Single consolidated solution
for heterogeneous environments
provides effective management
at a significantly reduced TCO
This document provides an overview of topics covered in a lecture on security engineering. It discusses security engineering and management, security risk assessment, and designing systems for security. It also covers security concerns like confidentiality, integrity and availability. Application security is a software problem while infrastructure security is a systems management problem. Security risk management involves preliminary risk assessment, life cycle risk assessment and operational risk assessment using techniques like misuse cases and asset analysis. The document provides examples of threats, controls, and security requirements.
This document summarizes Chapter 12 of a textbook on dependability and security specification. It discusses risk-driven specification, including identifying risks, analyzing risks, and defining requirements to reduce risks. It also covers specifying safety requirements by identifying hazards, assessing hazards, and analyzing hazards to discover root causes. The goal is to specify requirements that ensure systems function dependably and securely without failures causing harm.
This document discusses software security engineering. It covers security concepts like assets, vulnerabilities and threats. It discusses why security engineering is important to protect systems from malicious attackers. The document outlines security risk management processes like preliminary risk assessment. It also discusses designing systems for security through architectural choices that provide protection and distributing assets. The document concludes by covering system survivability through building resistance, recognition and recovery capabilities into systems.
This document discusses safety engineering for systems that contain software. It covers topics like safety-critical systems, safety requirements, and safety engineering processes. Safety is defined as a system's ability to operate normally and abnormally without harm. For safety-critical systems like aircraft or medical devices, software is often used for control and monitoring, so software safety is important. Hazard identification, risk assessment, and specifying safety requirements to mitigate risks are key parts of the safety engineering process. The goal is to design systems where failures cannot cause injury, death or environmental damage.
This document describes a method for qualifying IT infrastructure in a way that can scale to organizations of different sizes. It defines what constitutes IT infrastructure, including servers, networks, desktops, and management applications. The method aims to minimize validation effort through a risk-based and layered approach while still meeting regulatory requirements. IT infrastructure is expected to be fault-free, continuously available, and compliant with processes and procedures like critical utilities. Regulations surrounding IT infrastructure are discussed, noting the need to demonstrate control over infrastructure through a planned qualification process and ongoing compliance procedures.
This document discusses requirements specification for critical systems. It covers dependability requirements, risk-driven specification, safety specification, security specification, system reliability specification, and non-functional reliability requirements. For risk-driven specification, it describes the stages of risk identification, analysis and classification, decomposition, and risk reduction assessment. It provides examples of applying this process to an insulin pump. For safety specification, it discusses safety requirements, the safety life cycle, and the IEC 61508 standard. For security specification, it outlines a similar process to safety with stages of asset identification, threat analysis, and security requirements specification. It also discusses different types of security requirements.
The document discusses techniques for achieving dependable software systems. It covers redundancy and diversity approaches including N-version programming where multiple versions of software are developed independently. Dependable system architectures like protection systems and self-monitoring architectures that use redundancy are described. The document emphasizes that a well-defined development process is important for minimizing faults and notes validation activities should include requirements reviews, testing, and change management.
This document summarizes key concepts from Chapter 15 on resilience engineering. It discusses resilience as the ability of systems to maintain critical services during disruptions like failures or cyberattacks. Resilience involves recognizing issues, resisting failures when possible, and recovering quickly through activities like redundancy. The document also covers sociotechnical resilience, where human and organizational factors are considered, and characteristics of resilient organizations like responsiveness, monitoring, anticipation, and learning.
This document provides a comparative survey of fault handling in field programmable gate arrays (FPGAs) and microcontrollers (MCUs) for safety-critical embedded applications. It discusses how the different hardware platforms lead to fundamental differences in design that influence aspects of safety, reliability, and fault tolerance. Specifically, it examines how the hardware architecture, software design process, verification capabilities, and device technologies of FPGAs and MCUs each impact the ability to avoid or tolerate faults in hardware and software.
The document discusses cybersecurity and why a technological approach alone is not sufficient. It argues that cybersecurity is a socio-technical problem, as technology cannot guarantee reliability and human and organizational factors like insider threats, procedures, carelessness, and social engineering present vulnerabilities. A holistic approach is needed across personal, organizational, national, and international levels that includes deterrence, awareness, realistic procedures, monitoring, and cooperation.
British Midland Flight 92 from Heathrow to Belfast crashed near Kegworth, England in 1989, killing 47 people. A fan blade broke off on the left engine, causing vibration, but the pilot mistakenly shut down the right engine instead. The left engine then failed 20 minutes later, as the pilot had not detected the initial error. The crash resulted from a combination of factors including pilot error, inadequate training, aircraft design issues, and lack of communication between the pilot and cabin crew.
1) SCADA systems are used to monitor and control critical infrastructure through networks of sensors and programmable logic controllers.
2) These systems were traditionally isolated but are now increasingly connected to external networks, making them vulnerable to attacks.
3) Common vulnerabilities of SCADA systems include weak passwords, unencrypted network traffic, and lack of input validation. Improving SCADA security is challenging due to the operational needs of control systems and lack of security experience among operators.
The Maroochy SCADA attack in 2000 involved an insider, Vitek Boden, who had worked for the company that installed the sewage pumping control system for Maroochy, Australia. After leaving the company and being denied a job with the local council, Boden stole hardware and software from his previous employer to launch a revenge attack. Over several months, he remotely hacked into the SCADA system controlling the sewage pumps and caused pumps to fail, releasing over 1 million liters of untreated sewage. The attack highlighted security issues with the system's insecure radio communications and lack of monitoring. Boden was later convicted and jailed for his role in the incident.
Cybersecurity involves protecting individuals, businesses, and governments from cyber threats on computers and the internet. It is a broad field that includes threat analysis, security technologies, policies and laws. Cybersecurity problems stem from technical issues as well as human and organizational factors. It aims to prevent malicious cyber attacks and accidental damage. Attacks can come from inside or outside an organization and include fraud, spying, stalking, assault, and warfare between nations. The scale of the problem is large but difficult to measure fully. Cybersecurity issues have arisen because the internet was not designed with security in mind and prioritizes convenience, while widespread connectivity has increased risks.
The document discusses the characteristics of critical systems. Critical systems must have high dependability, meaning they must be reliable, available, safe, and secure. They are often systems of systems that rely on external infrastructure and other systems. Critical systems are important to society and can include safety-critical systems, mission-critical systems, business-critical systems, and infrastructure systems. Dependability is the most important attribute of critical systems.
System dependability is a composite system property that reflects the degree of trust users have in a system. It is determined by availability, reliability, safety, and security. Dependability is subjective as it depends on user expectations - a system deemed dependable by one user may be seen as unreliable by another if it does not meet their expectations. Formal specifications of dependability do not always capture real user experiences.
Critical systems engineering focuses on developing dependable and secure systems for applications where failure could result in loss of life, injury, or significant economic damage. These systems often require certification by regulators to demonstrate compliance with safety and dependability standards before use. The development process for critical systems uses rigorous, plan-driven methodologies and techniques like formal methods, fault detection tools, and redundancy to reduce risk of failure, which may involve higher costs than for other systems due to the potentially severe consequences of errors.
Socio-technical systems include both technical and human elements. They are made up of interconnected layers from equipment and software to business processes and societal rules. Properties emerge from the interactions between these layers, including reliability, security, and usability. Whether a socio-technical system is considered a success or failure depends on perspective, as stakeholders have differing views and system behavior is non-deterministic due to human factors. Failures are also inevitable given the complexity of relationships in socio-technical systems.
This document discusses availability and reliability in systems. Availability is defined as the probability that a system will be operational to deliver requested services, while reliability is the probability of failure-free operation over time. Both can be expressed as percentages. Availability takes into account repair times, whereas reliability does not. Faults can lead to errors and failures if not addressed through techniques like fault avoidance, detection, and tolerance.
The document discusses system security and defines key related terms. System security is the ability of a system to protect itself from accidental or deliberate attacks. It is essential for availability, reliability, and safety as most systems are networked. Without proper security, systems are vulnerable to damage like denial of service, data corruption, and disclosure of confidential information. Security can be achieved through strategies such as avoiding vulnerabilities, detecting and eliminating attacks, and limiting exposure and enabling recovery from successful attacks.
The document discusses techniques for achieving dependable software systems through fault tolerance. It describes hardware-based triple modular redundancy and software-based N-version programming. It also discusses challenges with achieving true design diversity and problems that can arise from specification errors. The document concludes with guidelines for dependable programming practices such as limiting information visibility, checking inputs, using exception handling, avoiding error-prone constructs, including restart capabilities and timeouts.
This document discusses techniques for engineering dependable software systems. It covers topics like redundancy and diversity, dependable processes, and dependable system architectures. Redundancy and diversity are fundamental approaches to achieving fault tolerance. Dependable processes use well-defined and repeatable development practices to minimize faults. Dependable system architectures, like protection systems and self-monitoring architectures, are designed to tolerate faults through techniques such as voting across redundant or diverse versions of the system.
This document discusses techniques for engineering dependable software systems. It covers redundancy and diversity approaches to achieve fault tolerance. Dependable systems are achieved through fault avoidance, detection, and tolerance. Critical systems often use regulated processes and dependable architectures like protection systems, self-monitoring architectures, and N-version programming which involve redundant and diverse components to continue operating despite failures. The document gives examples of how these techniques are applied in systems like aircraft flight control to maximize availability.
This document discusses dependability engineering and techniques for achieving dependable software systems. It covers fault avoidance, fault detection, and fault tolerance. Critical systems often use redundancy, diversity, and regulated development processes to meet high dependability requirements. Dependable architectures and protection systems can provide fault tolerance to prevent failures from causing outages or emergencies.
High dependability of the automated systemsAlan Tatourian
This is the second research talk I gave at the Semiconductor Research Corporation (SRC) in September. Here I bring to attention the need to solve problems of SW maintainability and of the self-adaptable but still reliable architectures. State of the art in the industry now is ‘fail-operational’ which is based on redundancy. We can build a better technology which will optimize itself based on some global minimum function and will be able to adapt both to external changes in the environment and internal operating conditions.
A powerpoint presentation on embedded systems for students and graduates.
Embedded system is an electronic or electro-mechanical system designed to perform a specific task & is combination of both hardware and software.
Every Embedded system is unique, hardware & software are highly specified to the application domain.
#education #technology #business #embeddedsystems #embedded #systems #datacollection #datacommunication #dataprocessing #monitoring #control #application #interface #content #advantages #disadvantages #bridge #electromechanical
Fault-tolerant computer systems are designed to continue operating properly even when some components fail. They achieve this through techniques like redundancy, where backup components take over if primary components fail. The document discusses the goals of fault tolerance like ensuring no single point of failure. It provides examples of how fault tolerance is implemented in areas like data storage and outlines techniques used to design and evaluate fault-tolerant systems.
The document provides an overview of operating systems with the following key points:
1) An operating system acts as an interface between users and hardware, allowing for convenient and efficient usage of resources while providing information protection and parallel processing.
2) Operating systems include components for job scheduling, simultaneous CPU execution and I/O handling, and offline processing to avoid wasted CPU cycles.
3) Other characteristics of operating systems include time sharing, multiprocessing, distributed systems, and supporting real-time applications with rapid response times.
Fault tolerance systems use hardware and software redundancy to continue operating satisfactorily despite failures. They employ techniques like triplication and voting where multiple processors compute the same parameter and vote to ignore suspect values. Duplication and comparison has two processors compare outputs and drop off if disagreement. Self-checking pairs can detect their own errors. Fault tolerant software uses multi-version programming with different teams developing versions, recovery blocks that re-execute if checks fail, and exception handlers.
This document summarizes a software engineering solved paper from June 2013. It discusses various topics in software engineering including attributes of good software, definitions of software engineering and different types of software products, emergent system properties with examples, critical systems and their types, the Rational Unified Process (RUP) methodology with block diagrams, security terminology, and metrics for specifying non-functional requirements. The requirement engineering process is also explained with the main goal being to create and maintain a system requirement document through various sub-processes like feasibility study, elicitation and analysis, validation, and management of requirements.
This document discusses cloud security responsibilities. It outlines that the Cloud Security Alliance (CSA) promotes best practices for securing cloud computing. The CSA provides guidance to help companies implement secure clouds. The document then discusses responsibilities for both cloud providers and customers. For providers, this includes physical security of data centers, operating system security, hypervisor security, and network security. For customers, responsibilities involve firewalls, software updates, password policies, virtual machine security, access device security, and staff security practices. The document provides details on how to implement security controls for each area.
UNIT 1C CHARACTERISTICS _ QUALITY ATT OF ES.pptxSKILL2021
This document discusses the key characteristics and quality attributes of embedded systems. It describes embedded systems as application specific, reactive and operating in real-time, able to withstand harsh environments, potentially distributed across multiple systems, small in size and weight, and concerned with power management. It outlines operational quality attributes like response, throughput, reliability and maintainability. Non-operational attributes include testability, evolvability, portability and cost considerations. Overall the document provides a comprehensive overview of what defines embedded systems and the important factors that determine their quality.
AWS Community Day - Vitaliy Shtym - Pragmatic Container SecurityAWS Chicago
Vitaliy Shtym from Trend Micro discusses pragmatic container security. He outlines six key areas to focus on: (1) the container host, (2) the network, (3) the management stack, (4) the build pipeline, (5) the application foundation, and (6) the application. Specific security best practices are provided for securing containers within each of these areas, such as hardening the container host operating system, using intrusion prevention controls, and scanning container images for vulnerabilities before deployment. The goal is to implement defense in depth across the entire container environment.
This document discusses trends in embedded systems. It outlines that embedded systems integrate computer hardware and software onto a single microprocessor board. Key trends in embedded systems include systems-on-a-chip (SoC), wireless technology, multi-core processors, support for multiple languages, improved user interfaces, use of open source technologies, interoperability, automation, enhanced security, and reduced power consumption. SoCs integrate all system components onto a single chip to reduce power usage. Wireless connectivity and multi-core processors improve performance. Embedded systems also support multiple languages and have improved user interfaces.
The document provides an overview of operating systems with three main points:
1) An operating system acts as an interface between hardware and users, allowing for convenient and efficient usage of resources through parallel processing and protection of information.
2) Operating systems include components like scheduling, memory management, and protection of resources from malicious users through privileged modes.
3) A key function of operating systems is managing the storage hierarchy through caching to make the best use of fast but expensive memory and slower cheaper storage.
The document provides an overview of operating systems with three main points:
1) An operating system acts as an interface between hardware and users, allowing for convenient and efficient usage of resources through multitasking, memory protection, and allocating resources to each user.
2) Operating systems include components like scheduling, handling concurrent CPU execution and I/O, memory management, and deadlock protection. They also have characteristics like time-sharing, multiprocessing, and supporting real-time systems.
3) Operating systems manage hardware resources and a storage hierarchy with caching. They provide protection between users and the system through privileged modes, controlled access to I/O, memory, and CPU time allocation.
This document provides an in-depth study of embedded operating systems. It discusses typical requirements, constraints, and applications of embedded systems. Embedded systems range from devices like watches and MP3 players to large industrial systems. Key characteristics of embedded systems include being application-specific, having real-time performance constraints, limited hardware resources, and high reliability requirements. The document outlines common industrial requirements for embedded systems like availability, reliability, safety, and security. It also discusses system limitations such as size, weight, power consumption, operating environment, lifetime, and cost constraints that embedded operating systems must address.
a comprehensive slide on Embedded System.pptxMohanAhmed3
An embedded system is a microprocessor-based computer hardware system with software that is designed to perform a dedicated function, either as an independent system or as a part of a large system. At the core is an integrated circuit designed to carry out computation for real-time operations.
Similar to CS 5032 L8 dependability engineering 2 2013 (20)
The document discusses ultra large scale systems (ULSS) and introduces key points from an SEI report on ULSS. It defines ULSS as interconnected webs of software, people, policies and economics at an internet scale. The scale of ULSS undermines traditional software engineering approaches. Some challenges in developing ULSS include design/evolution, orchestration, monitoring, organizational integration, and regulation/control. New interdisciplinary research is needed to address issues arising from increased system scale.
The document discusses responsibility modeling for socio-technical systems. It describes how responsibility models can be used to identify vulnerabilities in systems by clarifying agent responsibilities and information needs. The document presents examples of responsibility models for emergency response coordination and uses HAZOP analysis to identify potential failures based on deviations to information resources. The goal of responsibility modeling is to improve system dependability by facilitating analysis of social and organizational factors.
The document discusses approaches to system failure and recovery in large, complex information systems. It argues that a conventional view of failure as deviation from specifications is not adequate, as failures are subjective judgments that depend on context. Failures are also inevitable due to system complexity and stakeholder conflicts. Instead of seeking to avoid all failures, systems should be designed to support recognition of problems when they occur and enable recovery actions. Guidelines for designing systems to enhance recovery include supporting local knowledge, flexible reconfiguration of processes when needed, and redundancy through data replication and alternative records. The goal is to design systems that make recovery from inevitable failures easier rather than focusing solely on preventing failures.
This document discusses the challenges of engineering large, socio-technical systems where there are significant unknown factors (LSCITS). It argues that the traditional reductionist approach to engineering breaks down for these systems. LSCITS engineering must account for scale, uncertainty, and the entanglement of social and technical aspects. Some key challenges identified include managing large scale systems, dealing with uncertainty, reasoning about complex systems, integrating different systems, and developing standards for LSCITS. The document advocates an approach that tempers reductionism with pragmatism and recognition of real-world constraints like imperfect people and organizations.
This document discusses the realities of requirements engineering processes compared to formal textbook models. It notes that real RE processes are often ad hoc and vary significantly based on factors like the type of system, customer, developer culture, and deployment environment. Formal processes do not account for human, social, and political factors that influence requirements. The document emphasizes that stakeholders, customers, and environments are diverse and changing, so requirements engineering must adapt to these realities rather than following rigid predefined processes.
This document discusses dependability requirements engineering for safety-critical information systems. It introduces the concepts of system dependability, dependability requirement types, and the need for integrated requirements engineering. Dependability requirements should be considered alongside other business requirements. The document uses an exemplar healthcare records management system to illustrate how concerns can help identify safety requirements and requirements conflicts. Concerns reflect organizational goals and help bridge the gap between goals and system requirements.
The document discusses the conceptual design process for a large-scale complex IT system (LSCITS) for road pricing. It covers the key activities in conceptual design including concept formulation, problem understanding, requirements engineering, feasibility studies, and architectural design. Specifically, it provides examples for a proposed road pricing system, describing the problem it aims to solve, potential high-level requirements and constraints, different design options considered in the feasibility study, and examples of technologies that could enable each option.
This document provides an introduction to large-scale complex IT systems (LSCITS). It defines LSCITS and distinguishes them from other types of systems. Key points made include:
- An LSCITS is a large-scale IT system where there are significant unknown factors in its development, use, and operating environments that introduce complexity.
- LSCITS are often part of larger socio-technical systems and there are complex relationships between the technical and social aspects.
- Examples given of LSCITS include digital music systems and national identity management systems.
In November 1988, a student at Cornell University deliberately released a program called the Internet Worm that exploited security vulnerabilities in Unix systems and spread across computers connected to the Internet. The worm did not cause direct damage but its replication overloaded systems and severely disrupted network services. It took system administrators several days to devise and implement modifications to vulnerable programs to stop the spread of the worm and clear infected machines. This was the first widely distributed Internet security threat and highlighted the importance of securing systems and patching vulnerabilities.
Discusses sociotechnical issues that arose in the design of a national digital learning system intended for use by more than a million students and their teachers
The Ariane 5 rocket failed on its maiden flight 37 seconds after launch due to a software failure. The failure occurred when an attempt to convert a 64-bit number to a 16-bit integer caused an overflow, crashing the inertial reference system and backup system. The failed software was reused from the Ariane 4 without being properly tested or reviewed for the new rocket configuration. This led to an avoidable launch failure 37 seconds into the flight.
Critical infrastructure refers to the essential systems and services in modern societies that support public services, the economy, and national security. These include systems for power, water, transportation, communications, health care, finance, and emergency services. Modern critical infrastructure is controlled and managed by interconnected computer systems, so these digital systems are also considered critical infrastructure. Critical infrastructure is characterized as being large in scale, complex with many interdependencies, reliant on standards, and long-lasting. The availability of critical infrastructure is essential for society, and the unavailability of critical systems could have significant human, economic, and social consequences.
1. Dependable systems
architectures
Lecture 2
Dependable systems architectures, 2013 Slide 1
2. System fault tolerance
• Fault tolerance means that the system can continue
in operation in spite of software fault i.e. the fault
does not lead to a failure
• Fault tolerance is required where there are high
availability requirements, no ‘fail safe’ state or where
system failure costs are very high.
• Even if the system has been proved to conform to its
specification, it must also be fault tolerant as there
may be specification errors or the validation may be
incorrect.
Dependable systems architectures, 2013 Slide 2
3. Dependable system
architectures
• Dependable systems architectures are used in
situations where fault tolerance is essential. These
architectures are generally all based on redundancy
and diversity.
• Examples of situations where dependable
architectures are used:
– Flight control systems, where system failure could threaten
the safety of passengers
– Reactor systems where failure of a control system could lead
to a chemical or nuclear emergency
– Telecommunication systems, where there is a need for 24/7
availability.
Dependable systems architectures, 2013 Slide 3
4. Protection systems
• A specialized system that is associated with some
other control system, which can take emergency
action if a failure occurs.
– System to stop a train if it passes a red light
– System to shut down a reactor if temperature/pressure are
too high
• Protection systems independently monitor the
controlled system and the environment.
• If a problem is detected, it issues commands to take
emergency action to shut down the system and avoid
a catastrophe.
Dependable systems architectures, 2013 Slide 4
5. Sizewell B reactor
• Software controlled
protection system in
Sizewell B reactor
• Go-live in 1993
• Software protection
system has been claimed
to have a 1/10000 PFD
Dependable systems architectures, 2013 Slide 5
7. Protection system functionality
• Protection systems are redundant because they
include monitoring and control capabilities that
replicate those in the control software.
• Protection systems should be diverse and use
different technology from the control software.
• They are simpler than the control system so more
effort can be expended in validation and
dependability assurance.
• Aim is to ensure that there is a low probability of
failure on demand for the protection system.
Dependable systems architectures, 2013 Slide 7
8. Self-monitoring architectures
• Multi-channel architectures where the system
monitors its own operations and takes action if
inconsistencies are detected.
• The same computation is carried out on each channel
and the results are compared. If the results are
identical and are produced at the same time, then it is
assumed that the system is operating correctly.
• If the results are different, then a failure is assumed
and a failure exception is raised.
Dependable systems architectures, 2013 Slide 8
10. Self-monitoring systems
• Hardware in each channel has to be diverse so that
common mode hardware failure will not lead to each
channel producing the same results.
• Software in each channel must also be diverse,
otherwise the same software error would affect each
channel.
• If high-availability is required, you may use several
self-checking systems in parallel.
– This is the approach used in the Airbus family of aircraft for
their flight control systems.
Dependable systems architectures, 2013 Slide 10
11. Airbus 340
• Airbus were the first
commercial aircraft
manufacturer to use ‘fly
by wire’ flight control
systems
• Fly by wire systems are
lighter (saving fuel),
can be more fuel
efficient and can detect
and prevent pilot
actions that are
potentially dangerous.
Dependable systems architectures, 2013 Slide 11
12. Airbus flight control system
architecture
Dependable systems architectures, 2013 Slide 12
13. Airbus architecture discussion
• The Airbus FCS has 5 separate computers, any one
of which can run the control software.
• Extensive use has been made of diversity
– Primary and secondary systems use different processors.
– Primary and secondary systems use chipsets from different
manufacturers.
– Software in secondary systems is less complex than in
primary system – provides only critical functionality.
– Software in each channel is developed in different
programming languages by different teams.
– Different programming languages used in primary and
secondary systems.
Dependable systems architectures, 2013 Slide 13
14. Hardware fault tolerance
• Depends on triple-modular
redundancy (TMR).
• Three identical components that
receive the same input and whose
outputs are compared.
• If one output is different, it is
ignored and component failure is
assumed.
• Based on most faults resulting
from component failures rather
than design faults and a low
Dependable systems architectures, 2013 probability of simultaneous Slide 14
16. N-version programming
• Multiple versions of a software
system carry out computations at
the same time. There should be an
odd number of computers
involved, typically 3.
• The results are compared using a
voting system and the majority
result is taken to be the correct
result.
• Approach derived from the notion
of triple-modular redundancy, as
used in hardware systems.
Dependable systems architectures, 2013 Slide 16
18. N-version programming
• The different system versions
are designed and implemented
by different teams. It is
assumed that there is a low
probability that they will make
the same mistakes. The
algorithms used should but may
not be different.
• There is some empirical
evidence that teams commonly
misinterpret specifications in the
same way and chose the same
algorithms in their systems.
Dependable systems architectures, 2013 Slide 18
19. Software diversity
• Approaches to software fault tolerance embedded in
the system architecture depend on software diversity
where it is assumed that different implementations of
the same software specification will fail in different
ways.
• It is assumed that implementations are (a)
independent and (b) do not include common errors.
• Strategies to achieve diversity
– Different programming languages
– Different design methods and tools
– Explicit specification of different algorithms
Dependable systems architectures, 2013 Slide 19
20. Problems with design diversity
• Teams are not culturally diverse so they tend to
tackle problems in the same way
• Characteristic errors
– Different teams make the same mistakes. Some parts of an
implementation are more difficult than others so all teams
tend to make mistakes in the same place;
• Specification errors;
– If there is an error in the specification then this is reflected in
all implementations;
– This can be addressed to some extent by using multiple
specification representations.
Dependable systems architectures, 2013 Slide 20
21. Specification dependency
• Both approaches to software redundancy are
susceptible to specification errors. If the specification
is incorrect, the system could fail
• This is also a problem with hardware but software
specifications are usually more complex than
hardware specifications and harder to validate.
• This has been addressed in some cases by
developing separate software specifications from the
same user specification.
Dependable systems architectures, 2013 Slide 21
22. Improvements in practice
• In principle, if diversity and independence can be
achieved, multi-version programming leads to very
significant improvements in reliability and availability.
• In practice, observed improvements are much less
significant but the approach seems leads to reliability
improvements of between 5 and 9 times.
• Are such improvements are worth the considerable
extra development costs for multi-version
programming?
• Is it cheaper to increase the reliability of a single
version using fault avoidance and fault detection
Dependabletechniques?
systems architectures, 2013 Slide 22
23. Key points
• Dependable system architectures are system
architectures that are designed for fault tolerance.
• Architectural styles that support fault tolerance
include protection systems, self-monitoring
architectures and N-version programming.
• Software diversity is difficult to achieve because it is
practically impossible to ensure that each version of
the software is truly independent.
Dependable systems architectures, 2013 Slide 23