Good afternoon. My name is Ron Stroup, from the Office of Information Services, process Engineering Division, AIO 200. My specific responsibilities include the development and implementation of software safety and certification processes and standards both within the FAA and to harmonize those standards within the international aviation communities. The focus of my presentation today is to address the issues the FAA safety and certification communities are currently working as documented in the 1997 GAO report.
The NAS is not defined by a single component or system, rather it is a complex collection of systems, procedures, facilities, aircraft, and of course, at the base of it all, making it work, people. The NAS represents the overall environment for the safe operation of aircraft. The FAA has responsibility for civil aviation safety. The FAA’s mission is to ensure the safety, security, and system efficiency of the National Airspace System. The ever increasing system complexity, interdependencies, and the ever increasing dependence on software intensive mission critical and safety critical software have placed a heightened sensitivity to ensuring end-to-end system safety. The NAS is a highly technical system and includes some 36,000 pieces of equipment operating in hundreds of locations throughout the United States. At present there are approximately 45 million flights operating throughout the United States per year. The system provides communications, navigation, surveillance, display, flight planning, and weather data to controllers, traffic managers, and pilots The FAA has recognized the need for and taken a proactive approach to ensuring software safety engineering is applied effectively and consistently throughout the National Airspace System.
What were our concerns? I talked around risk management on my previous slide, now I would like to more directly identify the concern. Previously our risk management often resulted in unsatisfied stakeholders, poor performance and/or cost and schedule overruns. Each program was treated as an island with the resolution of system interdependence issues and safety issues being discovered, resolved, or worst case, ignored until formal systems testing or operational evaluation in the field. Obviously, discovering issues at the culmination of the design and development efforts greatly contributed to or resulted in the overall dissatisfaction of the stakeholders, system performance, cost and schedule overruns, and ultimately the targeted safety of the system. As I’m sure everyone present recognizes that the most effective and successful safety programs design in safety upfront, as backfitting safety design features usually fall short of the desired results. Changing the culture to one of early detection and reduction of risk was required. We also discovered that we were using immature software acquisition processes, which honestly were ad hoc and chaotic. Software is the most costly and complex component of an Air Traffic System and we had no standardized means of evaluating and improving our processes. I shall discuss our improvement initiatives latter on in the presentation.
How are we improving? This slide identifies a number of initiatives that have been undertaken by the FAA to address the concerns. Order 8040.4 establishes the safety risk management policy within the FAA. To comply with Order 8040.4, the FAA launched a Software Safety and Certification program to improve certification/approval practices for the software aspects of CNS/ATM ground-based systems and airborne systems. The Systems Engineering Council was also launched to develop common systems engineering activities across the National Airspace System. Software Engineering body of knowledge provides a systematic, concise, and complete description of the software engineering discipline (methodologies, sources, anticipated use, etc.). This is supplemented by the Software Engineering Curriculum Framework for determining, assessing, and improving software engineering competencies. The FAA-iCMM is a model that describes the essential elements of an organizations process that must exist to ensure good acquisition of software intensive systems. The model combines the features of the software acquisition, software, and systems engineering CMM models.
Order 8040.4 Safety Risk Management was signed by the administrator in June 1998. This Order formalizes the Safety Risk policy for all high consequence decisions. A high consequence decision is defined as one that either creates or could be reasonably estimated to result in a statistical increase or decrease in personal injuries and/or loss of life and health, a change in property value, loss of or damage to property, cost or savings, or other economic impacts valued at 100 million or more per annum. A Safety Risk Management Committee was formed to provide a service to the various FAA organizations to provide assistance in the development of a comprehensive and effective plan for the management of safety risk. The SRMC meets periodically to exchange risk management ideas and information and provide advice and counsel to the office of system safety and other management officials upon request.
A systems engineering council was implemented to assist in the consistent and efficient application of system engineering throughout the various NAS System Components. The System Engineering Council has four primary functions: 1. Systems engineering leadership. 2. Development of processes and tools using govt. and industry standards. 3. Facilitate problem definition and resolution 4. Advocacy for resources to accomplish system engineering Products currently being developed by this council are the System Engineering Management Plan and the System Engineering Manual. The SEMP provides an organizational focus to discuss roles and responsibilities for systems engineering as a process and discipline applied across the FAA. The SEM identifies the technical and programmatic activities and products as the program moves from the initial idea through disposal and elimination of the system.
The System Safety Working Group is an advisory body of FAA System Safety professionals. The near-term purpose is to establish guidance for conducting safety risk management processes in accordance with Order 8040.4. Our long-term purpose is to control and implement these processes. Products currently being developed by the SSWG is the System Safety Management Plan and the System Safety Handbook. The SSMP establishes and defines the FAA plan for ensuring that system safety is effectively integrated into the NAS modernization in accordance with FAA orders and AMS policy. The SSH provides instructions on how to perform system safety engineering and management (best practices).
The AMS phases are: - Mission Analysis enables the Joint Resource Council to determine and prioritize its most critical capability shortfalls and best technology opportunities for improving the FAA’s overall safety, security, capacity, efficiency, and effectiveness in providing services to its customers. - Investment analysis defines the functional and performance strategy to satisfy the agency’s mission needs and baseline the best overall solution for satisfying critical capability shortfalls. Solution Implementation begins after the JRC selects a solution and ensures that products are shown to meet user requirements, be operationally suitable, and be compatible with other operational systems prior to an in-service decision. In-Service Management establishes a framework for evolutionary product development and to identify operational problems early enough to upgrade or replace products prior to their obsolescence. System Safety Management shall be conducted and documented throughout the acquisition management system.
This slide shows the various safety analyses and activities to be accomplished through a combined effort by the System Safety Working Group and the Integrated Product Team throughout the systems acquisition life cycle. Prior to Order 8040.4 being implemented, the safety analyses and activities were not being accomplished until the Solution Implementation Phase. There was also inconsistency among the Integrated Product Teams as to the analyses and activities to be performed. This resulted in programs busting their cost and schedule baselines. Today each line of business involved in the acquisition management must institute a system safety management process that includes at minimum: hazard identification, hazards classification, measures to mitigate the hazards to an acceptable level, verification that mitigation measures are incorporated into product design, and assessment of residual risk. We are also establishing a NAS Wide Hazard Tracking and Risk Resolution database to ensure a closed loop process of managing safety hazards and risks.
As I stated earlier, software safety engineering cannot perform effectively outside the boundaries to the total system engineering effort. As I discuss the specific components it must be clear that there is interaction to the systems engineering effort even though it may not be clearly identified. The structure of our software quality model is one based on Strategic (FAA-iCMM) Enablers and tools (IEEE12207, DO-178B) Tailored practices (FAA-STD-026, Software Assurance Guidelines)
This slide provides a graphical view of the software quality triangle FAA-iCMM elements include the following processes Engineering (Requirements, SW Development, System Test), Project (Proj. Mgt., Risk Mgt., Contracts Mgt.), Supporting (QA, CM, Measurement), and Organization (Implementation, training). FAA-STD-026 establishes the requirements for software development associated with NAS acquisitions. Formally this standard required Mil-Std-498, now IEEE 12207 and we are developing an implementation document to standardize with the FAA-iCMM, AMS, and software assurance guidelines. Software Development Assurance provides a level of confidence for the software in safety-critical systems that is consistent with other components of the NAS and will meet the safety requirements of the system. I will concentrate the remainder of my presentation on the use of software assurance as a vehicle to achieving desired targeted level of safety and security integrity within the NAS.
As systems become more complex and software-intensive, the ability to establish and maintain acceptable safety and security integrity level requirements has become increasingly more difficult. Software safety and security integrity level requirements are satisfied by applying rigorous design analysis to the system. This analysis includes, but is not limited to: requirements validation and verification, requirements-based testing, system testing, and structural coverage analysis. Other communities may discuss safety and security separately, however, the FAA, based on the NAS infrastructure, must consider that an overt security breach could result in a mishap.
As you can see on this slide, the qualification processes for safety and security are similar. You analyze the vulnerabilities, develop mitigating requirements, and verify their effectiveness.
This model shows the various activities and their relationship to the system development process, system safety processes, and the system security processes within the Acquisition Management System. Our goal is to have complete, well-defined requirements by the completion of the investment analysis phase to establish the proper baseline and reduce the risk, cost, and schedule of programs. We are attempting to look more at a systems approach to safety and security. In the past, we were uncovering deficiencies late in the design and risk assessments were too focused. We have found that acceptable risks in independent systems could contribute to a mishap when fully integrated within the NAS. We have now refocused our safety/security programs to evaluate the NAS as a whole, to ensure total end-to-end system safety and security. Our goal over the next year will be to evaluate and refine this model and to identify those activities, products and assurance points that are necessary to assure the design of a safe and effective system.
The FAA continues to refine its systems and software engineering processes. We are focusing on the technical and programmatic efficiencies that can be achieved by integrating safety and security into the system life cycle processes. I would like to thank Dr. Leveson for the opportunity to discuss our issues before this prestigious body.
Ron Stroup FAA, Office of Information Services Process Engineering Division, AIO-200 Software Safety and Certification Lead PH. (202) 493-4390 Ronald.L. [email_address] www.faa.gov/aio An Approach to the Software Aspects of Safety Management
FAA Experience (1/2) <ul><li>What were our concerns? </li></ul><ul><ul><li>Ineffective Risk Management. </li></ul></ul><ul><ul><li>Immature software acquisition processes. </li></ul></ul><ul><ul><li>GAO Report - Air Traffic Control: Immature Software Acquisition Processes Increase FAA’s System Acquisition Risks. AIMD-97-47, March 1997 </li></ul></ul>
FAA Experience (2/2) <ul><li>How are we improving? </li></ul><ul><ul><li>Ineffective Risk Management </li></ul></ul><ul><ul><ul><li>Develop safety risk management policy. </li></ul></ul></ul><ul><ul><ul><li>(FAA Order 8040.4 Safety Risk Management) </li></ul></ul></ul><ul><ul><li>(Software Safety and Certification Initiative) </li></ul></ul><ul><ul><ul><li>Improve knowledge of systems engineering. </li></ul></ul></ul><ul><ul><ul><li>(Systems Engineering Council) </li></ul></ul></ul><ul><ul><li>Immature software acquisition processes. </li></ul></ul><ul><ul><ul><li>Improve knowledge of software engineering. </li></ul></ul></ul><ul><ul><ul><li>(Software Engineering Body of Knowledge) </li></ul></ul></ul><ul><ul><ul><li>Develop software policy, practices, and technologies. </li></ul></ul></ul><ul><ul><ul><li>(FAA integrated Capability Maturity Model) </li></ul></ul></ul>
Order 8040.4 Safety Risk Management <ul><li>Purpose </li></ul><ul><ul><li>Established safety risk management policy </li></ul></ul><ul><ul><ul><li>Formalized process for all high-consequence decisions. </li></ul></ul></ul><ul><ul><li>Prescribes procedures for implementing safety risk management and decision-making tool </li></ul></ul><ul><ul><ul><li>Plan, Identify, Analysis, Assess, Decision </li></ul></ul></ul><ul><ul><li>Establishes Safety Risk Management Committee </li></ul></ul><ul><ul><ul><li>Provides advice, counsel the organizations </li></ul></ul></ul><ul><li>Safety Risk Management Committee </li></ul><ul><ul><li>Provides supplemental support to assist in the overall risk analysis capability and efficiency of key FAA organizations </li></ul></ul><ul><ul><li>Maintains a risk management resource directory </li></ul></ul><ul><ul><ul><li>Risk methodologies employed </li></ul></ul></ul><ul><ul><ul><li>Resource assistance </li></ul></ul></ul><ul><ul><li>Identifying suitable risk analysis tools and training </li></ul></ul><ul><li>FORMALIZE A COMMON SENSE APPROACH </li></ul>
System Engineering Council <ul><li>Purpose </li></ul><ul><ul><li>Orchestrates common systems engineering activities across the NAS </li></ul></ul><ul><ul><li>Responsibility, authority, and accountability for the development, documentation, deployment, control, and monitoring of the systems engineering process. </li></ul></ul><ul><li>Products </li></ul><ul><ul><li>System Engineering Management Plan </li></ul></ul><ul><ul><li>System Engineering Manual </li></ul></ul>
System Safety Working Group <ul><li>Purpose </li></ul><ul><ul><li>Working arm of the System Engineering Council </li></ul></ul><ul><ul><li>Assists in supporting and evaluating Comparative and Operational Safety Assessments </li></ul></ul><ul><li>Products </li></ul><ul><ul><li>System Safety Management Plan </li></ul></ul><ul><ul><li>System Safety Handbook </li></ul></ul>
Acquisition Management System <ul><li>The FAA’s Acquisition Management System (AMS)/Life-cycle Management System (LMS) consists of: </li></ul><ul><ul><li>Mission Needs </li></ul></ul><ul><ul><li>Investment Analysis </li></ul></ul><ul><ul><li>Solution Implementation </li></ul></ul><ul><ul><li>In-Service Management </li></ul></ul><ul><ul><li>Service-life Extension </li></ul></ul>
JRC1 JRC2 Mission Needs Investment Analysis Option1 Option2 Option3 NAS System Safety Management (Hazard Tracking) System Safety Program Solution Implementation Option Selection Concept of Operation In-Service Management OSA NAS SSMP PHA CRA SSPP SHA/SSHA System Safety Process Service-life Extension Operations and Maintenance ISD SSAR HTRR CRA Upgrade or Retire
FAA CNS/ATM Software <ul><li>FAA-iCMM </li></ul><ul><li>Software development </li></ul><ul><li>Software assurance </li></ul><ul><li>Implement and integrate software engineering processes into systems engineering. </li></ul>
Software Quality Triangle QUALITY SW FOR NAS SYSTEMS FAA-iCMM Software Assurance Guidance FAA-STD-026 (IEEE12207) Establishes essential elements of an organizations software acquisition, engineering, and management process Establishes a level of confidence for software that is consistent with its environment Establishes a process and documentation guidance for software development
Software Assurance <ul><li>What do we want to achieve? </li></ul><ul><li>Identify the objectives necessary, throughout the life cycle process, to provide confidence that a product and process satisfies given safety and security integrity level requirements. ICAO has established a targeted Global Risk Factor of extremely remote or 10-7 </li></ul>
Preliminary Safety/Security Model System Development Process System Security Process System Safety Process Assurance Milestones Protection Profiles Threat Analysis Preliminary Vulnerability Assessment Safety Requirements Operational Safety Assessment Preliminary Hazard Analysis Continued Analysis Requirements Specification Requirements Analysis System Specification Procedures HW Spec. SW Spec. SW Design SW Code SW Integration Security Requirements Certification Refined Vulnerability Assessment System Integration & Test In-Service Decision In-Service Management Service Life Extension Solution Implementation Mission Needs/ Investment Analysis Monitor Vulnerability Sustainment & Retirement Hazard Tracking & Monitor Residual Risk System/SubSystem Hazard Analysis Operating & Support Hazard Analysis Security Target
Summary <ul><li>The FAA continues to refine its systems and software engineering processes </li></ul><ul><li>We are focusing on the technical and programmatic efficiencies that can be achieved by integrating safety and security into the system life cycle processes. </li></ul><ul><li>The FAA is present to gain knowledge and understanding from other industries on their approach to mitigating safety issues. </li></ul>
Acronyms (1/2) <ul><li>AIO Office of Information Services </li></ul><ul><li>AMS Acquisition Management System </li></ul><ul><li>ATM Air Traffic Management </li></ul><ul><li>CNS Communications, Navigation and Surveillance </li></ul><ul><li>CRA Comparative Risk Analysis </li></ul><ul><li>FAA Federal Aviation Administration </li></ul><ul><li>FMEA Failure Modes Effects Analysis </li></ul><ul><li>HTRR Hazard Tracking and Risk Resolution </li></ul><ul><li>ICAO International Civil Aviation Organization </li></ul><ul><li>ICMM Integrated Capability Maturity Model </li></ul><ul><li>ISD In-Service Decision </li></ul><ul><li>JRC Joint Resource Council </li></ul>
Acronyms (2/2) <ul><li>LMS Life-cycle Management System </li></ul><ul><li>NAS National Airspace System </li></ul><ul><li>OSA Operational Safety Assessment </li></ul><ul><li>PHA Preliminary Hazard Assessment </li></ul><ul><li>SEMP System Engineering Management Plan </li></ul><ul><li>SEM System Engineering Manual </li></ul><ul><li>SHA System Hazard Analysis </li></ul><ul><li>SSH System Safety Handbook </li></ul><ul><li>SSHA SubSystem Hazard Analysis </li></ul><ul><li>SSMP System Safety Management Plan </li></ul><ul><li>SSAR System Safety Assessment Report </li></ul>