The document provides an overview of a safety management system (SMS) used by the Federal Aviation Administration (FAA). It describes the four components of an SMS - safety policy, safety risk management, safety assurance, and safety promotion. It focuses on the safety risk management component, outlining the processes of system description, hazard identification, risk analysis, risk assessment, and risk control. It provides examples of how these processes would be applied to identifying and mitigating risks associated with aircraft deicing activities.
2. Definitions
• Safety management systems provide a
systematic way to control risk and to
provide assurance that those risk controls
are effective.
• Safety Risk Management performs the
process of controlling risk thru hazard
identification, analysis, risk assessment
and developing risk controls.
Federal Aviation
Administration
SL-2
3. Levels of Risk Management
• Process Risk Management
– Policy (What)
– Procedure (How)
– Controls
• Operational Risk Management
– Operational Control (Flight/Task/Mission)
– Crew/Team (Real time decision making)
Federal Aviation
Administration
SL-3
4. System Description
What is System Description & Task Analysis?
•
•
•
It is a system design function.
It is a predictive method of hazard identification.
It is the foundation for sound safety analysis.
System
Description
Hazard
Identification
When is it used?
•
•
•
•
Risk
Analysis
Used during implementation phases of SMS.
Used in the development of operational processes.
Used when a new Hazard is identified
Used in conjunction with all operational changes.
Risk
Assmt
Who uses System Description & Task Analysis?
•
Risk
Control
Personnel within the organization who form an appropriately
diverse team:
– Stakeholders
– Subject Matter Experts
ICAO Doc. 9859
Federal Aviation
Administration
SL-4
10. Conditions (Hazards) Related to Human
Error
• Time pressure
• Poor procedures and documentation
• Teamwork (Too much, too little)
• Shift turnovers/crew briefings
• Group norms (Values, culture)
• Fatigue management (shifts/circadian
problems)
Alan Hobbs, ATSB (2008)
Federal Aviation
Administration
SL-10
11. Conditions (Hazards) Related to Human
Error (Cont.)
• Lack of System Knowledge
• Poor, worn out, missing or equipment/facilities
• Human-machine interface (e.g. design for
maintainability)
Federal Aviation
Administration
SL-11
12. Activities and Conditions: Deicing
Activities/Tasks
Workplace Conditions
Things people do
System and Environment
Prepare truck / equipment
Verify type of fluid
Day/Night
Weather – precip / cold
Fluid temp / concentration
Protective clothing
Position at aircraft
Equipment condition
Communicate with crew
Shift change
Apply fluid
Employee demographics
Communicate with crew
Depart ramp area
Federal Aviation
Administration
SL-12
13. Hazard Identification
System
Description
Hazard
Identification
A hazard is any real or potential
condition…
Risk
Analysis
that can result in injury, illness, or death
to people; damage to, or loss of, a
system (hardware or software),
equipment, or property;
and/or damage to the
operating environment.
Risk
Assmt
Risk
Control
ICAO Doc. 9859
Federal Aviation
Administration
SL-13
14. SRM
Hazard Identification
from Workplace Conditions
System
Description
Hazard
Identification
Processes
Activities
Workplace Conditions:
e.g.
System
Factors &
Attributes
Risk
Analysis
Risk
Assmt
• Equipment
• Information (Procedures)
• Facilities
• Phys. Envir.
• Other Proc. (Interfaces)
• Training
• Supv./Mgmt. (Controls)
• ….
Deficient Conditions
impacting activities =
Risk
Control
Inference
Variable
Performance
Causing…
Active Failures
Hazards
Resulting in…
Consequences
Federal Aviation
Administration
SL-14
15. Risk Analysis
System
Description
Important to distinguish between:
Hazard – a condition
Hazard
Identification
Risk
Analysis
Risk
Assmt
Consequence – result
Risk
Control
Risk – likelihood & severity of the
consequence
Analyzing risk involves the consideration
of both the likelihood and the severity of
any adverse consequences.
ICAO Doc. 9859
Federal Aviation
Administration
SL-15
16. SRM
From Hazard to Risk
System
Description
Hazard
Identification
Deficient Conditions
impacting activities =
Variable
Performance
Hazards
Causing…
Risk
Analysis
Risk
Assmt
Risk
Control
Resulting in…
Active Failures
Consequences
L ikelihood
S everity
Risk
Judgment
Risk
Federal Aviation
Administration
SL-16
17. Failures and Consequences
Active failures
Potential Consequences
Direct results of human activity
Accident/incident severity
Incorrect fluid type
Wrong fluid concentration
Fluid sprayed into
pitot-static ports
Incomplete deicing
Aircraft required to be deiced again
Take-off delay
Maintenance action
required
Take-off accident due to
ice
Hold-over time too long
Federal Aviation
Administration
SL-17
18. Risk Assessment
Risk assessment determines the level of risk to use in
making a bottom line decision.
System
Description
Hazard
Identification
Risk
Likelihood
Risk Severity
Catastrophic
Hazardous
Major
Minor
Negligible
A
B
C
D
E
Risk
Control
5A
5B
5C
5D
5E
4
4A
4B
4C
4D
4E
Remote
3
3A
3B
3C
3D
3E
Improbable
Risk
Assmt
5
Occasional
Risk
Analysis
Frequent
2
2A
2B
2C
2D
2E
Extremely
improbable
1
1A
1B
1C
1D
1E
A risk matrix is a tool used for risk assessment. It can vary in
form yet it accomplishes the same purpose.
Federal Aviation
Administration
SL-18
19. Risk Control = Risk Mitigation
System
Description
Hazard
Identification
A major component of any safety system
is the defenses (controls) put in place
to protect people, property or the
environment.
Risk
Analysis
Risk
Assmt
Risk
Control
These defenses are used to reduce the
likelihood or severity of the
consequences associated with any
given hazard or condition.
ICAO Doc. 9859
Federal Aviation
Administration
SL-19
The second component of an SMS, is Safety Risk Management. We’ve already seen the five major elements of SRM, let’s see how they work in detail.
The objective of an SMS is to provide a structured management system to enable us to make decisions on controlling risk in our operations.
Once hazards are identified and their related risks analyzed, an organization can focus its resources on eliminating or mitigating those hazards that pose the greatest risk.
This is what SRM helps us to do.
As previously mentioned a Safety Management System…
Click
…controls risk and assures that the controls are working.
Click
Safety Risk Management is how we do the first part, controlling the risk.
Safety Risk Management:
Starts with the identification of hazards which are then analyzed to determine the risk to the company.
The information is assessed to determine if the risk is acceptable or not.
And finally, unacceptable risks are mitigated through implementation of risk controls.
Each organization must define their own acceptable levels of risk, and what levels of management are authorized to accept various degrees of risk.
In Safety Assurance, the whole operation is then monitored for the effectiveness of the controls and for the appearance of new hazards.
Risk management is applied at two primary levels: The first is at the Process level and the second is at the Operational level.
Click
Process Risk Management has three main areas to consider:
Policy, - “what needs to be done”
- Many risk decisions should be part of company policy
– the result of careful deliberation about a broad range of situations. Policies convey what employees should do in various situations. For example, many operators prohibit circling approaches in instrument weather conditions.
Procedure, - “how to do it”
- Employees should be given procedures that have been developed in ways that takes risk into account. The selections and sequence of items on the checklists for operations, maintenance, and cabin safety and helps control human error.
controls, such as Required Inspection Items, making sure the first two get done and defend against possible errors that could occur in maintenance operations.
Click
Operational Risk Management is comprised of Operational Control and Crew or Team decision making:
The Operational Control Aspects involve how risk management is employed on a mission-by-mission basis. Oversight by dispatchers, supervisors, and flight followers help to assure that potentially hazardous conditions are identified and controlled during specific operations.
Crew or Team Decision Making starts with front-line personnel, such as pilots, mechanics, cabin crews and controllers. They must have tools and procedures that help them make decisions. We know the pilots-in-command are the “final authority”, several levels of risk control should be employed.
Organizations need to pay attention to all levels of risk management.
System Description also called, system description and task analysis is the first step in the SRM Process. This step is the often skipped, or done hastily, yet it is one of the most valuable activities an organization can do. It allows the Company to understand what they do, who does it, and in which environment they achieve Company goals.
Pilot Project Participants have shared they have found enormous value, financially and
of efficiency, from taking the necessary time to complete this step.
Click
What is System Description & Task Analysis?
It is a system design function; involving the whole organization.
It is a predictive method of hazard identification; when the dots do not connect, there is probably a hazard lurking there, design it out.
It is the foundation for sound safety analysis; if you do not know who is on the team and what position they play, how can you expect to accomplish the goal.Click
When is it used?
It is used during the implementation phases of SMS; as you design your SMS.
And it is used in conjunction with all operational changes, again, as you design your new system, process or procedure.Click
Who uses System & Task Analysis?
You need an appropriately diverse team:
Stakeholders, and
Subject Matter Experts
Remember, these analyses need only to be as complex as needed to design procedures, develop training, identify hazards, and to prepare documentation for employees and other users. The level of detail required will vary depending on the complexity and safety criticality of the processes involved. Detail beyond this point is not necessary
In training or discussions you’ve had in the past, we discovered the importance of identifying facts vs. inferences vs. judgment…The importance was probably explained to you by your Boss, immediately after a less than sterling performance,
Click
System and task analyses must concentrate on being factual, so you can accurately consider any processes potential for causing a problem.
Click
Here we break those processes, down into Activities (the things people do) and Workplace Conditions.
Why are these processes, activities and workplace conditions important?
First let’s look at workplace conditions and then at the human activities.
Here are several typical workplace conditions. These conditions exist for all processes. Unsatisfactory conditions can become hazards that exist before an accident.
Click
Understanding these factors and their effects on safety performance will help us to control them, which is the ultimate goal, in managing risk.
A poorly written or incomplete procedure can be re-written, the marginal training class can be improved, the broken, missing or worn out tools or equipment can be replaced or serviced, the hangar that is poorly lighted can be upgraded. Finally, these factors, and many more workplace conditions, have been contributing or causal factors to many accidents. If an organization can manage these, they will likely be managing their risk.
And this leads us to the Safety System Attributes,
Process Attributes (also known as System or Safety Attributes)
In the earlier Safety Policy portion of the presentation, one of the primary responsibilities under a system safety approach is to make sure safety is built into our systems.
What do we look for in the design of a system?
A primary component of any well designed functional system is the adequacy of certain attributes. These attributes effect both safety and quality management systems and have been documented in both National and International publications.
Click
Responsibility – who is accountable for safety and quality of process activities?
Click
Authority – who has the power to establish and modify process procedures?
Click
Procedures – are there clear instructions for members of the organization?
Click
Controls – are the administrative, process or supervisory controls adequate to ensure activities produce the correct results?
Click
Process Measures – is there a way to determine whether a process is being performed according to the established procedures and achieving the desired results?
Click
Interfaces – Are there strong interrelationships between processes? Does the left hand talk to the right hand?
Now for a “process flow” that can be applied to any process in any organization. We will illustrate two things, first how the process attributes fit into a process flow and then second we’ll show how the flow applies to a real world workplace.
First the process itself…
Click
…a set of interrelated or interacting activities, things that people do, that transform inputs into outputs.
Click
We add Inputs, which can be external (a decision by the company to open a new station) or internal (the output from a previous process, for example; the flight crew calling for deicing triggers the station deicing crew to start their process to deice the aircraft.). Notice that inputs are an Interface process attribute.
Next we ask “Who’s in charge”…
Click
…and who can make changes to the process and who can accept risk on behalf of the company.
Click
Then we need procedures to direct or guide the “things that people do”. “ a procedure is a specified way to carry out an activity or a process.” Another process attribute.
Click
Controls, (procedures, steps or practices) are designed to keep processes on track so they will achieve their intended results.
Click
And now we get to outputs, the product or end result of a process, which can be recorded, monitored, measured, and analyzed. Outputs maybe the input for the next process area in succession, so it is important to get it right.. Outputs contain the partner to the input Interface attribute and the final attribute, Performance Measures.
This model is a simple, input-activity-output flow showing where the process attributes affect a process.
Now, we will demonstrate how the Process Flow might look in the real world.
Click
In addition to the internal and external inputs we see customer requirements…
Click
…and materials, especially if your output is a product.
Click
On the controls side we see the obvious controls of; laws, regulations and standards.
Click
Training is also a control…
Click
…as are, the books, aids and software we use..
Click
How about the people; the folks doing the job and those in charge...
Click
…plus their facilities, equipment and tools…
Click
…and finally the environment in which everything has to work..
Click
Combine all these and you generate the output of a product or service.
We’ve described system analysis, workplace conditions and attributes that should be considered, let’s look at the human side of the equation and some conditions that have a potential to cause harm.
Click
Because humans, operating in the system, account for the majority of active failures, either contributing to or being blamed for most accidents, we will examine conditions relating to human error, as we continue with SRM, and attempt to manage the things we can manage.
Click
Here are some examples of conditions that contribute to errors, which can result in active failures.
Click
These conditions are typical findings, when reviewing incidents or accidents; they also exist in a predominant number of daily errors that haven’t yet resulted in an undesired event.
Click
You can think of examples of each of these that you’ve experienced or seen.
These are a few more examples of conditions that lead to errors:
Many of these conditions are subjective and not factual, therefore making them difficult to address directly.
Adjectives, such as “poor,” “inadequate,” and “lack of” at best may indicate a trend, however service providers must learn to manage the risk associated with these conditions.
Typical, these conditions are findings when reviewing incidents or accidents.
They also exist in a many daily errors that have not yet resulted in an undesired event.
This example shows how Activities and Workplace Conditions could be depicted for deicing an aircraft. Human activities are on the left and workplace conditions on the right. Human activities and workplace conditions are independent list.s
Pause
While not an exhaustive list, this type of thought process and progression is helpful in an organization’s development of an SMS. Breaking down the process steps into manageable pieces, and is an effective way to capture and track the information.
The next element of SRM , called Hazard Identification is used to determine the aspects of systems and environments that present hazardous conditions.
Click
Consider these points:
Hazards often are erroneously identified as consequences.
Hazards are not events. As such, hazards do not occur but exist in the environment.
Any workplace condition or set of conditions may or may not be a hazard.
Click
Hazardous conditions become workplace conditions when they singly or in combination present the potential for harm.
For conditions to be hazards, there must be some type of exposure to aviation operations. As an example, not all power lines or telephone wires are hazards, however the wires in the photo are a hazard because they are in a place where aviation operations come close enough for airplanes to become tangled up in them.
We just said workplace conditions become hazards when they singly, or in combination, present the potential for harm .
Click
This diagram shows how our knowledge of workplace conditions flows into the hazard identification process, “Deficient conditions impacting activities”.
We interpret the knowledge we have gained about workplace conditions and apply them to identify hazards that could cause harm.
Then we add our knowledge of the human activities to our knowledge of the hazards to infer what could cause active failures.
Click
Ssystem and task analyses must concentrate on being factual so that accurate inferences about their potential for causing system failures can be accomplished.
Now we progress from hazard to risk analysis.
Click
A hazard is a condition that may have an adverse consequence. Risk is a hazard that has been analyzed for severity and likelihood of the consequence.
Click
Analysis of risk must consider both likelihood and severity. That is, workplace conditions and active human failures along with potential consequences must be analyzed to determine relative likelihood and severity. The more adverse the workplace conditions, the more the likelihood of active failures and a resulting accident.
This graphic demonstrates the progression from hazard to risk.
After identifying the presence of a safety hazard, analysis is required to identify active failures that have potential for an adverse consequences. The likelihood that the adverse consequence will occur is directly proportional with increased exposure to the hazard. Likelihood, is typically a byproduct of an active failure, and, further, a set of underlying adverse workplace conditions. While we may have data to establish the likelihood of failures, based on past experience, usually this will be based…
Click
…on the judgment of those involved , another reason for the need for operational experience in the process.
Active failures are often cited in probable cause statements or audit reports – “the probable cause of this accident was the…[pilot’s, crew’s, mechanic’s, controller’s, etc.]…failure to….”
Be sure to consider all possible adverse consequences that could result from the active failure. One hazard can have multiple consequences, each with different likelihood or severity levels.
Also, that severity is driven by the consequence. For instance, if the possible outcome of errors during an instrument approach were Controlled Flight Into Terrain , the consequences will almost certainly be catastrophic.
On the left are listed some possible active failures that can be a direct result of human activity.
These failures should be related to the process and/or one of its underlying activities.
We have also postulated potential consequences that could result, as an example tool that can be helpful for an organization going through this process.
Risk Assessment is a decision process based on human knowledge, experience and judgment, but guided by a structured process.
Click
It determines the level of risk to assist in bottom line decision making.
Click
A risk matrix is a tool for looking at the combined effects of likelihood and severity in order to prioritize resources and aid in decision making. It is important to remember that a risk matrix is not risk assessment…it is only a tool……to help determine which risk to address in what sequence. The risk assessment is the human activity of applying knowledge, experience and judgment to the likelihood and severity values.
Color codes will help to determine the acceptability or “tolerability” of risk. Is the risk acceptable? If so, the system can be put into operation. If not….there is more work to be done.
Typically, “Tolerability” and “Intolerability” are indicated by using the color codes of:
Green = Acceptable Risk.
Yellow = Acceptable with mitigation.
Red = Unacceptable Risk.
The FAA does not require there to be a certain number of severity or likelihood levels and leaves the construction of the matrix up to the organization,
At this point, risk assessment has two options possible.. The risk is either acceptable or not acceptable. If acceptable, the SRM process is complete and the risk move on to the Safety Assurance (SA) process for monitoring. If the risk is not acceptable, then risk controls are required to mitigate or reduce the risk.
Risk controls, are sometimes called “defenses” or “barriers.”
Click
While we may be able to reduce severity, protective equipment, in most risk situations, we’ll only be able to reduce the likelihood or probability of an accident.
Click
We’ll do this by reducing the likelihood of an active failure , mostly human errors or equipment malfunctions.
As an example; The use of Anti-skid tape on a maintenance stand reduces likelihood that someone will slip on the platform. Installing railings reduces the severity, if someone does slip and fall, while working on the stand. If the environment includes working in the rain, wind and at night, what will these conditions do to our likelihood and severity?
Pause
Likelihood will increase, but severity may not.
Risk controls, , are usually applied to the working conditions, things that can be managed…..
The purpose of risk controls is to develop barriers or defenses to mitigate safety risk down to an acceptable level. Once a risk control is developed, it should be run back through the SRM Process, to ensure it will function in the system, doesn’t introduce other hazards, and accomplishes what it was designed to do.
When risk has been found to be undesirable or unacceptable, control measures must be introduced – the higher the risk, the greater the urgency.
The level of risk can be lowered by reducing the severity or likelihood of each possible consequence.
There is no such thing as absolute safety. Risks have to be managed to a level “as low as reasonably practicable”
Which means that the risk must be balanced against the time, cost and difficulty of taking measures to reduce or eliminate the risk.