SlideShare a Scribd company logo
1 of 210
1/210
Accident and Incident
Investigation
2/210
Objectives of this Section
●
To define the reasons for investigating
accident and incidents.
●
To outline the process for effectively
investigating accidents and incidents.
●
To facilitate an effective investigation.
3/210
Accident Investigation
●
Important part of any safety management system.
Highlights the reasons why accidents occur and how
to prevent them.
●
The primary purpose of accident investigations is to
improve health and safety performance by:
 Exploring the reasons for the event and identifying both the
immediate and underlying causes;
 Identifying remedies to improve the health and safety
management system by improving risk control, preventing a
recurrence and reducing financial losses.
4/210
What to Investigate?
●
All accidents whether major or minor are caused.
●
Serious accidents have the same root causes as
minor accidents as do incidents with a potential for
serious loss. It is these root causes that bring about
the accident, the severity is often a matter of chance.
●
Accident studies have shown that there is a
consistently greater number of less serious
accidents than serious accidents and in the same
way a greater number of incidents then accidents.
5/210
Many accident ratio studies have been undertaken and
the one shown below is based on studies carried out by
the Health & Safety Executive.
189
Non Injury Accidents/Illnesses
7
Minor injuries or illnesses
1
Major injury
Or illness
6/210
Accident Studies
●
In all cases the ‘non injury’ incidents had the
potential to become events with more serious
consequences.
●
Such ratios clearly demonstrate that safety effort
should be aimed at all accidents including unsafe
practices at the bottom of the pyramid, with a
resulting improvement in upper tiers.
●
Peterson (1978) in defining the principles of safety
management says that “an unsafe act, an unsafe
condition, an accident are symptoms of something
wrong within the management’s system.”
7/210
Accident Studies
●
All events represent a degree of failure in control and
are potential learning experiences. It therefore
follows that all accidents should be investigated to
some extent.
●
This extent should be determined by the loss
potential, rather then just the immediate effect.
8/210
Stages in an Accident/Incident
Investigation
The stages in an accident/incident investigation are
shown in the following diagram.
Deal with immediate
risks.
Select the level of
investigation.
Investigate the event.
Record and analyse the
results.
Review the process.
9/210
Dealing with Immediate
Risks
●
When accidents and incidents
occur immediate action may be
necessary to:
Make the situation safe and
prevent further injury.
Help, treat and if necessary
rescue injured persons.
●
An effective response can only be
made if it has been planned for in
advance.
Deal with immediate
risks.
Select the level of
investigation.
Investigate the event.
Record and analyse the
results.
Review the process.
10/210
Day 2 start
11/210
Selecting the level of
investigation
The greatest effort should be put into:
Those involving severe injuries, ill-
health or loss.
Those which could have caused
much greater harm or damage.
These types of accidents and incidents
demand more careful investigation and
management time. This can usually be
achieved by:
Looking more closely at the
underlying causes of significant
events.
Assigning the responsibility for the
investigation of more significant
events to more senior managers.
Deal with immediate
risks.
Select the level of
investigation.
Investigate the event.
Record and analyse the
results.
Review the process.
12/210
Investigating the Event
The purpose of investigations is
to establish:
●
The way things were and how they came
to be.
●
What happened – the sequence of events
that led to the outcome.
●
Why things happened as they did
analysing both the immediate and
underlying causes.
●
What needs to be done to avoid a
repetition and how this can be achieved.
Deal with immediate
risks.
Select the level of
investigation.
Investigate the event.
Record and analyse the
results.
Review the process.
13/210
A few sources should give the investigator all that is
needed to know.
Observation
Information from physical
sources including:
• Premises and place of
work
• Access & egress
• Plant & substances in use
• Location & relationship of
physical particles
• Any post event checks,
sampling or
reconstruction
Documents
Information from:
• Written instructions;
Procedures, risk
assessments, policies
• Records of earlier
inspections, tests,
examinations and
surveys.
Interviews
Information from:
• Those involved and
their line
management;
• Witnesses;
• Those observed or
involved prior to the
event e.g. inspection
& maintenance staff.
• Checking reliability, accuracy
• Identifying conflicts and resolving differences
• Identifying gaps in evidence
14/210
●
Interviewing the person(s) involved and
witnesses to the accident is of prime
importance, ideally in familiar surroundings
so as not to make the person uncomfortable.
●
The interview style is important with
emphasis on prevention rather than blame.
●
The person(s) should give an account of
what happened in their terms rather than the
investigators.
Interviews
15/210
Interviews
●
Interviews should be separate to stop people
from influencing each other.
●
Questions when asked should not be
intimidating as the investigator will be seen
as aggressive and reflecting a blame culture.
16/210
Observation
The accident site should be inspected as
soon as possible after the accident. Particular
attention should/must be given to:
• Positions of people.
• Personnel protective equipment (PPE).
• Tools and equipment, plant or substances in
use.
• Orderliness/Tidiness.
17/210
Documents
Documentation to be looked at includes:
●
Written instructions, procedures and risk
assessments which should have been in operation
and followed. The validity of these documents may
need to be checked by interview. The main points to
look for are:
 Are they adequate/satisfactory?
 Were they followed on this occasion?
 Were people trained/competent to follow it?
●
Records of inspections, tests, examination and
surveys undertaken before the event. These provide
information on how and why the circumstances
leading to the event arose.
18/210
Determining Causes
●
Collect all information and facts which surround the
accident.
●
Immediate causes are obvious and easy to find.
They are brought about by unsafe acts and
conditions and are the ACTIVE FAILURES. Unsafe
acts show poor safety attitudes and indicate a lack of
proper training.
●
These unsafe acts and conditions are brought about
by the so called ‘root causes’. These are the
LATENT FAILURES and are brought about by
failures in organisation and the management’s safety
system.
19/210
Determine what changes are needed
The investigation should determine what control
measures were absent, inadequate or not implemented
and so generate remedial action for implementation to
correct this.
20/210
Generally, remedial actions should follow the
hierarchy of risk control:
●
Eliminate Risks by substituting the dangerous by the
inherently less dangerous.
●
Combat risks at source by engineering controls and
giving collective protective measures priority.
●
Minimise risk by designing suitable systems of
working.
●
Use PPE as a last resort.
21/210
Day 3 start
22/210
Recording & Analysing the
Results
●
Recorded in a similar and systematic
manner.
●
Provides a historical record of the accident.
●
Analysis of the causes and recommended
preventative protective measures should
be listed.
●
Completed as soon after the accident as
possible.
●
Information on the accident and remedial
actions should be passed to all
supervisors.
●
Appropriate preventative measures may
also have to be implemented by such
supervisors.
●
Investigation reports and accident statistics
should be analysed from time to time to identify
common causes, features and trends not be
apparent from looking at events in isolation.
Deal with immediate
risks.
Select the level of
investigation.
Investigate the event.
Record and analyse the
results.
Review the process.
23/210
Reviewing the Process
Reviewing the accident/incident
investigation process should
consider:
– The results of investigations and analysis.
– The operation of the investigation system
(in terms of quality and effectiveness).
Line managers should follow
through and action the findings of
investigations and analysis. Follow
up systems should be established
where necessary to keep progress
under control.
Deal with immediate
risks.
Select the level of
investigation.
Investigate the event.
Record and analyse the
results.
Review the process.
24/210
The investigation system should be examined
from time to time to check that it consistently
delivers information in accordance with the
stated objectives and standards. This usually
requires:
●
Checking samples of investigation forms to verify the
standard of investigation and the judgements made
about causation and prioritisation of remedial
actions.
●
Checking the numbers of incidents, near misses,
injury and ill-health events;
●
Checking that all events are being reported.
25/210
What is your definition
of an “Accident”?
26/210
What is an Accident
- an unplanned event
an unplanned event
- an unplanned incident involving
- an unplanned incident involving
injury or fatality
injury or fatality
- a series of events culminating in
- a series of events culminating in
an unplanned and unforeseen
an unplanned and unforeseen
event
event
27/210
How do Accidents occur?
- Accidents (with or without injuries) occur
when a series of unrelated events coincide at
a certain time and space.
-This can be from a few events to a series of
a dozen or more
(Because the coincidence of the series of
events is a matter of luck, actual accidents
only happen infrequently)
28/210
Unsafe Acts
- An unsafe act occurs in approx 85%- 95% of
all analyzed accidents with injuries
- An unsafe act is usually the last of a series of
events before the accident occurs (it could
occur at any step of the event)
- By stopping or eliminating the unsafe act, we
can stop the accident from occurring
29/210
What is an Accident Investigation?
●
A systematic approach to the identification of
causal factors and implementation of
corrective actions without placing blame on
or finding personal fault. The information
collected during an investigation is essential
to determine trends and taking appropriate
steps to prevent future accidents.
30/210
Which Accidents should be
Recorded or Reported?
ALL accidents
(including illnesses) shall
be recorded and reported
through the established
procedures and guidance
31/210
Why Investigate Accidents?
●
Determine the cause
●
Develop and implement corrective actions
●
Document the events
●
Meet legal requirements
Primary Focus:
PREVENT REOCCURENCE!!!
PREVENT REOCCURENCE!!!
PREVENT REOCCURENCE!!!
32/210
Accident vs. Near-Miss
Accident :
Any undesired, unplanned
event arising out of a given
work-related task which
results in physical injury/
illness or damage to property.
Near-Miss :
Events which did not result in injury/illness
or damage but had the potential to do so.
33/210
Accident Ratio Study
30
1
10
600
6000
Serious or Disabling
Minor Injuries
Property Damage
Accidents with no visible injury or
damage
Unsafe Acts or Conditions
34/210
Accident Causes
●
Unsafe Act
- an act by the injured person or another
person (or both) which caused the accident,
and/or
●
Unsafe Condition
- some environmental or hazardous
situation which caused the accident
independent of the employee
35/210
Accident Causation Model
 Results of the accident
- physical harm
- property damage
 Incident Occurrence
- contact with
- type
 Immediate causes
- practices
- conditions
 Basic causes
- personal factors
- job factors
- supervisory performance
- management policy and
decisions
36/210
Results of the Accident
●
Physical Harm
- catastrophic (multiple deaths)
- single death
- disabling
- serious
- minor
●
Property Damage
- catastrophic
- major
- serious
- minor
37/210
Incident Occurrence
●
Type
- struck by - struck against
- slip, trip - fell from
- caught on - fell on same level
- caught in - overexertion
●
Contact with
- electricity - equipment
- noise - vibration
- hazmat - heat/cold
- radiation - animals/insects
38/210
Immediate Causes
●
Practices
- operating without
authority
- use equipment
improperly
- not using PPE when
required
- correct lifting
procedures not
established
- drinking or drug use
- horseplay
- equipment not
properly secured
39/210
Immediate Causes (cont’d)
●
Conditions
- ineffective guards
- unserviceable tools and
equipment
- inadequate warning
systems
- bad housekeeping
practices
- poor work space
illumination
- unhealthy work
environment
40/210
Basic Causes
●
Personal Factors
- lack of knowledge or skill
- improper motivation
- physical or mental condition
- literacy or ability
●
Job Factors
- Physical environment
- sub-standard equipment
- abnormal usage
- wear and tear
- inadequate standards
- design and maintenance
41/210
Basic Causes (cont’d)
●
Supervisory Performance
- inadequate instructions
- failure of SOPs
- rules not enforced
- hazards not corrected
- devices not provided
●
Management Policy and
Decisions
- set measurable standards
- measure work in progress
- evaluate work vs. standards
- correct performance
No animals were hurt as a result of this accident
42/210
Severity of Incident
●
Major
- Employee fatality,
- Hospitalization of 3 or more employees,
- Permanent employee disability,
- Five or more lost workdays,
- Conditions that could pose an imminent and
threat of serious injury/illness to other employees
- Property losses in excess of $1 Million
●
Minor
- All other (less serious) incidents and unsafe
conditions reported by employees
43/210
Who Investigates?
●
Major Accidents
- NOAA “GO TEAM” Investigation Team
- LO Representative
- Other agencies such as NTSB, USCG, OSHA
●
Minor Accidents
- First-Line Supervisor
- Site Director or Manager
- Site Safety Representative
- NOAA SECO (if needed)
44/210
Investigator’s Qualifications
●
Technical knowledge
●
Objectivity
●
Analytical approach
●
Familiarity with the job, process or operation
●
Tact in communicating
●
Intellectual honesty
●
Inquisitiveness and curiosity
45/210
When to Investigate?
●
Immediately after incident
 Witness memories fade
 Equipment and clues
are moved
●
Finish investigation quickly
46/210
What to Investigate?
●
All accidents and near-misses
- Conduct investigation upon first
notification
- Keeping the scene in-tact and
recording witnesses statements
early is key to a successful
investigation
47/210
Accident Investigation Kit
May Include:
●
Digital Camera
●
Report forms, clipboard, pens
●
Barricade tape
●
Flashlight
●
Tape measure
●
Tape recorder
●
Personal Protective Equipment (as appropriate)
48/210
The Accident Occurs
●
Employee or co-worker immediately reports
the accident to a supervisor
●
Supervisor secures/assesses the scene to
prevent additional injuries to other
employees, before assisting the injured
employee
●
Supervisor treats the injury or seeks
medical treatment for the injured
●
The accident scene is left intact
●
Site safety rep is contacted to assist the
supervisor in the investigation of the
accident.
49/210
Beginning the Investigation
●
Gather investigation
members and kit
●
Report to the scene
●
Look at the big
picture
●
Record initial
observations
●
Take pictures
50/210
What’s Involved?
●
Who was injured?
●
Medication, drugs,
or alcohol?
●
Was employee ill or
fatigued?
●
Environmental conditions?
51/210
Witnesses
●
Who witnessed the
accident?
●
Was a supervisor or
Team Lead nearby?
●
Where were other
employees?
●
Why didn’t anyone
witness the accident
(working alone, remote areas)?
52/210
Interviewing Tips
●
Discuss what happened leading
up to and after the accident
●
Encourage witnesses to describe
the accident in their own words
●
Don’t be defensive or judgmental
●
Use open-ended questions
●
Do not interrupt the witness
53/210
What was Involved?
●
Machine, tool, or
equipment
●
Chemicals
●
Environmental
conditions
●
Field season prep
operations
54/210
●
Date and time?
●
Normal shift or
working hours?
●
Employee coming
off a vacation?
Time of Accident
55/210
Accident Location
●
Work area
●
On, under, in, near
●
Off-site address
●
Doing normal job
duties
●
Performing non-
routine or routine
tasks (i.e., properly
trained)
56/210
Employee’s Activity
●
Motion conducted
at time of accident
●
Repetitive motion?
●
Type of material
being handled
57/210
Accident Narrative
●
Describe the details so the reader
can clearly picture the accident
●
Specific body parts affected
●
Specific motions
of injured employee
just before,
during, and
after accident
58/210
Causal Factors
●
Try not to accept single cause theory
●
Identify underlying causes (root)
●
Primary cause
●
Secondary causes
 Contributing causes
 Effects
59/210
Corrective Actions Taken
●
Include immediate interim controls
implemented at the time of accident
●
Recommended corrective actions
 Employee training
 Preventive maintenance activities
 Better operating procedures
 Hazard recognition (ORM)
 Management awareness of risks involved
60/210
Immediate Notification
●
Supervisor shall complete the NOAA Web Based
Accident/ Illness Report Form and submit within
24 hours of incident occurrence (8 hours for major
incidents).
61/210
Accident Analysis Summary
●
Investigate accident immediately
●
Determine who was involved and
who witnessed it
●
Ascertain what items or equipment
were involved
●
Record detailed description
●
Determine causal factors
●
Implement corrective actions
62/210
63/210
64/210
1. What is an Accident Investigation?
a. A systematic approach to the identification of causal
factors and implementation of corrective actions.
b. Finding personal fault and placing blame.
c. The appropriate steps to prevent future actions.
d. The essential step to determine trends and taking
action against person or persons at fault.
65/210
2. Which Accidents should be Recorded or
Reported?
a. Only on the job accidents.
b. ALL accidents (including illnesses) shall be
recorded and reported.
c. Only on the job accidents on illnesses that occur on
the job and reported within 8 hours.
d. All accidents shall be recorded and reported.
66/210
3. Why Investigate Accidents?
a. To develop and implement corrective actions.
b. To document the events.
c. The Primary Focus is to PREVENT
REOCCURENCE!!!
d. To determine the cause.
67/210
4. Accident vs. Near-Miss?
a. Any unplanned event arising out of work that
resulted in injury vs. Any event which did not result
in injury but had potential to do so.
b. Any unsafe work habit vs. Any Hazardous working
conditions.
c. Any event which warns us of a problem vs. Any
circumstances that result in injury or property
damage.
68/210
5. Which of the following are the basic areas
that are looked at in an Accident
Investigation.
a. Policies.
b. Equipment.
c. Training.
d. All of the above.
69/210
Accident Investigation
Accident analysis is carried out in order to
determine the cause or causes of an accident
or series of accidents so as to prevent further
incidents of a similar kind. It is also known as
accident investigation.
70/210
Accident Investigation
It may be performed by a range of experts,
including forensic scientists, forensic
engineers or health and safety advisers.
Accident investigators, particularly those in
the aircraft industry, are colloquially known as
"tin-kickers".
71/210
Sequence
Accident analysis is performed in four steps:
Fact gathering: After an accident happened
a forensic process starts to gather all possibly
relevant facts that may contribute to
understanding the accident.
72/210
Sequence
Fact Analysis:
After the forensic process has been
completed or at least delivered some results,
the facts are put together to give a "big
picture." The history of the accident is
reconstructed and checked for consistency
and plausibility.
73/210
Sequence
Conclusion Drawing:
If the accident history is sufficiently
informative, conclusions can be drawn about
causation and contributing factors.
74/210
Sequence
Counter-measures:
In some cases the development of counter-
measures is desired or recommendations
have to be issued to prevent further accidents
of the same kind.
75/210
Methods
There exist numerous forms of Accident
Analysis methods. These can be divided into
three categories:
76/210
Methods
Causal Analysis
Causal Analysis uses the principle of
causality to determine the course of events.
Though people casually speak of a "chain of
events", results from Causal Analysis usually
have the form of directed a-cyclic graphs-the
nodes being events and the edges the cause-
effect relations. Methods of Causal Analysis
differ in their respective notion of causation.
77/210
Methods
Expert Analysis
Expert Analysis relies on the knowledge and
experience of field experts. This form of
analysis usually lacks a rigorous
(formal/semiformal) methodological
approach.
This usually affects falsify-ability and
objectivity of analyses. This is of importance
when conclusions are heavily disputed
among experts.
78/210
Methods
Organizational Analysis
Organizational Analysis relies on systemic
theories of organization. Most theories imply
that if a system's behaviour stayed within the
bounds of the ideal organization then no
accidents can occur.
79/210
Methods
Organizational Analysis
Organizational Analysis can be falsified and
results from analyses can be checked for
objectivity. Choosing an organizational theory
for accident analysis comes from the
assumption that the system to be analysed
conforms to that theory.
80/210
Using Digital Photographs to Extract
Evidence
Once all available data has been collected by
accident scene investigators and law
enforcement officers, camera matching,
photogrammetry or rectification can be used
to determine the exact location of physical
evidence shown in the accident scene
photos.
81/210
Camera matching:
Camera matching uses accident scene
photos that show various points of evidence.
The technique uses CAD software to create a
3-dimensional model of the accident site and
roadway surface.
82/210
Camera matching:
All survey data and photos are then imported
into a three dimensional software package
like 3D Studio Max.
A virtual camera can be then be positioned
relative to the 3D roadway surface.
Physical evidence is then mapped from the
photos onto the 3D roadway to create a three
dimensional accident scene drawing.
83/210
Photogrammetry
Photogrammetry is used to determine the
three-dimensional geometry of an object on
the accident scene from the original two
dimensional photos.
84/210
Photogrammetry
The photographs can be used to extract
evidence that may be lost after the accident
is cleared. Photographs from several
viewpoints are imported into software like
PhotoModeler.
85/210
Photogrammetry
The forensic engineer can then choose points
common to each photo. The software will
calculate the location of each point in a three
dimensional coordinate system.
86/210
Rectification
Photographic rectification is also used to
analyze evidence that may not have been
measured at the accident scene. Two
dimensional rectification transforms a single
photograph into a top-down view. Software
like PC-Rect can be used to rectify a digital
photograph.
87/210
Failure mode and effects analysis
88/210
Failure mode and effects analysis
Failure Mode and Effects Analysis (FMEA) was
one of the first systematic techniques for failure
analysis.
It was developed by reliability engineers in the
1950s to study problems that might arise from
malfunctions of military systems.
89/210
Failure mode and effects analysis
A FMEA is often the first step of a system
reliability study. It involves reviewing as many
components, assemblies, and subsystems as
possible to identify failure modes, and their
causes and effects.
90/210
Failure mode and effects analysis
For each component, the failure modes and their
resulting effects on the rest of the system are
recorded in a specific FMEA worksheet.
There are numerous variations of such
worksheets.
A FMEA is mainly a qualitative analysis.
91/210
Failure mode and effects analysis
A few different types of FMEA analysis exist, like
Functional,
Design, and
Process FMEA.
92/210
Failure mode and effects analysis
Sometimes the FMEA is called FMECA to
indicate that Criticality analysis is performed also.
93/210
Failure mode and effects analysis
An FMEA is an inductive reasoning (forward
logic) single point of failure analysis and is a core
task in reliability engineering, safety engineering
and quality engineering.
Quality engineering is specially concerned with
the "Process" (Manufacturing and Assembly) type
of FMEA.
94/210
Failure mode and effects analysis
A successful FMEA activity helps to identify
potential failure modes based on experience with
similar products and processes - or based on
common physics of failure logic.
95/210
Failure mode and effects analysis
It is widely used in development and
manufacturing industries in various phases of the
product life cycle.
Effects analysis refers to studying the
consequences of those failures on different
system levels.
96/210
Failure mode and effects analysis
Functional analyses are needed as an input to
determine correct failure modes, at all system
levels, both for functional FMEA or Piece-Part
(hardware) FMEA.
97/210
Failure mode and effects analysis
A FMEA is used to structure Mitigation for Risk
reduction based on either failure (mode) effect
severity reduction or based on lowering the
probability of failure or both.
98/210
Failure mode and effects analysis
The FMEA is in principle a full inductive (forward
logic) analysis, however the failure probability can
only be estimated or reduced by understanding
the failure mechanism.
99/210
Failure mode and effects analysis
Ideally this probability shall be lowered to
"impossible to occur" by eliminating the (root)
causes. It is therefore important to include in the
FMEA an appropriate depth of information on the
causes of failure (deductive analysis).
100/210
Failure mode and effects analysis
The FME(C)A is a design tool used to
systematically analyze postulated component
failures and identify the resultant effects on
system operations. The analysis is sometimes
characterized as consisting of two sub-analyses,
the first being the failure modes and effects
analysis (FMEA), and the second, the criticality
analysis (CA).
101/210
Failure mode and effects analysis
Successful development of an FMEA requires
that the analyst include all significant failure
modes for each contributing element or part in the
system. FMEAs can be performed at the system,
subsystem, assembly, subassembly or part level.
102/210
Failure mode and effects analysis
The FMECA should be a living document during
development of a hardware design. It should be
scheduled and completed concurrently with the
design. If completed in a timely manner, the
FMECA can help guide design decisions. The
usefulness of the FMECA as a design tool and in
the decision making process is dependent on the
effectiveness and timeliness with which design
problems are identified.
103/210
Failure mode and effects analysis
Timeliness is probably the most important
consideration. In the extreme case, the FMECA
would be of little value to the design decision
process if the analysis is performed after the
hardware is built.
104/210
Failure mode and effects analysis
While the FMECA identifies all part failure modes,
its primary benefit is the early identification of all
critical and catastrophic subsystem or system
failure modes so they can be eliminated or
minimized through design modification at the
earliest point in the development effort.
105/210
Failure mode and effects analysis
Therefore, the FMECA should be performed
at the system level as soon as preliminary
design information is available and extended
to the lower levels as the detail design
progresses.
106/210
Failure mode and effects analysis
Remark: For more complete scenario modelling
other type of Reliability analysis may be considered,
for example fault tree analysis(FTA); a deductive
(backward logic) failure analysis that may handle
multiple failures within the item and/or external to
the item including maintenance and logistics. It
starts at higher functional / system level. A FTA
may use the basic failure mode FMEA records or
an effect summary as one of its inputs (the basic
events). Interface hazard analysis, Human error
analysis and others may be added for completion in
scenario modelling.
107/210
Functional analysis
The analysis may be performed at the functional
level until the design has matured sufficiently to
identify specific hardware that will perform the
functions; then the analysis should be extended to
the hardware level. When performing the hardware
level FMECA, interfacing hardware is considered to
be operating within specification. In addition, each
part failure postulated is considered to be the only
failure in the system (i.e., it is a single failure
analysis).
108/210
Functional analysis
In addition to the FMEAs done on systems to
evaluate the impact lower level failures have on
system operation, several other FMEAs are done.
Special attention is paid to interfaces between
systems and in fact at all functional interfaces. The
purpose of these FMEAs is to assure that
irreversible physical and/or functional damage is
not propagated across the interface as a result of
failures in one of the interfacing units.
109/210
Functional analysis
These analyses are done to the piece part level for
the circuits that directly interface with the other
units. The FMEA can be accomplished without a
CA, but a CA requires that the FMEA has
previously identified system level critical failures.
When both steps are done, the total process is
called a FMECA.
110/210
Ground rules
The ground rules of each FMEA include a set of
project selected procedures; the assumptions on
which the analysis is based; the hardware that has
been included and excluded from the analysis and
the rationale for the exclusions. The ground rules
also describe the indenture level of the analysis, the
basic hardware status, and the criteria for system
and mission success.
111/210
Ground rules
Every effort should be made to define all ground
rules before the FMEA begins; however, the ground
rules may be expanded and clarified as the analysis
proceeds. A typical set of ground rules
(assumptions) follows:
112/210
Ground rules
● Only one failure mode exists at a time.
● All inputs (including software commands) to the
item being analyzed are present and at nominal
values.
● All consumables are present in sufficient
quantities.
● Nominal power is available
113/210
Benefits
Major benefits derived from a properly implemented
FMECA effort are as follows:
114/210
Benefits
It provides a documented method for selecting a
design with a high probability of successful
operation and safety.
115/210
Benefits
A documented uniform method of assessing
potential failure mechanisms, failure modes and
their impact on system operation, resulting in a list
of failure modes ranked according to the
seriousness of their system impact and likelihood of
occurrence.
116/210
Benefits
Early identification of single failure points (SFPS)
and system interface problems, which may be
critical to mission success and/or safety. They also
provide a method of verifying that switching
between redundant elements is not jeopardized by
postulated single failures.
117/210
Benefits
An effective method for evaluating the effect of
proposed changes to the design and/or operational
procedures on mission success and safety.
118/210
Benefits
A basis for in-flight troubleshooting procedures and
for locating performance monitoring and fault-
detection devices.
119/210
Benefits
Criteria for early planning of tests.
120/210
Basic terms
The following covers some basic FMEA
terminology.
Failure
The loss under stated conditions.
121/210
Basic terms
Failure mode
The specific manner or way by which a failure
occurs in terms of failure of the item (being a part or
(sub) system) function under investigation; it may
generally describe the way the failure occurs. It
shall at least clearly describe a (end) failure state of
the item (or function in case of a Functional FMEA)
under consideration. It is the result of the failure
mechanism (cause of the failure mode). For
example; a fully fractured axle, a deformed axle or a
fully open or fully closed electrical contact are each
a separate failure mode.
122/210
Basic terms
Failure cause and/or mechanism
Defects in requirements, design, process, quality
control, handling or part application, which are the
underlying cause or sequence of causes that
initiate a process (mechanism) that leads to a
failure mode over a certain time. A failure mode
may have more causes.
123/210
Basic terms
Failure cause and/or mechanism
For example; "fatigue or corrosion of a structural
beam" or "fretting corrosion in a electrical contact"
is a failure mechanism and in itself (likely) not a
failure mode. The related failure mode (end state) is
a "full fracture of structural beam" or "an open
electrical contact". The initial Cause might have
been "Improper application of corrosion protection
layer (paint)" and /or "(abnormal) vibration input
from another (possible failed) system".
124/210
Basic terms / Failure effect
Immediate consequences of a failure on operation,
function or functionality, or status of some item.
125/210
Indenture levels (bill of material or
functional breakdown)
An identifier for system level and thereby item
complexity. Complexity increases as levels are
closer to one.
126/210
Local effect
The failure effect as it applies to the item under
analysis.
127/210
Next higher level effect
The failure effect as it applies at the next higher
indenture level.
128/210
End effect
The failure effect at the highest indenture level or
total system.
129/210
Detection
The means of detection of the failure mode by
maintainer, operator or built in detection system,
including estimated dormancy period (if applicable)
130/210
Risk Priority Number (RPN)
Cost (of the event) * Probability (of the event
occurring) * Detection (Probability that the event
would not be detected before the user was aware of
it)
131/210
Severity
The consequences of a failure mode. Severity
considers the worst potential consequence of a
failure, determined by the degree of injury, property
damage, system damage and/or time lost to repair
the failure.
132/210
Remarks / mitigation / actions
Additional info, including the proposed mitigation or
actions used to lower a risk or justify a risk level or
scenario.
133/210
Example FMEA Worksheet
134/210
Probability (P)
In this step it is necessary to look at the cause of
a failure mode and the likelihood of occurrence.
This can be done by analysis, calculations / FEM,
looking at similar items or processes and the
failure modes that have been documented for
them in the past. A failure cause is looked upon
as a design weakness. All the potential causes
for a failure mode should be identified and
documented.
135/210
Probability (P)
This should be in technical terms. Examples of
causes are: Human errors in handling,
Manufacturing induced faults, Fatigue, Creep,
Abrasive wear, erroneous algorithms, excessive
voltage or improper operating conditions or use
(depending on the used ground rules). A failure
mode is given an Probability Ranking.
136/210
Probability (P)
137/210
Severity (S)
Determine the Severity for the worst case
scenario adverse end effect (state). It is
convenient to write these effects down in terms of
what the user might see or experience in terms of
functional failures. Examples of these end effects
are: full loss of function x, degraded performance,
functions in reversed mode, too late functioning,
erratic functioning, etc.
138/210
Severity (S)
Each end effect is given a Severity number (S)
from, say, I (no effect) to VI (catastrophic), based
on cost and/or loss of life or quality of life. These
numbers prioritize the failure modes (together
with probability and detectability). Below a typical
classification is given. Other classifications are
possible. See also hazard analysis.
139/210
Severity (S)
140/210
Detection (D)
141/210
Detection (D)
The means or method by which a failure is
detected, isolated by operator and/or maintainer
and the time it may take. This is important for
maintainability control (Availability of the system)
and it is specially important for multiple failure
scenarios.
142/210
Detection (D)
This may involve dormant failure modes (e.g. No
direct system effect, while a redundant system /
item automatic takes over or when the failure only
is problematic during specific mission or system
states) or latent failures (e.g. deterioration failure
mechanisms, like a metal growing crack, but not
a critical length).
143/210
Detection (D)
It should be made clear how the failure mode or
cause can be discovered by an operator under
normal system operation or if it can be discovered
by the maintenance crew by some diagnostic
action or automatic built in system test. A
dormancy and/or latency period may be entered.
144/210
Detection (D)
145/210
Detection (D)
DORMANCY or LATENCY PERIOD The average time that a
failure mode may be undetected may be entered if known.
For example:
During aircraft C Block inspection, preventive or predictive
maintenance, X months or X flight hours
During aircraft B Block inspection, preventive or predictive
maintenance, X months or X flight hours
During Turn-Around Inspection before or after flight (e.g. 8
hours average)
During in-built system functional test, X minutes
Continuously monitored, X seconds
146/210
Detection (D)
INDICATION
If the undetected failure allows the system to remain in a
safe / working state, a second failure situation should be
explored to determine whether or not an indication will be
evident to all operators and what corrective action they may
or should take.
147/210
Detection (D)
Indications to the operator should be described as follows:
Normal. An indication that is evident to an operator when the
system or equipment is operating normally.
Abnormal. An indication that is evident to an operator when
the system has malfunctioned or failed.
Incorrect. An erroneous indication to an operator due to the
malfunction or failure of an indicator (i.e., instruments,
sensing devices, visual or audible warning devices, etc.).
148/210
Detection (D)
PERFORM DETECTION COVERAGE ANALYSIS FOR
TEST PROCESSES AND MONITORING (From ARP4761
Standard):
149/210
Detection (D)
This type of analysis is useful to determine how effective
various test processes are at the detection of latent and
dormant faults. The method used to accomplish this involves
an examination of the applicable failure modes to determine
whether or not their effects are detected, and to determine
the percentage of failure rate applicable to the failure modes
which are detected. The possibility that the detection means
may itself fail latent should be accounted for in the coverage
analysis as a limiting factor (i.e., coverage cannot be more
reliable than the detection means availability).
150/210
Detection (D)
Inclusion of the detection coverage in the FMEA can lead to
each individual failure that would have been one effect
category now being a separate effect category due to the
detection coverage possibilities. Another way to include
detection coverage is for the FTA to conservatively assume
that no holes in coverage due to latent failure in the
detection method affect detection of all failures assigned to
the failure effect category of concern. The FMEA can be
revised is necessary for those cases where this conservative
assumption does not allow the top event probability
requirements to be met.
151/210
Detection (D)
After these three basic steps the Risk level may be provided.
152/210
Risk level (P*S) and (D)
Risk is the combination of End Effect Probability And
Severity. Where probability and severity includes the effect
on non-detectability (dormancy time). This may influence the
end effect probability of failure or the worst case effect
Severity. The exact calculation may not be easy in case
multiple scenarios (with multiple events) are possible and
detectability / dormancy plays a crucial role (as for
redundant systems). In that case Fault Tree Analysis and/or
Event Trees may be needed to determine exact probability
and risk levels.
153/210
Risk level (P*S) and (D)
Preliminary Risk levels can be selected based on a Risk
Matrix like shown below, based on Mil. Std. 882.[24] The
higher the Risk level, the more justification and mitigation is
needed to provide evidence and lower the risk to an
acceptable level. High risk should be indicated to higher
level management, who are responsible for final decision
making.
154/210
Risk level (P*S) and (D)
155/210
Risk level (P*S) and (D)
After this step the FMEA has become like a FMECA.
156/210
Timing
The FMEA should be updated whenever:
A new cycle begins (new product/process)
Changes are made to the operating conditions
A change is made in the design
New regulations are instituted
Customer feedback indicates a problem
157/210
Uses
Development of system requirements that minimize the
likelihood of failures.
Development of designs and test systems to ensure that
the failures have been eliminated or the risk is reduced to
acceptable level.
Development and evaluation of diagnostic systems
To help with design choices (trade-off analysis).
158/210
Advantages
Improve the quality, reliability and safety of a
product/process
Improve company image and competitiveness
Increase user satisfaction
Reduce system development time and cost
Collect information to reduce future failures, capture
engineering knowledge
159/210
Advantages
Reduce the potential for warranty concerns
Early identification and elimination of potential failure
modes
Emphasize problem prevention
Minimize late changes and associated cost
Catalyst for teamwork and idea exchange between
functions
Reduce the possibility of same kind of failure in future
Reduce impact on company profit margin
Improve production yield
160/210
Limitations
If used as a top-down tool, FMEA may only identify major
failure modes in a system. Fault tree analysis (FTA) is better
suited for "top-down" analysis. When used as a "bottom-up"
tool FMEA can augment or complement FTA and identify
many more causes and failure modes resulting in top-level
symptoms. It is not able to discover complex failure modes
involving multiple failures within a subsystem, or to report
expected failure intervals of particular failure modes up to
the upper level subsystem or system.
161/210
Limitations
Additionally, the multiplication of the severity, occurrence
and detection rankings may result in rank reversals, where a
less serious failure mode receives a higher RPN than a
more serious failure mode.
The reason for this is that the rankings are ordinal scale
numbers, and multiplication is not defined for ordinal
numbers. The ordinal rankings only say that one ranking is
better or worse than another, but not by how much. For
instance, a ranking of "2" may not be twice as severe as a
ranking of "1," or an "8" may not be twice as severe as a "4,"
but multiplication treats them as though they are. See Level
of measurement for further discussion.
162/210
Types
Functional: before design solutions are provided (or only on
high level) functions can be evaluated on potential functional
failure effects. General Mitigations ("design to"
requirements) can be proposed to limit consequence of
functional failures or limit the probability of occurrence in this
early development. It is based on a functional breakdown of
a system. This type may also be used for Software
evaluation.
163/210
Types
Concept Design / Hardware: analysis of systems or
subsystems in the early design concept stages to analyse
the failure mechanisms and lower level functional failures,
specially to different concept solutions in more detail. It may
be used in trade-off studies.
164/210
Types
Detailed Design / Hardware: analysis of products prior to
production. These are the most detailed (in mil 1629 called
Piece-Part or Hardware FMEA) FMEAs and used to identify
any possible hardware (or other) failure mode up to the
lowest part level. It should be based on hardware
breakdown (e.g. the BoM = Bill of Material). Any Failure
effect Severity, failure Prevention (Mitigation), Failure
Detection and Diagnostics may be fully analysed in this
FMEA.
165/210
Types
Process: analysis of manufacturing and assembly
processes. Both quality and reliability may be affected from
process faults. The input for this FMEA is amongst others a
work process / task Breakdown.
166/210
HOW TO CONDUCT AN
EFFECTIVE SAFETY
ASSESSMENT
OFFICE SPACES
Why should you be conducting
assessments?
●
To spot unsafe conditions and equipment
●
To focus on unsafe work practices or
behavior trends before they lead to injuries
●
Reveal the need for new safeguards
●
To provide a safe working environment for
all workers
What should I look for during an office
assessment?
●
Emergency Egress
●
Work Environment
●
Ergonomics
●
Emergency Information
●
Fire Prevention
●
Electrical Systems
●
Employee Behavior
Emergency Egress
●
Blocked or locked doorways
●
Locking devices that can impede
emergency egress
●
Properly marked exits
●
Properly illuminated exits
●
Clear aisles and pathways
Work Environment
●
Clean, sanitary and orderly work spaces
●
Tripping hazards such as loose tiles,
carpeting, flooring
●
Are drawers kept open when not in use
●
Are items stored above shoulder level and
unsecured
Ergonomics
●
Are workstations configured to prevent
employee discomfort and injury
●
Are employees aware of ergonomic risk
factors
●
Have employees received ergonomic
training
Emergency Information
●
Are emergency phone numbers posted
where they can be readily found
●
Are employees trained in emergency
procedures
●
Are evacuation procedures and diagrams
posted
Fire Prevention
●
Are portable fire extinguishers readily
available and unobstructed
●
Are fire pull stations clearly marked and
unobstructed
●
Are all fire sprinkler heads kept clear and
unobstructed (at least 18 inches)
●
Are space heaters used and authorized
Electrical Systems
●
Are extension cords/power strips kept
uncoupled (piggy-backed)
●
Are all extension cords/power strips
provided by the agency
●
Are electrical outlets clear of combustible
materials
●
Do electrical cords create trip hazards
●
Are extension cords used as permanent
wiring
Employee Behavior
●
Are employees observing established
safety rules
●
Do employees minimize hazards by
applying Operational Risk Management
principles
●
Are employee allowed to report unsafe
conditions or acts without restraint
Operational Risk Management
Identify
Assess
Decide
Control
Supervise
ORM
How to assess safety
SUMMARY
●
Promoting Safety
●
Monthly Assessment Program
●
Positive Findings (above & beyond
minimum requirements)
●
Assessments – emergency info, egress,
environment, ergonomics, fire prevention,
electrical, unsafe behavior
Risk Assessment and Management
Getting the Measure of Risk
●
Having understood the potential accident
sequences associated with a hazard (e.g.
using ETA) …
●
Next step is to determine the severity of the
credible accidents identified
●
Remember risk is the product of severity and
probability of an accident
●
Two different approaches:
– Estimate probability of accident, and hence get a
measure of accident risk… then decide whether
estimated risk is acceptable
●
Used in many domains, including rail, military
aerospace
●
Will discuss this approach first, using rail standards as
EN 50126
Accident Severity
●
Accident Severity Categories are qualitative
descriptions of consequences of failure
conditions (hazards)
– considering likely impact
Severity
Level
Consequence to Persons or
Environment
Consequence to
Service
Catastrophic Fatalities and/or multiple severe
injuries and/or major damage to the
environment
Critical Single fatality and/or severe injury
and/or significant damage to the
environment
Loss of a major system
Marginal Minor injury and/or significant threat
to the environment
Severe system(s)
damage
Insignificant Possible minor injury Minor system damage
Accident Probability
Next, estimate (predict) accident probability
●
Use historical results, analysis, and engineering judgment to
determine appropriate qualitative probability category
●
Note we may have to consider both
– how likely hazard is to arise
– how likely hazard is to develop into accident
EN 50126
Category Description
Frequent Likely to occur frequently. The hazard will be continually experienced.
Probable Will occur several times. The hazard can be expected to occur often.
Occasional Likely to occur several times. The hazard can be expected to occur several
times
Remote Likely to occur sometime in the system lifecycle. The hazard can
reasonably be expected to occur
Improbable Unlikely to occur, but possible. It can be assumed that the hazard will
exceptionally occur.
Incredible Extremely unlikely to occur. It can be assumed that the hazard may not
occur.
Classifying Risk
●
Having assigned severity and probability
associated with hazard consequences …
●
Next step is to use a Hazard Risk Matrix to
classify the the risk
EN 50126
Frequency of
occurrence of a
hazardous event
Risk Levels
Frequent Undesirable Intolerable Intolerable Intolerable
Probable Tolerable Undesirable Intolerable Intolerable
Occasional Negligible Undesirable Undesirable Intolerable
Remote Negligible Tolerable Undesirable Undesirable
Improbable Negligible Negligible Tolerable Tolerable
Incredible Negligible Negligible Negligible Negligible
Insignificant Marginal Critical Catastrophic
Severity Level of Hazard Consequence
Accepting Risk
Reasoning about risk
●
Using HRI now possible to say, e.g.
Risk(Hazard H1) > Risk(Hazard H2)
●
In order to say what is acceptable /
unacceptable, must provide an interpretation,
e.g.
EN 50126
Risk Category Actions to be applied against each category
Intolerable Shall be eliminated
Undesirable Shall only be accepted when risk reduction is impracticable and with
the agreement of the Railway Authority or the Safety Regulatory
Authority, as appropriate
Tolerable Acceptable with adequate control and with the agreement of the
Railway Authority
Negligible Acceptable with the agreement of the Railway Authority
Managing Risk
Risk Resolution
●
Can associate objectives or actions with risk
class, e.g.
– technologies used
– development processes
– assessment criteria
●
Example, for “undesirable” risk, might decide
– no single point of failure shall lead to system
accident
– probability of fatality must be < 1x10-8
per hour
– failure behaviour over time (lifetime of system)
ARP 4761
Determining Risk - Civil Aerospace Style 1
Start with determination of severity
●
very similar to rail categories
Determining Risk - Civil Aerospace Style 2
●
When severity has been determined, can set
objectives (requirements) for risk control
– primarily boundaries on acceptable probability of
failure condition (hazard)
P r o b a b ilit y O b je c t iv e
D e s c r ip t iv e Q u a n t it a t i v e
( p e r f l ig h t h o u r )
C a t a s t r o p h ic
S e v e r it y
C la s s if i c a t io n
H a z a r d o u s
M a jo r
M in o r
E x t r e m e ly Im p r o b a b le < 1 0 - 9
E x t r e m e ly R e m o t e 1 0 - 7
t o 1 0 - 9
1 0 - 5 t o 1 0 - 7
R e m o t e
1 0 - 3
t o 1 0 - 5
R e a s o n a b ly P r o b a b le
> 1 0 - 3
F r e q u e n t
Adapted from
ARP 4761
Determining Risk - Civil Aerospace Style 3
For civil aerospace, severity-related objectives are
set in
standards
●
easy to work with
●
unambiguous
– provided you can agree on standardised and
objective measures of severity!
BUT
●
Need to understand that direct mapping from
severity to probability objectives is based on
important assumption:
Determining Risk - Civil Aerospace Style 4
Where does acceptable risk come from?
●
in principle, requirements reflect “what risk the
public is willing to accept”
– risk (A) = probability (A) * severity (A)
– level of acceptable risk hard to determine, and
subjective
●
in practice, certification bodies (airworthiness
authorities) act as surrogates for the public
– “bottom line” is hull loss rate
– civil aviation hull loss rate target is currently 10-7
per flying hour
●
for comparison, military aviation (UK) hull loss rate
Determining Risk - Civil Aerospace Style 5
●
Has further implications:
– implicit assumption about number of catastrophic
failure conditions on an aircraft
– also implicit assumption about how probable
failure condition is to actually develop into an
accident
●
Example:
– probability objective (target) for catastrophic failure
condition is < 10-9
per flight hour
– target hull loss rate is < 10-7
per flight hour
– implies either a maximum of 100 catastrophic
failure conditions on an aircraft, assuming all
occurrences of catastrophic failure conditions will
Determining Risk - Civil Aerospace Style 6
●
Note that objective of probability per flying hour has its problems…
●
Consider:
– histogram shows accidents / time
– 1.8% of accidents occur in load / taxi / unload
The ALARP Principle 1
ALARP = As Low As Reasonably Practicable
T H E A L A R P
( A s L o w A s R e a s o n a b ly
P r a c t ic a b le )
R E G I O N
R is k is u n d e r t a k e n o n ly if
b e n e f it is d e s ir e d
T O L E R A B L E
o n ly if r is k r e d u c t io n s a r e
im p r a c t ic a b le o r c o s t
g r o s s ly d is p r o p o r t io n a t e t o
t h e im p r o v e m e n t g a in e d
B R O A D L Y A C C E P T A B L E
R E G I O N
T O L E R A B L E
if c o s t o f r e d u c t io n w o u ld
e x c e e d im p r o v e m e n t
g a in e d
R is k c a n n o t b e
ju s t if ie d o n a n y
g r o u n d s
I N T O L E R A B L E
N E G L I G I B L E R I S K
The ALARP Principle 2
●
Provides an interpretation of identified risks
●
Pragmatic – although you can always spend
more money to improve safety, it is not always
cost-effective
●
However, “cost-effectiveness” introduces
ambiguity
●
Regions of tolerability defined by regulatory
domain and customer
●
Approach is often implicit in the management
of safety-critical projects anyway
Risk Reduction Flowchart 1
Identify and determine risk associated with
identified hazards
S y s t e m
D e s ig n
H a z a r d
Id e n t if ic a t io n
H a z a r d R is k
( S e v e r it y /P r o b a b ilit y )
E s t a b lis h e d
ID E N T IF Y H A Z A R D a n d R IS K
Risk Reduction Flowchart 2
S y s t e m
D e s ig n
H a z a r d
Id e n t if ic a t io n
H a z a r d R is k
( S e v e r it y /P r o b a b ilit y )
E s t a b lis h e d
R is k M e a s u r e d
A g a in s t H R I
M a t r ix C r it e r ia
R is k
A c c e p t a b le ?
Id e n t if y H a z a r d a n d R is k A S S E S S R IS K
N o Y e s
Risk Reduction Flowchart 3
S y s t e m
D e s ig n
H a z a r d
Id e n t if ic a t io n
H a z a r d R is k
( S e v e r it y /P r o b a b ilit y )
E s t a b lis h e d
R is k M e a s u r e d
A g a in s t H R I
M a t r ix C r it e r ia
A p p ly R e - d e s ig n
P r e c e d e n c e
C r it e r ia
R is k
A c c e p t a b le ?
1 . R e d e s ig n t o e lim in a t e h a z a r d , o r r e d u c e lik e lih o o d
2 . In c o r p o r a t e m it ig a t io n , e .g . s a f e t y d e v ic e s
3 . P r o v id e w a r n in g s
4 . D e v e lo p p r o c e d u r e s a n d t r a in in g
Id e n t if y H a z a r d a n d R is k A s s e s s R is k
T A K E A C T IO N
O p e r a t o r / C r e w
T r a in in g R e q u ir e d
C o n t in u e d e s ig n .
D o c u m e n t a n a ly s is
a n d ju s t if ic a t io n
N o Y e s
Precedence in Risk Reduction 1
●
Redesign to eliminate risk
– Best where practical
●
Change in operational role, or removal of hazardous
material
●
Redesign to reduce hazard likelihood
– Select architecture or components
●
Duplex or triplex or …
●
Higher integrity components, with lower failure rates
●
Incorporate mitigation to reduce impact of
failures
– Automated protection, e.g. pressure relief valves
– Where incorporated, need to check periodically
Precedence in Risk Reduction 2
●
Provide warning devices
– Detect the hazardous condition and warn
operators
●
e.g. indicate that landing gear has not fully deployed
●
e.g. to evacuate building due to fire or fumes
●
Provide procedures and training
– Reduce likelihood of hazard, or mitigate
●
may involve use of personal protective equipment
– Do not assume procedures are enough by
themselves
●
consider evolution of power guillotine regulations
●
Precedence order
Hazard Severity Categories
Frequency of
Occurrence
I
CATASTROPHIC
II
CRITICAL
III
MARGINAL
IV
NEGLIGIBLE
A FREQUENT HIGH HIGH HIGH MEDIUM
B PROBABLE HIGH HIGH MEDIUM LOW
C OCCASIONAL HIGH HIGH MEDIUM LOW
D REMOTE HIGH MEDIUM LOW LOW
E IMPROBABLE MEDIUM LOW LOW LOW
(MIL-STD-882C)
Residual Risk - 1
●
Residual Risks are those that cannot be
‘designed out’
– risks inherent to design, where benefit is desirable
●
Significant residual risks must be formally
accepted by the appropriate authority (typically
customer / operator)
●
Can use Decision Authority Matrix, e.g.
Residual Risk 2
Appropriate Decision Authority (From MIL-STD-
882C)
HIGH – Service Acquisition Executive
– e.g. no ground collision avoidance on F22 –
signed off by
4-star Air Force General
MEDIUM – Program Executive Officer
LOW – Program Manager
●
Usually a requirement to document all actions
taken to resolve risk within terms of contract
●
Customer authority can then decide whether
Risk Management Summary
●
Risk Assessment is the process of identifying
the risk associated with system hazards
●
Approach in many sectors (military, rail…) is to
use Hazard Risk Matrix to determine the risk
associated with a hazard from severity and
probability estimates
– then decide on acceptability of risk
●
Alternative approach (Civil Aerospace) is
based around severity
– assumption of fixed level of acceptable risk...
– … so can derive objectives, including probability,
from severity

More Related Content

What's hot

Hazard identification and risk assessment(HIRA) &Safe Work method Statement.
Hazard identification and risk assessment(HIRA) &Safe Work method Statement.Hazard identification and risk assessment(HIRA) &Safe Work method Statement.
Hazard identification and risk assessment(HIRA) &Safe Work method Statement.
Yuvraj Shrivastava
 
HIRA TRAINING PPT.pptx
HIRA  TRAINING PPT.pptxHIRA  TRAINING PPT.pptx
HIRA TRAINING PPT.pptx
MoolRaj3
 
Fire safety training presentation.ppt
Fire safety training presentation.pptFire safety training presentation.ppt
Fire safety training presentation.ppt
Zhacky Zacarias
 

What's hot (20)

Staff safety hand book
Staff   safety hand bookStaff   safety hand book
Staff safety hand book
 
Hazard identification and risk assessment(HIRA) &Safe Work method Statement.
Hazard identification and risk assessment(HIRA) &Safe Work method Statement.Hazard identification and risk assessment(HIRA) &Safe Work method Statement.
Hazard identification and risk assessment(HIRA) &Safe Work method Statement.
 
Emergency Response Training
Emergency Response TrainingEmergency Response Training
Emergency Response Training
 
Health and Safety Induction Training
Health and Safety Induction TrainingHealth and Safety Induction Training
Health and Safety Induction Training
 
Behavior based safety
Behavior based safetyBehavior based safety
Behavior based safety
 
Reporting & Recording Investigations
Reporting & Recording Investigations Reporting & Recording Investigations
Reporting & Recording Investigations
 
Permit to Work Training
Permit to Work TrainingPermit to Work Training
Permit to Work Training
 
Near Miss.pptx
Near Miss.pptxNear Miss.pptx
Near Miss.pptx
 
Emergency preparedness training.pptx
Emergency preparedness training.pptxEmergency preparedness training.pptx
Emergency preparedness training.pptx
 
Emergency response plan
Emergency response planEmergency response plan
Emergency response plan
 
Toolbox Talk-Working at height
Toolbox Talk-Working at height Toolbox Talk-Working at height
Toolbox Talk-Working at height
 
How to Develop a Positive Safety Culture
How to Develop a Positive Safety CultureHow to Develop a Positive Safety Culture
How to Develop a Positive Safety Culture
 
BBS Training.pptx
BBS Training.pptxBBS Training.pptx
BBS Training.pptx
 
HIRA TRAINING PPT.pptx
HIRA  TRAINING PPT.pptxHIRA  TRAINING PPT.pptx
HIRA TRAINING PPT.pptx
 
Health and safety induction
Health and safety inductionHealth and safety induction
Health and safety induction
 
LOTO Training.ppt
LOTO Training.pptLOTO Training.ppt
LOTO Training.ppt
 
HAZOP.PPT
HAZOP.PPTHAZOP.PPT
HAZOP.PPT
 
Behavior-Based Safety PPT Slide Template
Behavior-Based Safety PPT Slide TemplateBehavior-Based Safety PPT Slide Template
Behavior-Based Safety PPT Slide Template
 
Accident Investigation Training- Safety First!
Accident Investigation Training- Safety First!Accident Investigation Training- Safety First!
Accident Investigation Training- Safety First!
 
Fire safety training presentation.ppt
Fire safety training presentation.pptFire safety training presentation.ppt
Fire safety training presentation.ppt
 

Similar to Accident & Incident Investigation | Gaurav Singh Rajput

3.-unit-1-hazard-analysis-correction.pptx
3.-unit-1-hazard-analysis-correction.pptx3.-unit-1-hazard-analysis-correction.pptx
3.-unit-1-hazard-analysis-correction.pptx
ssuser1ecccc
 
mod 4.pdf ppt about the safety at industries
mod 4.pdf ppt about the safety at industriesmod 4.pdf ppt about the safety at industries
mod 4.pdf ppt about the safety at industries
Midhundas31
 
SAFETY-AND-HEALTH-INSPECTION.pptx
SAFETY-AND-HEALTH-INSPECTION.pptxSAFETY-AND-HEALTH-INSPECTION.pptx
SAFETY-AND-HEALTH-INSPECTION.pptx
OhmarGalliguez3
 
Hazop (incident investigation & root cause analysis )
Hazop  (incident investigation & root cause analysis )Hazop  (incident investigation & root cause analysis )
Hazop (incident investigation & root cause analysis )
umar farooq
 
This expression is a blending of what’s wished for with what’s realistic. It ...
This expression is a blending of what’s wished for with what’s realistic.	It ...This expression is a blending of what’s wished for with what’s realistic.	It ...
This expression is a blending of what’s wished for with what’s realistic. It ...
NoeAranel
 

Similar to Accident & Incident Investigation | Gaurav Singh Rajput (20)

Accident investigation course
Accident investigation courseAccident investigation course
Accident investigation course
 
Accident Investigation and Analysis
Accident Investigation and AnalysisAccident Investigation and Analysis
Accident Investigation and Analysis
 
topic7.ppt
topic7.ppttopic7.ppt
topic7.ppt
 
Safety and Health Inspection
Safety and Health InspectionSafety and Health Inspection
Safety and Health Inspection
 
2 tools to identify and control patient safety risks
2 tools to identify and control patient safety risks2 tools to identify and control patient safety risks
2 tools to identify and control patient safety risks
 
3.-unit-1-hazard-analysis-correction.pptx
3.-unit-1-hazard-analysis-correction.pptx3.-unit-1-hazard-analysis-correction.pptx
3.-unit-1-hazard-analysis-correction.pptx
 
Hazard analysis
Hazard analysisHazard analysis
Hazard analysis
 
Incident Investigation and Analysis by HF&C
Incident Investigation and Analysis by HF&CIncident Investigation and Analysis by HF&C
Incident Investigation and Analysis by HF&C
 
mod 4.pdf ppt about the safety at industries
mod 4.pdf ppt about the safety at industriesmod 4.pdf ppt about the safety at industries
mod 4.pdf ppt about the safety at industries
 
SAFETY-AND-HEALTH-INSPECTION.pptx
SAFETY-AND-HEALTH-INSPECTION.pptxSAFETY-AND-HEALTH-INSPECTION.pptx
SAFETY-AND-HEALTH-INSPECTION.pptx
 
Hazop (incident investigation & root cause analysis )
Hazop  (incident investigation & root cause analysis )Hazop  (incident investigation & root cause analysis )
Hazop (incident investigation & root cause analysis )
 
Various steps of risk assessment. md. anwar ibrahim miraz
Various steps of risk assessment. md. anwar ibrahim mirazVarious steps of risk assessment. md. anwar ibrahim miraz
Various steps of risk assessment. md. anwar ibrahim miraz
 
The secret to professional investigations
The secret to professional investigationsThe secret to professional investigations
The secret to professional investigations
 
Risk Assessment and Management.ppt
Risk Assessment and Management.pptRisk Assessment and Management.ppt
Risk Assessment and Management.ppt
 
Risk assessment and management
Risk assessment and managementRisk assessment and management
Risk assessment and management
 
AEE Investigations 2009.ppt
AEE Investigations 2009.pptAEE Investigations 2009.ppt
AEE Investigations 2009.ppt
 
This expression is a blending of what’s wished for with what’s realistic. It ...
This expression is a blending of what’s wished for with what’s realistic.	It ...This expression is a blending of what’s wished for with what’s realistic.	It ...
This expression is a blending of what’s wished for with what’s realistic. It ...
 
Safe Operating Procudures
Safe Operating ProcuduresSafe Operating Procudures
Safe Operating Procudures
 
6271550.ppt
6271550.ppt6271550.ppt
6271550.ppt
 
002.safetymanagement v3
002.safetymanagement v3002.safetymanagement v3
002.safetymanagement v3
 

More from Gaurav Singh Rajput

More from Gaurav Singh Rajput (20)

Site Emergency Response | Gaurav Singh Rajput
Site Emergency Response | Gaurav Singh RajputSite Emergency Response | Gaurav Singh Rajput
Site Emergency Response | Gaurav Singh Rajput
 
Quantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh Rajput
Quantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh RajputQuantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh Rajput
Quantitative Risk Assessment | QRA | Risk Assessment | Gaurav Singh Rajput
 
Basic of Instrumentation and Control | Gaurav Singh Rajput
Basic of Instrumentation and Control | Gaurav Singh RajputBasic of Instrumentation and Control | Gaurav Singh Rajput
Basic of Instrumentation and Control | Gaurav Singh Rajput
 
Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...
Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...
Lifting Plan | Lifting Plan for Different Process Equipment | Gaurav Singh Ra...
 
Fire Protection System | Gaurav Singh Rajput
Fire Protection System | Gaurav Singh RajputFire Protection System | Gaurav Singh Rajput
Fire Protection System | Gaurav Singh Rajput
 
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh RajputBasic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
Basic Instrumentation Symbols | P&ID | PFD | Gaurav Singh Rajput
 
Hzard Communication | Gaurav Singh Rajput
Hzard Communication | Gaurav Singh RajputHzard Communication | Gaurav Singh Rajput
Hzard Communication | Gaurav Singh Rajput
 
Risk Analysis & Management | Gaurav Singh Rajput
Risk Analysis & Management | Gaurav Singh Rajput Risk Analysis & Management | Gaurav Singh Rajput
Risk Analysis & Management | Gaurav Singh Rajput
 
Inspection of Fire Fighting Equipments | NFPA Regulations | Gaurav Singh Rajput
Inspection of Fire Fighting Equipments | NFPA Regulations | Gaurav Singh RajputInspection of Fire Fighting Equipments | NFPA Regulations | Gaurav Singh Rajput
Inspection of Fire Fighting Equipments | NFPA Regulations | Gaurav Singh Rajput
 
Basic LNG Training | Liquefied Natural Gas | Gaurav Singh Rajput
Basic LNG Training | Liquefied Natural Gas | Gaurav Singh RajputBasic LNG Training | Liquefied Natural Gas | Gaurav Singh Rajput
Basic LNG Training | Liquefied Natural Gas | Gaurav Singh Rajput
 
Noise Analysis | Gaurav Singh Rajput
Noise Analysis | Gaurav Singh Rajput Noise Analysis | Gaurav Singh Rajput
Noise Analysis | Gaurav Singh Rajput
 
BBS - Behaviour Based Safety | Gaurav Singh Rajput
BBS - Behaviour Based Safety | Gaurav Singh RajputBBS - Behaviour Based Safety | Gaurav Singh Rajput
BBS - Behaviour Based Safety | Gaurav Singh Rajput
 
HAZID | Hazard Indentification | Gaurav Singh Rajput
HAZID | Hazard Indentification | Gaurav Singh RajputHAZID | Hazard Indentification | Gaurav Singh Rajput
HAZID | Hazard Indentification | Gaurav Singh Rajput
 
SIL Awareness | Introduction to Safety Life-Cycle | IEC - 61508 & IEC- 61511 ...
SIL Awareness | Introduction to Safety Life-Cycle | IEC - 61508 & IEC- 61511 ...SIL Awareness | Introduction to Safety Life-Cycle | IEC - 61508 & IEC- 61511 ...
SIL Awareness | Introduction to Safety Life-Cycle | IEC - 61508 & IEC- 61511 ...
 
Calculating Scaffold Loads | Gaurav Singh Rajput
Calculating Scaffold Loads | Gaurav Singh RajputCalculating Scaffold Loads | Gaurav Singh Rajput
Calculating Scaffold Loads | Gaurav Singh Rajput
 
Inspection of Pressure Vessels as per ASME Sec VIII Division -1
Inspection of Pressure Vessels as per ASME Sec VIII Division -1Inspection of Pressure Vessels as per ASME Sec VIII Division -1
Inspection of Pressure Vessels as per ASME Sec VIII Division -1
 
Process Safety | Process Safety Management | PSM | Gaurav Singh Rajput
Process Safety | Process Safety Management | PSM | Gaurav Singh RajputProcess Safety | Process Safety Management | PSM | Gaurav Singh Rajput
Process Safety | Process Safety Management | PSM | Gaurav Singh Rajput
 
Pigging | Pipeline Cleaning | Pigging Safety | Gaurav Singh Rajput
Pigging | Pipeline Cleaning | Pigging Safety | Gaurav Singh RajputPigging | Pipeline Cleaning | Pigging Safety | Gaurav Singh Rajput
Pigging | Pipeline Cleaning | Pigging Safety | Gaurav Singh Rajput
 
Process Safety Awareness | PSM | Gaurav Singh Rajput
Process Safety Awareness | PSM | Gaurav Singh RajputProcess Safety Awareness | PSM | Gaurav Singh Rajput
Process Safety Awareness | PSM | Gaurav Singh Rajput
 
Forklift Safety | Gaurav Singh Rajput
Forklift Safety | Gaurav Singh RajputForklift Safety | Gaurav Singh Rajput
Forklift Safety | Gaurav Singh Rajput
 

Recently uploaded

Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
pritamlangde
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Recently uploaded (20)

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 

Accident & Incident Investigation | Gaurav Singh Rajput

  • 2. 2/210 Objectives of this Section ● To define the reasons for investigating accident and incidents. ● To outline the process for effectively investigating accidents and incidents. ● To facilitate an effective investigation.
  • 3. 3/210 Accident Investigation ● Important part of any safety management system. Highlights the reasons why accidents occur and how to prevent them. ● The primary purpose of accident investigations is to improve health and safety performance by:  Exploring the reasons for the event and identifying both the immediate and underlying causes;  Identifying remedies to improve the health and safety management system by improving risk control, preventing a recurrence and reducing financial losses.
  • 4. 4/210 What to Investigate? ● All accidents whether major or minor are caused. ● Serious accidents have the same root causes as minor accidents as do incidents with a potential for serious loss. It is these root causes that bring about the accident, the severity is often a matter of chance. ● Accident studies have shown that there is a consistently greater number of less serious accidents than serious accidents and in the same way a greater number of incidents then accidents.
  • 5. 5/210 Many accident ratio studies have been undertaken and the one shown below is based on studies carried out by the Health & Safety Executive. 189 Non Injury Accidents/Illnesses 7 Minor injuries or illnesses 1 Major injury Or illness
  • 6. 6/210 Accident Studies ● In all cases the ‘non injury’ incidents had the potential to become events with more serious consequences. ● Such ratios clearly demonstrate that safety effort should be aimed at all accidents including unsafe practices at the bottom of the pyramid, with a resulting improvement in upper tiers. ● Peterson (1978) in defining the principles of safety management says that “an unsafe act, an unsafe condition, an accident are symptoms of something wrong within the management’s system.”
  • 7. 7/210 Accident Studies ● All events represent a degree of failure in control and are potential learning experiences. It therefore follows that all accidents should be investigated to some extent. ● This extent should be determined by the loss potential, rather then just the immediate effect.
  • 8. 8/210 Stages in an Accident/Incident Investigation The stages in an accident/incident investigation are shown in the following diagram. Deal with immediate risks. Select the level of investigation. Investigate the event. Record and analyse the results. Review the process.
  • 9. 9/210 Dealing with Immediate Risks ● When accidents and incidents occur immediate action may be necessary to: Make the situation safe and prevent further injury. Help, treat and if necessary rescue injured persons. ● An effective response can only be made if it has been planned for in advance. Deal with immediate risks. Select the level of investigation. Investigate the event. Record and analyse the results. Review the process.
  • 11. 11/210 Selecting the level of investigation The greatest effort should be put into: Those involving severe injuries, ill- health or loss. Those which could have caused much greater harm or damage. These types of accidents and incidents demand more careful investigation and management time. This can usually be achieved by: Looking more closely at the underlying causes of significant events. Assigning the responsibility for the investigation of more significant events to more senior managers. Deal with immediate risks. Select the level of investigation. Investigate the event. Record and analyse the results. Review the process.
  • 12. 12/210 Investigating the Event The purpose of investigations is to establish: ● The way things were and how they came to be. ● What happened – the sequence of events that led to the outcome. ● Why things happened as they did analysing both the immediate and underlying causes. ● What needs to be done to avoid a repetition and how this can be achieved. Deal with immediate risks. Select the level of investigation. Investigate the event. Record and analyse the results. Review the process.
  • 13. 13/210 A few sources should give the investigator all that is needed to know. Observation Information from physical sources including: • Premises and place of work • Access & egress • Plant & substances in use • Location & relationship of physical particles • Any post event checks, sampling or reconstruction Documents Information from: • Written instructions; Procedures, risk assessments, policies • Records of earlier inspections, tests, examinations and surveys. Interviews Information from: • Those involved and their line management; • Witnesses; • Those observed or involved prior to the event e.g. inspection & maintenance staff. • Checking reliability, accuracy • Identifying conflicts and resolving differences • Identifying gaps in evidence
  • 14. 14/210 ● Interviewing the person(s) involved and witnesses to the accident is of prime importance, ideally in familiar surroundings so as not to make the person uncomfortable. ● The interview style is important with emphasis on prevention rather than blame. ● The person(s) should give an account of what happened in their terms rather than the investigators. Interviews
  • 15. 15/210 Interviews ● Interviews should be separate to stop people from influencing each other. ● Questions when asked should not be intimidating as the investigator will be seen as aggressive and reflecting a blame culture.
  • 16. 16/210 Observation The accident site should be inspected as soon as possible after the accident. Particular attention should/must be given to: • Positions of people. • Personnel protective equipment (PPE). • Tools and equipment, plant or substances in use. • Orderliness/Tidiness.
  • 17. 17/210 Documents Documentation to be looked at includes: ● Written instructions, procedures and risk assessments which should have been in operation and followed. The validity of these documents may need to be checked by interview. The main points to look for are:  Are they adequate/satisfactory?  Were they followed on this occasion?  Were people trained/competent to follow it? ● Records of inspections, tests, examination and surveys undertaken before the event. These provide information on how and why the circumstances leading to the event arose.
  • 18. 18/210 Determining Causes ● Collect all information and facts which surround the accident. ● Immediate causes are obvious and easy to find. They are brought about by unsafe acts and conditions and are the ACTIVE FAILURES. Unsafe acts show poor safety attitudes and indicate a lack of proper training. ● These unsafe acts and conditions are brought about by the so called ‘root causes’. These are the LATENT FAILURES and are brought about by failures in organisation and the management’s safety system.
  • 19. 19/210 Determine what changes are needed The investigation should determine what control measures were absent, inadequate or not implemented and so generate remedial action for implementation to correct this.
  • 20. 20/210 Generally, remedial actions should follow the hierarchy of risk control: ● Eliminate Risks by substituting the dangerous by the inherently less dangerous. ● Combat risks at source by engineering controls and giving collective protective measures priority. ● Minimise risk by designing suitable systems of working. ● Use PPE as a last resort.
  • 22. 22/210 Recording & Analysing the Results ● Recorded in a similar and systematic manner. ● Provides a historical record of the accident. ● Analysis of the causes and recommended preventative protective measures should be listed. ● Completed as soon after the accident as possible. ● Information on the accident and remedial actions should be passed to all supervisors. ● Appropriate preventative measures may also have to be implemented by such supervisors. ● Investigation reports and accident statistics should be analysed from time to time to identify common causes, features and trends not be apparent from looking at events in isolation. Deal with immediate risks. Select the level of investigation. Investigate the event. Record and analyse the results. Review the process.
  • 23. 23/210 Reviewing the Process Reviewing the accident/incident investigation process should consider: – The results of investigations and analysis. – The operation of the investigation system (in terms of quality and effectiveness). Line managers should follow through and action the findings of investigations and analysis. Follow up systems should be established where necessary to keep progress under control. Deal with immediate risks. Select the level of investigation. Investigate the event. Record and analyse the results. Review the process.
  • 24. 24/210 The investigation system should be examined from time to time to check that it consistently delivers information in accordance with the stated objectives and standards. This usually requires: ● Checking samples of investigation forms to verify the standard of investigation and the judgements made about causation and prioritisation of remedial actions. ● Checking the numbers of incidents, near misses, injury and ill-health events; ● Checking that all events are being reported.
  • 25. 25/210 What is your definition of an “Accident”?
  • 26. 26/210 What is an Accident - an unplanned event an unplanned event - an unplanned incident involving - an unplanned incident involving injury or fatality injury or fatality - a series of events culminating in - a series of events culminating in an unplanned and unforeseen an unplanned and unforeseen event event
  • 27. 27/210 How do Accidents occur? - Accidents (with or without injuries) occur when a series of unrelated events coincide at a certain time and space. -This can be from a few events to a series of a dozen or more (Because the coincidence of the series of events is a matter of luck, actual accidents only happen infrequently)
  • 28. 28/210 Unsafe Acts - An unsafe act occurs in approx 85%- 95% of all analyzed accidents with injuries - An unsafe act is usually the last of a series of events before the accident occurs (it could occur at any step of the event) - By stopping or eliminating the unsafe act, we can stop the accident from occurring
  • 29. 29/210 What is an Accident Investigation? ● A systematic approach to the identification of causal factors and implementation of corrective actions without placing blame on or finding personal fault. The information collected during an investigation is essential to determine trends and taking appropriate steps to prevent future accidents.
  • 30. 30/210 Which Accidents should be Recorded or Reported? ALL accidents (including illnesses) shall be recorded and reported through the established procedures and guidance
  • 31. 31/210 Why Investigate Accidents? ● Determine the cause ● Develop and implement corrective actions ● Document the events ● Meet legal requirements Primary Focus: PREVENT REOCCURENCE!!! PREVENT REOCCURENCE!!! PREVENT REOCCURENCE!!!
  • 32. 32/210 Accident vs. Near-Miss Accident : Any undesired, unplanned event arising out of a given work-related task which results in physical injury/ illness or damage to property. Near-Miss : Events which did not result in injury/illness or damage but had the potential to do so.
  • 33. 33/210 Accident Ratio Study 30 1 10 600 6000 Serious or Disabling Minor Injuries Property Damage Accidents with no visible injury or damage Unsafe Acts or Conditions
  • 34. 34/210 Accident Causes ● Unsafe Act - an act by the injured person or another person (or both) which caused the accident, and/or ● Unsafe Condition - some environmental or hazardous situation which caused the accident independent of the employee
  • 35. 35/210 Accident Causation Model  Results of the accident - physical harm - property damage  Incident Occurrence - contact with - type  Immediate causes - practices - conditions  Basic causes - personal factors - job factors - supervisory performance - management policy and decisions
  • 36. 36/210 Results of the Accident ● Physical Harm - catastrophic (multiple deaths) - single death - disabling - serious - minor ● Property Damage - catastrophic - major - serious - minor
  • 37. 37/210 Incident Occurrence ● Type - struck by - struck against - slip, trip - fell from - caught on - fell on same level - caught in - overexertion ● Contact with - electricity - equipment - noise - vibration - hazmat - heat/cold - radiation - animals/insects
  • 38. 38/210 Immediate Causes ● Practices - operating without authority - use equipment improperly - not using PPE when required - correct lifting procedures not established - drinking or drug use - horseplay - equipment not properly secured
  • 39. 39/210 Immediate Causes (cont’d) ● Conditions - ineffective guards - unserviceable tools and equipment - inadequate warning systems - bad housekeeping practices - poor work space illumination - unhealthy work environment
  • 40. 40/210 Basic Causes ● Personal Factors - lack of knowledge or skill - improper motivation - physical or mental condition - literacy or ability ● Job Factors - Physical environment - sub-standard equipment - abnormal usage - wear and tear - inadequate standards - design and maintenance
  • 41. 41/210 Basic Causes (cont’d) ● Supervisory Performance - inadequate instructions - failure of SOPs - rules not enforced - hazards not corrected - devices not provided ● Management Policy and Decisions - set measurable standards - measure work in progress - evaluate work vs. standards - correct performance No animals were hurt as a result of this accident
  • 42. 42/210 Severity of Incident ● Major - Employee fatality, - Hospitalization of 3 or more employees, - Permanent employee disability, - Five or more lost workdays, - Conditions that could pose an imminent and threat of serious injury/illness to other employees - Property losses in excess of $1 Million ● Minor - All other (less serious) incidents and unsafe conditions reported by employees
  • 43. 43/210 Who Investigates? ● Major Accidents - NOAA “GO TEAM” Investigation Team - LO Representative - Other agencies such as NTSB, USCG, OSHA ● Minor Accidents - First-Line Supervisor - Site Director or Manager - Site Safety Representative - NOAA SECO (if needed)
  • 44. 44/210 Investigator’s Qualifications ● Technical knowledge ● Objectivity ● Analytical approach ● Familiarity with the job, process or operation ● Tact in communicating ● Intellectual honesty ● Inquisitiveness and curiosity
  • 45. 45/210 When to Investigate? ● Immediately after incident  Witness memories fade  Equipment and clues are moved ● Finish investigation quickly
  • 46. 46/210 What to Investigate? ● All accidents and near-misses - Conduct investigation upon first notification - Keeping the scene in-tact and recording witnesses statements early is key to a successful investigation
  • 47. 47/210 Accident Investigation Kit May Include: ● Digital Camera ● Report forms, clipboard, pens ● Barricade tape ● Flashlight ● Tape measure ● Tape recorder ● Personal Protective Equipment (as appropriate)
  • 48. 48/210 The Accident Occurs ● Employee or co-worker immediately reports the accident to a supervisor ● Supervisor secures/assesses the scene to prevent additional injuries to other employees, before assisting the injured employee ● Supervisor treats the injury or seeks medical treatment for the injured ● The accident scene is left intact ● Site safety rep is contacted to assist the supervisor in the investigation of the accident.
  • 49. 49/210 Beginning the Investigation ● Gather investigation members and kit ● Report to the scene ● Look at the big picture ● Record initial observations ● Take pictures
  • 50. 50/210 What’s Involved? ● Who was injured? ● Medication, drugs, or alcohol? ● Was employee ill or fatigued? ● Environmental conditions?
  • 51. 51/210 Witnesses ● Who witnessed the accident? ● Was a supervisor or Team Lead nearby? ● Where were other employees? ● Why didn’t anyone witness the accident (working alone, remote areas)?
  • 52. 52/210 Interviewing Tips ● Discuss what happened leading up to and after the accident ● Encourage witnesses to describe the accident in their own words ● Don’t be defensive or judgmental ● Use open-ended questions ● Do not interrupt the witness
  • 53. 53/210 What was Involved? ● Machine, tool, or equipment ● Chemicals ● Environmental conditions ● Field season prep operations
  • 54. 54/210 ● Date and time? ● Normal shift or working hours? ● Employee coming off a vacation? Time of Accident
  • 55. 55/210 Accident Location ● Work area ● On, under, in, near ● Off-site address ● Doing normal job duties ● Performing non- routine or routine tasks (i.e., properly trained)
  • 56. 56/210 Employee’s Activity ● Motion conducted at time of accident ● Repetitive motion? ● Type of material being handled
  • 57. 57/210 Accident Narrative ● Describe the details so the reader can clearly picture the accident ● Specific body parts affected ● Specific motions of injured employee just before, during, and after accident
  • 58. 58/210 Causal Factors ● Try not to accept single cause theory ● Identify underlying causes (root) ● Primary cause ● Secondary causes  Contributing causes  Effects
  • 59. 59/210 Corrective Actions Taken ● Include immediate interim controls implemented at the time of accident ● Recommended corrective actions  Employee training  Preventive maintenance activities  Better operating procedures  Hazard recognition (ORM)  Management awareness of risks involved
  • 60. 60/210 Immediate Notification ● Supervisor shall complete the NOAA Web Based Accident/ Illness Report Form and submit within 24 hours of incident occurrence (8 hours for major incidents).
  • 61. 61/210 Accident Analysis Summary ● Investigate accident immediately ● Determine who was involved and who witnessed it ● Ascertain what items or equipment were involved ● Record detailed description ● Determine causal factors ● Implement corrective actions
  • 64. 64/210 1. What is an Accident Investigation? a. A systematic approach to the identification of causal factors and implementation of corrective actions. b. Finding personal fault and placing blame. c. The appropriate steps to prevent future actions. d. The essential step to determine trends and taking action against person or persons at fault.
  • 65. 65/210 2. Which Accidents should be Recorded or Reported? a. Only on the job accidents. b. ALL accidents (including illnesses) shall be recorded and reported. c. Only on the job accidents on illnesses that occur on the job and reported within 8 hours. d. All accidents shall be recorded and reported.
  • 66. 66/210 3. Why Investigate Accidents? a. To develop and implement corrective actions. b. To document the events. c. The Primary Focus is to PREVENT REOCCURENCE!!! d. To determine the cause.
  • 67. 67/210 4. Accident vs. Near-Miss? a. Any unplanned event arising out of work that resulted in injury vs. Any event which did not result in injury but had potential to do so. b. Any unsafe work habit vs. Any Hazardous working conditions. c. Any event which warns us of a problem vs. Any circumstances that result in injury or property damage.
  • 68. 68/210 5. Which of the following are the basic areas that are looked at in an Accident Investigation. a. Policies. b. Equipment. c. Training. d. All of the above.
  • 69. 69/210 Accident Investigation Accident analysis is carried out in order to determine the cause or causes of an accident or series of accidents so as to prevent further incidents of a similar kind. It is also known as accident investigation.
  • 70. 70/210 Accident Investigation It may be performed by a range of experts, including forensic scientists, forensic engineers or health and safety advisers. Accident investigators, particularly those in the aircraft industry, are colloquially known as "tin-kickers".
  • 71. 71/210 Sequence Accident analysis is performed in four steps: Fact gathering: After an accident happened a forensic process starts to gather all possibly relevant facts that may contribute to understanding the accident.
  • 72. 72/210 Sequence Fact Analysis: After the forensic process has been completed or at least delivered some results, the facts are put together to give a "big picture." The history of the accident is reconstructed and checked for consistency and plausibility.
  • 73. 73/210 Sequence Conclusion Drawing: If the accident history is sufficiently informative, conclusions can be drawn about causation and contributing factors.
  • 74. 74/210 Sequence Counter-measures: In some cases the development of counter- measures is desired or recommendations have to be issued to prevent further accidents of the same kind.
  • 75. 75/210 Methods There exist numerous forms of Accident Analysis methods. These can be divided into three categories:
  • 76. 76/210 Methods Causal Analysis Causal Analysis uses the principle of causality to determine the course of events. Though people casually speak of a "chain of events", results from Causal Analysis usually have the form of directed a-cyclic graphs-the nodes being events and the edges the cause- effect relations. Methods of Causal Analysis differ in their respective notion of causation.
  • 77. 77/210 Methods Expert Analysis Expert Analysis relies on the knowledge and experience of field experts. This form of analysis usually lacks a rigorous (formal/semiformal) methodological approach. This usually affects falsify-ability and objectivity of analyses. This is of importance when conclusions are heavily disputed among experts.
  • 78. 78/210 Methods Organizational Analysis Organizational Analysis relies on systemic theories of organization. Most theories imply that if a system's behaviour stayed within the bounds of the ideal organization then no accidents can occur.
  • 79. 79/210 Methods Organizational Analysis Organizational Analysis can be falsified and results from analyses can be checked for objectivity. Choosing an organizational theory for accident analysis comes from the assumption that the system to be analysed conforms to that theory.
  • 80. 80/210 Using Digital Photographs to Extract Evidence Once all available data has been collected by accident scene investigators and law enforcement officers, camera matching, photogrammetry or rectification can be used to determine the exact location of physical evidence shown in the accident scene photos.
  • 81. 81/210 Camera matching: Camera matching uses accident scene photos that show various points of evidence. The technique uses CAD software to create a 3-dimensional model of the accident site and roadway surface.
  • 82. 82/210 Camera matching: All survey data and photos are then imported into a three dimensional software package like 3D Studio Max. A virtual camera can be then be positioned relative to the 3D roadway surface. Physical evidence is then mapped from the photos onto the 3D roadway to create a three dimensional accident scene drawing.
  • 83. 83/210 Photogrammetry Photogrammetry is used to determine the three-dimensional geometry of an object on the accident scene from the original two dimensional photos.
  • 84. 84/210 Photogrammetry The photographs can be used to extract evidence that may be lost after the accident is cleared. Photographs from several viewpoints are imported into software like PhotoModeler.
  • 85. 85/210 Photogrammetry The forensic engineer can then choose points common to each photo. The software will calculate the location of each point in a three dimensional coordinate system.
  • 86. 86/210 Rectification Photographic rectification is also used to analyze evidence that may not have been measured at the accident scene. Two dimensional rectification transforms a single photograph into a top-down view. Software like PC-Rect can be used to rectify a digital photograph.
  • 87. 87/210 Failure mode and effects analysis
  • 88. 88/210 Failure mode and effects analysis Failure Mode and Effects Analysis (FMEA) was one of the first systematic techniques for failure analysis. It was developed by reliability engineers in the 1950s to study problems that might arise from malfunctions of military systems.
  • 89. 89/210 Failure mode and effects analysis A FMEA is often the first step of a system reliability study. It involves reviewing as many components, assemblies, and subsystems as possible to identify failure modes, and their causes and effects.
  • 90. 90/210 Failure mode and effects analysis For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. A FMEA is mainly a qualitative analysis.
  • 91. 91/210 Failure mode and effects analysis A few different types of FMEA analysis exist, like Functional, Design, and Process FMEA.
  • 92. 92/210 Failure mode and effects analysis Sometimes the FMEA is called FMECA to indicate that Criticality analysis is performed also.
  • 93. 93/210 Failure mode and effects analysis An FMEA is an inductive reasoning (forward logic) single point of failure analysis and is a core task in reliability engineering, safety engineering and quality engineering. Quality engineering is specially concerned with the "Process" (Manufacturing and Assembly) type of FMEA.
  • 94. 94/210 Failure mode and effects analysis A successful FMEA activity helps to identify potential failure modes based on experience with similar products and processes - or based on common physics of failure logic.
  • 95. 95/210 Failure mode and effects analysis It is widely used in development and manufacturing industries in various phases of the product life cycle. Effects analysis refers to studying the consequences of those failures on different system levels.
  • 96. 96/210 Failure mode and effects analysis Functional analyses are needed as an input to determine correct failure modes, at all system levels, both for functional FMEA or Piece-Part (hardware) FMEA.
  • 97. 97/210 Failure mode and effects analysis A FMEA is used to structure Mitigation for Risk reduction based on either failure (mode) effect severity reduction or based on lowering the probability of failure or both.
  • 98. 98/210 Failure mode and effects analysis The FMEA is in principle a full inductive (forward logic) analysis, however the failure probability can only be estimated or reduced by understanding the failure mechanism.
  • 99. 99/210 Failure mode and effects analysis Ideally this probability shall be lowered to "impossible to occur" by eliminating the (root) causes. It is therefore important to include in the FMEA an appropriate depth of information on the causes of failure (deductive analysis).
  • 100. 100/210 Failure mode and effects analysis The FME(C)A is a design tool used to systematically analyze postulated component failures and identify the resultant effects on system operations. The analysis is sometimes characterized as consisting of two sub-analyses, the first being the failure modes and effects analysis (FMEA), and the second, the criticality analysis (CA).
  • 101. 101/210 Failure mode and effects analysis Successful development of an FMEA requires that the analyst include all significant failure modes for each contributing element or part in the system. FMEAs can be performed at the system, subsystem, assembly, subassembly or part level.
  • 102. 102/210 Failure mode and effects analysis The FMECA should be a living document during development of a hardware design. It should be scheduled and completed concurrently with the design. If completed in a timely manner, the FMECA can help guide design decisions. The usefulness of the FMECA as a design tool and in the decision making process is dependent on the effectiveness and timeliness with which design problems are identified.
  • 103. 103/210 Failure mode and effects analysis Timeliness is probably the most important consideration. In the extreme case, the FMECA would be of little value to the design decision process if the analysis is performed after the hardware is built.
  • 104. 104/210 Failure mode and effects analysis While the FMECA identifies all part failure modes, its primary benefit is the early identification of all critical and catastrophic subsystem or system failure modes so they can be eliminated or minimized through design modification at the earliest point in the development effort.
  • 105. 105/210 Failure mode and effects analysis Therefore, the FMECA should be performed at the system level as soon as preliminary design information is available and extended to the lower levels as the detail design progresses.
  • 106. 106/210 Failure mode and effects analysis Remark: For more complete scenario modelling other type of Reliability analysis may be considered, for example fault tree analysis(FTA); a deductive (backward logic) failure analysis that may handle multiple failures within the item and/or external to the item including maintenance and logistics. It starts at higher functional / system level. A FTA may use the basic failure mode FMEA records or an effect summary as one of its inputs (the basic events). Interface hazard analysis, Human error analysis and others may be added for completion in scenario modelling.
  • 107. 107/210 Functional analysis The analysis may be performed at the functional level until the design has matured sufficiently to identify specific hardware that will perform the functions; then the analysis should be extended to the hardware level. When performing the hardware level FMECA, interfacing hardware is considered to be operating within specification. In addition, each part failure postulated is considered to be the only failure in the system (i.e., it is a single failure analysis).
  • 108. 108/210 Functional analysis In addition to the FMEAs done on systems to evaluate the impact lower level failures have on system operation, several other FMEAs are done. Special attention is paid to interfaces between systems and in fact at all functional interfaces. The purpose of these FMEAs is to assure that irreversible physical and/or functional damage is not propagated across the interface as a result of failures in one of the interfacing units.
  • 109. 109/210 Functional analysis These analyses are done to the piece part level for the circuits that directly interface with the other units. The FMEA can be accomplished without a CA, but a CA requires that the FMEA has previously identified system level critical failures. When both steps are done, the total process is called a FMECA.
  • 110. 110/210 Ground rules The ground rules of each FMEA include a set of project selected procedures; the assumptions on which the analysis is based; the hardware that has been included and excluded from the analysis and the rationale for the exclusions. The ground rules also describe the indenture level of the analysis, the basic hardware status, and the criteria for system and mission success.
  • 111. 111/210 Ground rules Every effort should be made to define all ground rules before the FMEA begins; however, the ground rules may be expanded and clarified as the analysis proceeds. A typical set of ground rules (assumptions) follows:
  • 112. 112/210 Ground rules ● Only one failure mode exists at a time. ● All inputs (including software commands) to the item being analyzed are present and at nominal values. ● All consumables are present in sufficient quantities. ● Nominal power is available
  • 113. 113/210 Benefits Major benefits derived from a properly implemented FMECA effort are as follows:
  • 114. 114/210 Benefits It provides a documented method for selecting a design with a high probability of successful operation and safety.
  • 115. 115/210 Benefits A documented uniform method of assessing potential failure mechanisms, failure modes and their impact on system operation, resulting in a list of failure modes ranked according to the seriousness of their system impact and likelihood of occurrence.
  • 116. 116/210 Benefits Early identification of single failure points (SFPS) and system interface problems, which may be critical to mission success and/or safety. They also provide a method of verifying that switching between redundant elements is not jeopardized by postulated single failures.
  • 117. 117/210 Benefits An effective method for evaluating the effect of proposed changes to the design and/or operational procedures on mission success and safety.
  • 118. 118/210 Benefits A basis for in-flight troubleshooting procedures and for locating performance monitoring and fault- detection devices.
  • 119. 119/210 Benefits Criteria for early planning of tests.
  • 120. 120/210 Basic terms The following covers some basic FMEA terminology. Failure The loss under stated conditions.
  • 121. 121/210 Basic terms Failure mode The specific manner or way by which a failure occurs in terms of failure of the item (being a part or (sub) system) function under investigation; it may generally describe the way the failure occurs. It shall at least clearly describe a (end) failure state of the item (or function in case of a Functional FMEA) under consideration. It is the result of the failure mechanism (cause of the failure mode). For example; a fully fractured axle, a deformed axle or a fully open or fully closed electrical contact are each a separate failure mode.
  • 122. 122/210 Basic terms Failure cause and/or mechanism Defects in requirements, design, process, quality control, handling or part application, which are the underlying cause or sequence of causes that initiate a process (mechanism) that leads to a failure mode over a certain time. A failure mode may have more causes.
  • 123. 123/210 Basic terms Failure cause and/or mechanism For example; "fatigue or corrosion of a structural beam" or "fretting corrosion in a electrical contact" is a failure mechanism and in itself (likely) not a failure mode. The related failure mode (end state) is a "full fracture of structural beam" or "an open electrical contact". The initial Cause might have been "Improper application of corrosion protection layer (paint)" and /or "(abnormal) vibration input from another (possible failed) system".
  • 124. 124/210 Basic terms / Failure effect Immediate consequences of a failure on operation, function or functionality, or status of some item.
  • 125. 125/210 Indenture levels (bill of material or functional breakdown) An identifier for system level and thereby item complexity. Complexity increases as levels are closer to one.
  • 126. 126/210 Local effect The failure effect as it applies to the item under analysis.
  • 127. 127/210 Next higher level effect The failure effect as it applies at the next higher indenture level.
  • 128. 128/210 End effect The failure effect at the highest indenture level or total system.
  • 129. 129/210 Detection The means of detection of the failure mode by maintainer, operator or built in detection system, including estimated dormancy period (if applicable)
  • 130. 130/210 Risk Priority Number (RPN) Cost (of the event) * Probability (of the event occurring) * Detection (Probability that the event would not be detected before the user was aware of it)
  • 131. 131/210 Severity The consequences of a failure mode. Severity considers the worst potential consequence of a failure, determined by the degree of injury, property damage, system damage and/or time lost to repair the failure.
  • 132. 132/210 Remarks / mitigation / actions Additional info, including the proposed mitigation or actions used to lower a risk or justify a risk level or scenario.
  • 134. 134/210 Probability (P) In this step it is necessary to look at the cause of a failure mode and the likelihood of occurrence. This can be done by analysis, calculations / FEM, looking at similar items or processes and the failure modes that have been documented for them in the past. A failure cause is looked upon as a design weakness. All the potential causes for a failure mode should be identified and documented.
  • 135. 135/210 Probability (P) This should be in technical terms. Examples of causes are: Human errors in handling, Manufacturing induced faults, Fatigue, Creep, Abrasive wear, erroneous algorithms, excessive voltage or improper operating conditions or use (depending on the used ground rules). A failure mode is given an Probability Ranking.
  • 137. 137/210 Severity (S) Determine the Severity for the worst case scenario adverse end effect (state). It is convenient to write these effects down in terms of what the user might see or experience in terms of functional failures. Examples of these end effects are: full loss of function x, degraded performance, functions in reversed mode, too late functioning, erratic functioning, etc.
  • 138. 138/210 Severity (S) Each end effect is given a Severity number (S) from, say, I (no effect) to VI (catastrophic), based on cost and/or loss of life or quality of life. These numbers prioritize the failure modes (together with probability and detectability). Below a typical classification is given. Other classifications are possible. See also hazard analysis.
  • 141. 141/210 Detection (D) The means or method by which a failure is detected, isolated by operator and/or maintainer and the time it may take. This is important for maintainability control (Availability of the system) and it is specially important for multiple failure scenarios.
  • 142. 142/210 Detection (D) This may involve dormant failure modes (e.g. No direct system effect, while a redundant system / item automatic takes over or when the failure only is problematic during specific mission or system states) or latent failures (e.g. deterioration failure mechanisms, like a metal growing crack, but not a critical length).
  • 143. 143/210 Detection (D) It should be made clear how the failure mode or cause can be discovered by an operator under normal system operation or if it can be discovered by the maintenance crew by some diagnostic action or automatic built in system test. A dormancy and/or latency period may be entered.
  • 145. 145/210 Detection (D) DORMANCY or LATENCY PERIOD The average time that a failure mode may be undetected may be entered if known. For example: During aircraft C Block inspection, preventive or predictive maintenance, X months or X flight hours During aircraft B Block inspection, preventive or predictive maintenance, X months or X flight hours During Turn-Around Inspection before or after flight (e.g. 8 hours average) During in-built system functional test, X minutes Continuously monitored, X seconds
  • 146. 146/210 Detection (D) INDICATION If the undetected failure allows the system to remain in a safe / working state, a second failure situation should be explored to determine whether or not an indication will be evident to all operators and what corrective action they may or should take.
  • 147. 147/210 Detection (D) Indications to the operator should be described as follows: Normal. An indication that is evident to an operator when the system or equipment is operating normally. Abnormal. An indication that is evident to an operator when the system has malfunctioned or failed. Incorrect. An erroneous indication to an operator due to the malfunction or failure of an indicator (i.e., instruments, sensing devices, visual or audible warning devices, etc.).
  • 148. 148/210 Detection (D) PERFORM DETECTION COVERAGE ANALYSIS FOR TEST PROCESSES AND MONITORING (From ARP4761 Standard):
  • 149. 149/210 Detection (D) This type of analysis is useful to determine how effective various test processes are at the detection of latent and dormant faults. The method used to accomplish this involves an examination of the applicable failure modes to determine whether or not their effects are detected, and to determine the percentage of failure rate applicable to the failure modes which are detected. The possibility that the detection means may itself fail latent should be accounted for in the coverage analysis as a limiting factor (i.e., coverage cannot be more reliable than the detection means availability).
  • 150. 150/210 Detection (D) Inclusion of the detection coverage in the FMEA can lead to each individual failure that would have been one effect category now being a separate effect category due to the detection coverage possibilities. Another way to include detection coverage is for the FTA to conservatively assume that no holes in coverage due to latent failure in the detection method affect detection of all failures assigned to the failure effect category of concern. The FMEA can be revised is necessary for those cases where this conservative assumption does not allow the top event probability requirements to be met.
  • 151. 151/210 Detection (D) After these three basic steps the Risk level may be provided.
  • 152. 152/210 Risk level (P*S) and (D) Risk is the combination of End Effect Probability And Severity. Where probability and severity includes the effect on non-detectability (dormancy time). This may influence the end effect probability of failure or the worst case effect Severity. The exact calculation may not be easy in case multiple scenarios (with multiple events) are possible and detectability / dormancy plays a crucial role (as for redundant systems). In that case Fault Tree Analysis and/or Event Trees may be needed to determine exact probability and risk levels.
  • 153. 153/210 Risk level (P*S) and (D) Preliminary Risk levels can be selected based on a Risk Matrix like shown below, based on Mil. Std. 882.[24] The higher the Risk level, the more justification and mitigation is needed to provide evidence and lower the risk to an acceptable level. High risk should be indicated to higher level management, who are responsible for final decision making.
  • 155. 155/210 Risk level (P*S) and (D) After this step the FMEA has become like a FMECA.
  • 156. 156/210 Timing The FMEA should be updated whenever: A new cycle begins (new product/process) Changes are made to the operating conditions A change is made in the design New regulations are instituted Customer feedback indicates a problem
  • 157. 157/210 Uses Development of system requirements that minimize the likelihood of failures. Development of designs and test systems to ensure that the failures have been eliminated or the risk is reduced to acceptable level. Development and evaluation of diagnostic systems To help with design choices (trade-off analysis).
  • 158. 158/210 Advantages Improve the quality, reliability and safety of a product/process Improve company image and competitiveness Increase user satisfaction Reduce system development time and cost Collect information to reduce future failures, capture engineering knowledge
  • 159. 159/210 Advantages Reduce the potential for warranty concerns Early identification and elimination of potential failure modes Emphasize problem prevention Minimize late changes and associated cost Catalyst for teamwork and idea exchange between functions Reduce the possibility of same kind of failure in future Reduce impact on company profit margin Improve production yield
  • 160. 160/210 Limitations If used as a top-down tool, FMEA may only identify major failure modes in a system. Fault tree analysis (FTA) is better suited for "top-down" analysis. When used as a "bottom-up" tool FMEA can augment or complement FTA and identify many more causes and failure modes resulting in top-level symptoms. It is not able to discover complex failure modes involving multiple failures within a subsystem, or to report expected failure intervals of particular failure modes up to the upper level subsystem or system.
  • 161. 161/210 Limitations Additionally, the multiplication of the severity, occurrence and detection rankings may result in rank reversals, where a less serious failure mode receives a higher RPN than a more serious failure mode. The reason for this is that the rankings are ordinal scale numbers, and multiplication is not defined for ordinal numbers. The ordinal rankings only say that one ranking is better or worse than another, but not by how much. For instance, a ranking of "2" may not be twice as severe as a ranking of "1," or an "8" may not be twice as severe as a "4," but multiplication treats them as though they are. See Level of measurement for further discussion.
  • 162. 162/210 Types Functional: before design solutions are provided (or only on high level) functions can be evaluated on potential functional failure effects. General Mitigations ("design to" requirements) can be proposed to limit consequence of functional failures or limit the probability of occurrence in this early development. It is based on a functional breakdown of a system. This type may also be used for Software evaluation.
  • 163. 163/210 Types Concept Design / Hardware: analysis of systems or subsystems in the early design concept stages to analyse the failure mechanisms and lower level functional failures, specially to different concept solutions in more detail. It may be used in trade-off studies.
  • 164. 164/210 Types Detailed Design / Hardware: analysis of products prior to production. These are the most detailed (in mil 1629 called Piece-Part or Hardware FMEA) FMEAs and used to identify any possible hardware (or other) failure mode up to the lowest part level. It should be based on hardware breakdown (e.g. the BoM = Bill of Material). Any Failure effect Severity, failure Prevention (Mitigation), Failure Detection and Diagnostics may be fully analysed in this FMEA.
  • 165. 165/210 Types Process: analysis of manufacturing and assembly processes. Both quality and reliability may be affected from process faults. The input for this FMEA is amongst others a work process / task Breakdown.
  • 167. HOW TO CONDUCT AN EFFECTIVE SAFETY ASSESSMENT OFFICE SPACES
  • 168. Why should you be conducting assessments? ● To spot unsafe conditions and equipment ● To focus on unsafe work practices or behavior trends before they lead to injuries ● Reveal the need for new safeguards ● To provide a safe working environment for all workers
  • 169. What should I look for during an office assessment? ● Emergency Egress ● Work Environment ● Ergonomics ● Emergency Information ● Fire Prevention ● Electrical Systems ● Employee Behavior
  • 170. Emergency Egress ● Blocked or locked doorways ● Locking devices that can impede emergency egress ● Properly marked exits ● Properly illuminated exits ● Clear aisles and pathways
  • 171.
  • 172. Work Environment ● Clean, sanitary and orderly work spaces ● Tripping hazards such as loose tiles, carpeting, flooring ● Are drawers kept open when not in use ● Are items stored above shoulder level and unsecured
  • 173.
  • 174.
  • 175. Ergonomics ● Are workstations configured to prevent employee discomfort and injury ● Are employees aware of ergonomic risk factors ● Have employees received ergonomic training
  • 176. Emergency Information ● Are emergency phone numbers posted where they can be readily found ● Are employees trained in emergency procedures ● Are evacuation procedures and diagrams posted
  • 177.
  • 178. Fire Prevention ● Are portable fire extinguishers readily available and unobstructed ● Are fire pull stations clearly marked and unobstructed ● Are all fire sprinkler heads kept clear and unobstructed (at least 18 inches) ● Are space heaters used and authorized
  • 179.
  • 180. Electrical Systems ● Are extension cords/power strips kept uncoupled (piggy-backed) ● Are all extension cords/power strips provided by the agency ● Are electrical outlets clear of combustible materials ● Do electrical cords create trip hazards ● Are extension cords used as permanent wiring
  • 181.
  • 182.
  • 183. Employee Behavior ● Are employees observing established safety rules ● Do employees minimize hazards by applying Operational Risk Management principles ● Are employee allowed to report unsafe conditions or acts without restraint
  • 185.
  • 186. How to assess safety SUMMARY ● Promoting Safety ● Monthly Assessment Program ● Positive Findings (above & beyond minimum requirements) ● Assessments – emergency info, egress, environment, ergonomics, fire prevention, electrical, unsafe behavior
  • 187.
  • 188. Risk Assessment and Management
  • 189. Getting the Measure of Risk ● Having understood the potential accident sequences associated with a hazard (e.g. using ETA) … ● Next step is to determine the severity of the credible accidents identified ● Remember risk is the product of severity and probability of an accident ● Two different approaches: – Estimate probability of accident, and hence get a measure of accident risk… then decide whether estimated risk is acceptable ● Used in many domains, including rail, military aerospace ● Will discuss this approach first, using rail standards as
  • 190. EN 50126 Accident Severity ● Accident Severity Categories are qualitative descriptions of consequences of failure conditions (hazards) – considering likely impact Severity Level Consequence to Persons or Environment Consequence to Service Catastrophic Fatalities and/or multiple severe injuries and/or major damage to the environment Critical Single fatality and/or severe injury and/or significant damage to the environment Loss of a major system Marginal Minor injury and/or significant threat to the environment Severe system(s) damage Insignificant Possible minor injury Minor system damage
  • 191. Accident Probability Next, estimate (predict) accident probability ● Use historical results, analysis, and engineering judgment to determine appropriate qualitative probability category ● Note we may have to consider both – how likely hazard is to arise – how likely hazard is to develop into accident EN 50126 Category Description Frequent Likely to occur frequently. The hazard will be continually experienced. Probable Will occur several times. The hazard can be expected to occur often. Occasional Likely to occur several times. The hazard can be expected to occur several times Remote Likely to occur sometime in the system lifecycle. The hazard can reasonably be expected to occur Improbable Unlikely to occur, but possible. It can be assumed that the hazard will exceptionally occur. Incredible Extremely unlikely to occur. It can be assumed that the hazard may not occur.
  • 192. Classifying Risk ● Having assigned severity and probability associated with hazard consequences … ● Next step is to use a Hazard Risk Matrix to classify the the risk EN 50126 Frequency of occurrence of a hazardous event Risk Levels Frequent Undesirable Intolerable Intolerable Intolerable Probable Tolerable Undesirable Intolerable Intolerable Occasional Negligible Undesirable Undesirable Intolerable Remote Negligible Tolerable Undesirable Undesirable Improbable Negligible Negligible Tolerable Tolerable Incredible Negligible Negligible Negligible Negligible Insignificant Marginal Critical Catastrophic Severity Level of Hazard Consequence
  • 193. Accepting Risk Reasoning about risk ● Using HRI now possible to say, e.g. Risk(Hazard H1) > Risk(Hazard H2) ● In order to say what is acceptable / unacceptable, must provide an interpretation, e.g. EN 50126 Risk Category Actions to be applied against each category Intolerable Shall be eliminated Undesirable Shall only be accepted when risk reduction is impracticable and with the agreement of the Railway Authority or the Safety Regulatory Authority, as appropriate Tolerable Acceptable with adequate control and with the agreement of the Railway Authority Negligible Acceptable with the agreement of the Railway Authority
  • 194. Managing Risk Risk Resolution ● Can associate objectives or actions with risk class, e.g. – technologies used – development processes – assessment criteria ● Example, for “undesirable” risk, might decide – no single point of failure shall lead to system accident – probability of fatality must be < 1x10-8 per hour – failure behaviour over time (lifetime of system)
  • 195. ARP 4761 Determining Risk - Civil Aerospace Style 1 Start with determination of severity ● very similar to rail categories
  • 196. Determining Risk - Civil Aerospace Style 2 ● When severity has been determined, can set objectives (requirements) for risk control – primarily boundaries on acceptable probability of failure condition (hazard) P r o b a b ilit y O b je c t iv e D e s c r ip t iv e Q u a n t it a t i v e ( p e r f l ig h t h o u r ) C a t a s t r o p h ic S e v e r it y C la s s if i c a t io n H a z a r d o u s M a jo r M in o r E x t r e m e ly Im p r o b a b le < 1 0 - 9 E x t r e m e ly R e m o t e 1 0 - 7 t o 1 0 - 9 1 0 - 5 t o 1 0 - 7 R e m o t e 1 0 - 3 t o 1 0 - 5 R e a s o n a b ly P r o b a b le > 1 0 - 3 F r e q u e n t Adapted from ARP 4761
  • 197. Determining Risk - Civil Aerospace Style 3 For civil aerospace, severity-related objectives are set in standards ● easy to work with ● unambiguous – provided you can agree on standardised and objective measures of severity! BUT ● Need to understand that direct mapping from severity to probability objectives is based on important assumption:
  • 198. Determining Risk - Civil Aerospace Style 4 Where does acceptable risk come from? ● in principle, requirements reflect “what risk the public is willing to accept” – risk (A) = probability (A) * severity (A) – level of acceptable risk hard to determine, and subjective ● in practice, certification bodies (airworthiness authorities) act as surrogates for the public – “bottom line” is hull loss rate – civil aviation hull loss rate target is currently 10-7 per flying hour ● for comparison, military aviation (UK) hull loss rate
  • 199. Determining Risk - Civil Aerospace Style 5 ● Has further implications: – implicit assumption about number of catastrophic failure conditions on an aircraft – also implicit assumption about how probable failure condition is to actually develop into an accident ● Example: – probability objective (target) for catastrophic failure condition is < 10-9 per flight hour – target hull loss rate is < 10-7 per flight hour – implies either a maximum of 100 catastrophic failure conditions on an aircraft, assuming all occurrences of catastrophic failure conditions will
  • 200. Determining Risk - Civil Aerospace Style 6 ● Note that objective of probability per flying hour has its problems… ● Consider: – histogram shows accidents / time – 1.8% of accidents occur in load / taxi / unload
  • 201. The ALARP Principle 1 ALARP = As Low As Reasonably Practicable T H E A L A R P ( A s L o w A s R e a s o n a b ly P r a c t ic a b le ) R E G I O N R is k is u n d e r t a k e n o n ly if b e n e f it is d e s ir e d T O L E R A B L E o n ly if r is k r e d u c t io n s a r e im p r a c t ic a b le o r c o s t g r o s s ly d is p r o p o r t io n a t e t o t h e im p r o v e m e n t g a in e d B R O A D L Y A C C E P T A B L E R E G I O N T O L E R A B L E if c o s t o f r e d u c t io n w o u ld e x c e e d im p r o v e m e n t g a in e d R is k c a n n o t b e ju s t if ie d o n a n y g r o u n d s I N T O L E R A B L E N E G L I G I B L E R I S K
  • 202. The ALARP Principle 2 ● Provides an interpretation of identified risks ● Pragmatic – although you can always spend more money to improve safety, it is not always cost-effective ● However, “cost-effectiveness” introduces ambiguity ● Regions of tolerability defined by regulatory domain and customer ● Approach is often implicit in the management of safety-critical projects anyway
  • 203. Risk Reduction Flowchart 1 Identify and determine risk associated with identified hazards S y s t e m D e s ig n H a z a r d Id e n t if ic a t io n H a z a r d R is k ( S e v e r it y /P r o b a b ilit y ) E s t a b lis h e d ID E N T IF Y H A Z A R D a n d R IS K
  • 204. Risk Reduction Flowchart 2 S y s t e m D e s ig n H a z a r d Id e n t if ic a t io n H a z a r d R is k ( S e v e r it y /P r o b a b ilit y ) E s t a b lis h e d R is k M e a s u r e d A g a in s t H R I M a t r ix C r it e r ia R is k A c c e p t a b le ? Id e n t if y H a z a r d a n d R is k A S S E S S R IS K N o Y e s
  • 205. Risk Reduction Flowchart 3 S y s t e m D e s ig n H a z a r d Id e n t if ic a t io n H a z a r d R is k ( S e v e r it y /P r o b a b ilit y ) E s t a b lis h e d R is k M e a s u r e d A g a in s t H R I M a t r ix C r it e r ia A p p ly R e - d e s ig n P r e c e d e n c e C r it e r ia R is k A c c e p t a b le ? 1 . R e d e s ig n t o e lim in a t e h a z a r d , o r r e d u c e lik e lih o o d 2 . In c o r p o r a t e m it ig a t io n , e .g . s a f e t y d e v ic e s 3 . P r o v id e w a r n in g s 4 . D e v e lo p p r o c e d u r e s a n d t r a in in g Id e n t if y H a z a r d a n d R is k A s s e s s R is k T A K E A C T IO N O p e r a t o r / C r e w T r a in in g R e q u ir e d C o n t in u e d e s ig n . D o c u m e n t a n a ly s is a n d ju s t if ic a t io n N o Y e s
  • 206. Precedence in Risk Reduction 1 ● Redesign to eliminate risk – Best where practical ● Change in operational role, or removal of hazardous material ● Redesign to reduce hazard likelihood – Select architecture or components ● Duplex or triplex or … ● Higher integrity components, with lower failure rates ● Incorporate mitigation to reduce impact of failures – Automated protection, e.g. pressure relief valves – Where incorporated, need to check periodically
  • 207. Precedence in Risk Reduction 2 ● Provide warning devices – Detect the hazardous condition and warn operators ● e.g. indicate that landing gear has not fully deployed ● e.g. to evacuate building due to fire or fumes ● Provide procedures and training – Reduce likelihood of hazard, or mitigate ● may involve use of personal protective equipment – Do not assume procedures are enough by themselves ● consider evolution of power guillotine regulations ● Precedence order
  • 208. Hazard Severity Categories Frequency of Occurrence I CATASTROPHIC II CRITICAL III MARGINAL IV NEGLIGIBLE A FREQUENT HIGH HIGH HIGH MEDIUM B PROBABLE HIGH HIGH MEDIUM LOW C OCCASIONAL HIGH HIGH MEDIUM LOW D REMOTE HIGH MEDIUM LOW LOW E IMPROBABLE MEDIUM LOW LOW LOW (MIL-STD-882C) Residual Risk - 1 ● Residual Risks are those that cannot be ‘designed out’ – risks inherent to design, where benefit is desirable ● Significant residual risks must be formally accepted by the appropriate authority (typically customer / operator) ● Can use Decision Authority Matrix, e.g.
  • 209. Residual Risk 2 Appropriate Decision Authority (From MIL-STD- 882C) HIGH – Service Acquisition Executive – e.g. no ground collision avoidance on F22 – signed off by 4-star Air Force General MEDIUM – Program Executive Officer LOW – Program Manager ● Usually a requirement to document all actions taken to resolve risk within terms of contract ● Customer authority can then decide whether
  • 210. Risk Management Summary ● Risk Assessment is the process of identifying the risk associated with system hazards ● Approach in many sectors (military, rail…) is to use Hazard Risk Matrix to determine the risk associated with a hazard from severity and probability estimates – then decide on acceptability of risk ● Alternative approach (Civil Aerospace) is based around severity – assumption of fixed level of acceptable risk... – … so can derive objectives, including probability, from severity