Introduction to the management of control room alarms including guidance from EEMUA 191.
Control Room Operators receive alarms from the systems they use to monitor and control
Alarms should warn about situations that require a prompt response
There are many problems with alarms.
Hello My name is Andy Brazier This is a short presentation giving you an introduction to the management of control room alarms
The types of control rooms I am referring to will typically be found in oil refineries, gas plants, chemical sites, power stations and similar. Operators receive alarms from control systems. These may be hard wired alarms to a fixed panel, but most nowadays will be computer based systems. These alarms should warn operators about something that requires their prompt attention. Typically an equipment malfunction or control failure. However, there are lots of problems with alarms in many control rooms, especially with newer computer based systems.
These problems mean that alarms become a nuisance and distraction to operators. But it can have serious safety consequences. In 1994 a large fire and explosion occurred at Texaco’s refinery in Milford Have, South Wales, UK. It caused a lot of damage but luckily no one was killed or seriously injured. The UK regulator – the Health and Safety Executive or HSE investigated the accident and uncovered a number of significant issues with alarms. In particular they found that the operator received a large number of alarms in the run up to the accident with over 200 occurring in 10 minutes at one stage. The alarms were poorly prioritised so that the operator could not decide which ones were important. The graphical displays provided were not effective for the operator meaning they did not realise what was going wrong. And the operators had not been trained in how to deal with alarms or incidents. This accident report has informed a lot of information about alarms currently in use. Although it is worth noting that the accident occurred more than 20 years ago but the same problems persist.
The same problems are repeated at sites all over the world. Operators receive lots of alarms during the day that they do not need to respond to. These cause nuisance and distraction; and these alarms are of no practical benefit to the Operator. When things start to go wrong the Operator receives a lot of alarms in a flood so that they cannot work at what is going on or what has caused it. Many sites have a lot of alarms that are active all the time. These standing alarms occur due to poor design or management of change; or are a result of poor maintenance of instruments. Standing alarms are no use to anyone. Most systems allow alarms to be prioritised but this feature is often used badly. A common mistake is to give every alarm a high priority, which essentially means there is no priority. Operators often receive alarms that they do not recognise or understand; or they do not know how to respond. This often occurs because Operators are not actively involved in designing the alarms. Fundamentally one of the main problems is simply the number configured on a system. This is an unfortunate consequence of computer based control systems allowing alarms to be added very easily and cheaply.
There is a clear consensus across the affected industries that alarms need to be managed and the way this is done needs to be improved. To do this it is important to have a clear definition of what an alarm is and a policy on how alarms should be used. A procedure or guide is required to assign useful priorities in a consistent way. Alarms need to be reviewed and rationalised. This should be a standard activity for any project, including modifications. Also, existing alarms should go through the same process. And to ensure this approach is working it is important to have ways of determining performance; and to monitor the data and respond to it when required.
A lot of the problems with alarms have come about because systems have been designed without having a clear definition of what an alarm is how they should be used. You should develop your own definition. One that is appropriate for your organisation. But something along the lines of “a means of directing an operator’s attention to something that requires their timely assessment and action” will be a good place to start.” Alarms typically create an audible signal and the Operator then receives information visiually to allow them to identify the alarm. An alarm can be assigned to a wide range of events and conditions and so it is important to focus on what is important. This will certainly include safety and significant environmental hazards; and probably major damage to equipment. You can try to cover more issues including production and quality, but in practice this is very difficult to achieve whilst maintaining the other objectives with alarm management.
It is important to realise that alarms do not operate in isolation. They are part of an overall risk management system. In fact they are relatively poor methods of controlling risk, appearing low of the hierarchy of risk control. So you should only consider the use of an alarm after you are satisfied that all options to implement inherent safety and engineering controls have been implemented. Also, you should be sure that the Human Machine Interface (HMI), which is typically the set of graphics used by the Operator to monitor the system, is as good as it can be to allow the Operator to act proactively instead of relying on alarms, which is operating reactively. In most cases the main role of the alarm is to allow the Operator to intervene when things go wrong early enough to avoid an automated protection device such as an Safety Instrument System or trip to activate. In other words, the Operator’s response the alarm is a barrier or layer of protection; but not the only one. The general idea is that the more barriers there are in place the less vulnerable you are to one failing, noting that no automated system will ever be 100% reliable.
The key to improving alarms is to understand that every one configured on your system and every one received by the operator is important. They need to be designed so that they attract the Operator’s attention, information them about what has happened in a way that they understand and guide them to make the correct decision about how they should respond. The work you do needs to be make sure that every alarm is useful and relevant to the Operator. Other people may be interested in things that could be alarms, but unless they are sat in front of the system 24/7 these things should not be alarms. Every alarm should have a defined and documented response. This is one of the key things in making sure an alarm is justified. Every alarm has to sound at the right time so that the Operator has enough time to respond in order to avoid the problem that the alarm is warning about.
One thing that can help is the idea that some current alarms should actually be alerts. An alert is an indication that something has happened that may be interesting to the operator but they do not need to respond. There will be no safety or environmental consequence if the operator does not respond. They are a useful way of reducing the alarm count without depriving the operator of interesting or useful information. Alerts will normally not create an audible signal and will be listed separately from alarms.
When reviewing alarms it becomes clear that lots of events or conditions are being alarmed that should not be. Examples of what should not create an alarm include indications of current status and records of status changes that are considered normal part of operation or would be expected. Activations of a safety instrument system or trip should not be an alarm in their own right as they occur automatically without operator intervention, although knock-on effects may need to be considered. Duplicate signals should not create duplicate alarms and there should not be any alarms to indicate that something the Operator has done has been successful. For example, if an operator manually stops a pump it should not create an alarm stopped alarm.
OK So once we know what alarms we should have we can start to think about their priority. A very important message here is that every alarm is important and the operator is expected to respond to every alarm immediately. A low priority does not mean the alarm is not important or that response can be delayed. The purpose of prioritisation is to assist the operator if two or more alarms occur at the same time. Three or four levels of priority are usually sufficient. Low priority which you may call advisory. You may use medium priority, although often it is not necessary. High is generally the highest priority for normal alarms giving operators a warning about something requiring particular attention. The final priority is critical, but this is a special case which we will discuss in a minute.
You will need to generate your own rules or guidelines for determining alarm priorities. This flow chart gives an example of the process you may follow. You will start with something that could or should be an alarm. The first question to ask is how serious is the consequence that could occur if the alarm is not responded to? You will need to develop a way of deciding this but normally only a safety hazard with potential to cause a major accident would be considered significant, with the majority of potential outcomes being of lesser concern. The next question is how long will the operator have to respond to the alarm? Again, you will need to determine how you will differentiate between a relatively short or longer amount of time. I have suggested 20 minutes here. From this you have your first priority assigned. For the most significant consequences with a short amount of time for the operator to respond a high priority alarm would be most appropriate. If this is not the case we will be continuing with a situation where the consequence is lower or the time available is longer. Again you need to decide what length of time you use but I have suggested 4 hours. So if the operator has less than 4 hours available a low priority alarm would be appropriate. This leaves the final part of the flow chart where a longer time is available. If that is the case it is unlikely that an alarm can be justified. You may consider an alert instead or the alarm should be deleted.
Critical alarms are a special case and you may not have any. They are used where an operator action is required to avoid an immediate and significant hazard. This is usually because there is no automatic protection in place. If during your review you decide that one of your alarms should be identified as critical the first thing to do is to look for a better solution. Engineering out the need for a critical alarm must always be considered. If you find that the alarm must be kept you must be sure that a very robust system in place. The means of notifying the operator of the alarm must be very reliable. Critical alarms must be clearly differentiated from all normal alarms. The operator response must be fully documented and operators must be assessed as competent.
As a guideline we would expect 20% of alarms to be high priority and 80% low with no critical alarms or very few.
I will just say a couple of words about logical processing of alarms as this is a specialised field and is really something to think about once you are sure you have set up your alarms correctly. We have already said that every alarm received by the operator should be useful and this can depend on the mode of operation at the time. Making alarms conditional on some other condition may be useful. For example, a low flow alarm is not useful if the operator has stopped the pump, so simple logical programming can be used to only activate the alarm when the pump is running. There are more sophisticated options available for managing alarms but these do need to be used with great care and are not in the scope of this presentation.
So you need to put the ideas presented so far into practice. This may be applied to existing alarms or as part of a project that is introducing new alarms. You will need a clear definition of what an alarm is as well as a policy and guide or rules for prioritising. You then need to review every alarm. I am afraid there are no short cuts. First confirm the alarm should be an alarm, making reference to your definition and policy. Is the alarm properly identified and will operators know what it means? How are operators expected to respond. You need to record this. Assign a priority. Consider the need to make the alarm conditional Review the alarm set point to make sure it would activate at the right time to give the operator a realistic opportunity to respond.
I am going to use this very simple, hypothetical example. We have a vessel with a flow of liquid in and out. It normally operates at about 30% full. If the vessel reaches 100% it will overflow and cause a safety hazard. To avoid this we have a trip at 90% level which will shut the inlet valve. This includes an alarm. We also have an alarm set at 70% level.
We will review the two level alarms. This table will guide us and allow us to record our results. The table asks us to identify the alarm. We then consider what it is that we want to avoid. If the alarm occurs how should the operator respond? If they don’t respond in time, what is the consequence going to be? How long do they have to respond to avoid that consequence. And based on that, what is the priority? We will first look at the high level alarm set at the 70% level on the vessel. We are trying to avoid the vessel being overfilled as that will cause a spill with safety consequences. So if the operator receives the alarm at 70% they should reduce the flow into the vessel and may consider shutting the inlet valve completely. If the operator does not respond the level will continue causing the plant trip to activate at 90% The time available to react will depend on how quickly the level in the vessel can rise. In this case if the level can rise 1% per minute it will take 20 minutes to go from the alarm at 70% to the trip activation. This gives us a high priority alarm. If we now look at the alarm at 90% level, which may be called high high. It is also where the trip activates. We are still trying to avoid overfilling the vessel. However, at 90% the trip stops flow to the vessel automatically meaning there is no operator action. Hence the consequence of not responding is none and time available is not applicable. The most likely outcome here is that the alarm will be changed to an Alert because the operator will want to know about the trip but is not required to respond. And if they do not respond there will be no consequence. To illustrate the prioritisation according to time available we can look again at high level alarm at 70%. If the level rise is 1% every 2 minutes it would take 40 minutes to increase from the alarm at 70% to the trip at 90%. This would make it a low priority alarm instead.
The final thing to consider is what does good look like in terms of alarm performance. This is something you should be looking at to decide if you have a problem and to confirm the work you do creates improvement. This table gives an idea of the types of performance indicators you may want to use. An average alarm rate of 6 alarms per hour for each operator gives an indication of the demands on the operator during normal operations. When things do go wrong it is recognised that a greater number of alarms is expected but that too many will overload the operator. A maximum of 10 alarms in 10 minutes is the benchmark to aim for. Ideally there will be no standing, suppressed or shelved alarms as these indicate that either the alarms are not designed for the mode of operator or problems with instrumentation reliability and maintenance.
If you want more information on the subject there are a number of places you can go for further guidance. The UK’s Health and Safety Executive have a free download that gives simple guidance. Anyone can download it. For more detailed guidance and standards you have choices of EEMUA 191 from the Engineering Equipment and Materials Users Association, IEC 82682 from International Electrotechnical Commission and ISA 18.2 from International Society of Automation.
I hope you have found this useful and thank you for your interest. If you have any questions do not hesitate to contact me.
Management of control room alarms
Tel: (+44) 01492 879813 Mob: (+44) 07984 284642
Management of Control Room
A bit about me
22 years working as Human Factors
Engineering (HFE) consultant
13 years self-employed
Chartered member of the Chartered Institute of
Ergonomics and Human Factors
Associate member of Institute of Chemical
UK – Shell, BP, Tata, Syngenta, Total, Centrica,
Scottish and Southern Energy etc.
Middle East – PDO, KNPC, ADMA, Saudi
India – Larson and Toubro (projects for PDO
Human Factors Engineering
A multidisciplinary science that focuses on the
interaction between the human and the work
Optimise human and system performance
Thinking about how people will interact with plant and
equipment during design
Making sure it will be easy to operate and maintain
Reduce health and safety risks
Reduce the likelihood of human error and improve
Why apply HFE to the design of work
systems, workplaces and products?
Improve HSE performance
Reduce operational HSE risk
Reduce the likelihood or mitigate the
consequences of human error
Improve human efficiency and productivity.
Contribute to more efficient design and avoiding
re-work/changes later on.
Reduce life cycle costs of operating and
Critical task analysis
Plant/piping layout & access (valve analysis)
Control room design
Human machine interfaces
Procedures, training and competence
Control Room Operators receive alarms from the
systems they use to monitor and control
Alarms should warn about situations that require
a prompt response
There are many problems with alarms.
1994 – Fire & Explosion at Texaco’s
Milford Haven Refinery
Too many alarms
Displays did not help
Typical problems with alarms
Nuisance alarms during normal operations - do
not require a response
Floods of alarms during process upsets and
Too many standing alarms
Priorities not helpful to Operators
Operators do not understand alarm descriptions
or don’t know how to respond
Too many alarms configured.
Alarm definition and philosophy
Alarm review and rationalisation
Definition and philosophy
An alarm is “a means of directing an Operator’s
attention to something that requires their timely
assessment and action”
Applied to situations with potential for
Major equipment damage.
Definition and philosophy
Part of an overall risk management system
Part of the hierarchy of control in this order
Effective Human Machine Interface (HMI) allowing
Operators to act proactively
Main role of alarms is to avoid activation of
Safety Instrument Systems (SIS) or ‘trips.’
Every alarm matters
Key design principles
Attract, inform and guide
Useful and relevant to the Operator
Have a defined response
Allows the operator enough time to respond.
Alerts – not alarms
Indication that something has happened but
Operator does not have to respond
No safety or environmental consequence if
Operator does not respond
Useful way of reducing alarm count without
losing interesting or useful information
No audible signal
Normal or expected status changes
Activation of SIS or trip
Signals confirming successful Operator action.
All alarms are important
Operator must respond to every alarm when it occurs
Low priority does not mean unimportant
Prioritisation used to assist operator if more than
one alarm occurs at the same time
Use 3 or 4 levels of priority
Low – advisory
Medium – may not need this level of priority
High – warning
Critical – special case (see below).
Major <20 minutes
Operator action required to avoid an immediate
No automatic protection in place
Avoid wherever possible and engineer out
Alarm annunciator must be very reliable
Alarm must be clearly differentiated from others
Response must be documented fully
Operators must be assessed as competent.
20% high priority
80% low priority
No critical alarms (or very few)
Every alarm received needs to be useful to the
May depend on mode of operation
Making alarms conditional can ensure this
Example – low flow alarm is only active when a pump
More sophisticated methods available but must
be used with great care.
Alarm review and rationalisation
Existing alarms or new project
Clear definition, policy & priority guide
Review every alarm
Confirm it is an alarm (or an alert)
Review alarm identifier/description
Define operator response
Consider conditional alarm
Review alarm set-point.
90% Trip & alarm
30% = normal
100% = overflow
Alarm review method
of no action
Plant trip 20 minutes High
None None N/A Alert
Plant trip 40 minutes Low
Criteria Benchmark per operator
Average rate <6 alarms per hour
Maximum rate (upset event) 10 alarms in 10 minutes
Standing alarms 0
Supressed/shelved alarms 0
Alarms are only part of the HMI – think about
graphics as well
Alarms are a tool for people – need a HFE
The aim is to reduce demands on safety
If the operator does not need to (or cannot)
respond promptly – it is not an alarm.
For new projects
Define the philosophy as soon as possible
Schedule alarm review workshops like other safety
Review performance as soon as operational
For existing plants
Define the philosophy to include continual
Carry out alarm rationalisation
Pertinent requirements (EEMUA 191)
Each alarm should attract, inform and guide
Every alarm presented to the operator should be
useful and relevant to the operator
Every alarm should have a defined response
Adequate time should be allowed for the
Operator to carry out a defined response
Every alarm should be relevant, unique, timely,
prioritised, understandable, diagnostic and
Free, simple guide to better alarm handling
Paid for guides/standards
I hope you found that useful
If you would like any more information you can
contact me as follows:
Email – firstname.lastname@example.org
Phone – +44 1492 879813
Mobile – +44 7984 284642