Improved correlation analysis and visualization of industrial alarm data

ISA Transactions 51 (2012) 499–506
Contents lists available at SciVerse ScienceDirect
ISA Transactions
journal homepage: www.elsevier.com/locate/isatrans
Improved correlation analysis and visualization of industrial alarm data✩
F. Yanga,b,∗
, S.L. Shaha
, D. Xiaob
, T. Chenc
a
Department of Chemical & Materials Engineering, University of Alberta, Edmonton, AB T6G 2V4, Canada
b
Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Automation, Tsinghua University, Beijing 100084, PR China
c
Department of Electrical & Computer Engineering, University of Alberta, Edmonton, AB T6G 2V4, Canada
a r t i c l e i n f o
Article history:
Received 13 August 2011
Received in revised form
12 March 2012
Accepted 23 March 2012
Available online 12 April 2012
Keywords:
Alarm management
Correlation color map
Visualization
Gaussian kernel
Pseudo data
Clustering
a b s t r a c t
The problem of multivariate alarm analysis and rationalization is complex and important in the area of
smart alarm management due to the interrelationships between variables. The technique of capturing
and visualizing the correlation information, especially from historical alarm data directly, is beneficial
for further analysis. In this paper, the Gaussian kernel method is applied to generate pseudo continuous
time series from the original binary alarm data. This can reduce the influence of missed, false, and
chattering alarms. By taking into account time lags between alarm variables, a correlation color map
of the transformed or pseudo data is used to show clusters of correlated variables with the alarm tags
reordered to better group the correlated alarms. Thereafter correlation and redundancy information can
be easily found and used to improve the alarm settings; and statistical methods such as singular value
decomposition techniques can be applied within each cluster to help design multivariate alarm strategies.
Industrial case studies are given to illustrate the practicality and efficacy of the proposed method. This
improved method is shown to be better than the alarm similarity color map when applied in the analysis
of industrial alarm data.
© 2012 ISA. Published by Elsevier Ltd. All rights reserved.
1. Introduction
During routine operation of industrial systems, abnormal sit-
uations show up in the control room as alarms. Most processes,
however, have too many alarm tags configured on process vari-
ables and other state variables mainly due to safety considerations.
The number of alarmed tags is large enough to overwhelm even ex-
perienced operators. Because of this situation, alarm management
has been recognized as an important problem in the area of sys-
tem monitoring and fault detection [1,2]. New and revised guide-
lines and standards have been proposed from different viewpoints
to tackle this problem [3–7].
The focus of many current projects in industry is on ‘alarm
rationalization’, that is, to reduce the number of alarms and yet at
the same time be able to detect all potential abnormal situations.
Most alarms are analyzed in a univariate framework. However
it is well known that most processes are multivariate and thus
✩ A short version of this paper was presented at the 18th IFAC World Congress,
Milano, Italy, August 28–September 2, 2011.
∗ Corresponding author at: Tsinghua National Laboratory for Information Science
and Technology (TNList), Department of Automation, Tsinghua University, Beijing
100084, PR China. Tel.: +86 10 6277 1347; fax: +86 10 6278 6911.
E-mail addresses: yangfan@tsinghua.edu.cn,fyang.cme@ualberta.ca (F. Yang),
Sirish.Shah@ualberta.ca (S.L. Shah), xiaody@tsinghua.edu.cn (D. Xiao),
tchen@ece.ualberta.ca (T. Chen).
the alarms are not independent. Therefore a good strategy for
the analysis and visualization of alarm data should be based on
a multivariate framework. This type of multivariate information
can be easily captured from process data and multivariate statistics
can be applied to generate efficient alarm strategy [8] or optimize
alarm limits [9]. Such methods can be categorized as indirect
methods because we resort to process data only, i.e., measured
values of process variables in engineering units. However, one may
have many alarm tags associated with a single process variable
and often alarm tags are without any association with any specific
process variables (e.g., digital alarms to indicate a specific event or
situation). These field alarms which have only ON/OFF (‘1’ or ‘0’)
values are hard to analyze if there is no process data available. All
of these issues necessitate development of new methods to mine
alarm data directly.
Compared to the analysis of numerical process data, the
analysis of binary alarm data has some unique properties. First,
to analyze the statistical properties, the process data should be
stationary, which is not satisfied because both the normal and
abnormal data are included. The relationship between different
variables may be different under different situations and should
be studied separately. The alarm data, however, only include the
data in abnormal situation, leading to a simpler case. Second,
the information is often hidden in noise, which is inevitable in
process data. Consequently there may be some potential value in
the analysis of alarm data [4].
0019-0578/$ – see front matter © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.isatra.2012.03.005

500 F. Yang et al. / ISA Transactions 51 (2012) 499–506
There are two ways to capture correlation from alarm data:
one is to employ Pearson’s correlation coefficients as done for
continuous data by assigning a value of 1 for the periods when the
alarm is active and a value of 0 when it is inactive [9]; the other
is to introduce similarity measures based on binary data [10,11].
By computing the correlation coefficient or similarity measure for
each pair of variables, a matrix is constructed. However, this matrix
which is composed of specific numbers is not generally convenient
for inspection or cursory analysis. Therefore the correlation values
are converted into a color map and this correlation color map
offers better visualization capability; it uses different colors to
show different degrees of correlation [12,13]. The colors are
usually discretized into several levels according to the color code.
Through the reordering and clustering of variables, the color-coded
correlation map can show the clusters of alarm tags intuitively. In
this way, the problem of a large number of variables is separated
into smaller sub-problems with much fewer variables.
The classical correlation color map only uses the correlation
coefficients. However, it may be ineffective when there is a time lag
between two variables. Hence the correlation coefficients should
be lag-adjusted to take into account the delays between each
pair of variables. The alarm similarity color map (ASCM), which
is specially designed for alarm data analysis [10], has proved to
be an effective method for visualization. It captures time-delayed
similarity information from binary data. However it has some
disadvantages. Firstly, in order to weaken the sensitivity caused
by the time shift, each unique alarm is padded with extra 1’s
to enrich the data; or in a similar form, alarms are checked
within constantly divided time windows (equivalent to down
sampling) [11]. This step is short of physical evidence and as a
result of this individual false alarms and chattering alarms may be
magnified unreasonably. In this paper we suggest another method
in which each unique alarm is replaced by a Gaussian distribution
along with its neighbors in the time axis [14]. This set of generated
continuous data with numerical values is labeled as pseudo data.
Based on this data set, we use Pearson’s correlation coefficients to
measure the correlation. This method has some similar problems
as the padding method; however, it provides a better way of
transforming binary alarm data into continuous data that can
be analyzed by statistical approaches. Another disadvantage of
the ASCM method is that it takes into account all the alarm
tags separately regardless their physical relationship. If typical
analogue alarms (HI/LO/HH/LL) associated with the same process
variable are grouped together, the result should be expected to be
more reasonable. Thirdly, the similarity measure only indicates the
distance but loses the direction (positive or negative correlation),
while the correlation coefficient indicates both the similarity and
the direction.
In this paper, a new approach to analyzing multiple alarm series
is proposed to find redundancy. By converting binary data into
continuous (pseudo) data via a novel Gaussian kernel method,
statistical methods are applied, including an improved correlation
and singular value analysis. The numerical results can be visualized
by the ASCM tool via appropriate ordering and clustering of alarm
tags.
2. Basic method for correlation analysis
2.1. Gaussian kernel method for data preprocessing
Alarm data is binary data with ‘1’s’ and ‘0’s’ to denote abnormal
and normal states respectively. Since alarm data is obtained and
discretized from continuous process data according to alarm limits,
some information is lost in this process. However, alarm data
does have advantages as well: (1) binary data is easy for storage
and convenient for statistical analysis; (2) it has much higher
sampling frequencies than process data and thus includes detailed
information; and (3) it includes more types of data such as digital
alarms showing the status of some elements (e.g. if a pump is on or
off). Thus we can use alarm data directly to analyze the correlation.
For a particular variable, one alarm point can be regarded
as a sample of the time series. To apply correlation analysis
to alarm occurrences, a continuous analogue signal has to be
generated from the alarm signal which is actually an event series.
By simply assigning ‘1’ or ‘0’, a time series in a continuous temporal
domain can be generated, which is simple but lacks granularity
because non-continuous (binary) values are not commensurate for
correlation analysis. In order to better estimate the time series, the
kernel method can be used in the temporal domain [15], which is a
nonparametric method, to fit the function with any shape. Here the
Gaussian kernel function is used because of its smoothness. At each
alarm point, a Gaussian kernel function is superimposed around
this time instant. The function is defined as:
K(t) =
1
√
2πσ
e− t2
2σ , (1)
where σ is the standard deviation. If all the alarm points are
considered, a continuous time series can be obtained as the
superposition of all the time-shifted Gaussian kernel functions.
Thus the resulting time series is given as
P(t) =
N
i=1
K(t − ti), (2)
where N is the number of alarm points at the time instants ti, i =
1, 2, . . . , N. This time series P(t) can be regarded as the estimation
of the corresponding process data although there may not exist
such a physical process variable; hence we call it pseudo data. Fig. 1
illustrates the principle of generating such pseudo data from binary
data.
The pseudo time series has the following properties:
• It is continuous and smooth because it is approximated by
superimposed Gaussian functions.
• For consecutive alarm points lasting a long time, the pseudo
data is also 1’s because the integral of the Gaussian kernel
function over the whole time domain is 1. This is different
from the estimation of probability density function because the
samples here are at different time instants.
• With appropriate variance of the kernel function, the magni-
tude of the kernel function is small; thus non-consecutive and
sparse alarm points cannot result in spikes in the pseudo data.
This property is particularly important for the filtering of chat-
tering alarms so that the proposed method can be used directly
on the data set with chattering alarms.
Consider the example of alarm data as shown in Fig. 2(a) in
which there is a consecutive alarm period of 400 s including a short
(5 s) break at around the 510th interval. At both the beginning and
the end of this period, there is some chattering. From this data set,
we can generate the pseudo data with standard deviation of 30 s as
shown in Fig. 2(b). Here the chattering and the gap are sufficiently
smoothened and at the same time the alarm period prevails. The
effects of missed alarms and false alarms are lessened because they
usually only exist over a short time duration.
2.2. Effect of time lags
There may exist a time lag between two correlated alarms
due to the propagation or response time. This lag reduces the
correlation coefficient and thus may mask the correlation be-
tween these variables. To eliminate this influence, different time

F. Yang et al. / ISA Transactions 51 (2012) 499–506 501
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17
t
16
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17
t
16
(a) Alarm data. (b) Pseudo data.
Fig. 1. Transforming alarm data (a) to pseudo data (b).
(a) Alarm data. (b) Pseudo data.
Fig. 2. (a) Original alarm data; (b) generated pseudo data by the Gaussian kernel method.
lags should be assumed and the maximal correlation coeffi-
cient can be computed; this would be regarded as the real
correlation [16].
Assume that x and y are time series of n observations with
means µx, µy and variances σx, σy respectively, then the cross-
correlation function (CCF) with an assumed lag k on y is:
φxy(k) =
E

(xi − µx)(yi+k − µy)

σxσy
, k = −n + 1, . . . , n − 1. (3)
The expectation can be estimated as the sample CCF by:
ˆφxy(k) =



1
n − k
n−k
i=1

σxσy, if k ≥ 0,
1
n − k
n
i=1−k

σxσy, if k < 0.
(4)
A value of the CCF is obtained by assuming a certain time lag
for one of the time series. Thus the absolute maximum value can
be regarded as the real cross-correlation and the corresponding
lag as the estimated time lag between these two variables. For
a mathematical description, one can compute the maximum
and minimum values φmax
= maxk{φxy(k), 0} ≥ 0 and φmin
=
mink{φxy(k), 0} ≤ 0, and the corresponding arguments kmax
and
kmin
. Then the estimated time delay from x to y is:
λ =

kmax
, if φmax
≥ −φmin
,
kmin
, if φmax
< −φmin
.
(5)
(corresponding to the maximum absolute value) and the actual
time delayed cross-correlation is ρ = φxy(λ) (between −1 and
1). If λ is less than zero, then it means that the actual delay is from
y to x. Thus the sign of λ provides the directionality information
between x and y. The sign of ρ indicates whether the correlation
is positive or negative. Note that this directionality does not mean
causality because there may exist another common cause of the
relationship between x and y [17].
2.3. Finding redundant alarms based on correlation
If several alarms are highly correlated, we should check
if there is redundancy in them, i.e., if some alarms can be
obtained by linearly combining other alarm signals. Singular value
decomposition (SVD) is a well-developed method to do this by
finding singular values of the set of alarm data and thus enabling
one to identify collinear columns. If there are several large singular
values that possess the dominant proportion of all the values,
then the number of large or dominant singular values can be
regarded as the number of independent alarm tags and the residual
number is the number of redundant alarms. However, SVD cannot
tell us which alarm is redundant because we are dealing with
the transformed or pseudo data to generate a set of new data.
It only provides an opportunity for one to check if there is an
alarm that can be removed or replaced by the linear combination
of other alarm signals. In addition, the independent series in the
new data may not have the approximate binary property, making
these values inappropriate to be taken as generated alarm signals.
One should also consider the physical meaning of all alarms and
reconcile this with the SVD information. This ambiguity in using
SVD is much severe if the number of alarms is large. In this case,
it is difficult to identify the independent and redundant alarms
merely based on the result of SVD. Therefore, we need to separate
the variables into several groups and perform SVD on each group
of variables. The clustering process based on visualizing the result
of the correlation analysis would help one in analyzing SVDs of
smaller groups of variables. Another point to be noted is that, for
SVD analysis, the time series should be sufficiently long to include
many alarms, otherwise it cannot reflect its true property.
3. Visualization of correlation
In order to visualize the correlation matrix, the correlation color
map is developed, in which the order of variables is rearranged
in a ranked order where variables that are highly correlated with
each other appear together with prominent colors or shades, called
clusters.

3.1. Ordering and clustering of alarm tags
Based on the correlation coefficient, the measure to describe
the similarity between two alarms is known as the similarity
measure S [18] which has the properties of positivity (S(x, y)
≥ 0), symmetry (S(x, y) = S(y, x)), and maximality (S(x, x) ≥
S(x, y)). There are many similarity measures available for binary
data [19]; hence if we compute the similarity measures based on
original alarm data, we can choose any one of them; for example
Kondaveeti et al. [10] have used the Jaccard similarity measure.
However we are dealing with pseudo data; so generally we can
use the covariance, Euclidean distance (L2), Kendall’s τ, Pearson’s
correlation coefficient, Spearman’s rank, City-block (L1), etc. [20].
In this paper, we use the classical Pearson’s correlation coefficient
and define the following similarity measure S:
S(x, y) = |ρxy|, (6)
which lies in the interval [0, 1].
During clustering, one should employ both the similarity
measure between two alarms and also the measure between two
clusters of alarms. The latter can be defined by single-, complete-,
and average-linkage methods [20]. The definition of single-
linkage is
SS(X, Y) = min
x∈X,y∈Y
{S(x, y)}, (7)
the definition of complete-linkage is
SC(X, Y) = max
x∈X,y∈Y
{S(x, y)}, (8)
and the definition of average-linkage is
SA(X, Y) =

x∈X,y∈Y
S(x, y)

|X| |Y|, (9)
where x and y are alarm tags, X and Y are clusters of tags, and | · |
means the size of the cluster.
Using the similarity measure between each pair of alarms or
their clusters, all alarm tags can be clustered based on various clus-
tering algorithms such as agglomerative hierarchical clustering,
the methodology for which is shown as a dendrogram. This process
has been illustrated in Kondaveeti et al.’s study [10]. Other ordering
and clustering algorithms can also be used such as the ellipse or-
dering. For this purpose, the data analysis software GAP is a useful
tool [20].
3.2. Correlation color map
The correlation matrix with rearranged rows and columns is
color coded to transform it into a color map [12]. The matrix is
symmetric and both the vertical and horizontal axes are alarm
tag names. Each grid (x, y) shows the corresponding correlation
between the two alarm tags x and y. The color is coded to show
the correlation (−1 to +1) or its absolute (0–1). To explain the
meaning of each color, a color bar legend is placed beside the plot.
The order of the alarm tags is determined by the above algorithm.
The color map is symmetric about the diagonal which means the
auto-correlation coefficients are 1’s. The clusters are shown as
blocks located along the diagonal with similar color codes. From
this map, one can acquire the approximate correlation between
each pair of alarms, and easily find clusters and the corresponding
alarm tags. By matching this map with the physical meaning, one
can locate the redundant alarms.
4. Implementation issues
In the above methods there are several issues to be noted.
4.1. Sampling rates of alarm data and pseudo data
The alarm data is recorded by DCS, PLC, or other smart devices;
the sampling rate can be very high. This is necessary sometimes
because alarms occur with a very high frequency due to oscillation
or chattering. Generally speaking, we can record them in the
database as frequently as possible, say one per second. A proposed
rule of thumb is a 15 s sample interval as a lower bound [14].
It is generally unnecessary to record process data at such a
high rate because the process always has a time constant with
a magnitude of minutes or even longer. In real applications, we
often record such samples at one sample per minute. Different
from alarm data and process data, the pseudo data is generated
from alarm data but behaves as continuous process data. Thus we
can record the pseudo data at the same frequency as the original
alarm data, but we can also decrease the sampling rate to reduce
the computational burden and can also match it with the sample
rate of process data.
4.2. Variance of the Gaussian kernel function
The variance of the kernel function affects the robustness of
the pseudo data at each alarm point. If the variance is very small,
each alarm point generates a spike in the pseudo data making the
method very sensitive to an individual sample or during a short
period of alarms. On the other hand if the variance is large, the
magnitude of each kernel function is very small; hence it needs
quite a few alarm points during a period of no alarms or quite a
few ‘holes’ during a period of consecutive alarms to change the
trend of the pseudo curve. Although this can reduce the influence
of the missed alarms, false alarms, and chattering alarms due to
its robustness to individual changes, the latency increases on the
other hand, meaning that there is a longer time delay to reflect a
change. This is a trade-off between large and small variances. To
our best knowledge, most variables in the process industry have a
time constant that ranges from several seconds to several minutes
and even hours; thus the standard deviation of the Gaussian kernel
function can be chosen over this range. If there is a step change
(from no alarm to alarm for example) in the alarm data, it needs
several minutes for the pseudo data to completely change the value
(from 0 to 1).
4.3. Computational effort
Although the number of samples is generally large due to
the high frequency of sampling rate, the alarm data set includes
large amount of normal data resulting in the data set being
quite sparse. In addition, when an alarm occurs consecutively,
the corresponding pseudo time series is a set of consecutive 1’s
according to the property mentioned earlier in Section 2; such
periods can be ignored in the computation by letting them remain
unchanged. Therefore the computation only concentrates on the
non-consecutive alarm points and the boundary of the consecutive
alarm points.
In SVD, the vectors of the pseudo data can be very long, making
the computation infeasible. We can either lower the sampling rate,
or use the technique of sparse matrix analysis [21].
4.4. Selection of data range
Theoretically, a long series can result in better results, but
the computational effort is higher. In addition, in correlation
analysis we assume that the data is stationary; this is difficult to
satisfy because the operating situation or fault pattern may vary,
especially in a long series. Thus we should compromise to look
into shorter series. Based on this compromise, the computation of
correlation requires a sufficiently large data set that contains alarm

Fig. 3. High density plot of the original alarm data for case study 1.
points because the normal points do not provide information. In
practice a short time period of simultaneous alarms is a much
better data set than a long time period of rare alarms. In particular,
the alarm data during an alarm flood is an especially good source
for study.
4.5. Grouping of alarm tags of one process variable
For a typical process variable, there may be as many as four
alarms associated with it: HI, LO, HH, and LL alarms. If these alarm
tags are treated separately, the relationship between them is lost.
Instead, these four alarms can be combined together by assigning
them different values based on their directions and degrees, for
example, HI as +1, LO as −1, HH as +2, and LL as −2. This
treatment makes the alarm data have several values instead of
binary values. When generating the pseudo data, these values
should be taken as the weights of the Gaussian kernels.
4.6. Choosing color code for color map
We can either use the same color with different shades or
different colors. Nevertheless, the number of color scales is very
important, which determines the interpretability of the map. There
is a trade-off between the sensitivity and the interpretability. A
general recommendation is to use fewer number of codes first and
increase the shades gradually to suit your eyes in the visualization
process and provide more ‘granularity’ in the number of alarms for
a process. In our work we use warm colors to describe positive
correlation and cold colors to describe negative correlation. The
four bands of values are chosen as (0–0.25), (0.25–0.5), (0.5–0.75),
and (0.75–1) respectively. Please see the color bar in Fig. 4(b) for
the color assignment.
5. Case studies
To illustrate the practicality and utility of the method suggested
in this work, we consider two case studies. The first case study
is to illustrate the procedure of the proposed method and the
improvements. The second case study is an application to a real
industrial process, showing the efficacy of the method.
5.1. Case study 1
First consider data from a real industrial process including 10
analogue alarm tags with HI and LO settings that have alarms over
a period of one week. The sampling rate is one sample per second.
By assigning HI and LO alarms +1 and −1 respectively, the alarm
data is shown as a high density plot in Fig. 3 where alarm tags 3
and 7 have both HI and LO alarms. If we use the method proposed
by Kondaveeti et al. [10], the ASCM obtained is shown in Fig. 4(a)
where the padding length used is 5 s. We find that alarm tags 1 and
2 are correlated, and alarm tags 4, 5, and 6 are grouped in another
cluster.
Now we use the method proposed in this paper by setting the
standard deviation of the Gaussian kernel as 30 s. The correlation
matrix Φ (with the elements below the diagonal removed due to
symmetry) is given in Box I.
Based on this matrix, Table 1 illustrates the clustering process.
Initially each alarm tag is regarded as a cluster. In the first step,
the similarity measure (S = absolutecorrelation) of each pair
of alarm tags are compared, in which the similarity measure S
between alarm tags 4 and 5 has the largest value, meaning that
they are most similar. Therefore alarm tags 1 and 2 are grouped
into a new cluster, 11, and the old clusters 4 and 5 are removed.
Then we compute the similarity measure between each pair of
new clusters. We take alarm tags 1 and 2 in the second step, and
then 10 and 8 in the third step. In the fourth step, the measure
a b
Fig. 4. Alarm similarity color map of the original alarm data based on Kondaveeti’s scheme (a) and correlation color map of the pseudo data (b) for case study 1. (For
interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Φ =













1 0.97 −0.06 −0.08 −0.08 0.33 0.23 0.54 0.34 0.42
1 −0.03 −0.07 −0.07 0.34 0.23 0.55 0.35 0.43
1 0.47 0.47 −0.21 0.26 −0.08 0.04 −0.12
1 1.00 −0.20 0.10 −0.16 −0.27 −0.13
1 −0.20 0.10 −0.16 −0.27 −0.13
1 −0.25 0.64 0.34 0.67
1 0.25 0.17 0.06
1 0.60 0.79
1 0.70
1













Box I.
Table 1
Clustering process for case study 1.
Step Clusters Largest measure
Correlation Pair
0 1 2 9 10 8 6 3 4 5 7 1.00 4, 5
1 1 2 9 10 8 6 3 11 7 0.97 1, 2
2 12 9 10 8 6 3 11 7 0.79 10, 8
3 12 9 13 6 3 11 7 0.69 9, 8
4 12 14 6 3 11 7 0.67 6, 10
5 12 15 3 11 7 0.55 2, 8
6 16 3 11 7 0.47 3, 4/5
7 16 17 7 −0.27 4/5, 9
8 18 7 0.26 7, 3
9 19 – –
between clusters 9 and 13 is the largest, which is determined by
the similarity between alarm tags 9 and 8 according to the single-
linkage method. So clusters 9 and 13 become the new cluster 14
which contains three alarm tags. The algorithm continues until
only one cluster is left. The corresponding dendrogram is shown
in Fig. 5. Thus the new order of alarm tags is obtained and thereby
we have the correlation color map shown in Fig. 4(b).
Apparently, alarm tags 1, 2, 9, 10, 8, and 6 form a single cluster
with positive correlations, and alarm tags 3, 4, and 5 also have
some correlations between them. This result clearly provides more
information than the one shown in Fig. 4. Therefore the proposed
method reveals some correlations that are not uncovered in the
method proposed by Kondaveeti et al. [10], such as the correlation
between alarm tags 8 and 9. Then we take the data of alarm tags
1, 2, 9, 10, 8, and 6 to perform SVD. The six singular values are
1720, 498, 347, 302, 199, and 165. Thus it is apparent that there
exists redundancy in these tags because one or two singular values
dominate most of the collinear information. According to the user’s
preference and the corresponding physical meanings, some alarms
may be removed.
5.2. Case study 2
Next consider a hydro treater process in a refinery in Alberta,
Canada. Since there are hundreds of tags with alarms, we order
them based on the numbers of occurrence and choose 11 process
tags associated with analogue alarms (HI/LO) that contribute to
the large number of alarm counts. The data set includes one week
worth of alarm data at a rate of one sample per second, during
which there are two evident floods leading to a large number of
alarms. The above analysis is implemented on this data set and
the color map obtained is shown in Fig. 6(a). The order of tags is
determined by the same algorithm as the previous case study; the
procedure is shown in Fig. 7.
It can be observed in Fig. 6(a) that alarm tags 1, 3, 9, 10,
and 11 form a cluster with high correlation values. From process
knowledge, we know that tags 1 and 3 are both flow rates, and
tag 3 is downstream of tag 1. Therefore, when a fault occurs
Fig. 5. Dendrogram for the 10 variables in case study 1 based on the single-linkage
measure. The scale on the right indicates the similarity measure.
upstream, it can propagate downstream to other variables, which
can be reflected in the alarm data. Although they have similar
trends in this case study, they are not redundant in general
because they reveal the fault propagation and there is some time
delay as evident from the time stamps. Tags 9, 10, and 11 are
fuel gas pressures of different burners in one equipment, leading
to evident correlations. From this study, we find that they are
somewhat redundant and may be reduced to one tag. In real
engineering practice, however, we should check more data to
confirm this redundancy and resort to process knowledge to
see the physical implication. Actually, some redundancy is also
necessary because it can help improve the accuracy of detection,
especially for important variables. In this case, each burner is
placed with one sensor for monitoring its operating situation,
although they are all pressure measurements for one equipment.
In addition, the correlation between the above two groups is
also high, which shows up the alarm flood when different tags
show similar patterns due to the fault spread or interrelationship
between physical quantities. Note that tag 11 behaves differently
from tags 9 and 10 although they are highly correlated; this also
accounts for the necessity of setting an alarm on tag 11.
As a comparison, the corresponding process data is also
available, as shown in Fig. 8, based on which the correlation color
map is obtained as shown in Fig. 6(b). It is observed that the two
color maps (Fig. 6(a) and (b)) have different patterns even if they
share the same order of tags. During the floods, many process
tags are correlated, while their alarms are less correlated thanks
to the good alarm settings. For example, tags 2 and 6 show some
negative correlation because they are flow rates on upstream and
downstream variables respectively; however, the corresponding
alarms do not have an apparent correlation.
6. Concluding remarks
In this paper, the correlation matrix is plotted as a color map to
visualize the relationship between different alarm tags. Compared
to the ASCM, the clustered correlation map of pseudo data map
has the following three main advantages: (1) it is robust to missed,
false, and chattering alarms; (2) the correlation of pseudo data
provides directional (positive or negative) information in addition

a b
Fig. 6. Color maps for pseudo data (a) and process data (b) in case study 2. (For interpretation of the references to colour in this figure legend, the reader is referred to the
web version of this article.)
1.00
0.84
0.67
0.51
0.35
0.18
0.02
Fig. 7. Dendrogram for case study 2.
Fig. 8. High density plot of the process data in case study 2.
to the similarity; (3) the pseudo data used in generating the color
map can also be used in other statistical analysis that provides
more potential; and (4) no knowledge of delays between correlated
alarms is required. One disadvantage to be noted is that when
computing correlation between two time series, the simultaneous
0’s in both the series have little information because they are in the
normal region, resulting in the reduction of correlation coefficients.
Thus the period for analysis should be selected appropriately
so that there are dense alarm signals, and it is this window of
alarm data that should be converted to pseudo data for statistical
analysis. From this improved correlation analysis, we can find the
relationship and redundancy in alarm tags that can be translated
into improved alarm settings for efficient alarm rationalization.
It should be noted that the proposed method in this paper is
an improved correlation analysis based on alarm data. However,
to obtain a reasonable and effective alarm configuration, we
should also investigate process data as well as process knowledge,
as illustrated in the second industrial case study and in Fig. 6.
The analysis of alarm data is simple but is lacking in terms
of detailed information, especially the true trend information
of a process variable; as a result, we usually take this analysis
to be the first step to focus our attention on some particular
parts of the industrial process and then apply more complex
techniques. As discussed in Section 5.2, the results should be
compared to the ones based on process data to find the influence
of alarm settings. The correlation and redundancy information
captured through data analysis should be combined with process
knowledge, especially the process connectivity and causality
information between process units and variables [22,23].
The method proposed in this paper can be developed to be user-
friendly to tune the parameters. In the pseudo data generating
stage, the variance of Gaussian kernel and the sampling rate of
pseudo data are two parameters that require tuning. In the color
map plotting stage, the clustering algorithm is another option. In
the color map display period, one should be able to change the
color code and the number of scales of colors. It is important to
provide some degrees of freedom for the user because the plot is
sensitive to these tuning parameters and an appropriate choice of
parameters depends on the specific circumstances, requirements
and objectives of the analysis.
Acknowledgments
This work was supported by NSERC (SPG and IRC) in Canada,
NSFC (60736026 and 60904044), Tsinghua National Laboratory
for Information Science and Technology (TNList) Cross-discipline
Foundation, and the 973 Project (2009CB320600) in China.
References
[1] Izadi I, Shah SL, Shook D, Chen T. An introduction to alarm analysis and design.
In: Proc. 7th IFAC symp. fault detection, supervision and safety of technical
processes. IFAC safeprocess 2009. 2009. p. 645–50.
[2] Izadi I, Shah SL, Chen T. Effective resource utilization for alarm management.
In: Proc. 49th IEEE conf. decision and control. CDC 2010. Atlanta, GA. 2010.
p. 6803–8.

[3] Hollifield B, Habibi E. The alarm management handbook: a comprehensive
guide. Houston (TX): PAS; 2007.
[4] Engineering Equipment and Materials Users’ Association. Alarm systems: a
guide to design, management and procurement. EEMUA standard 191. second
ed. 2007.
[5] International Society of Automation. Management of alarm systems for the
process industries. ANSI/ISA standard 18.2. 2009.
[6] Errington J, Reising DV, Burns C. ASM consortium guidelines: effective alarm
management practices. Phoenix, AZ: ASM Consortium; 2009.
[7] American Petroleum Institute. Pipeline SCADA alarm management. API
recommended practice 1167. 2010.
[8] Kondaveeti SR, Izadi I, Shah SL. Application of multivariate statistics for
efficient alarm generation. In: Proc. 7th IFAC symp. fault detection, supervision
and safety of technical processes. IFAC safeprocess 2009. 2009, p. 657–62.
[9] Yang F, Shah SL, Xiao D. Correlation analysis of alarm data and alarm limit
design for industrial processes. In: Proc. 2010 American control conf., ACC
2010. 2010, p. 5850–5.
[10] Kondaveeti SR, Izadi I, Shah SL, Black T. Graphical representation of industrial
alarm data. In: Proc. 11th IFAC/IFIP/IFORS/IEA symp. analysis, design, and
evaluation of human–machine systems. IFAC/IFIP/IFORS/IEA 2010. 2010.
[11] Nishiguchi J, Takai T. IPL2 and 3 performance improvement method for
process safety using event correlation analysis. Comput Chem Eng 2010;
34(12):2007–13.
[12] Tangirala AK, Shah SL, Thornhill NF. PSCMAP: a new tool for plant-wide
oscillation detection. J Process Control 2005;15(8):931–41.
[13] Zhang H. Statistical process monitoring and modeling using PCA and PLS. M.S.
thesis. Edmonton (AB, Canada): University of Alberta; 2000.
[14] Control Arts Inc. Alarm system engineering (e-book), 2011.
http://controlartsinc.com/Support/Publications.html.
[15] Silverman BW. Density estimation for statistics and data analysis. London
(New York): Chapman and Hall; 1986.
[16] Bauer M, Thornhill NF. A practical method for identifying the propagation path
of plant-wide disturbances. J Process Control 2008;18(7–8):707–19.
[17] Yang F, Shah SL, Xiao D. SDG (signed directed graph) based process description
and fault propagation analysis for a tailings pumping process. In: Proc. 13th
IFAC symp. automation in mining, mineral and metal processing. IFAC MMM
2010. 2010.
[18] Lesot MJ, Rifqi M, Benhadda H. Similarity measures for binary and numerical
data: a survey. Int J Knowl Eng Soft Data Paradigm 2009;1(1):63–84.
[19] Choi S, Cha S, Tappert CC. A survey of binary similarity and distance measures.
J Syst Cybernet Inform 2010;8(1):43–8.
[20] Wu HM, Tien YJ, Chen CH. GAP: a graphical environment for matrix
visualization and cluster analysis. Comput Statist Data Anal 2010;54(3):
767–78.
[21] Pissanetzky S. Sparse matrix technology. London (UK): Academic Press; 1984.
[22] Yang F, Shah SL, Xiao D. Signed directed graph based modeling and its
validation from process knowledge and process data. Int J Appl Math Comput
Sci 2012;22(1):41–53.
[23] Larsson JE, DeBor J. Real-time root cause analysis for complex technical
systems. In: Proc. joint IEEE 8th conf. on human factors and power plants
and 13th annual workshop on human performance/root cause/trending/
operating experience/self assessment, joint 8th IEEE HFPP/13th HPRCT. 2007.
p. 156–63.

Improved correlation analysis and visualization of industrial alarm data

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Improved correlation analysis and visualization of industrial alarm data

Similar to Improved correlation analysis and visualization of industrial alarm data (20)

More from ISA Interchange

More from ISA Interchange (20)

Recently uploaded

Recently uploaded (20)

Improved correlation analysis and visualization of industrial alarm data