PCA-WD Fault Detection with PSO Parameter Optimization

Research article
Fault detection of feed water treatment process using PCA-WD
with parameter optimization
Shirong Zhang a
, Qian Tang a
, Yu Lin a
, Yuling Tang b,n
a
Department of Automation, College of Power and Mechanical Engineering, Wuhan University, Wuhan 430072, China
b
College of Computer Science, South-Central University for Nationalities, Wuhan, Hubei 430074, China
a r t i c l e i n f o
Article history:
Received 10 December 2015
Received in revised form
15 January 2017
Accepted 22 March 2017
Available online 3 April 2017
Keywords:
Feed water treatment process
Fault detection
PCA
Wavelet denoise
Parameter optimization
a b s t r a c t
Feed water treatment process (FWTP) is an essential part of utility boilers; and fault detection is expected
for its reliability improvement. Classical principal component analysis (PCA) has been applied to FWTPs
in our previous work; however, the noises of T2
and SPE statistics result in false detections and missed
detections. In this paper, Wavelet denoise (WD) is combined with PCA to form a new algorithm, (PCA-
WD), where WD is intentionally employed to deal with the noises. The parameter selection of PCA-WD is
further formulated as an optimization problem; and PSO is employed for optimization solution. A FWTP,
sustaining two 1000 MW generation units in a coal-fired power plant, is taken as a study case. Its op-
eration data is collected for following verification study. The results show that the optimized WD is
effective to restrain the noises of T2
and SPE statistics, so as to improve the performance of PCA-WD
algorithm. And, the parameter optimization enables PCA-WD to get its optimal parameters in an auto-
matic way rather than on individual experience. The optimized PCA-WD is further compared with
classical PCA and sliding window PCA (SWPCA), in terms of four cases as bias fault, drift fault, broken line
fault and normal condition, respectively. The advantages of the optimized PCA-WD, against classical PCA
and SWPCA, is finally convinced with the results.
& 2017 ISA. Published by Elsevier Ltd. All rights reserved.
1. Introduction
Presently, supercritical units and ultra-supercritical units are
widely employed in China; and have gradually become the main
parts of Chinese electricity supply. A power generation unit is a
typical continuous production process and consists of hundreds of
sub-processes and devices. Faults from all the components tend to
affect the operation safety of the whole units, even, result in ac-
cidents or unit shutdowns, which inevitably leads to large fi-
nancial loss or casualties [1]. Feed water treatment process (FWTP)
is a vital sub-process of a coal-fired utility boiler. It shoulders the
supply of qualified feed water to the steam and water circuit. An
ion exchange based feed water treatment process typically con-
sists of cation beds, anion beds, mixed beds and other components
such as pumps, fans and pipes, etc. Process faults may make the
quality of feed water below its standard. That further results in
heavy salification along the heating surface of the utility boilers,
consequently, endanger the operation safety of the boilers. The
FWTPs are equipped with process sensors, such as pressure, flow
rate, and analysis meters, such as electric conductivity, oxygen,
silicon and natrium, etc. These sensors are the measuring parts of
the process control loops and supervisory systems. Relatively
speaking, sensors are the weak spots of process control systems
comparing with actuators, controllers and communication links
[2]. They may face certain faults such as drift, bias, strong noise
and broken line, which hinder the safe and stable operation of
industrial processes [3]. Hence, an effective fault detection algo-
rithm is much needed for FWTPs.
Actually, the demands for operation safety of process industries
have spurred the recent development of many fault detection
methodologies [4–7]. Most of them are established upon the
process sensors. The computer control systems, such as distributed
control systems (DCSs) and programmable controllers (PCs), have
the ability to store massive operation data of the processes. It
makes data-driven fault detection possible and practical. Multi-
variate statistical analysis is a typical data-driven methodology,
which has been intensively studied and applied to fault detection
in literature [8–13]. Principal component analysis (PCA), in-
dependent component analysis (ICA) and partial least square (PLS)
have been widely applied to chemical industries for fault detection
[14–17]. In essence, they are all multivariate statistical analysis
based methods. Among these methods, PCA is the most popular
one and have been successfully applied to industrial proc-
esses owning to its simplicity, like in [18–20]. PCA represents the
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/isatrans
ISA Transactions
http://dx.doi.org/10.1016/j.isatra.2017.03.019
0019-0578/& 2017 ISA. Published by Elsevier Ltd. All rights reserved.
n
Corresponding author.
E-mail address: tylzsr@163.com (Y. Tang).
ISA Transactions 68 (2017) 313–326

high-dimensional process data in a reduced dimension; then, the
desired information can be achieved by reducing the weak cor-
relations between the variables [21–24]. PCA brings convenience
for fault detection of industrial processes. Two statistical hypoth-
esis tests, Hotelling T2
statistic in principal component space (PCS)
and SPE statistic in residual subspaces (RS), are generally con-
ducted in PCA. Some extensions of PCA are also proposed in the
literature with the purpose to improve certain performance of
PCA. In [25], an online fault detection framework, incorporating
multi-scale principal component analysis, is developed. An algo-
rithm using multisubspace principal component analysis with the
local outlier factor technique for process monitoring is further
proposed in [26].
As for our case at hand, when the classical PCA or the extended
PCA are applied to the FWTP, excessive false detections and missed
detections appear. It makes the classical PCA and its extended
versions not applicable for fault detection of FWTPs. The false
detections and missed detections are resulted by the fluctuations
of T2
and SPE statistics. Analytically, PCA based fault detection is
strictly valid when the following assumptions are satisfied [27]:
(1) The process is operating at pseudo-steady state. (2) The process
data used to build the PCA model contain normal operating data
only. (3) The process should be properly excited. However, the
field applications can hardly satisfy all these conditions. It is where
the fluctuations of the test statistics come from; thus, the temporal
violations of the limits lead to false alarms. Naturally, a denoising
methodology is expected to be combined with PCA with a plain
purpose to deal with the fluctuations of the two statistics. In [28],
an exponentially weighted moving average (EWMA) filtering
method is applied to the sensor validity index (SVI) and SPE; and
an application research in terms of a boiler process shows that the
EWMA filtering method can indeed reduce the false alarms and
oscillations of the indicators. In [29], the above EWMA filtering
method is further integrated into a self-validating soft sensor.
Again, EWMA scheme is used to filter the monitoring indices of
KICA-PCA to improve monitoring performance [31]. In fact, de-
noising is a relatively broad topic in engineering fields. In this
paper, we employ wavelet transform (WT) technique for simplicity
and practicability purposes. Wavelet transform is a well-known
multi-resolution analysis because of its ability to obtain good time
and frequency resolution, simultaneously, through ‘stretching’ and
‘translation’ of the wavelet. WT has been successfully used in
many fields, such as pattern recognition and fault diagnosis
[32,33]. Donoho et al. [34] firstly proposed a method to remove
white noise using wavelets, which is known as wavelet threshold
denoising. Within WT analysis, the signal is firstly decomposed
through discrete wavelet transform (DWT), so that the wavelet
coefficients can be obtained [35]. It is proven that the wavelet
coefficients resulted from noise are smaller than the coefficients of
major signal. With a predefined threshold, the coefficients below
the level are intentionally eliminated. Now, a pure signal without
noise can be achieved through a reconstruction with the denoised
wavelet coefficients. In [36], a mechanical fault diagnosis method,
integrating wavelet transform with support vector machine, is
presented, where WT is used to extract the noise from the T2
and
SPE statistics of PCA such that the impact caused by noise can be
effectively restrained. In this paper, wavelet denoising (WD) will
be combined with PCA to form a new fault detection algorithm,
PCA-WD. Then, PCA-WD will be applied to a FWTP for verification.
In fact, the selection of the specific parameters has considerable
influence on the performance of a WD. One way is to make a
decision with the priori knowledge of the engineers; however, it
relies too much on individual experience and cannot obtain an
optimal setting. This paper formulates the parameter selection of
WD as an optimization problem; and the solution to this problem
is an optimal parameter configuration. The objective function and
the constraints of the optimization problem are complex, non-
contiguous and have strong nonlinearity. It makes the conven-
tional optimization techniques, such as linear programming (LP)
and dynamic programming (DP), not applicable. Computational
intelligence-based techniques, such as genetic algorithm (GA) and
particle swarm optimization (PSO), can be alternative to our
parameter optimization problem. In literature, PSO has been
widely used in many fields such as mechanical, chemical, civil, and
aerospace design, because of its advantages such as comparative
simplicity, rapid convergence and little parameters to be adjusted.
PSO is known to effectively solve large-scale nonlinear optimiza-
tion problems [37]; hence, it is a suitable candidate for our pro-
blem at hand. PSO is in fact an evolutionary computation techni-
que proposed by Kennedy and Eberhart [38]. Classical PSO deals
with real-valued variables; however, it is realized that many op-
timization problems in practice are featured by discrete variables,
where classical PSO cannot work. Then, Kennedy and Eberhart
extended classical PSO to a discrete binary version, named BPSO,
where a sigmoid function is used with a random probability for
generating binary-valued position (0 or 1) for a particle from its
real-valued velocity component [39]. Laskari et al. [40] proposed a
discrete PSO, where a real value is truncated to its nearest integer
value. It is then employed by Liao and Tseng [41] to deal with a
flowshop scheduling problem. Moreover, a universal PSO is pro-
posed by Datta and Figueira [42]; it has the ability to work directly
with real, integer and discrete variables without extra conversions.
The parameter optimization problem of WD consists of several
kinds of variables; thus, the extended PSO is suitable and it will be
employed to get the optimal solutions.
From the application angle, this paper mainly focuses on the
fault detection of a feed water treatment process in coal-fired
power plants to improve its reliability. We start with an outline of
the object process. Massive operation data of the process is then
collected from a supervisory information system (SIS), which
communicates with the control system of FWTP and acquires the
long term operation data. Next, the fault detection of FWTPs with
classical PCA will be introduced, where the control limits of T2
and
SPE statistics, notated by Tlim
2
and SPElim are obtained, respectively.
And then, WD will be combined with PCA to form the PCA-WD
fault detection algorithm. The WD parameters need to be figured
out prior to the online operation of PCA-WD. The parameter se-
lection is formulated as an optimization problem, where PSO is
used to find which combination of parameters gives the best
performance. A FWTP in a coal-fired power plant, equipped with
two 1000 MW generation units, is taken as a study case. The real
operation data of the FWTP is collected to verify the PCA-WD al-
gorithm. We will present the results to show the effectiveness of
WD in dealing with the noises of T2
and SPE statistics, and the
capability of parameter optimization to determine the optimal
parameters of PCA-WD in an automatic way instead of on in-
dividual experience. Finally, the advantages of the optimized PCA-
WD, against classical PCA and SWPCA, will be proven with four
study cases as bias fault, drift fault, broken line fault and normal
condition.
The remainder of this paper is organized as follows. Section 2
outlines the feed water treatment process, which will be used as
study case in the following investigations. In Section 3, the PCA
based fault detection algorithm will be reviewed. Section 4 com-
bines WD with PCA to form PCA-WD algorithm. And the para-
meter selection of PCA-WD is to be discussed, and finally for-
mulated as an optimization problem. In Section 5, the proposed
PCA-WD is applied the FWTP for verification study, where the
effectiveness of WD and the advantages of the optimized PCA-WD
will be proven with convincing results. The conclusions are drawn
in Section 7.
S. Zhang et al. / ISA Transactions 68 (2017) 313–326314

2. Feed water treatment process
FWTP is a vital sub-process of coal-fired utility boiler. It aims to
supply qualified desalted water to the vapor circulating system of
the boiler. The quality of the feed water is the primary concern of
FWTPs. Unqualified feed water may cause salification along the
internal surface of critical devices, such as main steam pipes, re-
heat steam pipes, turbine blades, etc.; gradually, it may lead to
major safety hazards and bring economic loss for power plants. Ion
exchange is the most popular technology employed for FWTPs in
Chinese coal-fired power plants. Lots of sensors, actuators are
equipped with FWTPs for supervision and regulation purposes.
There are more than 60 measuring points equipped with a FWTP.
Field experiences show that the sensor faults are the common
reasons for unqualified water supply. Hence, an effective sensor
fault detection method is much needed for FWTPs. A FWTP in a
coal-fired power plant in Guangdong province of China is taken as
our study case. The flow chart of the FWTP is shown in Fig. 1. This
power plant is equipped with two 1000 MW generation units;
hence, the FWTP is designed to sustain the two units. Raw water is
successively treated by cation beds, anion beds, mixed beds; then,
the treated water is stored in desalted water tanks and finally
pumped into the two utility boilers. The FWTP in Fig. 1 is config-
ured into two operation routes, side A and side B, to assure the
reliability of the process. Side A consists of 1# cation bed, 1# anion
bed, and 1# mixed bed; and side B is formed by 2# cation bed, 2#
anion bed, and 2# mixed bed. Generally, one route can satisfy the
routine requirements of the two utility boilers; and the other route
is on stand by. Hence, it is reasonable to take only one route for
further study; specifically, we take side A as the following study
case. For the consideration of data availability and further field
application, only part of the valves and sensors of side A, as listed
in Table 1, are selected for the following fault detection research.
According to the operating procedure, the FWTP is scheduled
as follows. When the water in the desalted water tanks is sufficient
to sustain the boilers, the FWTP is switched off and on stand by.
On the other hand, if the water levels of the tanks are below a
predefined threshold, the FWTP will be switched on to produce
qualified feed water. Further, the working status of the two op-
eration routes is intentionally scheduled by the operators in order
to even the total working time of the two routes. The operating
procedure makes the FWTP working intermittently.
A supervisory information system (SIS) is equipped with the
coal-fired power plant, which gathers and stores long term op-
eration data of the whole plant through certain interfaces between
DCSs, programmable logic controllers (PLCs) and other controllers.
S11
S10
W4
W3
W2
W1
W6
W5
1# cation bed
raw water from secondary RO tank
acid from ejectors
from cation beds
to anion beds
from water pump
to mixed beds
to desalted water tanks
1# desalted water
tank
to utility boilers
to laboratory
1# desalted
water pump
S2
S1
S3
S4
S5
S6
S7
S8
S9
S12
1# in-house
water pumpto mixed, anion and cation beds
to acid, alkali storage system and
regenerative system
2# cation bed 1# anion bed
2# anion bed
1# mixed bed 2# mixed bed
2# desalted water
tank
2# desalted
water pump
3# desalted
water pump
4# desalted
water pump
2# in-house
water pump
Fig. 1. Flow chart of the feed water treatment process.
Table 1
Valves and sensors selected fault detection.
ID Description Unit ID Description Unit
W1 Inlet valve status of 1#
cation bed
– S4 Outlet pressure of 1#
cation exchanger
MPa
W2 Outlet valve status of
1# cation bed
– S5 Inlet flow rate of 1# anion
bed
m /h3
anion bed
– S6 Inlet pressure of 1# anion
exchanger
MPa
1# anion bed
– S7 Outlet pressure of 1# an-
ion exchanger
MPa
mixed bed
– S8 Electric conductivity of 1#
anion
us/cm
1# mixed bed
– S9 Inlet flow rate of 1# mixed
bed
m /h3
S1 Main pipe pressure of
raw water
MPa S10 Inlet pressure of 1# mixed
ion exchanger
MPa
S2 Inlet flow rate of 1#
cation bed
m /h3 S11 Outlet pressure of 1#
mixed ion exchanger
MPa
S3 Inlet pressure of 1#
cation exchanger
MPa S12 Electric conductivity of 1#
mixed bed
us/cm
S. Zhang et al. / ISA Transactions 68 (2017) 313–326 315

It makes our data-driven fault detection research applicable and
convenient. The historical operation data of the FWTP is collected
from the SIS through a programm interface provided by the SIS
vendor with a sampling rate of 5 s. They are further used for the
following fault detection research.
3. PCA based fault detection
PCA is a multivariate statistical technique which has been
widely used in process fault detection. Let ∈x m
R denote a sample
vector containing m sensors. Assuming that there are n samples of
these sensors with a constant sampling rate. Then a matrix
∈ ×
RX n m
is acquired; where each row represents a sample vector.
The matrix X is then standardized as follows to eliminate the ef-
fect from different scales of the sensors.
= [ − ( )] ( )σ
−
D IEX X X , 1
1
where μ μ μ( ) = [ … ] ∈ ×
E X , , , m
m
1 2
1
R is the mean vector of X, and
= [ … ] ∈ ×
I 1, 1, , 1 T n 1
R . In Eq. (1), σ σ σ= { … }σD diag , , , m1 2 , where
σ μ= ( − )E xi i i
2
is the ith standard variance of X. For the stan-
dardized data matrix X, its correlation matrix = ( − )S nX X/ 1
T
is
calculated and singularly decomposed. Then, X is projected to the
principal component space (PCS) and residual space (RS), namely,
= + = + ( )E TP EX X , 2T
where X represents the projection of X in PCS and E is the residual
matrix in RS. In Eq. (2), ∈ ×
T n k
R is the score matrix and ∈ ×
P m k
R
is the loading matrix, where k denotes the number of the principal
components (PCs). Further, k is determined using cumulative
percent variance (CPV)
λ
λ
=
∑
∑
≥
( )
=
=
lCPV ,
3
i
k
i
i
m
i
1
1
where λi presents the ith largest eigenvalue of the covariance
matrix S. The threshold l is usually set between 0.85 and 0.99.
For a new sample vector, ∈x m
R , it is respectively projected
into PCS and RS. Its projection in PCS, ^x, is as follows
^ = = ( )PP Cx x x, 4T
where C is the projection matrix to PCS. The projection in RS, e, is
defined as follows
= ( − ) = ( )
∼
e I PP Cx x, 5T
where
∼
C is the projection matrix to RS.
Generally, the PCA based process fault detection is conducted
through two indices as Hotelling T2
and SPE statistics. The T2
statistic is defined as
Λ Λ= ^ ^ = ( )− −
T P P t tx x , 6T T T2 1 1
where Λ λ λ= ( … )diag , , k1 represents the k largest eigenvalues of
covariance matrix S, and t represents the score vector of ^x. The
control limit of T2
statistic, i.e. Tlim
2
, is calculated as follows
=
( − )
( − )
( − )
( )
αT
k n
n n k
F k n k
1
, ,
7
lim
2
2
where ( − )αF k n k, is the critical point of F-distribution; and α is the
confidence. k and nÀk in Eq. (7) are the degree of freedom. The
SPE statistic is calculated as follows
= ∥ ˜ ∥ = ∥ ∥ ( )
∼
CSPE x x . 82 2
The control limit of the SPE statistic was developed by Jackson and
Mudholkar [10], that is,
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥
θ
θ
θ
θ
θ
= + +
( − )
( )
αC h h h
SPE
2
1
, 1
,
9
lim 1
2 0
2
1
2 0 0
1
2
where
∑θ λ= ( = )
( )= +
i 1, 2, 3 ,
10
i
j k
m
j
i
1
and
θ θ
θ
= −
( )
h 1
2
3
.
11
0
1 3
2
2
In Eq. (9), Cα is the upper fractile value of the standard normal
distribution with a significance level of α; and λ ( = … )j m1, ,j is
the jth largest eigenvalue of the covariance matrix S. In this paper,
both T2
and SPE statistics are taken into account for fault detection
of FWTPs. Fault alarm is triggered when one of the two statistics
exceeds their corresponding control limits.
The procedure of classical PCA based fault detection is as
follows.
(1) Get the operation data of the FWTP under normal condition,
and normalize the data according to Eq. (1). The data is then
used to form a training data set for PCA models.
(2) Build a PCA model with the training data set, and calculate its
Tlim
2
and SPElim according to Eqs. (7) and (9), respectively.
(3) Collect a new sample from the FWTP under a similar condition
as that in step (1), and calculate the real-time values of T2
and
SPE statistics.
(4) If the real-time values of T2
and SPE statistics exceed their
control limits, the sample is regarded as abnormal and a fault
alarm is then triggered; otherwise, it is considered to be in
normal condition.
(5) Repeat from step (3).
We applied the above procedure to a FWTP of a power plant in our
previous work; unexpectedly, excessive false detections and mis-
sed detections appear. It makes the classical PCA and the extended
PCA not applicable for fault detection of FWTPs. Analytically
speaking, the phenomena are mainly caused by the noises of T2
and SPE statistics. Naturally, a denoising technique is expected to
solve the problem. In the following section, WD technique is in-
tentionally combined with PCA to deal with the noise problems.
4. PCA-WD based fault detection
4.1. Wavelet denoising
Wavelet transform is a powerful signal-processing method. It
transforms time-domain signals into time–frequency domain
while obtaining high resolution time and frequency information of
the signals simultaneously. The mathematical definition of con-
tinuous wavelet transform (CWT) is described as
⎜ ⎟
⎛
⎝
⎞
⎠∫τ Ψ
τ
( ) =
| |
( ) * −
( )
a
a
f t
t
a
dtCWT ,
1
,
12f
R
where a is the scale factor which may be regarded as the inverse of
frequency, τ is the translation factor, and Ψ( )x is the base function.
In practice, CWT is not widely applied due to its enormous com-
putation caused by the fact that all the scales are used during the
computation progress. Compared with CWT, DWT requires less
computation time so that it will not degrade the signal-processing

performance; hence, DWT is widely used in many fields. Specifi-
cally, Mallat proposed a fast algorithm [35], which makes use of
the fact that the analysis will be very efficient if scales and posi-
tions are chosen based on power of two (dyadic scales factor and
translation factor). Mallat fast algorithm has the ability to obtain
the same accuracy as the other DWTs, while consuming much less
computation. It will be employed in the following study.
Let θ( )t be an original signal, a three level decomposition of θ( )t
using the fast algorithm is specially shown in Fig. 2 to illustrate its
process, where H0, H1 are the low-pass and high-pass filters, re-
spectively. ↓2 is defined as a down-sample process. Within the
three level decomposition, θ( )t is expressed as
θ( ) = + + + ( )t d d d a . 13k k k k1 2 3 3
Now, the signal θ( )t is decomposed into a set of detail coefficients
d1k, d2k, d3k and approximation coefficient a3k.
In 1990, Donoho proposed a method to remove white noise
using wavelets [34]; that is, wavelet denoising (WD). WD de-
composes the signal through discrete wavelet transform to obtain
the wavelet coefficients, which are then processed with a pre-
defined threshold. The coefficients below the level are eliminated;
while the ones above the level remain. Finally, the denoised signal
is extracted from the remaining coefficients without much loss in
original signal characteristics.
4.2. PCA-WD
Now, WD will be combined with PCA to form an innovative
PCA-WD method for fault detection, where WD is intentionally
employed to deal with the noises of T2
and SPE statistics. The
flowchart of PCA-WD for process fault detection is as shown in
Fig. 3. PCA-WD fault detection is divided into two stages: off-line
modeling stage and on-line detection stage. The calculation of Tlim
2
and SPElim are carried out at off-line modeling stage. On-line de-
tection stage includes the calculation of real-time T2
and SPE sta-
tistics, WD, and fault detection. Specifically, WD is employed to
denoise T2
and SPE statistics during on-line stage as shown in
Fig. 3.
In off-line modeling stage, a training data set ∈ ×
RX n m
, col-
lected under certain normal operation condition, is used to de-
velop the PCA model; as such, the control limits of T2
and SPE, Tlim
2
and SPElim, are obtained. In the on-line detection stage, for a new
coming sample x, its T2
and SPE statistics are firstly calculated.
Then, the real-time T2
and SPE statistics slide into a window,
where WD is applied to denoise their noises. Finally, the denoised
T2
and SPE statistics are compared with Tlim
2
and SPElim, respec-
tively; and a fault alarm is triggered if one of the two statistics
exceeds its control limit. Here, the filtering activity is used to
eliminate the noise of the statistics and does not dramatically
change their distributions. Even, [29] shows that the filtered re-
siduals are closer to normal distribution than unfiltered residuals.
Mathematically, the filtering algorithms may bring changes to the
control limits of the statistics. In [29], a theoretical analysis of the
control limits with and without filtering is given; and examples
are used to convince this filtering technique. This technique is also
accepted by other researchers. For instance, in [30], a detection
index is composed upon the filtered statistics to improve the
detectability of PCA. Our approach is a combination of PCA and
WD; in fact, it uses WD to filter the statistics of PCA as well.
Moreover, the WD in our framework is properly designed to make
sure that its amplification factor equals 1. Thus, the control limits
Tlim
2
and SPElim, obtained in off-line modeling stage, can be used in
on-line detection stage.
Now, another problem arises. The application of WD algorithm
involves serval parameters. The parameter configuration has great
effect on WD performance; even makes a WD algorithm un-
applicable under certain conditions. The parameter selection has
become a barrier for field application of WD. Our case combines
WD with PCA, where WD is used to denoise the real-time T2
and
SPE statistics. It makes the parameter selection of the compound
PCA-WD more complicated than traditional WD applications. The
common experience based techniques have no chance to deal with
our problem properly. We come out with an idea to formulate the
parameter selection of PCA-WD as an optimization problem and
get the optimal parameters through optimization solution. For the
purpose of the optimization problem formulation, the parameters
of PCA-WD are to be reviewed in advance.
4.3. Parameters of PCA-WD
4.3.1. Sliding window parameters
In our PCA-WD algorithm, WD is employed to denoise the real-
time T2
and SPE statistics with a sliding window; the denoised T2
and SPE statistics are then used for fault detection. Thus, proper
sliding window length and moving step, notated by len and step,
respectively, have to be determined prior to the application of
PCA-WD. Due to dyadic down-sample, the length of wavelet
coefficients is reduced by a factor of 2j
, where j is the scale factor.
To ensure the perfect reconstruction of original signal, len must be
chosen as a power of 2. step determines how many samples will be
involved in and dropped out of the sliding window in a single
calculation. Generally, large step brings large time delay; while
small one may cause discontinuity in signal.
4.3.2. Wavelets
Our research considers only orthogonal wavelets partially for
simplicity reason; in fact, they are able to obtain perfect reconstruction
Fig. 2. Three level decomposition.
Fig. 3. Flowchart of PCA-WD fault detection.

of the original signals. Specifically, Mallat fast algorithm is used in this
paper due to its efficient computation, where the wavelets are re-
quired to have orthogonality property along with a scaling function ϕ.
There are several wavelets satisfying the above requirements, as Haar,
Daubechies, Symlets and Coiflets.
The Daubechies family, built by Inrid Daubechies, consists of 45
wavelets, where Haar wavelet is actually the first and simplest
wavelet. The Daubechies family has no explicitly mathematical
definition except Haar wavelet. The Symlet family are more sym-
metrical than Daubechies family; however it is not strictly sym-
metrical. The Coiflet family consists of 5 wavelets. For detailed
descriptions of the wavelets refer relevant literatures.
In this paper, the first 15 wavelets of Daubechies family and the
first 15 wavelets of Symlet family will be utilized. The remaining
wavelets of the two families are rather complex; consequently,
they require more computation time, which makes them not
applicable for our field application. Meanwhile, all the 5 wavelets
of Coiflet family are used in the paper due to their computation
efficiency. In the following of the paper, …db db db1, 2, , 15 are
used to notate the first 15 wavelets of Daubechies family,
…sym sym sym1, 2, , 15 are used for the 15 wavelets of Symlet fa-
mily, and …coif coif coif1, 2, , 5 are used for Coiflet family,
respectively.
4.3.3. Threshold parameters
There are two common thresholding methods for WD, as soft
thresholding and hard thresholding. Let WT be the wavelet coef-
ficients and δ be the threshold; then the two thresholding meth-
ods can be respectively expressed as follows.
(i) Hard thresholding
⎧
⎨
⎩
δ
δ
=
| | >
| | ≤ ( )
WT
WT WT
WT
, ;
0, . 14
(ii) Soft thresholding
⎧
⎨
⎪
⎩
⎪
δ δ
δ
δ δ
=
− >
| | ≤
+ < ( )
WT
WT WT
WT
WT WT
, ;
0, ;
, . 15
Compared with hard thresholding, soft thresholding has better
performance because hard thresholding may cause discontinuities
at δ± while soft thresholding remains continuous by shrinking
nonzero coefficients towards zero.
Four threshold selection rules, ‘rigrsure’, ‘sqtwolog’, ‘heursure’
and ‘minimaxi’, as shown in Table 2 will be considered in this
paper. In fact, these threshold selection rules use statistical re-
gression of the noisy coefficients over time to acquire a non-
parametric estimation of the reconstructed signal. Different
threshold selection rule has different impact on denoising
performance.
Threshold rescaling method also affects the denoising perfor-
mance; it needs investigation as well. The general model of wa-
velet denoising is as follows:
σ( ) = ( ) + ( ) ( )s n f n e n , 16
where s(n) is the original signal, f(n) is the pure signal without
noise, e(n) represents noise, s is the noise intensity. The denoising
process is to suppress the noisy part of signal s(n) so as to recover
the pure signal f(n) without noise. Threshold rescaling intends to
adjust s with certain method; obviously, it has influence on the
denoising process. Three threshold rescaling methods as ‘one’,
‘sln’, and ‘mln’ are investigated in this paper. The brief descriptions
of these methods are described in Table 3.
4.3.4. Decomposition level
Generally speaking, the decomposition level, notated by lev,
should be determined in consideration of the frequency band-
width of the original signals. Signals with abundant high-fre-
quency information need larger numbers of decomposition levels.
Large lev requires more computation time and brings time delay.
Meanwhile, the length of sliding window, len, bounds lev as well,
because the dyadic down-sample halves the length of wavelet
coefficients in a single decomposition progress. For instance, if
len¼16 the maximum value of lev should be 4. In this paper, the
decomposition level is bounded within a range between 1 and 5.
4.4. Optimal parameter selection of PCA-WD
4.4.1. Formulation of the parameter selection optimization problem
The parameters of WD, as reviewed above, have more or less
effect on the performance of WD algorithm. The traditional way to
determine the WD parameters is mostly based on individual ex-
perience. In this paper, WD is combined with PCA to form a fault
detection algorithm. Thus, the parameter selection of the com-
pound PCA-WD algorithm is far more complicated than traditional
applications of WD. We come out with an innovative idea to for-
mulate the parameter selection of PCA-WD as an optimization
problem. And the optimal parameter configuration is then ob-
tained through the solution of the optimization problem. The
optimal parameter selection gets the parameters in an automatic
way rather on individual experience and grantees the optimality of
the parameters.
Our optimization problem does not consider WD algorithm
itself only; instead, it takes the PCA-WD fault detection algorithm
as a whole to optimize its parameters. Naturally, the performance
criteria from the fault detection perspective, as false alarm rate
(FAR) and missed detection rate (MDR), should be integrated into
the objective function of the optimization problem. They are de-
fined as follows.
(I) False alarm rate: FAR is described as Eq. (17), which re-
presents the percentage of the falsely alarmed samples to the total
faultless data samples:
=
( )
falsely alarmed samples
faultless data samples
FAR %
17
(II) Missed detection rate: MDR is calculated as Eq. (18), re-
presenting the percentage of the missed faulty samples to the total
faulty data samples:
=
( )
missed faulty samples
faulty data samples
MDR %
18
Table 2
Threshold selection rules.
Rules Descriptions
rigrsure Selection using Steins Unbiased Risk Estimate (SURE)
sqtwolog Fixed threshold
heursure Selection using a mixture of the first two options
minimaxi Selection using the minimax principle
Table 3
Threshold rescaling methods.
Rescaling methods Descriptions
one Select using the basic noise model
sln Select using the basic noise model with unscaled noise
mln Select using the basic noise model with non-Gaussian
white noise

The above two criteria evaluate the performance of fault detection
algorithm under faultless and faulty conditions, respectively. The
lower the two criteria are, the better performance the algorithm
has achieved. Moreover, signal-to-noise ratio (SNR) is a traditional
measure of denoising algorithms from signal conditioning per-
spective. It is defined as follows:
( )= × ( )power powerSNR 10 log / , 19signal noise10
where
∑= ( )
( )
power
n
s n
1
,
20
signal
n
2
and
∑= [ ( ) − ˜( )]
( )
power
n
s n s n
1
.
21
noise
n
2
powersignal in Eq. (20) represents the power of the original signal,
and powernoise in Eq. (21) is the power of the noise. s(n) denotes
the original signal and ˜( )s n is the denoised signal. Generally, higher
SNR is expected, for it indicates less information loss through the
denoising process.
It is reasonable to formulate the objective function both from
the fault detection perspective and signal conditioning perspec-
tive. Specifically, the objective function is expressed as follows:
( ) = ( + )
+ ( − ) ( + + ) ( )
β
β
−
−
J e
e
Pr SNR SNR
1 / FDR MDR 1 . 22
TX
X X
SPEV
t t
2
1 2
The optimization problem is to maximize JXV
, while satisfying the
relevant constraints. In Eq. (22), ∈ ×
RXV
t m
is a selected data set for
verification; and t is the sample number. Furthermore, XV is
composed to contain a subset Xt1, which has t1 faultless samples,
and a subset Xt2, consisting of t2 faulty samples. And, = +t t t1 2.
= [ … ]p p pPr , , ,1 2 7
in Eq. (22) denotes the parameter vector of WD;
in fact, it is the variable to be optimized. SNRT2 and SNRSPE are the
signal-to-noise ratios of T2
and SPE statistics, respectively; which
are calculated using Eq. (19). FARXt1
in Eq. (22) is calculated using
subset Xt1. The falsely alarmed samples, Ct1, are firstly obtained as
follows:
{
( ) ( )
}
=
( = < = ++)
( ) > ( ) >
= +
int C
for i i t i
if T i T i
C C
0;
0; ;
SPE SPE
1;
t
lim lim
t t
1
1
2 2
1 1
where T2
and SPE are denoised PCA statistics using WD algorithm.
= ( )T WD TPr
2 2
and = ( )WDSPE SPEPr , where WDPr means a de-
noising process with the parameter vector Pr and T2
and SPE are
obtained according to Eqs. (8) and (6), respectively. Then, FAR is
calculated as = ( )C tFAR / %tX 1 1t1
. Similarly, MDRXt2
is calculated with
subset Xt2. The un-detected faulty samples are calculated as fol-
lows:
{
( ) ( )
}
=
( = < = ++)
( ) < ( ) <
= +
int C
for i i t i
if T i T i
C C
0;
0; ;
&& SPE SPE
1;
t
lim lim
t t
2
2
2 2
2 2
Then, = ( )C tMDR / %tX 2 2t2
. In Eq. (22), β is a weighting factor; it is
used to balance the criteria from the fault detection perspective
and from the signal conditioning perspective. It makes our opti-
mization problem capable of satisfying different purposes by
tuning the value of β. Specifically, β can be set to a value larger
than 6 so as to guarantee lower FDR and MDR; on the other hand,
a value smaller than 6 leads to higher SNR of original signals. This
paper pays more attention on FDR and MDR of fault detection;
furthermore, β is intentionally set to 6.8 in the following in-
vestigation. Mathematically speaking, β¼6.8 is capable of elim-
inating the dimension differences between the two terms of
( )J PrXV
.
Finally, the parameter selection of PCA-WD is formulated as an
optimization problem as follows:
( ) = ( + )
+ ( − ) ( + + ) ( )
β
β
−
−
J e
e
Prmax SNR SNR
1 / FDR MDR 1 , 23
TX
X X
SPEV
t t
2
1 2
s.t.
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
∑ ∑
∑ ∑
Λ
β
= ( )
= ( )
= × [ ( ) − ( )]
= × ( ) [ ( ) − ( )]
= ( )
= ( )
= ^ ^ ( = … )
= ∥ ∥ ( = … )
= [ … ]
>
∼
= =
= =
−
C t
C t
T T t T t
t t t
T WD T
WD
T j t
C j t
p p p
P P
Pr
FAR / % ,
MDR / % ,
SNR 10 log ,
SNR 10 log SPE SPE SPE ,
,
SPE SPE ,
x x , 1, , ,
SPE x , 1, ,
, , , ,
0.
t
t
T
i
t
i
t
i
t
i
t
j j
T
j
j j
X
X
Pr
Pr
1 1
2 2
10
1
2
2
1
2 2 2
SPE 10
1
2
1
2
2 2
2 1
2
1 2 7
t
t
1
2
2
^ ( = … )j tx , 1, , ,j is the jth sample of the verification data set. The
score matrix P and projection matrix
∼
C are obtained through the
training process of the PCA model at the off-line stage. The solu-
tion to this problem, ¯Pr ¼[ ¯ ¯ … ¯ ]p p p, , ,1 2 7
, is the optimal parameters
of our PCA-WD fault detection algorithm. Now, we face the pro-
blem of solving the above optimization problem.
4.4.2. Solving of the optimization problem
Obviously, the objective function and some constraints are
nonlinear and even non-analytical. It makes the classical optimi-
zation techniques, such as linear programming (LP), dynamic
programming (DP), not feasible for our problem. Intelligence-
based techniques such as genetic algorithm (GA) and PSO can be
solutions to the problem. In literature, PSO has been widely used
in many fields such as mechanical, chemical, civil, and aerospace
design, because it has advantages such as comparative simplicity,
rapid convergence and little parameters to be adjusted. PSO is
known to effectively solve large-scale nonlinear optimization

problems; it is suitable for our problem at hand.
PSO is a stochastic search method which was firstly introduced
by Kennedy and Eberhart [38]. The main strategy of PSO is to
utilize the social behaviors and the communications involved in
swarms such as bird flocking and fish schooling. Each particle in
PSO is treated as a volumeless particle in g-dimensional searching
space; its velocity and position are adjusted according to its past
and companions’ experience.
PSO starts from a random swarm of particles called initial po-
pulation in the g-dimensional searching space. Let the swarm size
be U, the position and velocity of each particle are defined as
( ) = [ ( ) ( ) … ( )] ( )t t t tP p , p , , p , 24i i i i,1 ,2 ,g
and
( ) = [ ( ) ( ) … ( )] ( )t t t tV v , v , , v , 25i i i i,1 ,2 ,g
respectively, where = …i U1, 2, , , represents the ith particle and t
denotes the iteration time. The velocity ( )tVi and position ( )tPi of
each particle are iteratively modified according to the following
rules:
ω( + ) = ( ) + ( ) ( )( ( ) − ( ))
+ ( ) ( )( ( ) − ( )) ( )
t t c t r t t t
c t r t t t
v 1 v P P
P P 26
i i d bi d i d
d i d
, 1 1 , ,
2 2 g, ,
( + ) = ( ) + ( ) ( )t t tp 1 p v 27i d i d i d, , ,
where = …d 1, 2, , g represents the dth member of a particle, ( )r t1
and ( )r t2 are random numbers, generated from a uniform dis-
tribution in the range [0,1], to provide a stochastic weighting for
components involved in Eq. (26). The constants c1 and c2 represent
the weights of stochastic acceleration terms that pull each particle
toward its pbest and gbest, respectively. The inertia weight factor
ω is used as a trade-off between global and local exploration
capabilities of the swarm. A large inertia weight factor tends to
facilitate global exploration, while a small one facilitates local
exploration.
In practice, ω generally decreases linearly from 1.2 down to
0.4 during the iterations. Specifically, the inertia weight factor ω,
in this paper, is generated as follows.
ω ω
ω ω
= −
−
×
( )iter
iter
28
max
max min
max
where itermax denotes the maximum iteration number, and iter
represents the current iteration.
In the procedures above, the velocity ( )tvi d, and position ( )tpi d,
of
each particle are imposed a bound to prevent the swarm over
exploration. The maximum and the minimum velocities are de-
fined as vd
max
and vd
min
; and the maximum and the minimum po-
sitions are notated by pd
max
and pd
min
.
Thus, if ( ) >tv vi d d
max
, , then ( ) =tv vi d d
max
, ; if ( ) <tv vi d d
min
, , then
( ) =tv vi d d
min
, .
If ( ) >tp pi d d
max
,
, then ( ) =tp pi d d
max
,
; if ( ) <tp pi d d
min
,
, then
( ) =tp pi d d
min
,
.
Eqs. (26) and (27) are iterated until convergence is reached.
Each particle tracks its coordinates in the search space, which
means the best solution achieved by ith particle, called pbest and
notated as ( ) ∈tPbi
g
R . Accordingly, the global best value is called
gbest and notated as ( ) ∈tPg
g
R , representing the overall best so-
lution obtained by the particles in the swarm.
Specifically, a particle P(t) in our problem is defined as follows,
which in fact represents a potential solution to the optimization
problems:
( ) = [ ( ) ( ) … ( )] ( )t t t tP p , p , , p , 291 2 7
where ( ) ( = … )t jp 1, , 7j
represents a specific element of the
parameter vector; and t represents the iteration time. The para-
meter vector is described in Table 4. According to the definitions as
shown in Table 4, each parameter element is coded so as to imply
specific meaning with different values. For example, =p 301
im-
plies ‘db15’ wavelet; moreover, a WD parameter configuration as
= [ ]P 16, 1, 2, 3, 1, 3, 5 implies ‘db1’ wavelet, ‘soft’ thresholding,
‘sqtwolog’ threshold selection rule, ‘mln’ threshold rescaling
method, 3 level decomposition, 64 lengths and 5 steps of sliding
window.
5. Fault detection of FWTP
In this section, the proposed fault detection algorithms will be
applied to the FWTP as outlined in Section 2 for verification pur-
pose. We just take side A of the FWTP as study case because side B
is very similar to side A. Further, only part of the valves and sen-
sors of side A, as listed in Table 3, are selected for fault detection
research because they are accessible through the SIS of the power
plant. In fact, the FWTP of a utility boiler are intermittently op-
erated due to its unique operating procedure. Generally, the PCA
based algorithms can hardly deal with the problems resulting from
the alterative working condition of industrial processes. In this
paper, the whole working phase are firstly distinguished into
several conditions; and the PCA based algorithms are applied to
the same or similar working conditions. With a deep analysis of
the flowchart and operating procedure, we found the working
conditions of side A can distinguish with the status of relevant
values. Thus, W1,…,W6 in Table 1 are used for working condition
classification only; and S1,…,S12 are selected sensors for fault
detection of the FWTP.
The operation data is collected from the SIS of the plant
through a OPC (OLE for Process Control) interface. 500 samples of
the 12 sensors, under a kind of typical working condition, are
collected with a sampling rate of 5 s. They are further used to form
a training data set. Another 1000 samples, under the same work-
ing condition but within different time period, are collected for
fault detection validation. For a mature industrial process as
FWTPs, it is not easy to capture its abnormal operation conditions.
Hence, we intentionally introduce several kinds of faults to the
operation data to simulate the operation conditions with faults.
One thing to note is that in the following studies the same training
data set and verification data set are applied to different algo-
rithms for fair comparison study.
Table 4
Description of the parameter vector.
Elements Content Code Description
p1 sym1,…,smy15, db1,…,db15,
coif1,…,coif5
[1,35] Wavelet species
p2 soft thresholding, hard
thresholding
[1,2] Threshold method
p3 ‘rigrsure’, ‘sqtwolog’, ‘heursure’,
‘minimaxi’
[1,4] Threshold selection rule
p4 ‘one’, ‘sln’, ‘mln’ [1,3] Threshold rescaling
method
p5 1,2,3,4,5 [1,5] Decomposition level
p6 256, 128, 64, 32 [1,4] Length of sliding window
p7 1,2,3,…,32 [1,32] Step of sliding window

5.1. Application of classical PCA
First of all, the classical PCA is investigated. It is applied to the
FWTP according to the procedure proposed in Section 3. Specifi-
cally, the detection ability of the approach with respect to single
fault from a sensor is verified. The inlet flow rate of 1# anion bed,
S5, is used as study case. A constant deviation fault is intentionally
added to S5 from samples 401 to 800. The T2
and SPE statistics of
classical PCA fault detection algorithm are illustrated in Fig. 4. The
blue solid line represents the real-time statistics and the red dash
lines denote the corresponding control limits. It can be seen that
under the normal working condition a number of samples exceed
T2
control limit, which brings false alarms if the fault criterion is
strictly applied. On the other hand, the SPE values of some samples
do not surpass its control limit under fault condition, which tends
to miss fault alarms. The problems are mainly caused by the noise
of T2
and SPE statistics. We carried out several studies where faults
are added to different sensors; and the similar results are gotten. It
shows that the classical PCA is not applicable for fault detection of
the FWTP.
5.2. Application of PCA-WD
This paper intends to deal with the noise of T2
and SPE statistics
with wavelet denoising. A WD step is attached to T2
and SPE sta-
tistics before fault detection as shown in Fig. 3. The same training
data set and verification data set as above are applied to PCA-WD
with and without optimal parameter selection. The application
procedure of PCA-WD is shown in Fig. 3.
5.2.1. PCA-WD without parameter optimization
Accordingly, a constant deviation fault is intentionally added to
S5 from samples 401 to 800 to test the performance of PCA-WD.
Here, the WD parameters are determined through calculation
analysis of cases; in other words, the parameters are selected
mostly on the researcher's experience. The parameters of WD and
sliding window are listed as below.
‘Coieft’ 4 wavelet.
Decomposition level¼3.
Threshold parameters: soft thresholding method, sqtwolog
threshold selection rules, sln threshold rescaling method.
Sliding window length len¼256, and sliding window step
step¼32.
The T2
and SPE statistics of PCA-WD fault detection algorithm
with the above parameters are illustrated in Fig. 5. It can be seen
that both T2
and SPE statistics of the samples between 401 and
800, where the fault is introduced, go up beyond their control
limits. On the other hand, during the periods without fault, T2
and
SPE go down below their limits. Fig. 5 demonstrates that the PCA-
WD algorithm, with meticulously selected parameters, is capable
of detecting the fault, more precisely. The FDR and MDR, compared
with classical PCA, are much lower.
However, the performance of PCA-WD is sensitive to the WD
parameter selection. Poorly selected parameters may decrease the
performance of PCA-WD. The empirical WD parameter selection
relies too much on individual's experience. And, it is time con-
suming and cannot get the optimal parameter configuration. A
better way proposed in this paper is to obtain the parameters with
certain optimization techniques.
5.3. PCA-WD with parameter optimization
In Eq. (23), the parameter selection of PCA-WD is formulated as
an optimization problem, which takes the parameter vector of WD
as the optimization variable. In literature, PSO algorithm has been
successfully employed to solve complex optimization problems. In
this paper, PSO is also used to determine the optimal parameters
of PCA-WD. The aim of the PSO method is to determine which set
of parameters, i.e. wavelet species, p1, threshold method, p2,
threshold selection rules, p3, threshold rescaling method, p4, the
decomposition level p5, length and step of sliding window, p6 and
p7, is optimal for fault detection. Here, the parameters of PCA-WD
are coded as integers, as shown in Table 4; hence, the real values of
the parameters must be rounded to its nearest integer values
during each iteration.
The same training data set, containing 500 samples, and the same
verification data set, containing another 1000 samples, are used to
verify the PCA-WD with parameter optimization. Further, a bias fault,
with the amplitude of 8% of the sensor mean value, is intentionally
introduced to sensor S5 between 401 and 800 samples. The PCA
model is firstly built with the training data set to get the score matrix
P, the projection matrix
∼
C and the control limits of the two statistics
as Tlim
2
and SPElim. Then, the verification data set is applied to the
parameter selection problem as shown in Eq. (23), where t¼1000,
t1¼700, and t2¼300. PSO is used to solve the optimization problem.
The parameters of PSO are specifically explained as follows.
Generally, the population size implies a balance between ac-
curacy, stability, computation time and dimension. In our case,
population size is set to 50.
Inertia weight factor ω is a trade-off between global and local
exploration capabilities of the swarms. It is set according to
Eq. (24), where ωmax¼1.2 and ωmin ¼0.4.
0 200 400 600 800 1000
0
20
40
60
80
samples
T2
0 200 400 600 800 1000
0
10
20
30
samples
SPE
T2
statistic
T2
lim
SPE statistic
SPElim
Fig. 4. Statistics of classical PCA. (For interpretation of the references to color in
this figure caption, the reader is referred to the web version of this paper.)
0 200 400 600 800 1000
0
20
40
60
80
samples
T2
0 200 400 600 800 1000
0
10
20
30
samples
SPE
T2
statistic
T2
lim
SPE statistic
SPElim
Fig. 5. Statistics of PCA-WD without parameter optimization.

The lower and upper bounds of pd, pd
min
and pd
max
are set ac-
cording to Table 4.
The limits of velocity change must be within a reasonable
bound. We set =v p /2d
max
d
max
and = −v p /2d
min
d
max
, so as to avoid
over exploration.
The acceleration constants c1 and c2 represent the weights of
stochastic acceleration terms toward local and global best, re-
spectively. In our case, c1 ¼1.2 and c2 ¼1.2.
Weighting factor β = 6.8.
Actually, the parameter selection of PSO is a rather broad topic.
This paper focuses on the application of PSO, instead of the PSO
algorithm itself. Its parameters are selected through sample cal-
culation analysis. The optimization process is shown in Fig. 6,
where the objective function converges to its maximum value,
1.04, after 20th iteration. Meanwhile, FDR¼0%, MDR¼0%,
SNRT2 ¼18.13 and SNRSPE¼16.50 when the parameter vector gets
its optimal value. The solution to the optimization problem,
Pr ¼ [ ¯ ¯ … ¯ ]p p p, , ,1 2 7
, contains the optimal parameter configuration
of the PCA-WD algorithm, as shown in Table 5.
For comparison, the T2
and SPE statistics in terms of the ver-
ification data set, with classical PCA and optimized PCA-WD, are
shown in Fig. 7(a) and (b), respectively, where the blue solid line
represents the real-time statistics and the red dash line denotes
the corresponding control limit. During the faultless conditions, as
1–500 and 801–1000, the T2
and SPE statistics of classical PCA
fluctuate heavily. It is the source of false detection under faulty
condition and missed detection under faultless condition. On the
contrary, the optimized PCA-WD has the ability to achieve precise
fault detection (FDR¼0% and MDR¼0%); because the WD part can
eliminate the effect of the noise of T2
and SPE statistics dramati-
cally. The performance criteria of classical PCA and optimized PCA-
WD are listed in Table 6 for quantitative comparison purpose. FDR
and MDR are the core criteria of fault detection algorithms. In
Table 6, the FDR and MDR of T2
and SPE of classical PCA are 9.83%
and 10.8%; comparatively, both FDR and MDR of the optimized
PCA-WD are zero. The results show that the optimized PCA-WD
can improve the fault detection performance greatly.
The results from PCA-WD with and without parameter opti-
mization are similar, because the two algorithms are identical and
the only difference is the way to determine the parameters. The
PCA-WD with optimization excels in getting the optimal para-
meters in an automatical and deterministic way.
6. Comparative studies
The above section demonstrates the application of PCA-WD
and makes a comparative analysis between classical PCA, PCA-WD
0 10 20 30 40 50
0.2
0.4
0.6
0.8
1
1.2
iteration
objectivefunction
Fig. 6. Objective function value.
Table 5
Optimal parameters of PCA-WD.
Component Parameter Value
¯p1 Wavelet species db15
¯p2 Sliding window step 22
¯p3 Threshold method soft
¯p4 Threshold selection rule sqtwolog
¯p5 Threshold rescaling method sln
¯p6 Decomposition level 3
¯p7 Sliding window length 256
0 200 400 600 800 1000
0
20
40
60
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
20
40
60
samples
SPE
SPE statistic
SPElim
(a) classical PCA
0 200 400 600 800 1000
10
20
30
40
50
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
20
40
60
samples
SPE
SPE statistic
SPElim
(b) PCA-WD with parameter optimization
Fig. 7. T2
and SPE statistics with classical PCA and optimized PCA-WD. (For inter-
pretation of the references to color in this figure caption, the reader is referred to
the web version of this paper.)
Table 6
Performance comparison between classical PCA and optimized PCA-WD.
Statistics FDR (%) MDR (%) SNR
T2
SPE T2
SPE T2
SPE
Classical PCA 1.5 8.33 0 10.8 – –
Optimized PCA-WD 0 0 0 0 18.13 16.50
Table 7
Fault descriptions.
Study cases Fault description Fault samples
Normal condition – –
Bias fault =d 8%1 501–800
Drift fault = ( − )⁎d k0.05 3002 501–800
Broken line =d 03 501–800

without optimization and PCA-WD with optimization. However,
the results are obtained only with constant deviation fault; logi-
cally, it cannot guarantee the electiveness of PCA-WD under other
kinds of faults. To thoroughly test the advantages of optimized
PCA-WD, some comparative studies between classical PCA,
SWPCA, and optimized PCA-WD are to be carried out. Speciﬁcally,
four study cases, as listed in Table 7, are used.
Study Case 1: Normal condition without any fault.
Study Case 2: Bias fault is added to S5 from 501 to 800 with a
amplitude of 8% of its mean value.
Study Case 3: Drift fault is added to S2 from 501 to 800; and its
amplitude varies linearly with time, as
= × ( − )d k0.05 300 ,
where d is the fault amplitude and k is the sample number.
Study Case 4: Broken line fault is introduced to S8 from 501 to
800; which is implemented by setting the amplitude of S8 to zero.
For fair comparisons between the algorithms, the same training
data set in the above section is applied to the algorithms. More-
over, the threshold of CPV l and conﬁdence limit α are set to 85%
and 99%, respectively; and the sliding window length of SWPCA is
set to 500. Here, the PCA-WD takes the optimized parameters as
shown in Table 5 in the following investigations.
0 200 400 600 800 1000
0
10
20
30
40
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
5
10
15
20
samples
SPE
SPE statistic
SPElim
(a) statistics of PCA
0 200 400 600 800 1000
0
10
20
30
40
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
5
10
15
20
samples
SPE
SPE statistic
SPElim
(b) statistics of SWPCA
0 200 400 600 800 1000
0
10
20
30
40
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
5
10
15
20
samples
SPE
SPE statistic
SPElim
(c) statistics of PCA-WD
Fig. 8. Fault detection results under normal condition.
0 200 400 600 800 1000
0
50
100
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
10
20
30
samples
SPE
SPE statistic
SPElim
0 200 400 600 800 1000
0
50
100
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
10
20
30
40
samples
SPE
SPE statistic
SPElim
0 200 400 600 800 1000
0
50
100
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
10
20
30
samples
SPE
SPE statistic
SPElim
(c) statistics of optimized PCA-WD
Fig. 9. Fault detection results under bias fault.

6.1. Study Case 1: normal condition
Firstly, the original operation data set of the FWTP under nor-
mal condition, from SIS of the plant, is directly taken as verifica-
tion data set. It is then applied to classical PCA SWPCA, and PCA-
WD with optimization, respectively; and the results are compared
in Fig. 8. If the fault criterion is strictly applied, both classical PCA
and SWPCA cause false detection under faultless conditions; on
the contrary, PCA-WD operates normally without any false alarm
(FDR¼0%). Consequently, under normal conditions, the optimized
PCA-WD shows better fault detection performance comparing
with the classical PCA and SWPCA, due to its ability to restrain the
fluctuation of T2
and SPE statistics.
6.2. Study Case 2: bias fault
Secondly, a bias fault is deliberately introduced to S5 from 501
to 800 with an amplitude of 8% of its mean value to form a ver-
ification data set. The data set is then respectively applied to the
3 fault detection algorithms with the results as shown in Fig. 9.
Under the faultless conditions, as 1–500 and 801–1000, both
classical PCA and SWPCA bring false alarms. And the two algo-
rithms also cause missed detections under the faulty condition
from 501 to 800. Fig. 9 shows that the optimized PCA-WD works
well under both faultless and faulty conditions. In a word, the
optimized PCA-WD outperforms the classical PCA and SWPCA in
dealing with bias fault.
0 200 400 600 800 1000
0
50
100
150
200
250
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
20
40
60
samples
SPE
SPE statistic
SPElim
0 200 400 600 800 1000
0
50
100
150
200
250
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
20
40
60
samples
SPE
SPE statistic
SPElim
0 200 400 600 800 1000
0
50
100
150
200
250
samples
T2
T2
statistic
T2
lim
0 200 400 600 800 1000
0
20
40
60
samples
SPE
SPE statistic
SPElim
(c) statistics of PCA-WD
Fig. 10. Fault detection results under drift fault.
0 200 400 600 800 1000
0
200
400
600
samples
T2
0 200 400 600 800 1000
0
10
20
30
40
samples
SPE
T2
statistic
T2
lim
SPE statistic
SPElim
0 200 400 600 800 1000
0
200
400
600
samples
T2
0 200 400 600 800 1000
0
50
100
samples
SPE
T2
statistic
T2
lim
SPE statistic
SPElim
0 200 400 600 800 1000
0
250
500
samples
T2
0 200 400 600 800 1000
0
10
20
30
40
samples
SPE
T2
statistic
T2
lim
SPE statistic
SPElim
(c) statistics of PCA-WD method
Fig. 11. Fault detection results under broken line fault.

6.3. Study Case 3: drift fault
Thirdly, a drift fault is added to S2 from 501 to 800 to form a data
set for comparative study, whose amplitude varies linearly with time.
The data set is then respectively applied to classical PCA, SWPCA and
optimized PCA-WD respectively. The results are shown in Fig. 10.
Again, the classical PCA, SWPCA cause false detections and missed
detections under faultless and faulty conditions; comparatively, the
optimized PCA-WD is capable of dealing with the drift fault.
6.4. Study Case 4: broken line fault
Finally, a broken line fault is added to S9 from 501 to 800 to test
the three algorithms. The broken line fault is simulated by setting
the value of S9 to zero during the faulty period. The results are
shown in Fig. 11. It can be seen that the optimized PCA-WD is
much better than classical PCA and SWPCA under this condition.
With a careful analysis, we find that even the optimized PCA-WD
brings relatively high FDR, specifically, its SPE FDR reaches 1.6%.
Fortunately, the MDRs of T2
and SPE statistics keep zero too; they
are actually the key indexes of the reliability of the optimized PCA-
WD algorithm. The reason is that broken line fault causes strong
signal jumps to the corresponding sensor, which in fact decreases
the performance of the objected fault detection algorithms with-
out exception of the optimized PCA-WD.
Furthermore, the performance criteria of the three algorithms
with the four cases are listed in Table 8 for mathematical com-
parison purpose. When applied to the FWTP of a coal-fired power
plant, the optimized PCA-WD performs much better than classical
PCA and SWPCA, under the conditions with above four typical
faults. Under normal condition, the optimized PCA-WD works well
without any false detection and missed detection. In terms of the
conditions with the bias fault and drift fault, the optimized PCA-
WD brings false detection too; but its FDRs are acceptable and
much lower than classical PCA and SWPCA. Under the conditions
with broken line fault, the performance of optimized PCA-WD is
decreased by the strong signal jumps. Specifically, the false de-
tections and missed detections are always happened along with
the samples where the signal changes present. In practice, the
signals of the feed water treatment process do not fluctuate vio-
lently, unlike what happened in the above simulation studies.
Somehow, it will improve the performance of the optimized PCA-
WD algorithm, and makes the newly proposed algorithm applic-
able in field application.
7. Conclusion
Feed water treatment process is a vital sub-process of an utility
boiler, in practice, sensor faults tend to cause severe consequences.
An effective fault detection algorithm is much needed to improve
the reliability of FWTPs. Classical PCA has been employed to the
FWTP in our previous work; however, the noises of T2
and SPE
statistics lead to relatively high rate of false detections and missed
detections. In this paper, wavelet denoise is used to deal with this
problem. Specifically, the WD is combined with PCA to form a new
PCA-WD algorithm. The performance of PCA-WD is sensitive to its
parameters; and the parameter selection of this compound algo-
rithm is difficult. This paper formulates the parameter selection
PCA-WD as an optimization problem and employs PSO to deal
with its nonlinearity and complexity. A FWTP from a coal-fired
power plant is taken as a study case. The real operation data of the
FWTP is collected to verify the PCA-WD algorithm. The result
shows that WD is effective to restrain the noises of T2
and SPE
statistics so as to improve the performance of PCA-WD algorithm.
And the parameter optimization can obtain the optimal para-
meters of PCA-WD in an automatic way; and thus relive the de-
pendence on individual's experience. The comparative studies
between classical PCA, SWPCA and optimized PCA-WD algorithms,
in terms of four kinds of faults, are finally carried out. The opti-
mized PCA-WD excels its opponent under all the conditions. The
results further convince the advantages of the newly proposed
PCA-WD algorithm. However, this paper mainly focuses on the
application of PCA-WD algorithm and does not present the theo-
retical analysis concerning the denoising of the PCA statistics. In
our future work, we plan to establish a solid base for our PCA-WD
algorithm with strict theoretical proof.
Acknowledgment
This project is supported by National Natural Science Founda-
tion of China (Grant No. 51475337) and International Science
Technology Cooperation Program of China (Grant No.
2015DFG72440), and is partially supported by the Open Research
Fund of Key Laboratory of Transients in Hydraulic Machinery,
Ministry of Education.
References
[1] Yin S, Ding SX, Xie X, Luo H. A review on basic data-driven approaches for
industrial process monitoring. IEEE Trans Ind Electron 2014;61(11):6418–28.
[2] Ge Z, Song Z, Gao F. Review of recent research on data-based process mon-
itoring. Ind Eng Chem Res 2013;52(10):3543–62.
[3] Dong H, Wang Z, Gao H. Fault detection for Markovian jump systems with
sensor saturations and randomly varying nonlinearities. IEEE Trans Circuits
Syst I—Regul Pap 2012;59(10):2354–62.
[4] Chen KY, Chen LS, Chen MC, Lee CL. Using SVM based method for equipment
fault detection in a thermal power plant. Comput Ind 2011;62(1):42–50.
[5] Hong JJ, Zhang J, Morris J. Progressive multi-block modelling for enhanced
fault isolation in batch processes. J Process Control 2014;24(1):13–26.
[6] Widodo A, Yang BS. Wavelet support vector machine for induction machine fault
diagnosis based on transient current signal. Expert Syst Appl 2008;35(1):307–16.
[7] Yin S, Ding SX, Haghani A, Hao H, Zhang P. A comparison study of basic data-
driven fault diagnosis and process monitoring methods on the benchmark
Tennessee Eastman process. J Process Control 2012;22(9):1567–81.
[8] MacGregor JF, Kourti T. Statistical process control of multivariate processes.
Control Eng Pract 1995;3(3):403–14.
[9] Xu X, Xiao F, Wang S. Enhanced chiller sensor fault detection, diagnosis and
estimation using wavelet analysis and principal component analysis methods.
Appl Thermal Eng 2008;28(2):226–37.
[10] Jackson JE, Mudholkar GS. Control procedures for residuals associated with
principal component analysis. Technometrics 1979;21(3):341–9.
[11] Schölkopf B, Smola AJ, Müller K. Nonlinear component analysis as a kernel
eigenvalue problem. Neural Comput 1998;10(5):1299–399.
[12] Jia M, Xu H, Liu X, Wang N. The optimization of the kind and parameters of kernel
function in KPCA for process monitoring. Comput Chem Eng 2012;46(15):94–104.
[13] Li CH, Lin CT, Kuo BC, Chu HS. An automatic method for selecting the para-
meter of the RBF kernel function to support vector machines. In: 2010 IEEE
international geoscience remote sensing symposium, Honolulu, Hawaii, 25–
30 July 2010. p. 836–9.
Table 8
FDR and MDR of different methods.
Cases Algorithms FDR (%) MDR (%)
T2
SPE T2
SPE
Normal Classical PCA 1.4 5.2 – –
SWPCA 0.5 3.5 – –
PCA-WD 0 0 – –
Bias fault Classical PCA 1.33 11.67 0.25 2.5
SWPCA 0.5 3.5 0.75 17.25
PCA-WD 0.33 0 0 0.5
Drift fault Classical PCA 1.5 4.33 0 0.5
SWPCA 0.5 2.67 0 69
PCA-WD 0.33 0 0 0
Broken line Classical PCA 1.11 4.9 0 0
SWPCA 0.29 3.14 0 0
PCA-WD 0 1.6 0 0

[14] Ding S, Zhang P, Ding E, Naik A, Deng P, Gui W. On the application of PCA
technique to fault diagnosis. Tsinghua Sci Technol 2010;15(2):138–44.
[15] Lu N, Wang F, Gao F. Combination method of principal component and wa-
velet analysis for multivariate process monitoring and fault diagnosis. Ind Eng
Chem Res 2003;42(18):4198–207.
[16] Zhang Y, Ma C. Fault diagnosis of nonlinear processes using multiscale KPCA
and multiscale KPLS. Chem Eng Sci 2011;66(1):64–72.
[17] Xu J, Hu S. Nonlinear process monitoring and fault diagnosis based on KPCA
and MKL-SVM. In: 2010 international conference on artificial intelligence and
computational intelligence(AICI2010), Sanya, China, 23–24 October 2010. p.
233–7.
[18] Chen D, Li Z, He Z. Research on fault detection of Tennessee Eastman process
based on PCA. In: 25th Chinese control and decision conference (CCDC2013),
Guiyang, China, 25–27 May 2013. p. 1078–81.
[19] Li G, Alcala CF, Qin SJ, et al. Generalized reconstruction-based contributions
for output-relevant fault diagnosis with application to the Tennessee Eastman
process. IEEE Trans Control Syst Technol 2011;19(5):1114–27.
[20] Hu Z, Chen Z, Gui W, Jiang B. Adaptive PCA based fault diagnosis scheme in
imperial smelting process. ISA Trans 2014;53:1446–55.
[21] Liu K, Jin X, Fei Z, Liang J. Adaptive partitioning PCA model for improving fault
detection and isolation. Chinese J Chem Eng 23(6), 2015, 981–991. http://dx.
doi.org/10.1016/j.cjche.2014.09.052.
[22] Jaffel I, Taouali O, Elaissi E, Messaoud H. A new online fault detection method
based on PCA technique. IMA J Math Control Inf 31, 2014, 487–499. http://dx.
doi.org/10.1093/imamci/dnt025.
[23] Wang X, Kruger U, Irwin GW. Process monitoring approach using fast moving
window PCA. Ind Eng Chem Res 2005;44(15):5691–702.
[24] Kim D, Lee IB. Process monitoring based on probabilistic PCA. Chemom Intell
Lab Syst 2003;67(2):109–23.
[25] Lau CK, Ghosh K, Hussain MA, Hussan CRC. Fault diagnosis of Tennessee Eastman
process with multi-scale PCA and ANFIS. Chemom Intell Lab Syst 2013;120:1–14.
[26] Song B, Shi H, Ma Y, Wang J. Multisubspace principal component analysis with
local outlier factor for multimode process monitoring. Ind Eng Chem Res
2014;53(42):16453–64.
[27] Dunia R, Qin S. A unified geometric approach to process and sensor fault
identification and reconstruction: the unidimensional fault case. Comput
Chem Eng 1998;22(7–8):927–43.
[28] Dunia R. Identification of faulty sensors using principal component analysis.
AIChE J 1996;42(10):2797–812.
[29] Qin S, Yue H, Dunia R. Self-validating inferential sensors with application to air
emission monitoring. Ind Eng Chem Res 1997;36:1675–85.
[30] Harkat M, Mourot G, Ragot J. An improved PCA scheme for sensor FDI: application
to an air quality monitoring network. J Process Control 2006;16:625–34.
[31] Fan J, Qin S, Wang Y. Online monitoring of nonlinear multivariate industrial
processes using filtering KICA-PCA. Control Eng Pract 2014;22:205–16.
[32] Shao R, Hu W, Wang Y, Qi X. The fault feature extraction and classification of gear
using principal component analysis and kernel principal component analysis based
on the wavelet packet transform. Measurement 2014;54:118–32.
[33] Kim H, Melhem H. Damage detection of structures by wavelet analysis. Eng
Struct 2004;26(3):347–62.
[34] Donoho DL. Denoising by soft-thresholding. IEEE Trans Inf 1995(3):613–27.
[35] Mallat S. A theory for multiresolution signal decomposition: the wavelet re-
presentation. IEEE Trans Pattern Anal Mach Intell 1989;11(7):674–93.
[36] Li N, Zhou R, Hu Q, Liu X. Mechanical fault diagnosis based on redundant
second generation wavelet packet transform, neighborhood rough set and
support vector machine. Mech Syst Signal Process 2012;28:608–21.
[37] del Valle Y, Venayagamoorthy G, Mohagheghi S, Hernandez J, Harley R. Par-
ticle swarm optimization: basic concepts, variants and applications in power
systems. IEEE Trans Evol Comput 2008;12:171–95.
[38] Kennedy James. Particle swarm optimization. Encyclopedia of machine
learning.US: Springer; 2010. p. 760–6.
[39] Kennedy J, Eberhart RC. A discrete binary version of the particle swarm op-
timizer. In: IEEE conference on computational cybernetics and simulation, vol.
5, 1997. p. 4104–8.
[40] Laskari EC, Parsopoulos KE, Vrahatis MN. Particle swarm optimization for integer
programming. In: IEEE congress on evolutionary computation, 2002. p. 1582–7.
[41] Liao CJ, Tseng CT, Luarn P. A discrete version of particle swarm optimization
for flowshop scheduling problems. Comput Oper Res 2007;34(10):3099–111.
[42] Datta D, Figueira JR. A real-integer-discrete-coded particle swarm optimiza-
tion for design problems. Appl Soft Comput 2011;11(4):3625–33.

PCA-WD Fault Detection with PSO Parameter Optimization

Recommended

Recommended

More Related Content

Similar to PCA-WD Fault Detection with PSO Parameter Optimization

Similar to PCA-WD Fault Detection with PSO Parameter Optimization (20)

More from ISA Interchange

More from ISA Interchange (20)

Recently uploaded

Recently uploaded (20)

PCA-WD Fault Detection with PSO Parameter Optimization