Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fault detection of feed water treatment process using PCA-WD with parameter optimization

75 views

Published on

Feed water treatment process (FWTP) is an essential part of utility boilers; and fault detection is expected for its reliability improvement. Classical principal component analysis (PCA) has been applied to FWTPs in our previous work; however, the noises of T2 and SPE statistics result in false detections and missed detections. In this paper, Wavelet denoise (WD) is combined with PCA to form a new algorithm, (PCA- WD), where WD is intentionally employed to deal with the noises. The parameter selection of PCA-WD is further formulated as an optimization problem; and PSO is employed for optimization solution. A FWTP, sustaining two 1000 MW generation units in a coal-fired power plant, is taken as a study case. Its operation data is collected for following verification study. The results show that the optimized WD is effective to restrain the noises of T2 and SPE statistics, so as to improve the performance of PCA-WD algorithm. And, the parameter optimization enables PCA-WD to get its optimal parameters in an auto- matic way rather than on individual experience. The optimized PCA-WD is further compared with classical PCA and sliding window PCA (SWPCA), in terms of four cases as bias fault, drift fault, broken line fault and normal condition, respectively. The advantages of the optimized PCA-WD, against classical PCA and SWPCA, is finally convinced with the results.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Fault detection of feed water treatment process using PCA-WD with parameter optimization

  1. 1. Research article Fault detection of feed water treatment process using PCA-WD with parameter optimization Shirong Zhang a , Qian Tang a , Yu Lin a , Yuling Tang b,n a Department of Automation, College of Power and Mechanical Engineering, Wuhan University, Wuhan 430072, China b College of Computer Science, South-Central University for Nationalities, Wuhan, Hubei 430074, China a r t i c l e i n f o Article history: Received 10 December 2015 Received in revised form 15 January 2017 Accepted 22 March 2017 Available online 3 April 2017 Keywords: Feed water treatment process Fault detection PCA Wavelet denoise Parameter optimization a b s t r a c t Feed water treatment process (FWTP) is an essential part of utility boilers; and fault detection is expected for its reliability improvement. Classical principal component analysis (PCA) has been applied to FWTPs in our previous work; however, the noises of T2 and SPE statistics result in false detections and missed detections. In this paper, Wavelet denoise (WD) is combined with PCA to form a new algorithm, (PCA- WD), where WD is intentionally employed to deal with the noises. The parameter selection of PCA-WD is further formulated as an optimization problem; and PSO is employed for optimization solution. A FWTP, sustaining two 1000 MW generation units in a coal-fired power plant, is taken as a study case. Its op- eration data is collected for following verification study. The results show that the optimized WD is effective to restrain the noises of T2 and SPE statistics, so as to improve the performance of PCA-WD algorithm. And, the parameter optimization enables PCA-WD to get its optimal parameters in an auto- matic way rather than on individual experience. The optimized PCA-WD is further compared with classical PCA and sliding window PCA (SWPCA), in terms of four cases as bias fault, drift fault, broken line fault and normal condition, respectively. The advantages of the optimized PCA-WD, against classical PCA and SWPCA, is finally convinced with the results. & 2017 ISA. Published by Elsevier Ltd. All rights reserved. 1. Introduction Presently, supercritical units and ultra-supercritical units are widely employed in China; and have gradually become the main parts of Chinese electricity supply. A power generation unit is a typical continuous production process and consists of hundreds of sub-processes and devices. Faults from all the components tend to affect the operation safety of the whole units, even, result in ac- cidents or unit shutdowns, which inevitably leads to large fi- nancial loss or casualties [1]. Feed water treatment process (FWTP) is a vital sub-process of a coal-fired utility boiler. It shoulders the supply of qualified feed water to the steam and water circuit. An ion exchange based feed water treatment process typically con- sists of cation beds, anion beds, mixed beds and other components such as pumps, fans and pipes, etc. Process faults may make the quality of feed water below its standard. That further results in heavy salification along the heating surface of the utility boilers, consequently, endanger the operation safety of the boilers. The FWTPs are equipped with process sensors, such as pressure, flow rate, and analysis meters, such as electric conductivity, oxygen, silicon and natrium, etc. These sensors are the measuring parts of the process control loops and supervisory systems. Relatively speaking, sensors are the weak spots of process control systems comparing with actuators, controllers and communication links [2]. They may face certain faults such as drift, bias, strong noise and broken line, which hinder the safe and stable operation of industrial processes [3]. Hence, an effective fault detection algo- rithm is much needed for FWTPs. Actually, the demands for operation safety of process industries have spurred the recent development of many fault detection methodologies [4–7]. Most of them are established upon the process sensors. The computer control systems, such as distributed control systems (DCSs) and programmable controllers (PCs), have the ability to store massive operation data of the processes. It makes data-driven fault detection possible and practical. Multi- variate statistical analysis is a typical data-driven methodology, which has been intensively studied and applied to fault detection in literature [8–13]. Principal component analysis (PCA), in- dependent component analysis (ICA) and partial least square (PLS) have been widely applied to chemical industries for fault detection [14–17]. In essence, they are all multivariate statistical analysis based methods. Among these methods, PCA is the most popular one and have been successfully applied to industrial proc- esses owning to its simplicity, like in [18–20]. PCA represents the Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/isatrans ISA Transactions http://dx.doi.org/10.1016/j.isatra.2017.03.019 0019-0578/& 2017 ISA. Published by Elsevier Ltd. All rights reserved. n Corresponding author. E-mail address: tylzsr@163.com (Y. Tang). ISA Transactions 68 (2017) 313–326
  2. 2. high-dimensional process data in a reduced dimension; then, the desired information can be achieved by reducing the weak cor- relations between the variables [21–24]. PCA brings convenience for fault detection of industrial processes. Two statistical hypoth- esis tests, Hotelling T2 statistic in principal component space (PCS) and SPE statistic in residual subspaces (RS), are generally con- ducted in PCA. Some extensions of PCA are also proposed in the literature with the purpose to improve certain performance of PCA. In [25], an online fault detection framework, incorporating multi-scale principal component analysis, is developed. An algo- rithm using multisubspace principal component analysis with the local outlier factor technique for process monitoring is further proposed in [26]. As for our case at hand, when the classical PCA or the extended PCA are applied to the FWTP, excessive false detections and missed detections appear. It makes the classical PCA and its extended versions not applicable for fault detection of FWTPs. The false detections and missed detections are resulted by the fluctuations of T2 and SPE statistics. Analytically, PCA based fault detection is strictly valid when the following assumptions are satisfied [27]: (1) The process is operating at pseudo-steady state. (2) The process data used to build the PCA model contain normal operating data only. (3) The process should be properly excited. However, the field applications can hardly satisfy all these conditions. It is where the fluctuations of the test statistics come from; thus, the temporal violations of the limits lead to false alarms. Naturally, a denoising methodology is expected to be combined with PCA with a plain purpose to deal with the fluctuations of the two statistics. In [28], an exponentially weighted moving average (EWMA) filtering method is applied to the sensor validity index (SVI) and SPE; and an application research in terms of a boiler process shows that the EWMA filtering method can indeed reduce the false alarms and oscillations of the indicators. In [29], the above EWMA filtering method is further integrated into a self-validating soft sensor. Again, EWMA scheme is used to filter the monitoring indices of KICA-PCA to improve monitoring performance [31]. In fact, de- noising is a relatively broad topic in engineering fields. In this paper, we employ wavelet transform (WT) technique for simplicity and practicability purposes. Wavelet transform is a well-known multi-resolution analysis because of its ability to obtain good time and frequency resolution, simultaneously, through ‘stretching’ and ‘translation’ of the wavelet. WT has been successfully used in many fields, such as pattern recognition and fault diagnosis [32,33]. Donoho et al. [34] firstly proposed a method to remove white noise using wavelets, which is known as wavelet threshold denoising. Within WT analysis, the signal is firstly decomposed through discrete wavelet transform (DWT), so that the wavelet coefficients can be obtained [35]. It is proven that the wavelet coefficients resulted from noise are smaller than the coefficients of major signal. With a predefined threshold, the coefficients below the level are intentionally eliminated. Now, a pure signal without noise can be achieved through a reconstruction with the denoised wavelet coefficients. In [36], a mechanical fault diagnosis method, integrating wavelet transform with support vector machine, is presented, where WT is used to extract the noise from the T2 and SPE statistics of PCA such that the impact caused by noise can be effectively restrained. In this paper, wavelet denoising (WD) will be combined with PCA to form a new fault detection algorithm, PCA-WD. Then, PCA-WD will be applied to a FWTP for verification. In fact, the selection of the specific parameters has considerable influence on the performance of a WD. One way is to make a decision with the priori knowledge of the engineers; however, it relies too much on individual experience and cannot obtain an optimal setting. This paper formulates the parameter selection of WD as an optimization problem; and the solution to this problem is an optimal parameter configuration. The objective function and the constraints of the optimization problem are complex, non- contiguous and have strong nonlinearity. It makes the conven- tional optimization techniques, such as linear programming (LP) and dynamic programming (DP), not applicable. Computational intelligence-based techniques, such as genetic algorithm (GA) and particle swarm optimization (PSO), can be alternative to our parameter optimization problem. In literature, PSO has been widely used in many fields such as mechanical, chemical, civil, and aerospace design, because of its advantages such as comparative simplicity, rapid convergence and little parameters to be adjusted. PSO is known to effectively solve large-scale nonlinear optimiza- tion problems [37]; hence, it is a suitable candidate for our pro- blem at hand. PSO is in fact an evolutionary computation techni- que proposed by Kennedy and Eberhart [38]. Classical PSO deals with real-valued variables; however, it is realized that many op- timization problems in practice are featured by discrete variables, where classical PSO cannot work. Then, Kennedy and Eberhart extended classical PSO to a discrete binary version, named BPSO, where a sigmoid function is used with a random probability for generating binary-valued position (0 or 1) for a particle from its real-valued velocity component [39]. Laskari et al. [40] proposed a discrete PSO, where a real value is truncated to its nearest integer value. It is then employed by Liao and Tseng [41] to deal with a flowshop scheduling problem. Moreover, a universal PSO is pro- posed by Datta and Figueira [42]; it has the ability to work directly with real, integer and discrete variables without extra conversions. The parameter optimization problem of WD consists of several kinds of variables; thus, the extended PSO is suitable and it will be employed to get the optimal solutions. From the application angle, this paper mainly focuses on the fault detection of a feed water treatment process in coal-fired power plants to improve its reliability. We start with an outline of the object process. Massive operation data of the process is then collected from a supervisory information system (SIS), which communicates with the control system of FWTP and acquires the long term operation data. Next, the fault detection of FWTPs with classical PCA will be introduced, where the control limits of T2 and SPE statistics, notated by Tlim 2 and SPElim are obtained, respectively. And then, WD will be combined with PCA to form the PCA-WD fault detection algorithm. The WD parameters need to be figured out prior to the online operation of PCA-WD. The parameter se- lection is formulated as an optimization problem, where PSO is used to find which combination of parameters gives the best performance. A FWTP in a coal-fired power plant, equipped with two 1000 MW generation units, is taken as a study case. The real operation data of the FWTP is collected to verify the PCA-WD al- gorithm. We will present the results to show the effectiveness of WD in dealing with the noises of T2 and SPE statistics, and the capability of parameter optimization to determine the optimal parameters of PCA-WD in an automatic way instead of on in- dividual experience. Finally, the advantages of the optimized PCA- WD, against classical PCA and SWPCA, will be proven with four study cases as bias fault, drift fault, broken line fault and normal condition. The remainder of this paper is organized as follows. Section 2 outlines the feed water treatment process, which will be used as study case in the following investigations. In Section 3, the PCA based fault detection algorithm will be reviewed. Section 4 com- bines WD with PCA to form PCA-WD algorithm. And the para- meter selection of PCA-WD is to be discussed, and finally for- mulated as an optimization problem. In Section 5, the proposed PCA-WD is applied the FWTP for verification study, where the effectiveness of WD and the advantages of the optimized PCA-WD will be proven with convincing results. The conclusions are drawn in Section 7. S. Zhang et al. / ISA Transactions 68 (2017) 313–326314
  3. 3. 2. Feed water treatment process FWTP is a vital sub-process of coal-fired utility boiler. It aims to supply qualified desalted water to the vapor circulating system of the boiler. The quality of the feed water is the primary concern of FWTPs. Unqualified feed water may cause salification along the internal surface of critical devices, such as main steam pipes, re- heat steam pipes, turbine blades, etc.; gradually, it may lead to major safety hazards and bring economic loss for power plants. Ion exchange is the most popular technology employed for FWTPs in Chinese coal-fired power plants. Lots of sensors, actuators are equipped with FWTPs for supervision and regulation purposes. There are more than 60 measuring points equipped with a FWTP. Field experiences show that the sensor faults are the common reasons for unqualified water supply. Hence, an effective sensor fault detection method is much needed for FWTPs. A FWTP in a coal-fired power plant in Guangdong province of China is taken as our study case. The flow chart of the FWTP is shown in Fig. 1. This power plant is equipped with two 1000 MW generation units; hence, the FWTP is designed to sustain the two units. Raw water is successively treated by cation beds, anion beds, mixed beds; then, the treated water is stored in desalted water tanks and finally pumped into the two utility boilers. The FWTP in Fig. 1 is config- ured into two operation routes, side A and side B, to assure the reliability of the process. Side A consists of 1# cation bed, 1# anion bed, and 1# mixed bed; and side B is formed by 2# cation bed, 2# anion bed, and 2# mixed bed. Generally, one route can satisfy the routine requirements of the two utility boilers; and the other route is on stand by. Hence, it is reasonable to take only one route for further study; specifically, we take side A as the following study case. For the consideration of data availability and further field application, only part of the valves and sensors of side A, as listed in Table 1, are selected for the following fault detection research. According to the operating procedure, the FWTP is scheduled as follows. When the water in the desalted water tanks is sufficient to sustain the boilers, the FWTP is switched off and on stand by. On the other hand, if the water levels of the tanks are below a predefined threshold, the FWTP will be switched on to produce qualified feed water. Further, the working status of the two op- eration routes is intentionally scheduled by the operators in order to even the total working time of the two routes. The operating procedure makes the FWTP working intermittently. A supervisory information system (SIS) is equipped with the coal-fired power plant, which gathers and stores long term op- eration data of the whole plant through certain interfaces between DCSs, programmable logic controllers (PLCs) and other controllers. S11 S10 W4 W3 W2 W1 W6 W5 1# cation bed raw water from secondary RO tank acid from ejectors from cation beds to anion beds from water pump to mixed beds to desalted water tanks 1# desalted water tank to utility boilers to laboratory 1# desalted water pump S2 S1 S3 S4 S5 S6 S7 S8 S9 S12 1# in-house water pumpto mixed, anion and cation beds to acid, alkali storage system and regenerative system 2# cation bed 1# anion bed 2# anion bed 1# mixed bed 2# mixed bed 2# desalted water tank 2# desalted water pump 3# desalted water pump 4# desalted water pump 2# in-house water pump Fig. 1. Flow chart of the feed water treatment process. Table 1 Valves and sensors selected fault detection. ID Description Unit ID Description Unit W1 Inlet valve status of 1# cation bed – S4 Outlet pressure of 1# cation exchanger MPa W2 Outlet valve status of 1# cation bed – S5 Inlet flow rate of 1# anion bed m /h3 W3 Inlet valve status of 1# anion bed – S6 Inlet pressure of 1# anion exchanger MPa W4 Outlet valve status of 1# anion bed – S7 Outlet pressure of 1# an- ion exchanger MPa W5 Inlet valve status of 1# mixed bed – S8 Electric conductivity of 1# anion us/cm W6 Outlet valve status of 1# mixed bed – S9 Inlet flow rate of 1# mixed bed m /h3 S1 Main pipe pressure of raw water MPa S10 Inlet pressure of 1# mixed ion exchanger MPa S2 Inlet flow rate of 1# cation bed m /h3 S11 Outlet pressure of 1# mixed ion exchanger MPa S3 Inlet pressure of 1# cation exchanger MPa S12 Electric conductivity of 1# mixed bed us/cm S. Zhang et al. / ISA Transactions 68 (2017) 313–326 315
  4. 4. It makes our data-driven fault detection research applicable and convenient. The historical operation data of the FWTP is collected from the SIS through a programm interface provided by the SIS vendor with a sampling rate of 5 s. They are further used for the following fault detection research. 3. PCA based fault detection PCA is a multivariate statistical technique which has been widely used in process fault detection. Let ∈x m R denote a sample vector containing m sensors. Assuming that there are n samples of these sensors with a constant sampling rate. Then a matrix ∈ × RX n m is acquired; where each row represents a sample vector. The matrix X is then standardized as follows to eliminate the ef- fect from different scales of the sensors. = [ − ( )] ( )σ − D IEX X X , 1 1 where μ μ μ( ) = [ … ] ∈ × E X , , , m m 1 2 1 R is the mean vector of X, and = [ … ] ∈ × I 1, 1, , 1 T n 1 R . In Eq. (1), σ σ σ= { … }σD diag , , , m1 2 , where σ μ= ( − )E xi i i 2 is the ith standard variance of X. For the stan- dardized data matrix X, its correlation matrix = ( − )S nX X/ 1 T is calculated and singularly decomposed. Then, X is projected to the principal component space (PCS) and residual space (RS), namely, = + = + ( )E TP EX X , 2T where X represents the projection of X in PCS and E is the residual matrix in RS. In Eq. (2), ∈ × T n k R is the score matrix and ∈ × P m k R is the loading matrix, where k denotes the number of the principal components (PCs). Further, k is determined using cumulative percent variance (CPV) λ λ = ∑ ∑ ≥ ( ) = = lCPV , 3 i k i i m i 1 1 where λi presents the ith largest eigenvalue of the covariance matrix S. The threshold l is usually set between 0.85 and 0.99. For a new sample vector, ∈x m R , it is respectively projected into PCS and RS. Its projection in PCS, ^x, is as follows ^ = = ( )PP Cx x x, 4T where C is the projection matrix to PCS. The projection in RS, e, is defined as follows = ( − ) = ( ) ∼ e I PP Cx x, 5T where ∼ C is the projection matrix to RS. Generally, the PCA based process fault detection is conducted through two indices as Hotelling T2 and SPE statistics. The T2 statistic is defined as Λ Λ= ^ ^ = ( )− − T P P t tx x , 6T T T2 1 1 where Λ λ λ= ( … )diag , , k1 represents the k largest eigenvalues of covariance matrix S, and t represents the score vector of ^x. The control limit of T2 statistic, i.e. Tlim 2 , is calculated as follows = ( − ) ( − ) ( − ) ( ) αT k n n n k F k n k 1 , , 7 lim 2 2 where ( − )αF k n k, is the critical point of F-distribution; and α is the confidence. k and nÀk in Eq. (7) are the degree of freedom. The SPE statistic is calculated as follows = ∥ ˜ ∥ = ∥ ∥ ( ) ∼ CSPE x x . 82 2 The control limit of the SPE statistic was developed by Jackson and Mudholkar [10], that is, ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ θ θ θ θ θ = + + ( − ) ( ) αC h h h SPE 2 1 , 1 , 9 lim 1 2 0 2 1 2 0 0 1 2 where ∑θ λ= ( = ) ( )= + i 1, 2, 3 , 10 i j k m j i 1 and θ θ θ = − ( ) h 1 2 3 . 11 0 1 3 2 2 In Eq. (9), Cα is the upper fractile value of the standard normal distribution with a significance level of α; and λ ( = … )j m1, ,j is the jth largest eigenvalue of the covariance matrix S. In this paper, both T2 and SPE statistics are taken into account for fault detection of FWTPs. Fault alarm is triggered when one of the two statistics exceeds their corresponding control limits. The procedure of classical PCA based fault detection is as follows. (1) Get the operation data of the FWTP under normal condition, and normalize the data according to Eq. (1). The data is then used to form a training data set for PCA models. (2) Build a PCA model with the training data set, and calculate its Tlim 2 and SPElim according to Eqs. (7) and (9), respectively. (3) Collect a new sample from the FWTP under a similar condition as that in step (1), and calculate the real-time values of T2 and SPE statistics. (4) If the real-time values of T2 and SPE statistics exceed their control limits, the sample is regarded as abnormal and a fault alarm is then triggered; otherwise, it is considered to be in normal condition. (5) Repeat from step (3). We applied the above procedure to a FWTP of a power plant in our previous work; unexpectedly, excessive false detections and mis- sed detections appear. It makes the classical PCA and the extended PCA not applicable for fault detection of FWTPs. Analytically speaking, the phenomena are mainly caused by the noises of T2 and SPE statistics. Naturally, a denoising technique is expected to solve the problem. In the following section, WD technique is in- tentionally combined with PCA to deal with the noise problems. 4. PCA-WD based fault detection 4.1. Wavelet denoising Wavelet transform is a powerful signal-processing method. It transforms time-domain signals into time–frequency domain while obtaining high resolution time and frequency information of the signals simultaneously. The mathematical definition of con- tinuous wavelet transform (CWT) is described as ⎜ ⎟ ⎛ ⎝ ⎞ ⎠∫τ Ψ τ ( ) = | | ( ) * − ( ) a a f t t a dtCWT , 1 , 12f R where a is the scale factor which may be regarded as the inverse of frequency, τ is the translation factor, and Ψ( )x is the base function. In practice, CWT is not widely applied due to its enormous com- putation caused by the fact that all the scales are used during the computation progress. Compared with CWT, DWT requires less computation time so that it will not degrade the signal-processing S. Zhang et al. / ISA Transactions 68 (2017) 313–326316
  5. 5. performance; hence, DWT is widely used in many fields. Specifi- cally, Mallat proposed a fast algorithm [35], which makes use of the fact that the analysis will be very efficient if scales and posi- tions are chosen based on power of two (dyadic scales factor and translation factor). Mallat fast algorithm has the ability to obtain the same accuracy as the other DWTs, while consuming much less computation. It will be employed in the following study. Let θ( )t be an original signal, a three level decomposition of θ( )t using the fast algorithm is specially shown in Fig. 2 to illustrate its process, where H0, H1 are the low-pass and high-pass filters, re- spectively. ↓2 is defined as a down-sample process. Within the three level decomposition, θ( )t is expressed as θ( ) = + + + ( )t d d d a . 13k k k k1 2 3 3 Now, the signal θ( )t is decomposed into a set of detail coefficients d1k, d2k, d3k and approximation coefficient a3k. In 1990, Donoho proposed a method to remove white noise using wavelets [34]; that is, wavelet denoising (WD). WD de- composes the signal through discrete wavelet transform to obtain the wavelet coefficients, which are then processed with a pre- defined threshold. The coefficients below the level are eliminated; while the ones above the level remain. Finally, the denoised signal is extracted from the remaining coefficients without much loss in original signal characteristics. 4.2. PCA-WD Now, WD will be combined with PCA to form an innovative PCA-WD method for fault detection, where WD is intentionally employed to deal with the noises of T2 and SPE statistics. The flowchart of PCA-WD for process fault detection is as shown in Fig. 3. PCA-WD fault detection is divided into two stages: off-line modeling stage and on-line detection stage. The calculation of Tlim 2 and SPElim are carried out at off-line modeling stage. On-line de- tection stage includes the calculation of real-time T2 and SPE sta- tistics, WD, and fault detection. Specifically, WD is employed to denoise T2 and SPE statistics during on-line stage as shown in Fig. 3. In off-line modeling stage, a training data set ∈ × RX n m , col- lected under certain normal operation condition, is used to de- velop the PCA model; as such, the control limits of T2 and SPE, Tlim 2 and SPElim, are obtained. In the on-line detection stage, for a new coming sample x, its T2 and SPE statistics are firstly calculated. Then, the real-time T2 and SPE statistics slide into a window, where WD is applied to denoise their noises. Finally, the denoised T2 and SPE statistics are compared with Tlim 2 and SPElim, respec- tively; and a fault alarm is triggered if one of the two statistics exceeds its control limit. Here, the filtering activity is used to eliminate the noise of the statistics and does not dramatically change their distributions. Even, [29] shows that the filtered re- siduals are closer to normal distribution than unfiltered residuals. Mathematically, the filtering algorithms may bring changes to the control limits of the statistics. In [29], a theoretical analysis of the control limits with and without filtering is given; and examples are used to convince this filtering technique. This technique is also accepted by other researchers. For instance, in [30], a detection index is composed upon the filtered statistics to improve the detectability of PCA. Our approach is a combination of PCA and WD; in fact, it uses WD to filter the statistics of PCA as well. Moreover, the WD in our framework is properly designed to make sure that its amplification factor equals 1. Thus, the control limits Tlim 2 and SPElim, obtained in off-line modeling stage, can be used in on-line detection stage. Now, another problem arises. The application of WD algorithm involves serval parameters. The parameter configuration has great effect on WD performance; even makes a WD algorithm un- applicable under certain conditions. The parameter selection has become a barrier for field application of WD. Our case combines WD with PCA, where WD is used to denoise the real-time T2 and SPE statistics. It makes the parameter selection of the compound PCA-WD more complicated than traditional WD applications. The common experience based techniques have no chance to deal with our problem properly. We come out with an idea to formulate the parameter selection of PCA-WD as an optimization problem and get the optimal parameters through optimization solution. For the purpose of the optimization problem formulation, the parameters of PCA-WD are to be reviewed in advance. 4.3. Parameters of PCA-WD 4.3.1. Sliding window parameters In our PCA-WD algorithm, WD is employed to denoise the real- time T2 and SPE statistics with a sliding window; the denoised T2 and SPE statistics are then used for fault detection. Thus, proper sliding window length and moving step, notated by len and step, respectively, have to be determined prior to the application of PCA-WD. Due to dyadic down-sample, the length of wavelet coefficients is reduced by a factor of 2j , where j is the scale factor. To ensure the perfect reconstruction of original signal, len must be chosen as a power of 2. step determines how many samples will be involved in and dropped out of the sliding window in a single calculation. Generally, large step brings large time delay; while small one may cause discontinuity in signal. 4.3.2. Wavelets Our research considers only orthogonal wavelets partially for simplicity reason; in fact, they are able to obtain perfect reconstruction Fig. 2. Three level decomposition. Fig. 3. Flowchart of PCA-WD fault detection. S. Zhang et al. / ISA Transactions 68 (2017) 313–326 317
  6. 6. of the original signals. Specifically, Mallat fast algorithm is used in this paper due to its efficient computation, where the wavelets are re- quired to have orthogonality property along with a scaling function ϕ. There are several wavelets satisfying the above requirements, as Haar, Daubechies, Symlets and Coiflets. The Daubechies family, built by Inrid Daubechies, consists of 45 wavelets, where Haar wavelet is actually the first and simplest wavelet. The Daubechies family has no explicitly mathematical definition except Haar wavelet. The Symlet family are more sym- metrical than Daubechies family; however it is not strictly sym- metrical. The Coiflet family consists of 5 wavelets. For detailed descriptions of the wavelets refer relevant literatures. In this paper, the first 15 wavelets of Daubechies family and the first 15 wavelets of Symlet family will be utilized. The remaining wavelets of the two families are rather complex; consequently, they require more computation time, which makes them not applicable for our field application. Meanwhile, all the 5 wavelets of Coiflet family are used in the paper due to their computation efficiency. In the following of the paper, …db db db1, 2, , 15 are used to notate the first 15 wavelets of Daubechies family, …sym sym sym1, 2, , 15 are used for the 15 wavelets of Symlet fa- mily, and …coif coif coif1, 2, , 5 are used for Coiflet family, respectively. 4.3.3. Threshold parameters There are two common thresholding methods for WD, as soft thresholding and hard thresholding. Let WT be the wavelet coef- ficients and δ be the threshold; then the two thresholding meth- ods can be respectively expressed as follows. (i) Hard thresholding ⎧ ⎨ ⎩ δ δ = | | > | | ≤ ( ) WT WT WT WT , ; 0, . 14 (ii) Soft thresholding ⎧ ⎨ ⎪ ⎩ ⎪ δ δ δ δ δ = − > | | ≤ + < ( ) WT WT WT WT WT WT , ; 0, ; , . 15 Compared with hard thresholding, soft thresholding has better performance because hard thresholding may cause discontinuities at δ± while soft thresholding remains continuous by shrinking nonzero coefficients towards zero. Four threshold selection rules, ‘rigrsure’, ‘sqtwolog’, ‘heursure’ and ‘minimaxi’, as shown in Table 2 will be considered in this paper. In fact, these threshold selection rules use statistical re- gression of the noisy coefficients over time to acquire a non- parametric estimation of the reconstructed signal. Different threshold selection rule has different impact on denoising performance. Threshold rescaling method also affects the denoising perfor- mance; it needs investigation as well. The general model of wa- velet denoising is as follows: σ( ) = ( ) + ( ) ( )s n f n e n , 16 where s(n) is the original signal, f(n) is the pure signal without noise, e(n) represents noise, s is the noise intensity. The denoising process is to suppress the noisy part of signal s(n) so as to recover the pure signal f(n) without noise. Threshold rescaling intends to adjust s with certain method; obviously, it has influence on the denoising process. Three threshold rescaling methods as ‘one’, ‘sln’, and ‘mln’ are investigated in this paper. The brief descriptions of these methods are described in Table 3. 4.3.4. Decomposition level Generally speaking, the decomposition level, notated by lev, should be determined in consideration of the frequency band- width of the original signals. Signals with abundant high-fre- quency information need larger numbers of decomposition levels. Large lev requires more computation time and brings time delay. Meanwhile, the length of sliding window, len, bounds lev as well, because the dyadic down-sample halves the length of wavelet coefficients in a single decomposition progress. For instance, if len¼16 the maximum value of lev should be 4. In this paper, the decomposition level is bounded within a range between 1 and 5. 4.4. Optimal parameter selection of PCA-WD 4.4.1. Formulation of the parameter selection optimization problem The parameters of WD, as reviewed above, have more or less effect on the performance of WD algorithm. The traditional way to determine the WD parameters is mostly based on individual ex- perience. In this paper, WD is combined with PCA to form a fault detection algorithm. Thus, the parameter selection of the com- pound PCA-WD algorithm is far more complicated than traditional applications of WD. We come out with an innovative idea to for- mulate the parameter selection of PCA-WD as an optimization problem. And the optimal parameter configuration is then ob- tained through the solution of the optimization problem. The optimal parameter selection gets the parameters in an automatic way rather on individual experience and grantees the optimality of the parameters. Our optimization problem does not consider WD algorithm itself only; instead, it takes the PCA-WD fault detection algorithm as a whole to optimize its parameters. Naturally, the performance criteria from the fault detection perspective, as false alarm rate (FAR) and missed detection rate (MDR), should be integrated into the objective function of the optimization problem. They are de- fined as follows. (I) False alarm rate: FAR is described as Eq. (17), which re- presents the percentage of the falsely alarmed samples to the total faultless data samples: = ( ) falsely alarmed samples faultless data samples FAR % 17 (II) Missed detection rate: MDR is calculated as Eq. (18), re- presenting the percentage of the missed faulty samples to the total faulty data samples: = ( ) missed faulty samples faulty data samples MDR % 18 Table 2 Threshold selection rules. Rules Descriptions rigrsure Selection using Steins Unbiased Risk Estimate (SURE) sqtwolog Fixed threshold heursure Selection using a mixture of the first two options minimaxi Selection using the minimax principle Table 3 Threshold rescaling methods. Rescaling methods Descriptions one Select using the basic noise model sln Select using the basic noise model with unscaled noise mln Select using the basic noise model with non-Gaussian white noise S. Zhang et al. / ISA Transactions 68 (2017) 313–326318
  7. 7. The above two criteria evaluate the performance of fault detection algorithm under faultless and faulty conditions, respectively. The lower the two criteria are, the better performance the algorithm has achieved. Moreover, signal-to-noise ratio (SNR) is a traditional measure of denoising algorithms from signal conditioning per- spective. It is defined as follows: ( )= × ( )power powerSNR 10 log / , 19signal noise10 where ∑= ( ) ( ) power n s n 1 , 20 signal n 2 and ∑= [ ( ) − ˜( )] ( ) power n s n s n 1 . 21 noise n 2 powersignal in Eq. (20) represents the power of the original signal, and powernoise in Eq. (21) is the power of the noise. s(n) denotes the original signal and ˜( )s n is the denoised signal. Generally, higher SNR is expected, for it indicates less information loss through the denoising process. It is reasonable to formulate the objective function both from the fault detection perspective and signal conditioning perspec- tive. Specifically, the objective function is expressed as follows: ( ) = ( + ) + ( − ) ( + + ) ( ) β β − − J e e Pr SNR SNR 1 / FDR MDR 1 . 22 TX X X SPEV t t 2 1 2 The optimization problem is to maximize JXV , while satisfying the relevant constraints. In Eq. (22), ∈ × RXV t m is a selected data set for verification; and t is the sample number. Furthermore, XV is composed to contain a subset Xt1, which has t1 faultless samples, and a subset Xt2, consisting of t2 faulty samples. And, = +t t t1 2. = [ … ]p p pPr , , ,1 2 7 in Eq. (22) denotes the parameter vector of WD; in fact, it is the variable to be optimized. SNRT2 and SNRSPE are the signal-to-noise ratios of T2 and SPE statistics, respectively; which are calculated using Eq. (19). FARXt1 in Eq. (22) is calculated using subset Xt1. The falsely alarmed samples, Ct1, are firstly obtained as follows: { ( ) ( ) } = ( = < = ++) ( ) > ( ) > = + int C for i i t i if T i T i C C 0; 0; ; SPE SPE 1; t lim lim t t 1 1 2 2 1 1 where T2 and SPE are denoised PCA statistics using WD algorithm. = ( )T WD TPr 2 2 and = ( )WDSPE SPEPr , where WDPr means a de- noising process with the parameter vector Pr and T2 and SPE are obtained according to Eqs. (8) and (6), respectively. Then, FAR is calculated as = ( )C tFAR / %tX 1 1t1 . Similarly, MDRXt2 is calculated with subset Xt2. The un-detected faulty samples are calculated as fol- lows: { ( ) ( ) } = ( = < = ++) ( ) < ( ) < = + int C for i i t i if T i T i C C 0; 0; ; && SPE SPE 1; t lim lim t t 2 2 2 2 2 2 Then, = ( )C tMDR / %tX 2 2t2 . In Eq. (22), β is a weighting factor; it is used to balance the criteria from the fault detection perspective and from the signal conditioning perspective. It makes our opti- mization problem capable of satisfying different purposes by tuning the value of β. Specifically, β can be set to a value larger than 6 so as to guarantee lower FDR and MDR; on the other hand, a value smaller than 6 leads to higher SNR of original signals. This paper pays more attention on FDR and MDR of fault detection; furthermore, β is intentionally set to 6.8 in the following in- vestigation. Mathematically speaking, β¼6.8 is capable of elim- inating the dimension differences between the two terms of ( )J PrXV . Finally, the parameter selection of PCA-WD is formulated as an optimization problem as follows: ( ) = ( + ) + ( − ) ( + + ) ( ) β β − − J e e Prmax SNR SNR 1 / FDR MDR 1 , 23 TX X X SPEV t t 2 1 2 s.t. ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ ⎛ ⎝ ⎜⎜ ⎞ ⎠ ⎟⎟ ∑ ∑ ∑ ∑ Λ β = ( ) = ( ) = × [ ( ) − ( )] = × ( ) [ ( ) − ( )] = ( ) = ( ) = ^ ^ ( = … ) = ∥ ∥ ( = … ) = [ … ] > ∼ = = = = − C t C t T T t T t t t t T WD T WD T j t C j t p p p P P Pr FAR / % , MDR / % , SNR 10 log , SNR 10 log SPE SPE SPE , , SPE SPE , x x , 1, , , SPE x , 1, , , , , , 0. t t T i t i t i t i t j j T j j j X X Pr Pr 1 1 2 2 10 1 2 2 1 2 2 2 SPE 10 1 2 1 2 2 2 2 1 2 1 2 7 t t 1 2 2 ^ ( = … )j tx , 1, , ,j is the jth sample of the verification data set. The score matrix P and projection matrix ∼ C are obtained through the training process of the PCA model at the off-line stage. The solu- tion to this problem, ¯Pr ¼[ ¯ ¯ … ¯ ]p p p, , ,1 2 7 , is the optimal parameters of our PCA-WD fault detection algorithm. Now, we face the pro- blem of solving the above optimization problem. 4.4.2. Solving of the optimization problem Obviously, the objective function and some constraints are nonlinear and even non-analytical. It makes the classical optimi- zation techniques, such as linear programming (LP), dynamic programming (DP), not feasible for our problem. Intelligence- based techniques such as genetic algorithm (GA) and PSO can be solutions to the problem. In literature, PSO has been widely used in many fields such as mechanical, chemical, civil, and aerospace design, because it has advantages such as comparative simplicity, rapid convergence and little parameters to be adjusted. PSO is known to effectively solve large-scale nonlinear optimization S. Zhang et al. / ISA Transactions 68 (2017) 313–326 319
  8. 8. problems; it is suitable for our problem at hand. PSO is a stochastic search method which was firstly introduced by Kennedy and Eberhart [38]. The main strategy of PSO is to utilize the social behaviors and the communications involved in swarms such as bird flocking and fish schooling. Each particle in PSO is treated as a volumeless particle in g-dimensional searching space; its velocity and position are adjusted according to its past and companions’ experience. PSO starts from a random swarm of particles called initial po- pulation in the g-dimensional searching space. Let the swarm size be U, the position and velocity of each particle are defined as ( ) = [ ( ) ( ) … ( )] ( )t t t tP p , p , , p , 24i i i i,1 ,2 ,g and ( ) = [ ( ) ( ) … ( )] ( )t t t tV v , v , , v , 25i i i i,1 ,2 ,g respectively, where = …i U1, 2, , , represents the ith particle and t denotes the iteration time. The velocity ( )tVi and position ( )tPi of each particle are iteratively modified according to the following rules: ω( + ) = ( ) + ( ) ( )( ( ) − ( )) + ( ) ( )( ( ) − ( )) ( ) t t c t r t t t c t r t t t v 1 v P P P P 26 i i d bi d i d d i d , 1 1 , , 2 2 g, , ( + ) = ( ) + ( ) ( )t t tp 1 p v 27i d i d i d, , , where = …d 1, 2, , g represents the dth member of a particle, ( )r t1 and ( )r t2 are random numbers, generated from a uniform dis- tribution in the range [0,1], to provide a stochastic weighting for components involved in Eq. (26). The constants c1 and c2 represent the weights of stochastic acceleration terms that pull each particle toward its pbest and gbest, respectively. The inertia weight factor ω is used as a trade-off between global and local exploration capabilities of the swarm. A large inertia weight factor tends to facilitate global exploration, while a small one facilitates local exploration. In practice, ω generally decreases linearly from 1.2 down to 0.4 during the iterations. Specifically, the inertia weight factor ω, in this paper, is generated as follows. ω ω ω ω = − − × ( )iter iter 28 max max min max where itermax denotes the maximum iteration number, and iter represents the current iteration. In the procedures above, the velocity ( )tvi d, and position ( )tpi d, of each particle are imposed a bound to prevent the swarm over exploration. The maximum and the minimum velocities are de- fined as vd max and vd min ; and the maximum and the minimum po- sitions are notated by pd max and pd min . Thus, if ( ) >tv vi d d max , , then ( ) =tv vi d d max , ; if ( ) <tv vi d d min , , then ( ) =tv vi d d min , . If ( ) >tp pi d d max , , then ( ) =tp pi d d max , ; if ( ) <tp pi d d min , , then ( ) =tp pi d d min , . Eqs. (26) and (27) are iterated until convergence is reached. Each particle tracks its coordinates in the search space, which means the best solution achieved by ith particle, called pbest and notated as ( ) ∈tPbi g R . Accordingly, the global best value is called gbest and notated as ( ) ∈tPg g R , representing the overall best so- lution obtained by the particles in the swarm. Specifically, a particle P(t) in our problem is defined as follows, which in fact represents a potential solution to the optimization problems: ( ) = [ ( ) ( ) … ( )] ( )t t t tP p , p , , p , 291 2 7 where ( ) ( = … )t jp 1, , 7j represents a specific element of the parameter vector; and t represents the iteration time. The para- meter vector is described in Table 4. According to the definitions as shown in Table 4, each parameter element is coded so as to imply specific meaning with different values. For example, =p 301 im- plies ‘db15’ wavelet; moreover, a WD parameter configuration as = [ ]P 16, 1, 2, 3, 1, 3, 5 implies ‘db1’ wavelet, ‘soft’ thresholding, ‘sqtwolog’ threshold selection rule, ‘mln’ threshold rescaling method, 3 level decomposition, 64 lengths and 5 steps of sliding window. 5. Fault detection of FWTP In this section, the proposed fault detection algorithms will be applied to the FWTP as outlined in Section 2 for verification pur- pose. We just take side A of the FWTP as study case because side B is very similar to side A. Further, only part of the valves and sen- sors of side A, as listed in Table 3, are selected for fault detection research because they are accessible through the SIS of the power plant. In fact, the FWTP of a utility boiler are intermittently op- erated due to its unique operating procedure. Generally, the PCA based algorithms can hardly deal with the problems resulting from the alterative working condition of industrial processes. In this paper, the whole working phase are firstly distinguished into several conditions; and the PCA based algorithms are applied to the same or similar working conditions. With a deep analysis of the flowchart and operating procedure, we found the working conditions of side A can distinguish with the status of relevant values. Thus, W1,…,W6 in Table 1 are used for working condition classification only; and S1,…,S12 are selected sensors for fault detection of the FWTP. The operation data is collected from the SIS of the plant through a OPC (OLE for Process Control) interface. 500 samples of the 12 sensors, under a kind of typical working condition, are collected with a sampling rate of 5 s. They are further used to form a training data set. Another 1000 samples, under the same work- ing condition but within different time period, are collected for fault detection validation. For a mature industrial process as FWTPs, it is not easy to capture its abnormal operation conditions. Hence, we intentionally introduce several kinds of faults to the operation data to simulate the operation conditions with faults. One thing to note is that in the following studies the same training data set and verification data set are applied to different algo- rithms for fair comparison study. Table 4 Description of the parameter vector. Elements Content Code Description p1 sym1,…,smy15, db1,…,db15, coif1,…,coif5 [1,35] Wavelet species p2 soft thresholding, hard thresholding [1,2] Threshold method p3 ‘rigrsure’, ‘sqtwolog’, ‘heursure’, ‘minimaxi’ [1,4] Threshold selection rule p4 ‘one’, ‘sln’, ‘mln’ [1,3] Threshold rescaling method p5 1,2,3,4,5 [1,5] Decomposition level p6 256, 128, 64, 32 [1,4] Length of sliding window p7 1,2,3,…,32 [1,32] Step of sliding window S. Zhang et al. / ISA Transactions 68 (2017) 313–326320
  9. 9. 5.1. Application of classical PCA First of all, the classical PCA is investigated. It is applied to the FWTP according to the procedure proposed in Section 3. Specifi- cally, the detection ability of the approach with respect to single fault from a sensor is verified. The inlet flow rate of 1# anion bed, S5, is used as study case. A constant deviation fault is intentionally added to S5 from samples 401 to 800. The T2 and SPE statistics of classical PCA fault detection algorithm are illustrated in Fig. 4. The blue solid line represents the real-time statistics and the red dash lines denote the corresponding control limits. It can be seen that under the normal working condition a number of samples exceed T2 control limit, which brings false alarms if the fault criterion is strictly applied. On the other hand, the SPE values of some samples do not surpass its control limit under fault condition, which tends to miss fault alarms. The problems are mainly caused by the noise of T2 and SPE statistics. We carried out several studies where faults are added to different sensors; and the similar results are gotten. It shows that the classical PCA is not applicable for fault detection of the FWTP. 5.2. Application of PCA-WD This paper intends to deal with the noise of T2 and SPE statistics with wavelet denoising. A WD step is attached to T2 and SPE sta- tistics before fault detection as shown in Fig. 3. The same training data set and verification data set as above are applied to PCA-WD with and without optimal parameter selection. The application procedure of PCA-WD is shown in Fig. 3. 5.2.1. PCA-WD without parameter optimization Accordingly, a constant deviation fault is intentionally added to S5 from samples 401 to 800 to test the performance of PCA-WD. Here, the WD parameters are determined through calculation analysis of cases; in other words, the parameters are selected mostly on the researcher's experience. The parameters of WD and sliding window are listed as below. ‘Coieft’ 4 wavelet. Decomposition level¼3. Threshold parameters: soft thresholding method, sqtwolog threshold selection rules, sln threshold rescaling method. Sliding window length len¼256, and sliding window step step¼32. The T2 and SPE statistics of PCA-WD fault detection algorithm with the above parameters are illustrated in Fig. 5. It can be seen that both T2 and SPE statistics of the samples between 401 and 800, where the fault is introduced, go up beyond their control limits. On the other hand, during the periods without fault, T2 and SPE go down below their limits. Fig. 5 demonstrates that the PCA- WD algorithm, with meticulously selected parameters, is capable of detecting the fault, more precisely. The FDR and MDR, compared with classical PCA, are much lower. However, the performance of PCA-WD is sensitive to the WD parameter selection. Poorly selected parameters may decrease the performance of PCA-WD. The empirical WD parameter selection relies too much on individual's experience. And, it is time con- suming and cannot get the optimal parameter configuration. A better way proposed in this paper is to obtain the parameters with certain optimization techniques. 5.3. PCA-WD with parameter optimization In Eq. (23), the parameter selection of PCA-WD is formulated as an optimization problem, which takes the parameter vector of WD as the optimization variable. In literature, PSO algorithm has been successfully employed to solve complex optimization problems. In this paper, PSO is also used to determine the optimal parameters of PCA-WD. The aim of the PSO method is to determine which set of parameters, i.e. wavelet species, p1, threshold method, p2, threshold selection rules, p3, threshold rescaling method, p4, the decomposition level p5, length and step of sliding window, p6 and p7, is optimal for fault detection. Here, the parameters of PCA-WD are coded as integers, as shown in Table 4; hence, the real values of the parameters must be rounded to its nearest integer values during each iteration. The same training data set, containing 500 samples, and the same verification data set, containing another 1000 samples, are used to verify the PCA-WD with parameter optimization. Further, a bias fault, with the amplitude of 8% of the sensor mean value, is intentionally introduced to sensor S5 between 401 and 800 samples. The PCA model is firstly built with the training data set to get the score matrix P, the projection matrix ∼ C and the control limits of the two statistics as Tlim 2 and SPElim. Then, the verification data set is applied to the parameter selection problem as shown in Eq. (23), where t¼1000, t1¼700, and t2¼300. PSO is used to solve the optimization problem. The parameters of PSO are specifically explained as follows. Generally, the population size implies a balance between ac- curacy, stability, computation time and dimension. In our case, population size is set to 50. Inertia weight factor ω is a trade-off between global and local exploration capabilities of the swarms. It is set according to Eq. (24), where ωmax¼1.2 and ωmin ¼0.4. 0 200 400 600 800 1000 0 20 40 60 80 samples T2 0 200 400 600 800 1000 0 10 20 30 samples SPE T2 statistic T2 lim SPE statistic SPElim Fig. 4. Statistics of classical PCA. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.) 0 200 400 600 800 1000 0 20 40 60 80 samples T2 0 200 400 600 800 1000 0 10 20 30 samples SPE T2 statistic T2 lim SPE statistic SPElim Fig. 5. Statistics of PCA-WD without parameter optimization. S. Zhang et al. / ISA Transactions 68 (2017) 313–326 321
  10. 10. The lower and upper bounds of pd, pd min and pd max are set ac- cording to Table 4. The limits of velocity change must be within a reasonable bound. We set =v p /2d max d max and = −v p /2d min d max , so as to avoid over exploration. The acceleration constants c1 and c2 represent the weights of stochastic acceleration terms toward local and global best, re- spectively. In our case, c1 ¼1.2 and c2 ¼1.2. Weighting factor β = 6.8. Actually, the parameter selection of PSO is a rather broad topic. This paper focuses on the application of PSO, instead of the PSO algorithm itself. Its parameters are selected through sample cal- culation analysis. The optimization process is shown in Fig. 6, where the objective function converges to its maximum value, 1.04, after 20th iteration. Meanwhile, FDR¼0%, MDR¼0%, SNRT2 ¼18.13 and SNRSPE¼16.50 when the parameter vector gets its optimal value. The solution to the optimization problem, Pr ¼ [ ¯ ¯ … ¯ ]p p p, , ,1 2 7 , contains the optimal parameter configuration of the PCA-WD algorithm, as shown in Table 5. For comparison, the T2 and SPE statistics in terms of the ver- ification data set, with classical PCA and optimized PCA-WD, are shown in Fig. 7(a) and (b), respectively, where the blue solid line represents the real-time statistics and the red dash line denotes the corresponding control limit. During the faultless conditions, as 1–500 and 801–1000, the T2 and SPE statistics of classical PCA fluctuate heavily. It is the source of false detection under faulty condition and missed detection under faultless condition. On the contrary, the optimized PCA-WD has the ability to achieve precise fault detection (FDR¼0% and MDR¼0%); because the WD part can eliminate the effect of the noise of T2 and SPE statistics dramati- cally. The performance criteria of classical PCA and optimized PCA- WD are listed in Table 6 for quantitative comparison purpose. FDR and MDR are the core criteria of fault detection algorithms. In Table 6, the FDR and MDR of T2 and SPE of classical PCA are 9.83% and 10.8%; comparatively, both FDR and MDR of the optimized PCA-WD are zero. The results show that the optimized PCA-WD can improve the fault detection performance greatly. The results from PCA-WD with and without parameter opti- mization are similar, because the two algorithms are identical and the only difference is the way to determine the parameters. The PCA-WD with optimization excels in getting the optimal para- meters in an automatical and deterministic way. 6. Comparative studies The above section demonstrates the application of PCA-WD and makes a comparative analysis between classical PCA, PCA-WD 0 10 20 30 40 50 0.2 0.4 0.6 0.8 1 1.2 iteration objectivefunction Fig. 6. Objective function value. Table 5 Optimal parameters of PCA-WD. Component Parameter Value ¯p1 Wavelet species db15 ¯p2 Sliding window step 22 ¯p3 Threshold method soft ¯p4 Threshold selection rule sqtwolog ¯p5 Threshold rescaling method sln ¯p6 Decomposition level 3 ¯p7 Sliding window length 256 0 200 400 600 800 1000 0 20 40 60 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 20 40 60 samples SPE SPE statistic SPElim (a) classical PCA 0 200 400 600 800 1000 10 20 30 40 50 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 20 40 60 samples SPE SPE statistic SPElim (b) PCA-WD with parameter optimization Fig. 7. T2 and SPE statistics with classical PCA and optimized PCA-WD. (For inter- pretation of the references to color in this figure caption, the reader is referred to the web version of this paper.) Table 6 Performance comparison between classical PCA and optimized PCA-WD. Statistics FDR (%) MDR (%) SNR T2 SPE T2 SPE T2 SPE Classical PCA 1.5 8.33 0 10.8 – – Optimized PCA-WD 0 0 0 0 18.13 16.50 Table 7 Fault descriptions. Study cases Fault description Fault samples Normal condition – – Bias fault =d 8%1 501–800 Drift fault = ( − )⁎d k0.05 3002 501–800 Broken line =d 03 501–800 S. Zhang et al. / ISA Transactions 68 (2017) 313–326322
  11. 11. without optimization and PCA-WD with optimization. However, the results are obtained only with constant deviation fault; logi- cally, it cannot guarantee the electiveness of PCA-WD under other kinds of faults. To thoroughly test the advantages of optimized PCA-WD, some comparative studies between classical PCA, SWPCA, and optimized PCA-WD are to be carried out. Specifically, four study cases, as listed in Table 7, are used. Study Case 1: Normal condition without any fault. Study Case 2: Bias fault is added to S5 from 501 to 800 with a amplitude of 8% of its mean value. Study Case 3: Drift fault is added to S2 from 501 to 800; and its amplitude varies linearly with time, as = × ( − )d k0.05 300 , where d is the fault amplitude and k is the sample number. Study Case 4: Broken line fault is introduced to S8 from 501 to 800; which is implemented by setting the amplitude of S8 to zero. For fair comparisons between the algorithms, the same training data set in the above section is applied to the algorithms. More- over, the threshold of CPV l and confidence limit α are set to 85% and 99%, respectively; and the sliding window length of SWPCA is set to 500. Here, the PCA-WD takes the optimized parameters as shown in Table 5 in the following investigations. 0 200 400 600 800 1000 0 10 20 30 40 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 5 10 15 20 samples SPE SPE statistic SPElim (a) statistics of PCA 0 200 400 600 800 1000 0 10 20 30 40 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 5 10 15 20 samples SPE SPE statistic SPElim (b) statistics of SWPCA 0 200 400 600 800 1000 0 10 20 30 40 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 5 10 15 20 samples SPE SPE statistic SPElim (c) statistics of PCA-WD Fig. 8. Fault detection results under normal condition. 0 200 400 600 800 1000 0 50 100 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 10 20 30 samples SPE SPE statistic SPElim (a) statistics of PCA 0 200 400 600 800 1000 0 50 100 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 10 20 30 40 samples SPE SPE statistic SPElim (b) statistics of SWPCA 0 200 400 600 800 1000 0 50 100 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 10 20 30 samples SPE SPE statistic SPElim (c) statistics of optimized PCA-WD Fig. 9. Fault detection results under bias fault. S. Zhang et al. / ISA Transactions 68 (2017) 313–326 323
  12. 12. 6.1. Study Case 1: normal condition Firstly, the original operation data set of the FWTP under nor- mal condition, from SIS of the plant, is directly taken as verifica- tion data set. It is then applied to classical PCA SWPCA, and PCA- WD with optimization, respectively; and the results are compared in Fig. 8. If the fault criterion is strictly applied, both classical PCA and SWPCA cause false detection under faultless conditions; on the contrary, PCA-WD operates normally without any false alarm (FDR¼0%). Consequently, under normal conditions, the optimized PCA-WD shows better fault detection performance comparing with the classical PCA and SWPCA, due to its ability to restrain the fluctuation of T2 and SPE statistics. 6.2. Study Case 2: bias fault Secondly, a bias fault is deliberately introduced to S5 from 501 to 800 with an amplitude of 8% of its mean value to form a ver- ification data set. The data set is then respectively applied to the 3 fault detection algorithms with the results as shown in Fig. 9. Under the faultless conditions, as 1–500 and 801–1000, both classical PCA and SWPCA bring false alarms. And the two algo- rithms also cause missed detections under the faulty condition from 501 to 800. Fig. 9 shows that the optimized PCA-WD works well under both faultless and faulty conditions. In a word, the optimized PCA-WD outperforms the classical PCA and SWPCA in dealing with bias fault. 0 200 400 600 800 1000 0 50 100 150 200 250 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 20 40 60 samples SPE SPE statistic SPElim (a) statistics of PCA 0 200 400 600 800 1000 0 50 100 150 200 250 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 20 40 60 samples SPE SPE statistic SPElim (b) statistics of SWPCA 0 200 400 600 800 1000 0 50 100 150 200 250 samples T2 T2 statistic T2 lim 0 200 400 600 800 1000 0 20 40 60 samples SPE SPE statistic SPElim (c) statistics of PCA-WD Fig. 10. Fault detection results under drift fault. 0 200 400 600 800 1000 0 200 400 600 samples T2 0 200 400 600 800 1000 0 10 20 30 40 samples SPE T2 statistic T2 lim SPE statistic SPElim (a) statistics of PCA 0 200 400 600 800 1000 0 200 400 600 samples T2 0 200 400 600 800 1000 0 50 100 samples SPE T2 statistic T2 lim SPE statistic SPElim (b) statistics of SWPCA 0 200 400 600 800 1000 0 250 500 samples T2 0 200 400 600 800 1000 0 10 20 30 40 samples SPE T2 statistic T2 lim SPE statistic SPElim (c) statistics of PCA-WD method Fig. 11. Fault detection results under broken line fault. S. Zhang et al. / ISA Transactions 68 (2017) 313–326324
  13. 13. 6.3. Study Case 3: drift fault Thirdly, a drift fault is added to S2 from 501 to 800 to form a data set for comparative study, whose amplitude varies linearly with time. The data set is then respectively applied to classical PCA, SWPCA and optimized PCA-WD respectively. The results are shown in Fig. 10. Again, the classical PCA, SWPCA cause false detections and missed detections under faultless and faulty conditions; comparatively, the optimized PCA-WD is capable of dealing with the drift fault. 6.4. Study Case 4: broken line fault Finally, a broken line fault is added to S9 from 501 to 800 to test the three algorithms. The broken line fault is simulated by setting the value of S9 to zero during the faulty period. The results are shown in Fig. 11. It can be seen that the optimized PCA-WD is much better than classical PCA and SWPCA under this condition. With a careful analysis, we find that even the optimized PCA-WD brings relatively high FDR, specifically, its SPE FDR reaches 1.6%. Fortunately, the MDRs of T2 and SPE statistics keep zero too; they are actually the key indexes of the reliability of the optimized PCA- WD algorithm. The reason is that broken line fault causes strong signal jumps to the corresponding sensor, which in fact decreases the performance of the objected fault detection algorithms with- out exception of the optimized PCA-WD. Furthermore, the performance criteria of the three algorithms with the four cases are listed in Table 8 for mathematical com- parison purpose. When applied to the FWTP of a coal-fired power plant, the optimized PCA-WD performs much better than classical PCA and SWPCA, under the conditions with above four typical faults. Under normal condition, the optimized PCA-WD works well without any false detection and missed detection. In terms of the conditions with the bias fault and drift fault, the optimized PCA- WD brings false detection too; but its FDRs are acceptable and much lower than classical PCA and SWPCA. Under the conditions with broken line fault, the performance of optimized PCA-WD is decreased by the strong signal jumps. Specifically, the false de- tections and missed detections are always happened along with the samples where the signal changes present. In practice, the signals of the feed water treatment process do not fluctuate vio- lently, unlike what happened in the above simulation studies. Somehow, it will improve the performance of the optimized PCA- WD algorithm, and makes the newly proposed algorithm applic- able in field application. 7. Conclusion Feed water treatment process is a vital sub-process of an utility boiler, in practice, sensor faults tend to cause severe consequences. An effective fault detection algorithm is much needed to improve the reliability of FWTPs. Classical PCA has been employed to the FWTP in our previous work; however, the noises of T2 and SPE statistics lead to relatively high rate of false detections and missed detections. In this paper, wavelet denoise is used to deal with this problem. Specifically, the WD is combined with PCA to form a new PCA-WD algorithm. The performance of PCA-WD is sensitive to its parameters; and the parameter selection of this compound algo- rithm is difficult. This paper formulates the parameter selection PCA-WD as an optimization problem and employs PSO to deal with its nonlinearity and complexity. A FWTP from a coal-fired power plant is taken as a study case. The real operation data of the FWTP is collected to verify the PCA-WD algorithm. The result shows that WD is effective to restrain the noises of T2 and SPE statistics so as to improve the performance of PCA-WD algorithm. And the parameter optimization can obtain the optimal para- meters of PCA-WD in an automatic way; and thus relive the de- pendence on individual's experience. The comparative studies between classical PCA, SWPCA and optimized PCA-WD algorithms, in terms of four kinds of faults, are finally carried out. The opti- mized PCA-WD excels its opponent under all the conditions. The results further convince the advantages of the newly proposed PCA-WD algorithm. However, this paper mainly focuses on the application of PCA-WD algorithm and does not present the theo- retical analysis concerning the denoising of the PCA statistics. In our future work, we plan to establish a solid base for our PCA-WD algorithm with strict theoretical proof. Acknowledgment This project is supported by National Natural Science Founda- tion of China (Grant No. 51475337) and International Science Technology Cooperation Program of China (Grant No. 2015DFG72440), and is partially supported by the Open Research Fund of Key Laboratory of Transients in Hydraulic Machinery, Ministry of Education. References [1] Yin S, Ding SX, Xie X, Luo H. A review on basic data-driven approaches for industrial process monitoring. IEEE Trans Ind Electron 2014;61(11):6418–28. [2] Ge Z, Song Z, Gao F. Review of recent research on data-based process mon- itoring. Ind Eng Chem Res 2013;52(10):3543–62. [3] Dong H, Wang Z, Gao H. Fault detection for Markovian jump systems with sensor saturations and randomly varying nonlinearities. IEEE Trans Circuits Syst I—Regul Pap 2012;59(10):2354–62. [4] Chen KY, Chen LS, Chen MC, Lee CL. Using SVM based method for equipment fault detection in a thermal power plant. Comput Ind 2011;62(1):42–50. [5] Hong JJ, Zhang J, Morris J. Progressive multi-block modelling for enhanced fault isolation in batch processes. J Process Control 2014;24(1):13–26. [6] Widodo A, Yang BS. Wavelet support vector machine for induction machine fault diagnosis based on transient current signal. Expert Syst Appl 2008;35(1):307–16. [7] Yin S, Ding SX, Haghani A, Hao H, Zhang P. A comparison study of basic data- driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. J Process Control 2012;22(9):1567–81. [8] MacGregor JF, Kourti T. Statistical process control of multivariate processes. Control Eng Pract 1995;3(3):403–14. [9] Xu X, Xiao F, Wang S. Enhanced chiller sensor fault detection, diagnosis and estimation using wavelet analysis and principal component analysis methods. Appl Thermal Eng 2008;28(2):226–37. [10] Jackson JE, Mudholkar GS. Control procedures for residuals associated with principal component analysis. Technometrics 1979;21(3):341–9. [11] Schölkopf B, Smola AJ, Müller K. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 1998;10(5):1299–399. [12] Jia M, Xu H, Liu X, Wang N. The optimization of the kind and parameters of kernel function in KPCA for process monitoring. Comput Chem Eng 2012;46(15):94–104. [13] Li CH, Lin CT, Kuo BC, Chu HS. An automatic method for selecting the para- meter of the RBF kernel function to support vector machines. In: 2010 IEEE international geoscience remote sensing symposium, Honolulu, Hawaii, 25– 30 July 2010. p. 836–9. Table 8 FDR and MDR of different methods. Cases Algorithms FDR (%) MDR (%) T2 SPE T2 SPE Normal Classical PCA 1.4 5.2 – – SWPCA 0.5 3.5 – – PCA-WD 0 0 – – Bias fault Classical PCA 1.33 11.67 0.25 2.5 SWPCA 0.5 3.5 0.75 17.25 PCA-WD 0.33 0 0 0.5 Drift fault Classical PCA 1.5 4.33 0 0.5 SWPCA 0.5 2.67 0 69 PCA-WD 0.33 0 0 0 Broken line Classical PCA 1.11 4.9 0 0 SWPCA 0.29 3.14 0 0 PCA-WD 0 1.6 0 0 S. Zhang et al. / ISA Transactions 68 (2017) 313–326 325
  14. 14. [14] Ding S, Zhang P, Ding E, Naik A, Deng P, Gui W. On the application of PCA technique to fault diagnosis. Tsinghua Sci Technol 2010;15(2):138–44. [15] Lu N, Wang F, Gao F. Combination method of principal component and wa- velet analysis for multivariate process monitoring and fault diagnosis. Ind Eng Chem Res 2003;42(18):4198–207. [16] Zhang Y, Ma C. Fault diagnosis of nonlinear processes using multiscale KPCA and multiscale KPLS. Chem Eng Sci 2011;66(1):64–72. [17] Xu J, Hu S. Nonlinear process monitoring and fault diagnosis based on KPCA and MKL-SVM. In: 2010 international conference on artificial intelligence and computational intelligence(AICI2010), Sanya, China, 23–24 October 2010. p. 233–7. [18] Chen D, Li Z, He Z. Research on fault detection of Tennessee Eastman process based on PCA. In: 25th Chinese control and decision conference (CCDC2013), Guiyang, China, 25–27 May 2013. p. 1078–81. [19] Li G, Alcala CF, Qin SJ, et al. Generalized reconstruction-based contributions for output-relevant fault diagnosis with application to the Tennessee Eastman process. IEEE Trans Control Syst Technol 2011;19(5):1114–27. [20] Hu Z, Chen Z, Gui W, Jiang B. Adaptive PCA based fault diagnosis scheme in imperial smelting process. ISA Trans 2014;53:1446–55. [21] Liu K, Jin X, Fei Z, Liang J. Adaptive partitioning PCA model for improving fault detection and isolation. Chinese J Chem Eng 23(6), 2015, 981–991. http://dx. doi.org/10.1016/j.cjche.2014.09.052. [22] Jaffel I, Taouali O, Elaissi E, Messaoud H. A new online fault detection method based on PCA technique. IMA J Math Control Inf 31, 2014, 487–499. http://dx. doi.org/10.1093/imamci/dnt025. [23] Wang X, Kruger U, Irwin GW. Process monitoring approach using fast moving window PCA. Ind Eng Chem Res 2005;44(15):5691–702. [24] Kim D, Lee IB. Process monitoring based on probabilistic PCA. Chemom Intell Lab Syst 2003;67(2):109–23. [25] Lau CK, Ghosh K, Hussain MA, Hussan CRC. Fault diagnosis of Tennessee Eastman process with multi-scale PCA and ANFIS. Chemom Intell Lab Syst 2013;120:1–14. [26] Song B, Shi H, Ma Y, Wang J. Multisubspace principal component analysis with local outlier factor for multimode process monitoring. Ind Eng Chem Res 2014;53(42):16453–64. [27] Dunia R, Qin S. A unified geometric approach to process and sensor fault identification and reconstruction: the unidimensional fault case. Comput Chem Eng 1998;22(7–8):927–43. [28] Dunia R. Identification of faulty sensors using principal component analysis. AIChE J 1996;42(10):2797–812. [29] Qin S, Yue H, Dunia R. Self-validating inferential sensors with application to air emission monitoring. Ind Eng Chem Res 1997;36:1675–85. [30] Harkat M, Mourot G, Ragot J. An improved PCA scheme for sensor FDI: application to an air quality monitoring network. J Process Control 2006;16:625–34. [31] Fan J, Qin S, Wang Y. Online monitoring of nonlinear multivariate industrial processes using filtering KICA-PCA. Control Eng Pract 2014;22:205–16. [32] Shao R, Hu W, Wang Y, Qi X. The fault feature extraction and classification of gear using principal component analysis and kernel principal component analysis based on the wavelet packet transform. Measurement 2014;54:118–32. [33] Kim H, Melhem H. Damage detection of structures by wavelet analysis. Eng Struct 2004;26(3):347–62. [34] Donoho DL. Denoising by soft-thresholding. IEEE Trans Inf 1995(3):613–27. [35] Mallat S. A theory for multiresolution signal decomposition: the wavelet re- presentation. IEEE Trans Pattern Anal Mach Intell 1989;11(7):674–93. [36] Li N, Zhou R, Hu Q, Liu X. Mechanical fault diagnosis based on redundant second generation wavelet packet transform, neighborhood rough set and support vector machine. Mech Syst Signal Process 2012;28:608–21. [37] del Valle Y, Venayagamoorthy G, Mohagheghi S, Hernandez J, Harley R. Par- ticle swarm optimization: basic concepts, variants and applications in power systems. IEEE Trans Evol Comput 2008;12:171–95. [38] Kennedy James. Particle swarm optimization. Encyclopedia of machine learning.US: Springer; 2010. p. 760–6. [39] Kennedy J, Eberhart RC. A discrete binary version of the particle swarm op- timizer. In: IEEE conference on computational cybernetics and simulation, vol. 5, 1997. p. 4104–8. [40] Laskari EC, Parsopoulos KE, Vrahatis MN. Particle swarm optimization for integer programming. In: IEEE congress on evolutionary computation, 2002. p. 1582–7. [41] Liao CJ, Tseng CT, Luarn P. A discrete version of particle swarm optimization for flowshop scheduling problems. Comput Oper Res 2007;34(10):3099–111. [42] Datta D, Figueira JR. A real-integer-discrete-coded particle swarm optimiza- tion for design problems. Appl Soft Comput 2011;11(4):3625–33. S. Zhang et al. / ISA Transactions 68 (2017) 313–326326

×