Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A KPI-based process monitoring and fault detection framework for large-scale processes

83 views

Published on

Large-scale processes, consisting of multiple interconnected sub-processes, are commonly encountered in industrial systems, whose performance needs to be determined. A common approach to this problem is to use a key performance indicator (KPI)-based approach. However, the different KPI-based approaches are not developed with a coherent and consistent framework. Thus, this paper proposes a framework for KPI-based process monitoring and fault detection (PM-FD) for large-scale industrial processes, which considers the static and dynamic relationships between process and KPI variables. For the static case, a least squares-based approach is developed that provides an explicit link with least-squares regression, which gives better performance than partial least squares. For the dynamic case, using the kernel re- presentation of each sub-process, an instrument variable is used to reduce the dynamic case to the static case. This framework is applied to the TE benchmark process and the hot strip mill rolling process. The results show that the proposed method can detect faults better than previous methods.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A KPI-based process monitoring and fault detection framework for large-scale processes

  1. 1. Research article A KPI-based process monitoring and fault detection framework for large-scale processes Kai Zhang a,b , Yuri A.W. Shardt a,c,n , Zhiwen Chen a , Xu Yang b , Steven X. Ding a , Kaixiang Peng b a Institute for Automatic Control and Complex Systems, University of Duisburg-Essen, Bismarckstrasse 81 BB, 47057 Duisburg, Germany b Key Laboratory for Advanced Control of Iron and Steel Process, School of Automation and Electrical Engineering, University of Science and Technology of Beijing, 100083 Beijing, PR China c Department of Chemical Engineering, Faculty of Engineering, University of Waterloo, E6-3006, 200 University Ave. W., Waterloo, ON, Canada N2L 3G1 a r t i c l e i n f o Article history: Received January 2016 Received in revised form 23 November 2016 Accepted 30 January 2017 Available online 9 February 2017 Keywords: KPI Large-scale Process monitoring Least square Data-driven subspace a b s t r a c t Large-scale processes, consisting of multiple interconnected subprocesses, are commonly encountered in industrial systems, whose performance needs to be determined. A common approach to this problem is to use a key performance indicator (KPI)-based approach. However, the different KPI-based approaches are not developed with a coherent and consistent framework. Thus, this paper proposes a framework for KPI-based process monitoring and fault detection (PM-FD) for large-scale industrial processes, which considers the static and dynamic relationships between process and KPI variables. For the static case, a least squares-based approach is developed that provides an explicit link with least-squares regression, which gives better performance than partial least squares. For the dynamic case, using the kernel re- presentation of each subprocess, an instrument variable is used to reduce the dynamic case to the static case. This framework is applied to the TE benchmark process and the hot strip mill rolling process. The results show that the proposed method can detect faults better than previous methods. & 2017 ISA. Published by Elsevier Ltd. All rights reserved. 1. Introduction Large-scale processes are ubiquitous features of many chemical, steelmaking, papermaking and semiconductor manufacturing plants [1]. Such large-scale processes consist of many interacting subprocesses which increase the overall control complexity [2]. Due to increasing demands for quality, a greater emphasis on improving operating performance of these large-scale processes has occurred. A recently proposed approach to solve this problem has been the use of key performance indicators (KPIs) to analyse the process performance [2,3]. This approach has been shown to improve product quality and effectively detect process faults. Hao et al. [2] showed that industrial KPIs can be classified into three groups: 1) engineering KPIs that refer to the technical perfor- mance of the plant, for example, product quality; 2) maintenance KPIs that refer to the operating rate and hence maintenance time and costs; and 3) economic KPIs that refer to business profit, for example, the overall energy consumption or the productivity of a plant. It has been shown that KPIs are closely related to the measurable process variables, but can often be difficult to directly measure, for example, the concentration in a chemical process or the thickness of a steel roll between two stands in the steel mill process. KPI-based fault detection partitions the observed faults into two groups: those which affect the KPI and hence are called KPI-relevant faults and those which do not affect the KPI and hence are called KPI-irrelevant faults. For obvious reasons, KPI- based fault detection tends to focus on dealing with KPI-relevant faults. Single unit process monitoring and fault detection (PM-FD) can be performed using both accurate process models and data-driven ones [5,4]. Unfortunately, KPI-based process monitoring requires the availability of complex first-principles models, which may not often be available. This unavailability of detailed process models requires that data-driven approaches that can handle large num- ber of inputs be considered [7,11,10,8,9]. KPI-based PM-FD of large-scale processes will consider the process which consists of many control loops, sensors and actuators. In general, the methods of monitoring KPIs can be classified based on how they are Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/isatrans ISA Transactions http://dx.doi.org/10.1016/j.isatra.2017.01.029 0019-0578/& 2017 ISA. Published by Elsevier Ltd. All rights reserved. n Corresponding author at: Department of Chemical Engineering, Faculty of En- gineering, University of Waterloo, E6-3006, 200 University Ave. W., Waterloo, ON, Canada N2L 3G1. E-mail addresses: kai.zhang@uni-due.de, kaizhang@ustb.edu.cn (K. Zhang), yuri.shardt@uni-due.de, yuri.shardt@uwaterloo.ca (Y.A.W. Shardt), zhiwen.chen@uni-due.de (Z. Chen), xu.yang@ustb.edu.cn (X. Yang), steven.ding@uni-due.de (S.X. Ding), kaixiang@ustb.edu.cn (K. Peng). ISA Transactions 68 (2017) 276–286
  2. 2. constructed into static and dynamic cases. In the static case, the KPI is assumed to be affected only by the current process measurements and can be often encountered in simple economic-based KPIs, where the current values of the in- puts and outputs produce the overall instantaneous profit of the plant. For solving the static KPI case, there exist multiple different methods [13]. One of the most common methods, partial least squares (PLS), first proposed by Wold et al. [14,15] for dealing with collinearity in the linear regression problem [12], has had a wide field of application, including soft-sensor design [16] and PM-FD [7,13]. Recently, various enhancements of this approach, specifi- cally for PM-FD application, have led to the development of total PLS (TPLS) [17] and concurrent PLS (CPLS) [18]. Both of these methods seek to take into consider the metrics by which PM-FD is measured, that is, the fault detection rate (FDR), which measures the number of successes, and the false alarm rate (FAR), which measures the number of failures [5]. However, both methods are based on the PLS model and only used postmodifications to im- prove the overall performance. Actually, making changes in the approach before implementing the PLS method can also provide a significant improvement. Furthermore, PLS has been extended to consider nonstationary [19,20], nonlinear [22,21], and dynamic [23] systems. In the dynamic case, the KPI is assumed to be affected not only by the current process measurements, but also by past measure- ments. Such cases can often be encountered when dealing with complex systems whose behaviour can depend on multiple un- known variables. In such cases, it is assumed that a state-space model can describe well the dynamic relationships [24,23,25]. Using the developed state-space model, a residual generator using either parity space (PS) or a diagnostic observer (DO) can be de- veloped to implement PM-FD. Using a data-driven approach, Ding et al. [5] have developed the data-driven PS (DPS) and DO (DDO), which only require the process data. The vectors used for gen- erating the residuals are called the kernel representation of a process [5]. Applications of this method include the data-driven PM-FD of a complex hot strip mill rolling (HSMR) process [3], wherein all the process variables were used as inputs and all KPIs as outputs to develop a numerically stable solution to the problem. Recently, Shardt et al. [26] extended this approach by in- corporating a soft-sensor-based PM-FD to deal with infrequent KPI data. Although these approaches are simple to understand, they have not taken into consideration the dynamics in each individual subprocess. Thus, based on the above observations, the objectives and contributions of this paper are: for the static case, to develop a least-squares regression-based approach for KPI-based PM-FD that can be shown to perform better than the widely used PLS-based methods; for the dynamic case, to develop a dynamic method that can incorporate the relationships between the individual sub- processes, as well as between the KPIs and process variables; to apply the proposed approaches to the TE benchmark process and a real HSMR process where the KPI-based PM-FD problem can be successfully addressed; and to compare the performance of the proposed and existing methods using these two processes. The paper is organised as follows. Section 2 introduces the newly developed theory including least-squares-based PM-FD for static and dynamic methods based on the DPS and the char- acteristics of large-scale processes, while Sections 3 and 4 focus on the industrial applications. In Section 3, the static method is tested on the Tennessee Eastman benchmark process, while in Section 4, the dynamic method is tested on a real HSMR process. Finally, Section 5 presents the key conclusions and future direction. 2. Theory 2.1. Background Consider a large-scale process, where each single subprocess can be expressed using a time-invariant, state-space model as ( + ) = ( ) + ( ) + ( ) = ( ) + ( ) + ( ) ( ) k A k B k k k C k D k k x x u w y x u v 1 1 where ( )kw and ( )kv are independent, Gaussian distributed vec- tors; and ∈u l is the zero-mean process input, ∈y m the zero- mean process output, and ∈x n the zero-mean state variables. Consider a process consisting of M subprocesses. Let ∈( )u i li be the process input vector and ∈( )y i mi the output vector in the ith controlled subprocess. Let θ ∈ κ be the KPI of the complete process. Clearly, θ should be closely related to all the system inputs and outputs. Two different cases need to now be considered: the static case, where the KPI is assumed to be time-independent, and the dynamic case, where the KPI is assumed to depend on the current and past process values. When designing KPI-based PM-FD systems, it is commonly assumed that the static case applies, since then simple regression methods can be used to obtain the model. However, this approach requires that the measurements be nor- mally distributed, time-independent, and available when required. Such a situation may not hold with many commonly encountered areas, where variables may not be normally distributed or avail- able when required. One way around this problem of missing in- formation is to design a soft sensor to provide an estimated value of the process measurements that are then fed into the static KPI system. Another approach is to directly design a dynamic KPI system that can, not only monitor the process, but also estimate the required parameters. This has the advantage that the errors involved are optimised so that the final KPI estimate is as accurate as possible. Finally, it can be noted that these methods can be applied to both small- and large-scale processes as long as the underlying conditions hold. 2.2. Static case Assume that at steady state ( )Σ∼ μu ,raw u u , ( )Σ∼ μy ,raw y y and ( )θ Σ∼ μθ θ,raw . For the sake of simplicity, the zero-centred form, i.e. u, y and θ are commonly used. In the static method, the relationship can be mathematically modelled as ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎡ ⎣⎢ ⎤ ⎦⎥ ( ) ( ) ( ) ( ) θ θ θ ζ ζΨ Ψ= ⋮ = ⋮ + = + ( ) κ u y u y u y 2 M M 1 1 1 where Ψ ∈ κ×(∑ + )m li i i shows the embedded relationship and ζ ∈ κ the noisy component, which is independent of the process measurements. Let ⎡ ⎣ ⎤ ⎦=z u y , then the PM-FD objective for this case can be written by first developing an estimation of Ψ^ . A direct solution makes use of least-squares estimation. Based on the his- torical data matrices,  = [ … ]z z, , N1 and θ θΘ = [ … ], , N1 , the least- squares solution can be written as  ( )Ψ Θ^ = ( ) − 3LS T T 1 Note in the case that T is numerically unstable, a pseudo-in- verse can be performed to give    ( ) Λ= ( ) † − P P 4 T T1 K. Zhang et al. / ISA Transactions 68 (2017) 276–286 277
  3. 3. where       ⎡ ⎣ ⎤ ⎦ ⎡ ⎣ ⎤ ⎦ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥= ˜ Λ ˜ P PT P P0 T T is obtained using a singular value decomposition (SVD). In the next step, define a projector  such that  θ Ψ= → ^ = ^ ( )θ θz z z 5LS LS Performing a QR-decomposition on Ψ^ LS gives ⎡⎣ ⎤⎦ ⎡ ⎣ ⎤ ⎦Ψ^ = θ θQ QLS T R 01 2 , where ∈θ κ(∑ + )×Q m l 1 i i i . In this case, it can be proven that  = θ θQ Q T 1 1 holds. Thus, θz can be assumed to be an image of θ in z, that is, it is an image of the KPI-relevant part. For the online PM- FD, this part can be implemented to replace the KPI measure- ments. Since both z and θ are multivariate Gaussian distributed, the T2 test statistic can be developed based on θz : Σ= ( )θ θ θ − θ T z z 6s T z, 2 1 The threshold Jth s, is determined based on ( ) ( ) ( ) κ κ− κ κ κ α − − F N, N N N 2 , then the decision is made according to ⎪ ⎪ ⎧ ⎨ ⎩ ≤ ⇒ − > ⇒ ( ) θ θ T J T J KPI is fault free KPI is faulty 7 s th s s th s , 2 , , 2 , Two different cases are considered in order to examine the properties of the static method and PLS-based methods. The first case examines the case where the KPI is sufficiently captured by the given models [5], while the second case examines the case where this is not true. The two cases can be distinguished based on the number of principal components, υ, selected in the PLS model [14]. As well, a better practice for developing PLS models for fault detection will be given for the second case. Theorem 1. If θ can be completely modelled by both methods, that is Ψ Ψ Ψ^ = ^ =LS PLS [5], then LS will give a better performance than PLS. Proof. In this case,  Ψ Θ Ψ Θ^ = ^ = ^ = ^ LS LS PLS PLS. Eq. (6) can be re- written as ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ∑θ θ σ χ κ Θ Θ = ^ ^ ^ − ^ = ∼ ( ) ( ) θ κ − = T N t 1 8 s LS T LS LS T LS i LS i LS i , 2 1 1 , 2 , 2 2 where σLS i, 2 denotes the ith singular value of Θ Θ^ ^ −N 1 LS LS T , and tLS the κ principal components of θ^ LS. Based on [5] and Lemma 3 in [17], there exists a linear relationship ∈ υ υ×M such that ⎡ ⎣⎢ ⎤ ⎦⎥ =θ θ⊥ MtPLS t t PLS PLS , , is the score of the PLS model. This implies that the T2 statistic in the PLS-based method can be rewritten as ( ) ( ) ∑ ∑ ∑ Σ σ Σ Σ Σ σ σ χ υ = = = = + = + = + ∼ ( ) υ θ θ θ θ κ θ θ κ υ θ θ θ θ − = − − − = = + θ θ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ T T T t t t t M M M Mt t t t t t t 9 PLS PLS T PLS i PLS i PLS i PLS T T T PLS PLS T PLS PLS T PLS i PLS i PLS i i PLS i PLS i PLS PLS t t t t 2 1 1 , 2 , 2 1 , 1 , , 1 , 1 , , 2 , , 2 1 , , 2 , , 2 , 2 , 2 2 PLS PLS PLS PLS, , where υ κ≥ holds. Likewise in least-squares, θTPLS, 2 can be rewritten as ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ( ) ∑ θ θ σ χ κ Θ Θ = = ^ ^ ^ − ^ ∼ ( ) θ κ θ θ= − T N t 1 10 PLS i PLS i PLS i PLS T PLS PLS T PLS, 2 1 , , 2 , , 2 1 2 Thus, Eq. (8) will equal the first part of the right-hand side of Eq. (9), namely =θ θT TPLS s, 2 , 2 . In the case of a fault that can directly affect the KPI, this part will be straightforwardly influenced with θ θ^ = ^ LS f PLS f, , . It is clear that χ κ χ υ ▵ ( ) ≥ ▵ ( ) ( ) θ α α T T 11 s PLS, 2 2 2 2 where ( )▵ = −θ θ θT T E Ts s f s, 2 , , 2 , 2 and ( )▵ = ▵ = −θ θ θT T T E TPLS PLS PLS f PLS 2 , 2 , , 2 , 2 , is the change of θTs, 2 and TPLS 2 compared with the fault-free scenarios in this case, χ κ χ υ( ) ≤ ( )α α 2 2 are the thresholds of the LS- and PLS- based methods. The use of Eq. (11) to evaluate the sensitivity of the two detection methods was defined in [5]. Here, it shows that θTs, 2 is more sensitive to KPI-related faults. In the case of a fault that affects only θ⊥TPLS, 2 , then χ κ χ υ ▵ ( ) < ▵ ( ) ( ) θ α α T T 12 s PLS, 2 2 2 2 where ▵ = ▵ > ▵ =θ θ⊥T T T 0PLS PLS s 2 , 2 , 2 . Eq. (12) shows that PLS-based methods will cause a larger amount of false alarms for KPI-irre- levant faults. □ Theorem 2. If the PLS model does not completely capture the fault- relevant models, then LS give a better performance than PLS-based methods. Proof. In this case, it is assumed that tPLS have not included all the KPI-related information in z. Let ˜tPLS denote the part uncovered by PLS which can still affect the KPI. If a fault occurs in this part, the PLS-based method will not detect it. However, LS can capture this part and detect this fault. It can be noted that in [17] and [18], two improvements on PLS, C-PLS and T-PLS, were proposed. However, both cannot determine whether PLS has captured most of the KPI- related information in z or not, and blindly consider the noisy part have potential for monitoring the KPI. It is certain that a large amount of false alarms will be produced. Furthermore, the two methods perform a simple principal component analysis on the KPI-irrelevant part of the PLS, and the residuals are assumed to include ˜tPLS. However, PCA can only isolate the principal part based on the variance in the components, thus it is possible that ˜tPLS will be assigned to the principal part. In this case, the two methods will not monitor well a KPI-relevant fault. In this case, in order to avoid a poorly constructed model [17,18], it is necessary to carefully select tPLS. This implies that, when using the cross-validation method [14] to build the models, the optimal υ should be selected based not only on prediction performance, but also on the ability to minimise the correlation between the process and KPI data, that is, ( )θ = ( )θ⊥E z 0 13PLS T , Note that if υ = ( )rank , PLS will be equivalent to LS. □ Thus, for the KPI-based PM-FD problem, the LS-based method is more efficient than the classic PLS-based one. Furthermore, the poor performance of many PLS-based methods can be partially attributed to the inappropriate choice of the principal components in PLS. Furthermore, the following points should be also borne in mind when using this method. K. Zhang et al. / ISA Transactions 68 (2017) 276–286278
  4. 4. Remark 1. It can be noted that when monitoring the KPI-irrele- vant part, namely θ⊥z , which is calculated by projecting z onto  = θ θ ⊥ Q Q T 2 2, a single T2 test on θ⊥z is not adequate due to the strong dynamics in this part. Instead, a further decomposition or postprocessing should be performed in order to model and monitor this component. Further details can be found in Li et al. [29]. Remark 2. A cross-validation-based method can be used to im- prove the robustness of the LS regression solution for online PM- FD [30]. 2.3. Dynamic case The dynamic counterpart of model (2) can be expressed as the following finite impulse response (FIR) model [29]. ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ( ) ( ) θ θ θ ζ ζΨ Ψ= ⋮ = ⋮ + = + ( ) κ ( ) ( ) u y u y z 14 d s s s M s M d s 1 1 1 M M 1 1 where …s sM1 is the time delay for each subprocess. Consider subprocess i such that >s ni i with ni representing the order of the subprocess. For this problem, it is clear that a direct least-squares solution is not sufficient, since u and y are coupled with each other in different subprocess and such a solution will not accu- rately capture the dynamics of the whole system. Motivated by the parity space methods in the model-based and data-driven-based methods [5], it is useful to define an instrument matrix that can successfully reflect the dynamic information of each subsystem. Based on Eq. (1), Ding [3,5] has proposed that, theoretically, the parity space equals the kernel representation of a dynamic system, such that the data-driven form of kernel representation is  ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ( ) ( ) ( ) ( ) ( ) ∀ = ( ) k k k k u x r u y , 0 , 15 s d s s s , where us is the process input data and ys the process output data in the time interval [ − ]k s k, and ( )kr represents the noise in- formation. One way to solve d s, is to use historical process data [3]. First, rewrite model (1) into the form ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ψ= + + ( )−X H U Y U W V 0 16 k s k s s k s k s w s k s k s , , , , , , where ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ψ Γ = ( + )I H 0 s s l u s s 1 , , ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ Γ = ⋮ C CA CA s s , ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ( ) ( ) ( ) ( ) = − … − + − ⋮ ⋱ ⋮ … + − k s k s N k k N U u u u u 1 1 k s, , ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ = ⋱ ⋮ ⋱ ⋱ …− H D CB CA B CB D 0 0u s s , 1 , ⎡ ⎣ ⎤ ⎦( ) ( )= − … − − +− k s k s NX x x 1k s , Yk s, and Hw s, have the same structure as Uk s, . Let  ψ= ⊥ d s s, , which leads to   ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥= + ( )H U Y W V 0 17 d s k s k s d s w s k s k s , , , , , , , Let   ⎡ ⎣ ⎢ ⎤ ⎦ ⎥= =− − − − − − U Y p k s s k s s k s s 1, 1, 1, , and performing a QR-decomposition gives [3,24] ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ = ( ) R R R R R R Q Q Q U Y 0 0 0 18 p k s k s , , 11 21 22 31 32 33 1 2 3 which leads to (19) ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥= + R R R R Q Q R Q U Y 0k s k s , , 21 22 31 32 1 2 33 3 and + = ( )H R QW V 20w s k s k s, , , 33 3 Further take a SVD on (21) ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡⎣ ⎤⎦ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ Λ = ˜R R R R VP P 0 R R R R T21 22 31 32 Thus,   ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = ∀ { ˜ } → = ˜ = ( ) row R R R R Q Q P U Y P 0 22 d s R T d s k s k s R T , , , , 21 22 31 32 1 2 which means that d s, could be any row of ˜PR T . Furthermore, it gives  ( )= + = ( )H R Qr W V 23d s y w s k s k s d s y, , , , , , , 33 3 where d s y, , equals the lower part of d s, . Regarding each sub- system, the data-driven kernel representations of the system can be obtained based on the historical data. Thus, the M residuals are collected ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ϒ = ⋮ ( ) ( ) r r M 1 , which can be then used as the instrument variables. Due to the complexity in the whole system, the ( )r i ’s are mutually correlated. Let the transformation matrix  be defined as   ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ( ) ( ) ⋱ d s d s M , 1 , M 1 . This gives  ΥΘ Ψ Ψ Ψ^ = ^ = ^ = ^ ( )Υ Υ 24d d s d d s, , Thus, dynamic process monitoring of this system can be di- vided into two steps: (1) calculating the instrument variable for each subsystem and (2) monitoring the whole process based on the static model between the instrument variables and the KPI. In step (2), the static model can be developed using the same method as Eqs. (3)–(5), where ⎡⎣ ⎤⎦ ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ Ψ^ =Υ Υ Υ Υ R Q Q 0d, ,1 ,2 .  =Υ Υ ΥQ QT ,1 ,1 . Let ϒ ϒ=θ Υ , then the T2 test statistic is: Σϒ ϒ= ( )θ θ θϒ − θ T 25d T , 2 1 In the dynamic KPI case, the decision is made based on ⎪ ⎪ ⎧ ⎨ ⎩ ≤ ⇒ − > ⇒ ( ) θ θ T J T J KPI is fault free KPI is faulty 26 d th d d th d , 2 , , 2 , with =J Jth d th s, , . Remark 3. The dynamic method requires a large effort when creating the kernel realisations, system order ni, and delay factor si K. Zhang et al. / ISA Transactions 68 (2017) 276–286 279
  5. 5. for each of the M subsystems. 3. Static case: application to the TE process 3.1. A brief description of the TE process The Tennessee Eastman (TE) process first proposed by Downs et al. [27] is a standard chemical engineering benchmark for control. A detailed process schematic is available in Downs et al. [27]. The process consists of four reactants (A, C, D, and E) that are fed into the reactor. After passing through the condenser, com- pressor, separator and the stripper, the reactants are converted into the products G and H in the reactor. The component F is assumed to be a by-product. As shown in [27], the process can be divided into eight subprocesses, which share a group of inputs (manipulated variables) and have eight different output variables. Thus, there are a total of 12 inputs and 41 output variables under consideration. To show the operational performance of the TE process, a KPI is defined as the operating cost of the overall pro- cess, that is, θ = + + +D V D V D W D Vraw p p pro pro comp comp s s where DP is the cost of operating the purge stream, Dcomp cost of operating the compressor, Ds the cost of operating the steam, Dpro the cost of the product stream, Vp the operating rate of the purge stream, Vpro the operating rate of the product stream, Vs the op- erating cost of the steam, and Wcomp the work that the compressor requires [27]. Downs et al. proposed a calculable cost equation for each part [27]. However, it can be noted that the KPI is not avail- able in real time, since the component cost is generally delayed by 6 to 15 min. Thus, it is necessary to design a KPI-based PM-FD method. Since it is most likely to be statically related to the vari- ables of interest, the static approach will be used here. 3.2. Results and discussion The training data set consists of 1000 samples of all the process variables z and θ, which implies that  ∈ ×53 1000 and Θ ∈ ×1 1000. For the online case, the static method can be developed using  and Jth s, . Three different cases will be considered: no fault, fault #2 occurred, and fault #4 occurred. The evaluation indices, FAR and FDR, are calculated using = ( > − ) − = ( > − ) ( ) θ θ T J T J FAR Number of samples Without KPI related fault total fault free samples FDR Number of samples With KPI related fault total faulty samples 27 th th 2 2 where θT2 and Jth are θTs, 2 and Jth s, for the static case, while θTd, 2 and Jth d, are for the dynamic case. For the case, where no faults have occurred, the false alarm rate should be within the specified tolerance of 5%. Fig. 1 shows the simulation results, where it can be observed that θTs, 2 provides acceptable results with a FAR of 4.92%. This suggests that the static method has been properly configured and can work with the no fault case. The second case considers the case where fault #2 occurred. This is one of the predefined faults in the TE process that has a clear physical description. Based on the available information, this fault occurs at 50 h, that is, the 1000th sample. For this fault, a step change in the composition of reactant B occurs. The change affects control loops 2 and 7 where product B is involved. In due course, the KPI will also be affected, since this input is a component of the KPI model. Fig. 2 shows the detection results of the static method for this fault. It can be seen that the fault can be detected accu- rately and as quickly as it occurs, while directly monitoring the KPI data has at least a 6-minute delay. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 90 100 110 120 130 KPI($) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 5 10 15 Samples T 2 s,θ Test statistic Threshold Fig. 1. Monitoring result for the fault-free scenario in the TE process. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 100 120 140 160 180 200 KPI($) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 100 200 300 Samples T s,θ 2 Test statistic Threshold Fig. 2. Monitoring result for fault scenario #2 in the TE process. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 90 100 110 120 130 KPI($) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 5 10 15 Samples Ts,θ 2 Test statistic Threshold Fig. 3. Monitoring result for fault scenario #4 in the TE process, where the KPI was not affected. K. Zhang et al. / ISA Transactions 68 (2017) 276–286280
  6. 6. The third case considers the case where fault #4 occurred. This fault causes a step change in the temperature of the cooling water in the reactor, which while directly affecting subprocess 2 (the reactor), does not directly affect the KPI. Fig. 3 shows the simula- tion results for this fault. This figure shows that the KPI remains relatively constant and has not been influenced by the fault. The above results show that the ability to detect the fault is, as expected, strongly correlated with the strength of the fault. The larger the impact of the fault on the KPI, the easier it is to detect it. This is quite nicely illustrated by the inability to the detect the fault in the third case. 3.3. Comparison with the PLS-based method In order to understand the behaviour of the proposed static method, its properties will be compared with the standard PLS- based method using Monte Carlo simulations. A KPI-relevant fault is simulated by causing a step change to occur in the concentration of components G and H in subprocesses 7 and 8, while a KPI- irrelevant fault is simulated by changing the concentration set point for by-product F in subprocesses 6-8. The simulation is run 100 times. The mean and standard deviations are shown in Ta- ble 1. The calculable forms of FAR and FDR in [13] are used. It can be seen that the proposed static method is better than the stan- dard PLS with higher average FDR and lower average FAR. 4. Dynamic case: application to the hot strip mill rolling process 4.1. A brief introduction to the HSMR process The hot steel mill rolling (HSMR) process is a complex process that decreases the thickness of a hot steel roll to the desired thickness. The HSMR process consists of six subsections: the re- heating furnace, the rough mill, the transfer table and crop shear, the finishing mills, the run-out table cooling unit and the coiler. The incoming strip is first reheated in the reheating furnace. Then in the rough mill section, it is roughly shaped to the desired thickness and width. After passing through the transfer table, the strip reaches the finishing mill section, where it will be accurately milled towards the preset width and thickness. Then, the run-out table cooling section allows the strip to cool to the desired tem- perature. Finally, the strip is coiled in the coiler for convenient shipping. A detailed description of this process can be found in Peng et al. [31] and Ding et al. [3]. For the HSMR process, one of the key variables is the exit strip thickness, which is primarily determined by the finishing rolling mill process (FRMP). Therefore, the focus of this example will be on analysing and understanding the faults in the FRMP. The FRMP reduces the thickness of the strip steel to the thickness required by the customer or the next process. Fig. 4 shows a schematic of the FMRP, where there are seven groups of stands. The rolling speed is set so that the last stand performs the final reduction at a temperature ranging from °850 C to 950 °C, which gives the steel its desired mechanical properties. After the milling process, the steel will be rolled into a flat bar as long as 800 m. Compared with the rough milling process, the FMRP mills the strip steel in series, which means that each strip passes through all seven stands. The hot steel is so fragile that the tension between the two stands should be precisely controlled in order to avoid stretching or tearing the strip. Between two successive stands, an electromechanical pivot called a looper is set up, such that it can get in touch with the bottom of the strip. The looper is used to monitor the tension between the stands. Then, adjust- ments can be made to ensure that the strip goes through each stand without being stretched. As shown in Fig. 5, for each group of stands, there are four rolls: two working rolls directly in contact with the strip and two back- up rolls supporting the working rolls. Before a strip arrives at the stand, the rolling force for the backup rolls is computed based on the desired thickness reduction rate. As well, the bending force that can also affect the thickness is set beforehand using an em- pirical equation. The rolling force is measured using a magnetic pressure sensor located at the bottom of the lower backup roll, while the bending force can be obtained using the oil pressure sensor located on the two sides of the working roll. Physically speaking, the deformation of the thickness is affected not only by the rolling force but also by the temperature, bending force, and other physical properties that depend on the specific steel. Thus, it is hard to obtain a precise, first principles model for a single stand. In the overall FMRP system, the stands do not work individually, but are coupled with each other by different control schemes. For example, in the last stand, the thickness is compared with the desired value and the difference can be fed back to adjust the milling force in that or previous stands. It is noted that the thickness cannot be measured between two stands, instead the gap measurements between two working rolls are available. Due to the rebounding phenomenon, the thickness is approximately Table 1 Comparisons of the static method and PLS. Performance type PLS [5] static method FDR ±0.9965 0.0015 ±0.9985 0.0007 FAR ±0.4432 0.003 ±0.1017 0.003 … Rolling force Bending force Gap Rolling force Rolling force Bending force Bending force Gap Gap thickness thicknessProcess controller KPI Stand 1 Stand 2 Stand 7 Fig. 4. Block diagram of the large-scale finishing mill in the HMSR process. Backup roll Working roll Fig. 5. A diagram of the rolling mill stand. K. Zhang et al. / ISA Transactions 68 (2017) 276–286 281
  7. 7. equal to the gap less the impact of the roll's stiffness. Note that the gap between two working rolls is measured by a linear displace- ment transducer located inside the hydraulic cylinder which is above the upper backup roll. However, since the stiffness is hard to precisely calculate, it is impossible to adjust the downstream stands based on the upstream thickness. In the HSMR process, the most important KPI is the thickness at the end of the multistand system, which is measured using an X-ray thickness gauge located some distance from the last stand, which causes time delay in the feedback control system. For each of the seven stands, the control structure is explicitly shown in Fig. 4. The manipulated variables are the rolling and bending forces, while the gap is the controlled variable. The KPI, which has been defined as the exit thickness, is closely linked with all the process variables. This suggests that implementing the proposed dynamic fault diagnosis system would be appropriate. Although it is well-known that the FMRP is a batch process, where each individual batch represents a single coil varying from 1000 to 2000 m in length, for the purposes of this analysis, batch-to-batch variability will be ignored. Two different strip thicknesses will be considered ( ) =E KPI 2.70 mm and ( ) =E KPI 3.95 mm. The training batch consists of 3000 data points collected to form the data structure  { }∈ ∈( ) × ( ) ×U Y,i i2 3000 1 3000 with = ‥i 1, , 7, and Θ ∈ ×1 3000. Unlike the static method, the kernel representation vector d s, , needs to be developed for each control loop and the parameters si and ni fixed. Since this step is not the main focus, the process of fixing these parameters is not shown. Additional in- formation can be found in [5]. The results obtained are =s 13i with = …i 1, , 3, =s 14j , with = …j 4, , 7, and =n 5k for = …k 1, , 7. Four different scenarios will be considered: Scenario 1: Fault-free case, that is, there are no faults in the system Scenario 2: The control loop deteriorates in the 4th stand from 20 to 30 s. Scenario 3: The cooling water valve is blocked between the 2nd and 3rd stands from 12 to 22 s. Scenario 4: The bending force sensor in the 5th stand mal- functions from 10 to 20 s, but the KPI is not affected. Since, in Scenarios 1 and 4, the KPI has not been affected, the FAR index is used to evaluate the performance. For Scenario 1, it is calculated by counting the alarms that satisfy θT Jd th d, 2 , during the whole process (0–30 s, 3000 samples), while for Scenario 4, the time when the KPI-unrelated fault occurs from 10 to 20 s within the 1000 samples is considered. In Scenario 2 and 3, faults that can affect the KPI occur, thus, FDR is used. The alarms raised during the period from 20 to 30 s for Scenario 2 and from 12 to 22 s for Scenario 3 are considered. Fig. 6(a) and (b) show the comparison between normal op- eration and the situation when Scenario 3 occurs. It can be ob- served that after the water valved was blocked, the water sprayed onto the steel decreases. This change affects the thickness as shown in Fig. 6(b), and will be propagated to the KPI when the steel strip arrives at the exit of the FMP. Thus, such types of faults should be detected as soon as they occur in order to keep a stable KPI. 4.2. Comparison of residual signal properties In order to understand the behaviour of the residual signals for the proposed approach compared with the original, full model method [3], representative residual signals for both cases will be Fig. 6. Comparison of the strip steel after fault scenario 3 occurred. −0.3 −0.2 −0.1 0 0.1 0.2 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 residual signal for the holistic process Probability Fig. 7. Normal distribution plot of residual signals for the complete process. −4 −2 0 2 4 6 0.001 0.10 0.25 0.50 0.75 0.999 residual signal of subprocess 1 Probability −4 −2 0 2 4 6 0.001 0.25 0.50 0.75 0.999 Probability residual signal of subprocess 2 Fig. 8. Normal distribution plot of residual signals for subprocesses 1 and 2. K. Zhang et al. / ISA Transactions 68 (2017) 276–286282
  8. 8. examined and compared to how well they resemble the normal distribution. Fig. 7 shows a normal probability plot of the original residual signal. A normally distributed signal will lie along the straight line with minimal deviations from it [6]. Fig. 7 clearly shows this residual signal has huge deviations at the left and right ends (tails) from the straight line. This deviation strongly suggests that the signal is not normal. Therefore, using this signal, as is traditionally done, to monitor the process will lead to many false alarms, which is not desirable. On the other hand, Fig. 8 shows two representative residual signals for the proposed approach. Here the data points all lie much closer to the straight line without huge deviations from it. Thus, this suggests that the results are normally distributed. This can be compared with the original input and output signals, which are shown in Fig. 9. For both the original inputs and outputs the signals are highly nonnormal and have a highly complex dis- tribution. This nonnormality will lead to excess false alarms and hence limit the usefulness of the resulting fault detection system. Therefore, the proposed new approach avoids the problems as- sociated with a nonnormal signal. 4.3. Comparison of experimental results Fig. 10 shows the results for Scenario 1. It can be seen that there are very few false alarms in θTd, 2 , even at the starting stage when the process is very unstable. For the strip with =KPI 3.9 mm, the result is similar with the FAR equal to 4.10% . Fig. 11 gives the re- sults for Scenario 2. It can be seen that the KPI has been affected, and detection is achieved before KPI is even affected. As described above, this fault occurred in the 4th stand. Thus, the 4th sub- process was instantly influenced and this was then transmitted to the downstream subprocesses until it finally reached the KPI. As shown in Fig. 11, θTd, 2 has performed very well in this case. By applying the popular relative contribution plot based method to the residuals of the 7 subprocesses [28], mean contribution values of the first 50 samples are  = =0.012, 0.008,s s1 2     = = = = =0.045, 0.822, 0.100, 0.003, 0.001s s s s s3 4 5 6 7 . It is evident that the 4th stand is the origin of the fault, which is consistent with the definition of the fault. Ding et al. [3] combined ( )u i and ( )y i with = [ ‥ ]i 1, , 7 into z, and based on z and θ devel- oped a data-driven DO-based PM-FD method for KPI. Using this 1.36 1.38 1.4 1.42 1.44 1.46 1.48 1.5 x 10 4 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Milling force 1.792 1.794 1.796 1.798 1.8 1.802 1.804 x 10 4 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Data Probability Gap 2200 2400 2600 2800 3000 0.001 0.003 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.997 0.999 Probability Bending force Fig. 9. Normal distribution plot of the rolling force. K. Zhang et al. / ISA Transactions 68 (2017) 276–286 283
  9. 9. method for this scenario gives the results shown in Fig. 12. It can be seen that the DO-based method performs poorly and gives many false alarms, in particular at the starting period. Such a large amount of false alarms make it inappropriate in practice. Fig. 13 shows the results for Scenario 3, which contains a cooling water fault with a strip whose KPI ¼ 3.95 mm. This is also a KPI- relevant fault as can be seen in the change at 16 s. Similar to Fig. 11, the fault was also detected before it had influenced the KPI. The fault occurred between the 2nd and 3rd stands and directly affected the 3rd subprocess at about 13 s. θTd, 2 succeeded in detecting it at 12.5 s, which shows an effective performance. After the fault has been manually eliminated at around 22.5 s the detection showed some disturbance, which is caused by the switch of para- meters in some controllers. Like in the last case, the relative con- tribution values are obtained:   = = =0.017, 0.032, 0.454,s s s1 2 3    = = = =0.306, 0.114, 0.073, 0.003s s s s4 5 6 7 , which shows that the subprocess 3 is responsible for the fault. Similarly to Fig. 12, the DO-based method shows unacceptable false alarms for this scenario as shown in Fig. 14. Fig. 15 shows the results for Scenario 4, which also considers a strip whose KPI ¼ 3.95 mm. However, in this case, the KPI is not influenced by this fault. Fig. 15 showed consistent results with normal KPI measurements. θTd, 2 gave a few false alarm (the rate is 5.4%), which is acceptable compared to the significance level α = 0.05. In comparison, the DO-based performance is not as good as the new proposed one. It gives many more alarms due to the nonnormal distribution of the residual signal as shown in Fig. 7. Thus, it can be seen that the proposed dynamic system can ef- fectively identify faults before they are even noticed in the KPI. This allows for efficient and prompt resolution of any potential problems. Fig. 16. As well, the dynamic PLS (DPLS)-based [29] method developed for addressing steady-state dynamic processes has also been applied to Fig. 10. Monitoring result for Scenario 1 in the finishing mill process. Fig. 11. Monitoring result for Scenario 2 in the finishing mill process. Fig. 12. Monitoring result for Scenario 2 using the DDO-based method. Fig. 13. Monitoring result for Scenario 3 in the finishing mill process. Fig. 14. Monitoring result for Scenario 3 using the DDO-based method. K. Zhang et al. / ISA Transactions 68 (2017) 276–286284
  10. 10. solve this problem. These three methods were compared in detail in terms of the FDR and FAR using the above four scenarios. Table 2 shows the results. It can be seen that the proposed dynamic method behaves better than the DDO and DPLS-based methods. In Scenario 1, the fault-free case, both DDO and DPLS give false alarm rates larger than 10%. Given the large rate, it can be concluded that these two methods have failed to monitor the process. The proposed dynamic method has a FAR of 4.8%, which is acceptable. In Scenario 4, where the KPI was not affected, the proposed dynamic method reduces the number of false alarms, while in Scenarios 2, the proposed method showed the largest detection rate. It can be noted that the false alarm results from the quality of the FMP data and the fact that the start of the process is extremely unstable, which lead to the design of an unstable observer. On the other hand, both the proposed dynamic and DPLS-based methods have no such problem. 5. Conclusions This paper proposed the framework for KPI-based process monitoring that can be applied to large-scale industrial processes. The new framework consists of two related methods: a static method and a dynamic method. The static method was developed for the case that the KPI variables are statically related to the process variables. This approach is based on the least-squares re- gression, which was shown to be more efficient than the PLS- based method. The dynamic method was developed for the cases that the KPI variables are dynamically related to the process variables. The design of this approach started from the derivation of the kernel representation of each subprocess, and, then used it as the transformation vector to obtain the instrument variable for each subprocess. Finally, the static method was used on the transformed dynamic results to develop a model between the instrument variables and the KPI. The proposed static approach was applied to the TE benchmark process to develop the process monitoring model for the operating cost. It was shown that the approach can perform well in terms of FDR for the KPI-relevant faults, and in terms of FAR for the KPI- irrelevant faults. The dynamic approach was applied to a HSPR process to monitor the KPI of the exit thickness in the FMRP. It was seen that the method presents a large FDR for faults affecting the thickness and very low FAR for faults that do not influence the thickness. Both methods were compared with existing methods. The results showed that the proposed methods have better sta- tistical properties than the currently available methods. Future work seeks to consider cases with infrequently mea- sured KPIs or infrequent output measurements in large-scale processes to achieve optimal PM-FD performance. Acknowledgements Prof. Yuri A.W. Shardt would like to thank the Alexander von Humboldt foundation for providing the funding for his stay in Germany. Profs. Xu Yang and Yuri A.W. Shardt would like to thank the National Natural Science Foundation of China (Grant #61673053) and the Beijing Natural Science Foundation (Grant #4162041) for funding. Profs. Kaixiang Peng and Xu Yang would like to thank the National Natural Science Foundation of China (Grant #51205018, #61473033), the Research Project of the State Key Laboratory of Mechanical System and Vibration (Grant #MSV201409), and the Beijing Natural Science Foundation (Grant #4142035) for funding. References [1] He PQ, Wang J. Large-Scale semiconductor process fault detection using a fast pattern recognition-based method. IEEE Trans Semicond Manuf 2010;23 (2):194–200. [2] Hao HY, Zhang K, Ding SX, Chen Z, Lei Y. A data-driven multiplicative fault diagnosis approach for automation processes. ISA Trans 2014;53(5):1436–45. [3] Ding SX, Yin S, Peng KX, Hao HY, Shen B. A novel scheme for key performance indicator prediction and diagnosis with application to an industrial hot strip mill. IEEE Trans Ind Inf 2013;9(4):2239–47. [4] Hajiyev C. Tracy-Widom distribution based fault detection approach: appli- cation to aircraft sensor/actuator fault detection. ISA Trans 2012;51:189–97. [5] Ding SX. Data-driven design of fault diagnosis and fault-tolerant control sys- tems. London: Springer-Verlag; 2014. [6] Shardt YAW. Statistics for chemical and process engineers: a modern ap- proach. Cham, Switzerland: Springer-Verlag; 2015. [7] MacGregor JF, Jaeckle C, Kiparissides C, Koutoudi M. Process monitoring and diagnosis by multiblock PLS methods. AIChE J 1994;40(5):826–38. [8] Hu ZK, Chen ZW, Gui WH, Jiang B. Adaptive PCA based fault diagnosis scheme in imperial smelting process. ISA Trans 2014;53(5):1446–55. [9] Gao Z, Cecati C, Ding SX. A survey of fault diagnosis and fault-tolerant tech- niques part II: fault diagnosis with knowledge-based and hybrid/active Fig. 15. Monitoring result for Scenario 4 in the finishing mill process. Fig. 16. Monitoring result for Scenario 4 using the DDO-based method. Table 2 Comparisons of the dynamic method, DDO and DPLS. Scenario Performance metric DPLS DDO Dynamic method 1 FAR 0.222 0.108 0.048 2 FDR 0.692 0.885 0.995 3 FDR 0.401 0.862 0.761 4 FAR 0.672 0.635 0.054 K. Zhang et al. / ISA Transactions 68 (2017) 276–286 285
  11. 11. approaches. IEEE Trans Ind Electron 2015;62(6):3768–74. [10] Ralston P, DePuy G, Graham JH. Graphical enhancement to support PCA-based process monitoring and fault diagnosis. ISA Trans 2004;43(4):639–53. [11] He QP, Wang J. Statistics pattern analysis: A new process monitoring frame- work and its application to semiconductor batch processes. AIChE J;57(1): p. 107–121. [12] Phatak A, de Jong S. The Geometry of partial least squares. J Chemom 1997;11:311–38. [13] Zhang K, Hao H, Chen Z, Ding SX, Peng K. A comparison and evaluation of key performance indicator-based multivariate statistics process monitoring ap- proaches. J Process Control 2015;33(11):112–26. [14] Wold S, Ruhe A, Wold H, Dunn WJ. The collinearity problem in linear re- gression, the partial least squares (PLS) approach to generalized inverses. SIAM J Sci Comput 1984;5(3):735–43. [15] Höskuldsson A. PLS regression methods. J Chemom 1988;2(5):211–28. [16] Wang Z, He QP, Wang J. Comparison of variable selection methods for PLS- based soft sensor modeling. J Process Control 2015;26:56–72. [17] Zhou DH, Li G, Qin SJ. Total projection to latent structures for process mon- itoring. AIChE J 2010;56(1):168–78. [18] Qin SJ, Zheng YY. Quality-relevant and process-relevant fault monitoring with concurrent projection to latent structures. AIChE J 2013;59(2):496–504. [19] Helland K, Berntsen HE, Borgen OS. Recursive algorithm for partial least squares regression. Chemom Intell Lab Syst 1992;14(1–3):129–37. [20] Zhao CH, Sun YX. Multispace total projection to latent structures and its ap- plication to online process monitoring. IEEE Trans Control Syst Technol 2014;22(3):868–83. [21] Rosipal R, Trejo LJ. Kernel partial least squares regression in reproducing kernel Hillbert space. J Mach Learn Res 2001;2:97–123. [22] Baffi G, Martin EB, Morris AJ. Non-linear projection to latent structures re- visited (the neural network PLS algorithm). Comput Chem Eng 1999;23:1293– 307. [23] Galicia HJ, He QP, Wang J. A reduced order soft sensor approach and its ap- plication to a continuous digester. J Process Control 2011;21:489–500. [24] Wang J, Qin SJ. A new subspace identification approach based on principal component analysis. J Process Control 2002;12(8):841–55. [25] Xie L, Zeng J, Kruger U, Wang X, Geluk J. Fault detection in dynamic systems using the Kullback-Leibler divergence. Control Eng Pract 2015;43:39–48. [26] Shardt YAW, Hao HY, Ding SX. A new soft-sensor-based process monitoring scheme incorporating infrequent KPI measurements. IEEE Trans Ind Electron 2015;62(6):3843–51. [27] Downs J, Fogel E. A plant-wide industrial process control problem. Comput Chem Eng 1993;17(3):245–55. [28] Alcala CF, Qin SJ. Analysis and generalization of fault diagnosis methods for process monitoring. J Process Control 2011;21(3):322–30. [29] Li G, Qin SJ, Liu BS, Zhou DH. Quality relevant data-driven modelling and monitoring of multivariate dynamic processes: the dynamic T-PLS approach. IEEE Trans Neural Netw 2011;22(12):2262–71. [30] Wold S. Cross validatory estimation of the number of components in factor and principal component analysis. Technometrics 1978;20:397–406. [31] Peng KX, Zhang K, Li G, Zhou DH. Contribution rate plot for nonlinear quality- related fault diagnosis with application to the hot strip mill process. Control Eng Pract 2013;21(40):360–9. K. Zhang et al. / ISA Transactions 68 (2017) 276–286286

×