ISA Transactions: Shirong Zhang, Qian Tang, Yu Lin, Yuling Tang

ISA Transactions ()
Contents lists available at ScienceDirect
ISA Transactions
journal homepage: www.elsevier.com/locate/isatrans
Research article
Fault detection of feed water treatment process using PCA-WD

with parameter optimization
Shirong Zhang a, Qian Tang a, Yu Lin a, Yuling Tang b,n
a
Department of Automation, College of Power and Mechanical Engineering, Wuhan University, Wuhan 430072, China
b
College of Computer Science, South-Central University for Nationalities, Wuhan, Hubei 430074, China
art ic l e i nf o a b s t r a c t
Article history: Feed water treatment process (FWTP) is an essential part of utility boilers; and fault detection is expected
Received 10 December 2015 for its reliability improvement. Classical principal component analysis (PCA) has been applied to FWTPs
Received in revised form in our previous work; however, the noises of T2 and SPE statistics result in false detections and missed
15 January 2017
detections. In this paper, Wavelet denoise (WD) is combined with PCA to form a new algorithm, (PCA-
Accepted 22 March 2017
WD), where WD is intentionally employed to deal with the noises. The parameter selection of PCA-WD is
further formulated as an optimization problem; and PSO is employed for optimization solution. A FWTP,
Keywords: sustaining two 1000 MW generation units in a coal-red power plant, is taken as a study case. Its op-
Feed water treatment process eration data is collected for following verication study. The results show that the optimized WD is
Fault detection
effective to restrain the noises of T2 and SPE statistics, so as to improve the performance of PCA-WD
PCA
algorithm. And, the parameter optimization enables PCA-WD to get its optimal parameters in an auto-
Wavelet denoise
Parameter optimization matic way rather than on individual experience. The optimized PCA-WD is further compared with
classical PCA and sliding window PCA (SWPCA), in terms of four cases as bias fault, drift fault, broken line
fault and normal condition, respectively. The advantages of the optimized PCA-WD, against classical PCA
and SWPCA, is nally convinced with the results.
& 2017 ISA. Published by Elsevier Ltd. All rights reserved.
1. Introduction silicon and natrium, etc. These sensors are the measuring parts of
the process control loops and supervisory systems. Relatively
Presently, supercritical units and ultra-supercritical units are speaking, sensors are the weak spots of process control systems
widely employed in China; and have gradually become the main comparing with actuators, controllers and communication links
parts of Chinese electricity supply. A power generation unit is a [2]. They may face certain faults such as drift, bias, strong noise
typical continuous production process and consists of hundreds of and broken line, which hinder the safe and stable operation of
sub-processes and devices. Faults from all the components tend to industrial processes [3]. Hence, an effective fault detection algo-
affect the operation safety of the whole units, even, result in ac- rithm is much needed for FWTPs.
cidents or unit shutdowns, which inevitably leads to large - Actually, the demands for operation safety of process industries
nancial loss or casualties [1]. Feed water treatment process (FWTP) have spurred the recent development of many fault detection
is a vital sub-process of a coal-red utility boiler. It shoulders the methodologies [47]. Most of them are established upon the
supply of qualied feed water to the steam and water circuit. An process sensors. The computer control systems, such as distributed
control systems (DCSs) and programmable controllers (PCs), have
ion exchange based feed water treatment process typically con-
the ability to store massive operation data of the processes. It
sists of cation beds, anion beds, mixed beds and other components
makes data-driven fault detection possible and practical. Multi-
such as pumps, fans and pipes, etc. Process faults may make the
variate statistical analysis is a typical data-driven methodology,
quality of feed water below its standard. That further results in
which has been intensively studied and applied to fault detection
heavy salication along the heating surface of the utility boilers,
in literature [813]. Principal component analysis (PCA), in-
consequently, endanger the operation safety of the boilers. The
dependent component analysis (ICA) and partial least square (PLS)
FWTPs are equipped with process sensors, such as pressure, ow
have been widely applied to chemical industries for fault detection
rate, and analysis meters, such as electric conductivity, oxygen, [1417]. In essence, they are all multivariate statistical analysis
based methods. Among these methods, PCA is the most popular
n
Corresponding author. one and have been successfully applied to industrial proc-
E-mail address: tylzsr@163.com (Y. Tang). esses owning to its simplicity, like in [1820]. PCA represents the
http://dx.doi.org/10.1016/j.isatra.2017.03.019
0019-0578/& 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Please cite this article as: Zhang S, et al. Fault detection of feed water treatment process using PCA-WD with parameter optimization.
ISA Transactions (2017), http://dx.doi.org/10.1016/j.isatra.2017.03.019i
2 S. Zhang et al. / ISA Transactions ()
high-dimensional process data in a reduced dimension; then, the the constraints of the optimization problem are complex, non-
desired information can be achieved by reducing the weak cor- contiguous and have strong nonlinearity. It makes the conven-
relations between the variables [2124]. PCA brings convenience tional optimization techniques, such as linear programming (LP)
for fault detection of industrial processes. Two statistical hypoth- and dynamic programming (DP), not applicable. Computational
esis tests, Hotelling T2 statistic in principal component space (PCS) intelligence-based techniques, such as genetic algorithm (GA) and
and SPE statistic in residual subspaces (RS), are generally con- particle swarm optimization (PSO), can be alternative to our
ducted in PCA. Some extensions of PCA are also proposed in the parameter optimization problem. In literature, PSO has been
literature with the purpose to improve certain performance of widely used in many elds such as mechanical, chemical, civil, and
PCA. In [25], an online fault detection framework, incorporating aerospace design, because of its advantages such as comparative
multi-scale principal component analysis, is developed. An algo- simplicity, rapid convergence and little parameters to be adjusted.
rithm using multisubspace principal component analysis with the PSO is known to effectively solve large-scale nonlinear optimiza-
local outlier factor technique for process monitoring is further tion problems [37]; hence, it is a suitable candidate for our pro-
proposed in [26]. blem at hand. PSO is in fact an evolutionary computation techni-
As for our case at hand, when the classical PCA or the extended que proposed by Kennedy and Eberhart [38]. Classical PSO deals
PCA are applied to the FWTP, excessive false detections and missed with real-valued variables; however, it is realized that many op-
detections appear. It makes the classical PCA and its extended timization problems in practice are featured by discrete variables,
versions not applicable for fault detection of FWTPs. The false where classical PSO cannot work. Then, Kennedy and Eberhart
detections and missed detections are resulted by the uctuations extended classical PSO to a discrete binary version, named BPSO,
of T2 and SPE statistics. Analytically, PCA based fault detection is where a sigmoid function is used with a random probability for
strictly valid when the following assumptions are satised [27]: generating binary-valued position (0 or 1) for a particle from its
(1) The process is operating at pseudo-steady state. (2) The process real-valued velocity component [39]. Laskari et al. [40] proposed a
data used to build the PCA model contain normal operating data discrete PSO, where a real value is truncated to its nearest integer
only. (3) The process should be properly excited. However, the value. It is then employed by Liao and Tseng [41] to deal with a
eld applications can hardly satisfy all these conditions. It is where
owshop scheduling problem. Moreover, a universal PSO is pro-
the uctuations of the test statistics come from; thus, the temporal
posed by Datta and Figueira [42]; it has the ability to work directly
violations of the limits lead to false alarms. Naturally, a denoising
with real, integer and discrete variables without extra conversions.
methodology is expected to be combined with PCA with a plain
The parameter optimization problem of WD consists of several
purpose to deal with the uctuations of the two statistics. In [28],
kinds of variables; thus, the extended PSO is suitable and it will be
an exponentially weighted moving average (EWMA) ltering
employed to get the optimal solutions.
method is applied to the sensor validity index (SVI) and SPE; and
From the application angle, this paper mainly focuses on the
an application research in terms of a boiler process shows that the
fault detection of a feed water treatment process in coal-red
EWMA ltering method can indeed reduce the false alarms and
power plants to improve its reliability. We start with an outline of
oscillations of the indicators. In [29], the above EWMA ltering
the object process. Massive operation data of the process is then
method is further integrated into a self-validating soft sensor.
collected from a supervisory information system (SIS), which
Again, EWMA scheme is used to lter the monitoring indices of
communicates with the control system of FWTP and acquires the
KICA-PCA to improve monitoring performance [31]. In fact, de-
long term operation data. Next, the fault detection of FWTPs with
noising is a relatively broad topic in engineering elds. In this
classical PCA will be introduced, where the control limits of T2 and
paper, we employ wavelet transform (WT) technique for simplicity 2
and practicability purposes. Wavelet transform is a well-known SPE statistics, notated by Tlim and SPElim are obtained, respectively.
And then, WD will be combined with PCA to form the PCA-WD
multi-resolution analysis because of its ability to obtain good time
fault detection algorithm. The WD parameters need to be gured
and frequency resolution, simultaneously, through stretching and
out prior to the online operation of PCA-WD. The parameter se-
translation of the wavelet. WT has been successfully used in
many elds, such as pattern recognition and fault diagnosis lection is formulated as an optimization problem, where PSO is
[32,33]. Donoho et al. [34] rstly proposed a method to remove used to nd which combination of parameters gives the best
white noise using wavelets, which is known as wavelet threshold performance. A FWTP in a coal-red power plant, equipped with
denoising. Within WT analysis, the signal is rstly decomposed two 1000 MW generation units, is taken as a study case. The real
through discrete wavelet transform (DWT), so that the wavelet operation data of the FWTP is collected to verify the PCA-WD al-
coefcients can be obtained [35]. It is proven that the wavelet gorithm. We will present the results to show the effectiveness of
coefcients resulted from noise are smaller than the coefcients of WD in dealing with the noises of T2 and SPE statistics, and the
major signal. With a predened threshold, the coefcients below capability of parameter optimization to determine the optimal
the level are intentionally eliminated. Now, a pure signal without parameters of PCA-WD in an automatic way instead of on in-
noise can be achieved through a reconstruction with the denoised dividual experience. Finally, the advantages of the optimized PCA-
wavelet coefcients. In [36], a mechanical fault diagnosis method, WD, against classical PCA and SWPCA, will be proven with four
integrating wavelet transform with support vector machine, is study cases as bias fault, drift fault, broken line fault and normal
presented, where WT is used to extract the noise from the T2 and condition.
SPE statistics of PCA such that the impact caused by noise can be The remainder of this paper is organized as follows. Section 2
effectively restrained. In this paper, wavelet denoising (WD) will outlines the feed water treatment process, which will be used as
be combined with PCA to form a new fault detection algorithm, study case in the following investigations. In Section 3, the PCA
PCA-WD. Then, PCA-WD will be applied to a FWTP for verication. based fault detection algorithm will be reviewed. Section 4 com-
In fact, the selection of the specic parameters has considerable bines WD with PCA to form PCA-WD algorithm. And the para-
inuence on the performance of a WD. One way is to make a meter selection of PCA-WD is to be discussed, and nally for-
decision with the priori knowledge of the engineers; however, it mulated as an optimization problem. In Section 5, the proposed
relies too much on individual experience and cannot obtain an PCA-WD is applied the FWTP for verication study, where the
optimal setting. This paper formulates the parameter selection of effectiveness of WD and the advantages of the optimized PCA-WD
WD as an optimization problem; and the solution to this problem will be proven with convincing results. The conclusions are drawn
is an optimal parameter conguration. The objective function and in Section 7.
S. Zhang et al. / ISA Transactions () 3
2. Feed water treatment process According to the operating procedure, the FWTP is scheduled
as follows. When the water in the desalted water tanks is sufcient
FWTP is a vital sub-process of coal-red utility boiler. It aims to to sustain the boilers, the FWTP is switched off and on stand by.
supply qualied desalted water to the vapor circulating system of On the other hand, if the water levels of the tanks are below a
the boiler. The quality of the feed water is the primary concern of predened threshold, the FWTP will be switched on to produce
FWTPs. Unqualied feed water may cause salication along the qualied feed water. Further, the working status of the two op-
internal surface of critical devices, such as main steam pipes, re- eration routes is intentionally scheduled by the operators in order
heat steam pipes, turbine blades, etc.; gradually, it may lead to to even the total working time of the two routes. The operating
major safety hazards and bring economic loss for power plants. Ion procedure makes the FWTP working intermittently.
exchange is the most popular technology employed for FWTPs in A supervisory information system (SIS) is equipped with the
Chinese coal-red power plants. Lots of sensors, actuators are coal-red power plant, which gathers and stores long term op-
equipped with FWTPs for supervision and regulation purposes. eration data of the whole plant through certain interfaces between
There are more than 60 measuring points equipped with a FWTP. DCSs, programmable logic controllers (PLCs) and other controllers.
Field experiences show that the sensor faults are the common
reasons for unqualied water supply. Hence, an effective sensor Table 1
Valves and sensors selected fault detection.
fault detection method is much needed for FWTPs. A FWTP in a
coal-red power plant in Guangdong province of China is taken as ID Description Unit ID Description Unit
our study case. The ow chart of the FWTP is shown in Fig. 1. This
power plant is equipped with two 1000 MW generation units; W1 Inlet valve status of 1# S4 Outlet pressure of 1# MPa
cation bed cation exchanger
hence, the FWTP is designed to sustain the two units. Raw water is W2 Outlet valve status of S5 Inlet ow rate of 1# anion m3/h
successively treated by cation beds, anion beds, mixed beds; then, 1# cation bed bed
the treated water is stored in desalted water tanks and nally W3 Inlet valve status of 1# S6 Inlet pressure of 1# anion MPa
pumped into the two utility boilers. The FWTP in Fig. 1 is cong- anion bed exchanger
W4 Outlet valve status of S7 Outlet pressure of 1# an- MPa
ured into two operation routes, side A and side B, to assure the 1# anion bed ion exchanger
reliability of the process. Side A consists of 1# cation bed, 1# anion W5 Inlet valve status of 1# S8 Electric conductivity of 1# us/cm
bed, and 1# mixed bed; and side B is formed by 2# cation bed, 2# mixed bed anion
anion bed, and 2# mixed bed. Generally, one route can satisfy the W6 Outlet valve status of S9 Inlet ow rate of 1# mixed m3/h
1# mixed bed bed
routine requirements of the two utility boilers; and the other route S1 Main pipe pressure of MPa S10 Inlet pressure of 1# mixed MPa
is on stand by. Hence, it is reasonable to take only one route for raw water ion exchanger
further study; specically, we take side A as the following study S2 Inlet ow rate of 1# m3/h S11 Outlet pressure of 1# MPa
case. For the consideration of data availability and further eld cation bed mixed ion exchanger
S3 Inlet pressure of 1# MPa S12 Electric conductivity of 1# us/cm
application, only part of the valves and sensors of side A, as listed
cation exchanger mixed bed
in Table 1, are selected for the following fault detection research.
to anion beds
raw water from secondary RO tank S1

acid from ejectors from water pump
from cation beds
S5 2# anion bed
S2 1# cation bed 2# cation bed 1# anion bed
W1 W3 S7
W2 S6 W4
S3
S4
S8
to mixed beds
to desalted water tanks
1# mixed bed 2# mixed bed

S9
1# desalted water 2# desalted water
tank tank
W5
S10
W6 S11
to utility boilers
S12 to laboratory
1# desalted 2# desalted 3# desalted 4# desalted

water pump water pump water pump water pump
1# in-house 2# in-house
to mixed, anion and cation beds water pump water pump
to acid, alkali storage system and

regenerative system
Fig. 1. Flow chart of the feed water treatment process.
It makes our data-driven fault detection research applicable and Mudholkar [10], that is,
convenient. The historical operation data of the FWTP is collected

from the SIS through a programm interface provided by the SIS C 22h02 h , (h 1)
SPElim = 1 + 1 + 2 0 20 ,
vendor with a sampling rate of 5 s. They are further used for the 1 1
(9)
following fault detection research.
where
m
3. PCA based fault detection i = ji (i = 1, 2, 3),
j=k+1 (10)
PCA is a multivariate statistical technique which has been
widely used in process fault detection. Let x Rm denote a sample and
vector containing m sensors. Assuming that there are n samples of 213
h0 = 1 .
these sensors with a constant sampling rate. Then a matrix 322 (11)
X Rn m is acquired; where each row represents a sample vector.
The matrix X is then standardized as follows to eliminate the ef- In Eq. (9), C is the upper fractile value of the standard normal
fect from different scales of the sensors. distribution with a signicance level of ; and j ( j = 1, , m) is
the jth largest eigenvalue of the covariance matrix S. In this paper,
X = D1[X IE(X)], (1)
both T2 and SPE statistics are taken into account for fault detection
where E (X) = [ 1, 2 , , m ] R1 m is the mean vector of X , and of FWTPs. Fault alarm is triggered when one of the two statistics
I = [1, 1, , 1]T Rn 1. In Eq. (1), D = diag{1, 2, , m}, where exceeds their corresponding control limits.
The procedure of classical PCA based fault detection is as
i = E (x i i )2 is the ith standard variance of X. For the stan-
follows.
T
dardized data matrix X , its correlation matrix S = X X/(n 1) is
calculated and singularly decomposed. Then, X is projected to the (1) Get the operation data of the FWTP under normal condition,
principal component space (PCS) and residual space (RS), namely, and normalize the data according to Eq. (1). The data is then
used to form a training data set for PCA models.
+ E = TP T + E,
X=X (2)
(2) Build a PCA model with the training data set, and calculate its
2
where X represents the projection of X in PCS and E is the residual Tlim and SPElim according to Eqs. (7) and (9), respectively.
matrix in RS. In Eq. (2), T Rn k is the score matrix and P Rm k (3) Collect a new sample from the FWTP under a similar condition
is the loading matrix, where k denotes the number of the principal as that in step (1), and calculate the real-time values of T2 and
components (PCs). Further, k is determined using cumulative SPE statistics.
percent variance (CPV) (4) If the real-time values of T2 and SPE statistics exceed their
control limits, the sample is regarded as abnormal and a fault
k
i = 1 i alarm is then triggered; otherwise, it is considered to be in
CPV = m l, normal condition.
i = 1 i (3)
(5) Repeat from step (3).
where i presents the ith largest eigenvalue of the covariance
matrix S. The threshold l is usually set between 0.85 and 0.99. We applied the above procedure to a FWTP of a power plant in our
For a new sample vector, x Rm , it is respectively projected previous work; unexpectedly, excessive false detections and mis-
into PCS and RS. Its projection in PCS, x^ , is as follows sed detections appear. It makes the classical PCA and the extended
PCA not applicable for fault detection of FWTPs. Analytically
^ = PP T x = C x,
x (4) speaking, the phenomena are mainly caused by the noises of T2
where C is the projection matrix to PCS. The projection in RS, e, is and SPE statistics. Naturally, a denoising technique is expected to
dened as follows solve the problem. In the following section, WD technique is in-
tentionally combined with PCA to deal with the noise problems.

e = (I PP T )x = C x, (5)

where C is the projection matrix to RS.
Generally, the PCA based process fault detection is conducted 4. PCA-WD based fault detection
through two indices as Hotelling T2 and SPE statistics. The T2
statistic is dened as 4.1. Wavelet denoising
T2 = x ^ T = t1t T ,
^ P1P T x (6) Wavelet transform is a powerful signal-processing method. It
transforms time-domain signals into timefrequency domain
where = diag (1, , k ) represents the k largest eigenvalues of while obtaining high resolution time and frequency information of
covariance matrix S, and t represents the score vector of x ^ . The
the signals simultaneously. The mathematical denition of con-
2
control limit of T2 statistic, i.e. Tlim , is calculated as follows tinuous wavelet transform (CWT) is described as
k(n2 1) 1
2
Tlim =
n(n k )
F(k, n k ),
(7)
CWTf (a, ) =
| a|
R f (t ) * t a dt,

(12)
where F (k , n k ) is the critical point of F-distribution; and is the where a is the scale factor which may be regarded as the inverse of
condence. k and n k in Eq. (7) are the degree of freedom. The frequency, is the translation factor, and (x ) is the base function.
SPE statistic is calculated as follows In practice, CWT is not widely applied due to its enormous com-
putation caused by the fact that all the scales are used during the
SPE = x 2 = C x 2 . (8)
computation progress. Compared with CWT, DWT requires less
The control limit of the SPE statistic was developed by Jackson and computation time so that it will not degrade the signal-processing
Fig. 2. Three level decomposition.
performance; hence, DWT is widely used in many elds. Speci-

cally, Mallat proposed a fast algorithm [35], which makes use of
the fact that the analysis will be very efcient if scales and posi-
tions are chosen based on power of two (dyadic scales factor and
translation factor). Mallat fast algorithm has the ability to obtain
the same accuracy as the other DWTs, while consuming much less
computation. It will be employed in the following study.
Let (t ) be an original signal, a three level decomposition of (t )
using the fast algorithm is specially shown in Fig. 2 to illustrate its
process, where H0, H1 are the low-pass and high-pass lters, re-
spectively. 2 is dened as a down-sample process. Within the
three level decomposition, (t ) is expressed as
(t ) = d1k + d2k + d3k + a3k . (13)
Now, the signal (t ) is decomposed into a set of detail coefcients

Fig. 3. Flowchart of PCA-WD fault detection.
d1k, d2k, d3k and approximation coefcient a3k.
In 1990, Donoho proposed a method to remove white noise
using wavelets [34]; that is, wavelet denoising (WD). WD de- detectability of PCA. Our approach is a combination of PCA and
composes the signal through discrete wavelet transform to obtain WD; in fact, it uses WD to lter the statistics of PCA as well.
the wavelet coefcients, which are then processed with a pre- Moreover, the WD in our framework is properly designed to make
dened threshold. The coefcients below the level are eliminated; sure that its amplication factor equals 1. Thus, the control limits
while the ones above the level remain. Finally, the denoised signal 2
Tlim and SPElim, obtained in off-line modeling stage, can be used in
is extracted from the remaining coefcients without much loss in on-line detection stage.
original signal characteristics. Now, another problem arises. The application of WD algorithm
involves serval parameters. The parameter conguration has great
4.2. PCA-WD effect on WD performance; even makes a WD algorithm un-
applicable under certain conditions. The parameter selection has
Now, WD will be combined with PCA to form an innovative become a barrier for eld application of WD. Our case combines
PCA-WD method for fault detection, where WD is intentionally WD with PCA, where WD is used to denoise the real-time T2 and
employed to deal with the noises of T2 and SPE statistics. The SPE statistics. It makes the parameter selection of the compound
owchart of PCA-WD for process fault detection is as shown in PCA-WD more complicated than traditional WD applications. The
Fig. 3. PCA-WD fault detection is divided into two stages: off-line common experience based techniques have no chance to deal with
2
modeling stage and on-line detection stage. The calculation of Tlim our problem properly. We come out with an idea to formulate the
and SPElim are carried out at off-line modeling stage. On-line de- parameter selection of PCA-WD as an optimization problem and
tection stage includes the calculation of real-time T2 and SPE sta- get the optimal parameters through optimization solution. For the
tistics, WD, and fault detection. Specically, WD is employed to purpose of the optimization problem formulation, the parameters
denoise T2 and SPE statistics during on-line stage as shown in of PCA-WD are to be reviewed in advance.
Fig. 3.
In off-line modeling stage, a training data set X Rn m , col- 4.3. Parameters of PCA-WD
lected under certain normal operation condition, is used to de-
2
velop the PCA model; as such, the control limits of T2 and SPE, Tlim 4.3.1. Sliding window parameters
and SPElim, are obtained. In the on-line detection stage, for a new In our PCA-WD algorithm, WD is employed to denoise the real-
coming sample x, its T2 and SPE statistics are rstly calculated. time T2 and SPE statistics with a sliding window; the denoised T2
Then, the real-time T2 and SPE statistics slide into a window, and SPE statistics are then used for fault detection. Thus, proper
where WD is applied to denoise their noises. Finally, the denoised sliding window length and moving step, notated by len and step,
2
T2 and SPE statistics are compared with Tlim and SPElim, respec- respectively, have to be determined prior to the application of
tively; and a fault alarm is triggered if one of the two statistics PCA-WD. Due to dyadic down-sample, the length of wavelet
exceeds its control limit. Here, the ltering activity is used to coefcients is reduced by a factor of 2j, where j is the scale factor.
eliminate the noise of the statistics and does not dramatically To ensure the perfect reconstruction of original signal, len must be
change their distributions. Even, [29] shows that the ltered re- chosen as a power of 2. step determines how many samples will be
siduals are closer to normal distribution than unltered residuals. involved in and dropped out of the sliding window in a single
Mathematically, the ltering algorithms may bring changes to the calculation. Generally, large step brings large time delay; while
control limits of the statistics. In [29], a theoretical analysis of the small one may cause discontinuity in signal.
control limits with and without ltering is given; and examples
are used to convince this ltering technique. This technique is also 4.3.2. Wavelets
accepted by other researchers. For instance, in [30], a detection Our research considers only orthogonal wavelets partially for
index is composed upon the ltered statistics to improve the simplicity reason; in fact, they are able to obtain perfect reconstruction
Table 2 Table 3
Threshold selection rules. Threshold rescaling methods.
Rules Descriptions Rescaling methods Descriptions
rigrsure Selection using Steins Unbiased Risk Estimate (SURE) one Select using the basic noise model
sqtwolog Fixed threshold sln Select using the basic noise model with unscaled noise
heursure Selection using a mixture of the rst two options mln Select using the basic noise model with non-Gaussian
minimaxi Selection using the minimax principle white noise
of the original signals. Specically, Mallat fast algorithm is used in this where s(n) is the original signal, f(n) is the pure signal without
paper due to its efcient computation, where the wavelets are re- noise, e(n) represents noise, s is the noise intensity. The denoising
quired to have orthogonality property along with a scaling function . process is to suppress the noisy part of signal s(n) so as to recover
There are several wavelets satisfying the above requirements, as Haar, the pure signal f(n) without noise. Threshold rescaling intends to
Daubechies, Symlets and Coiets. adjust s with certain method; obviously, it has inuence on the
The Daubechies family, built by Inrid Daubechies, consists of 45 denoising process. Three threshold rescaling methods as one,
wavelets, where Haar wavelet is actually the rst and simplest sln, and mln are investigated in this paper. The brief descriptions
wavelet. The Daubechies family has no explicitly mathematical of these methods are described in Table 3.
denition except Haar wavelet. The Symlet family are more sym-
metrical than Daubechies family; however it is not strictly sym- 4.3.4. Decomposition level
metrical. The Coiet family consists of 5 wavelets. For detailed Generally speaking, the decomposition level, notated by lev,
descriptions of the wavelets refer relevant literatures. should be determined in consideration of the frequency band-
In this paper, the rst 15 wavelets of Daubechies family and the width of the original signals. Signals with abundant high-fre-
rst 15 wavelets of Symlet family will be utilized. The remaining quency information need larger numbers of decomposition levels.
wavelets of the two families are rather complex; consequently, Large lev requires more computation time and brings time delay.
they require more computation time, which makes them not Meanwhile, the length of sliding window, len, bounds lev as well,
applicable for our eld application. Meanwhile, all the 5 wavelets because the dyadic down-sample halves the length of wavelet
of Coiet family are used in the paper due to their computation coefcients in a single decomposition progress. For instance, if
efciency. In the following of the paper, db1, db2, , db15 are len16 the maximum value of lev should be 4. In this paper, the
used to notate the rst 15 wavelets of Daubechies family, decomposition level is bounded within a range between 1 and 5.
sym1, sym2, , sym15 are used for the 15 wavelets of Symlet fa-
mily, and coif 1, coif 2, , coif 5 are used for Coiet family, 4.4. Optimal parameter selection of PCA-WD
respectively.
4.4.1. Formulation of the parameter selection optimization problem
4.3.3. Threshold parameters The parameters of WD, as reviewed above, have more or less
There are two common thresholding methods for WD, as soft effect on the performance of WD algorithm. The traditional way to
thresholding and hard thresholding. Let WT be the wavelet coef- determine the WD parameters is mostly based on individual ex-
cients and be the threshold; then the two thresholding meth- perience. In this paper, WD is combined with PCA to form a fault
ods can be respectively expressed as follows. detection algorithm. Thus, the parameter selection of the com-
pound PCA-WD algorithm is far more complicated than traditional
(i) Hard thresholding applications of WD. We come out with an innovative idea to for-
WT , |WT | > ; mulate the parameter selection of PCA-WD as an optimization
WT = problem. And the optimal parameter conguration is then ob-
0, |WT | . (14)
tained through the solution of the optimization problem. The
(ii) Soft thresholding optimal parameter selection gets the parameters in an automatic
WT , WT > ; way rather on individual experience and grantees the optimality of
the parameters.
WT = 0, |WT | ;
Our optimization problem does not consider WD algorithm
WT + , WT < . (15) itself only; instead, it takes the PCA-WD fault detection algorithm
as a whole to optimize its parameters. Naturally, the performance
Compared with hard thresholding, soft thresholding has better criteria from the fault detection perspective, as false alarm rate
performance because hard thresholding may cause discontinuities (FAR) and missed detection rate (MDR), should be integrated into
at while soft thresholding remains continuous by shrinking the objective function of the optimization problem. They are de-
nonzero coefcients towards zero. ned as follows.
Four threshold selection rules, rigrsure, sqtwolog, heursure (I) False alarm rate: FAR is described as Eq. (17), which re-
and minimaxi, as shown in Table 2 will be considered in this presents the percentage of the falsely alarmed samples to the total
paper. In fact, these threshold selection rules use statistical re- faultless data samples:
gression of the noisy coefcients over time to acquire a non- falsely alarmed samples
parametric estimation of the reconstructed signal. Different FAR = %
faultless data samples (17)
threshold selection rule has different impact on denoising
performance. (II) Missed detection rate: MDR is calculated as Eq. (18), re-
Threshold rescaling method also affects the denoising perfor- presenting the percentage of the missed faulty samples to the total
mance; it needs investigation as well. The general model of wa- faulty data samples:
velet denoising is as follows:
missed faulty samples
MDR = %
s(n) = f (n) + e(n), (16) faulty data samples (18)
The above two criteria evaluate the performance of fault detection int Ct2 = 0;
algorithm under faultless and faulty conditions, respectively. The
for (i = 0; i < = t2; i ++)
lower the two criteria are, the better performance the algorithm
has achieved. Moreover, signal-to-noise ratio (SNR) is a traditional {
measure of denoising algorithms from signal conditioning per-
spective. It is dened as follows:
(
if T2(i) < Tlim
2
) (
&& SPE(i) < SPElim )
Ct2 = Ct2 + 1;
(
SNR = 10 log10 powersignal /powernoise , ) (19) }
Then, MDR Xt2 = Ct 2/t2(%). In Eq. (22), is a weighting factor; it is
where
used to balance the criteria from the fault detection perspective
1 and from the signal conditioning perspective. It makes our opti-
powersignal = s(n)2, mization problem capable of satisfying different purposes by
n (20)
n
tuning the value of . Specically, can be set to a value larger
than 6 so as to guarantee lower FDR and MDR; on the other hand,
and a value smaller than 6 leads to higher SNR of original signals. This
paper pays more attention on FDR and MDR of fault detection;
1 furthermore, is intentionally set to 6.8 in the following in-
powernoise = [s(n) s(n)]2 .
n n (21) vestigation. Mathematically speaking, 6.8 is capable of elim-
inating the dimension differences between the two terms of
powersignal in Eq. (20) represents the power of the original signal, JX (Pr).
V
and powernoise in Eq. (21) is the power of the noise. s(n) denotes Finally, the parameter selection of PCA-WD is formulated as an
the original signal and s(n) is the denoised signal. Generally, higher optimization problem as follows:
SNR is expected, for it indicates less information loss through the
max JX (Pr) = e (SNR T 2 + SNR SPE)
denoising process. V
It is reasonable to formulate the objective function both from + (1 e )/(FDR Xt1 + MDR Xt 2 + 1), (23)
the fault detection perspective and signal conditioning perspec-
tive. Specically, the objective function is expressed as follows: s.t.
FAR Xt1 = Ct1/t1(%),

JX (Pr) = e (SNR T 2 + SNR SPE)
V MDR Xt 2 = Ct2/t2(%),
+ (1 e )/(FDR Xt1 + MDR Xt 2 + 1). (22) t t
2
SNR T 2 = 10 log10 T2 [T2(t ) T2(t )]2 ,
i=1 i=1
The optimization problem is to maximize JX , while satisfying the
V t t
relevant constraints. In Eq. (22), XV Rt m is a selected data set for SNR SPE = 10 log10 SPE(t )2 [SPE(t ) SPE(t )]2 ,
verication; and t is the sample number. Furthermore, XV is i=1 i=1
composed to contain a subset X t1, which has t1 faultless samples, 2
T = WDPr (T ), 2
and a subset X t2, consisting of t2 faulty samples. And, t = t1 + t2.

SPE = WDPr (SPE),
Pr = [p1, p2 , , p7] in Eq. (22) denotes the parameter vector of WD;
^ P1PT x
T2j = x ^ , ( j = 1, , t ),
in fact, it is the variable to be optimized. SNR T 2 and SNR SPE are the j j
signal-to-noise ratios of T2 and SPE statistics, respectively; which 2
SPEj = C xj , ( j = 1, , t )
are calculated using Eq. (19). FAR Xt1 in Eq. (22) is calculated using
Pr = [ p1, p2 , , p7 ],
subset X t1. The falsely alarmed samples, Ct1, are rstly obtained as
follows: > 0.
^ , ( j = 1, , t ) , is the jth sample of the verication data set. The
x j
int Ct1 = 0;
score matrix P and projection matrix C are obtained through the
for (i = 0; i < = t1; i ++) training process of the PCA model at the off-line stage. The solu-
[p , p , , p ], is the optimal parameters
{ tion to this problem, Pr 1 2 7
of our PCA-WD fault detection algorithm. Now, we face the pro-
(
if T2(i) > Tlim
2
) ( SPE(i) > SPE )
lim blem of solving the above optimization problem.
Ct1 = Ct1 + 1;
4.4.2. Solving of the optimization problem
} Obviously, the objective function and some constraints are
nonlinear and even non-analytical. It makes the classical optimi-
zation techniques, such as linear programming (LP), dynamic
where T 2 and SPE are denoised PCA statistics using WD algorithm.
programming (DP), not feasible for our problem. Intelligence-
T 2 = WDPr (T 2) and SPE = WDPr (SPE), where WDPr means a de- based techniques such as genetic algorithm (GA) and PSO can be
noising process with the parameter vector Pr and T2 and SPE are solutions to the problem. In literature, PSO has been widely used
obtained according to Eqs. (8) and (6), respectively. Then, FAR is in many elds such as mechanical, chemical, civil, and aerospace
calculated as FAR Xt1 = Ct1/t1(%). Similarly, MDR Xt2 is calculated with design, because it has advantages such as comparative simplicity,
subset X t2. The un-detected faulty samples are calculated as fol- rapid convergence and little parameters to be adjusted. PSO is
lows: known to effectively solve large-scale nonlinear optimization
problems; it is suitable for our problem at hand. Table 4

PSO is a stochastic search method which was rstly introduced Description of the parameter vector.
by Kennedy and Eberhart [38]. The main strategy of PSO is to
Elements Content Code Description
utilize the social behaviors and the communications involved in
swarms such as bird ocking and sh schooling. Each particle in p1 sym1,,smy15, db1,,db15, [1,35] Wavelet species
PSO is treated as a volumeless particle in g-dimensional searching coif1,,coif5
p2 soft thresholding, hard [1,2] Threshold method
space; its velocity and position are adjusted according to its past thresholding
and companions experience. p3 rigrsure, sqtwolog, heursure, [1,4] Threshold selection rule
PSO starts from a random swarm of particles called initial po- minimaxi
pulation in the g-dimensional searching space. Let the swarm size p4 one, sln, mln [1,3] Threshold rescaling
method
be U, the position and velocity of each particle are dened as p5 1,2,3,4,5 [1,5] Decomposition level
Pi(t ) = [ pi,1(t ), pi,2(t ), , pi,g (t )], p6 256, 128, 64, 32 [1,4] Length of sliding window
(24) p7 1,2,3,,32 [1,32] Step of sliding window
and
Vi(t ) = [vi,1(t ), vi,2(t ), , vi,g(t )], (25) P(t ) = [p1(t ), p2(t ), , p7(t )], (29)
respectively, where i = 1, 2, , U , represents the ith particle and t
where pj (t ) ( j = 1, , 7) represents a specic element of the
denotes the iteration time. The velocity Vi(t ) and position Pi(t ) of
each particle are iteratively modied according to the following parameter vector; and t represents the iteration time. The para-
rules: meter vector is described in Table 4. According to the denitions as
shown in Table 4, each parameter element is coded so as to imply
vi(t + 1) = vi, d(t ) + c1(t )r1(t )(Pbi, d(t ) Pi, d(t )) specic meaning with different values. For example, p1 = 30 im-
+ c2(t )r2(t )(Pg, d(t ) Pi, d(t )) (26) plies db15 wavelet; moreover, a WD parameter conguration as
P = [16, 1, 2, 3, 1, 3, 5] implies db1 wavelet, soft thresholding,
sqtwolog threshold selection rule, mln threshold rescaling
pi, d (t + 1) = pi, d (t ) + vi, d(t ) (27) method, 3 level decomposition, 64 lengths and 5 steps of sliding
window.
where d = 1, 2, , g represents the dth member of a particle, r1(t )
and r2(t ) are random numbers, generated from a uniform dis-
tribution in the range [0,1], to provide a stochastic weighting for
components involved in Eq. (26). The constants c1 and c2 represent 5. Fault detection of FWTP
the weights of stochastic acceleration terms that pull each particle
toward its pbest and gbest, respectively. The inertia weight factor In this section, the proposed fault detection algorithms will be
is used as a trade-off between global and local exploration applied to the FWTP as outlined in Section 2 for verication pur-
capabilities of the swarm. A large inertia weight factor tends to pose. We just take side A of the FWTP as study case because side B
facilitate global exploration, while a small one facilitates local is very similar to side A. Further, only part of the valves and sen-
exploration. sors of side A, as listed in Table 3, are selected for fault detection
In practice, generally decreases linearly from 1.2 down to research because they are accessible through the SIS of the power
0.4 during the iterations. Specically, the inertia weight factor , plant. In fact, the FWTP of a utility boiler are intermittently op-
in this paper, is generated as follows. erated due to its unique operating procedure. Generally, the PCA
max min based algorithms can hardly deal with the problems resulting from
= max iter
itermax (28) the alterative working condition of industrial processes. In this
paper, the whole working phase are rstly distinguished into
where itermax denotes the maximum iteration number, and iter
several conditions; and the PCA based algorithms are applied to
represents the current iteration.
the same or similar working conditions. With a deep analysis of
In the procedures above, the velocity vi, d(t ) and position pi, d(t ) of
the owchart and operating procedure, we found the working
each particle are imposed a bound to prevent the swarm over
conditions of side A can distinguish with the status of relevant
exploration. The maximum and the minimum velocities are de-
values. Thus, W1,,W6 in Table 1 are used for working condition
ned as v max
d and v min
d ; and the maximum and the minimum po-
classication only; and S1,,S12 are selected sensors for fault
sitions are notated by pmax and pmin .
d d detection of the FWTP.
Thus, if vi, d(t ) > v max max min
d , then vi, d(t ) = v d ; if vi, d(t ) < v d , then The operation data is collected from the SIS of the plant
vi, d(t ) = v dmin . through a OPC (OLE for Process Control) interface. 500 samples of
If pi, d(t ) > pdmax , then pi, d(t ) = pdmax ; if pi, d(t ) < pmin
d
, then the 12 sensors, under a kind of typical working condition, are
pi, d(t ) = pdmin . collected with a sampling rate of 5 s. They are further used to form
Eqs. (26) and (27) are iterated until convergence is reached. a training data set. Another 1000 samples, under the same work-
Each particle tracks its coordinates in the search space, which ing condition but within different time period, are collected for
means the best solution achieved by ith particle, called pbest and fault detection validation. For a mature industrial process as
notated as Pbi(t ) Rg . Accordingly, the global best value is called FWTPs, it is not easy to capture its abnormal operation conditions.
gbest and notated as Pg(t ) Rg , representing the overall best so- Hence, we intentionally introduce several kinds of faults to the
lution obtained by the particles in the swarm. operation data to simulate the operation conditions with faults.
Specically, a particle P(t) in our problem is dened as follows, One thing to note is that in the following studies the same training
which in fact represents a potential solution to the optimization data set and verication data set are applied to different algo-
problems: rithms for fair comparison study.
80 80
T2 statistic T2 statistic
60 60
T2lim T2lim
40 40
T2
T2
20 20
0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
samples samples
30 30
SPE statistic SPE statistic
SPElim 20 SPElim
20
SPE
SPE
10 10
0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
samples samples
Fig. 4. Statistics of classical PCA. (For interpretation of the references to color in Fig. 5. Statistics of PCA-WD without parameter optimization.
this gure caption, the reader is referred to the web version of this paper.)
that both T2 and SPE statistics of the samples between 401 and
5.1. Application of classical PCA 800, where the fault is introduced, go up beyond their control
limits. On the other hand, during the periods without fault, T2 and
First of all, the classical PCA is investigated. It is applied to the SPE go down below their limits. Fig. 5 demonstrates that the PCA-
FWTP according to the procedure proposed in Section 3. Speci- WD algorithm, with meticulously selected parameters, is capable
cally, the detection ability of the approach with respect to single of detecting the fault, more precisely. The FDR and MDR, compared
fault from a sensor is veried. The inlet ow rate of 1# anion bed, with classical PCA, are much lower.
S5, is used as study case. A constant deviation fault is intentionally However, the performance of PCA-WD is sensitive to the WD
added to S5 from samples 401 to 800. The T2 and SPE statistics of parameter selection. Poorly selected parameters may decrease the
classical PCA fault detection algorithm are illustrated in Fig. 4. The performance of PCA-WD. The empirical WD parameter selection
blue solid line represents the real-time statistics and the red dash relies too much on individual's experience. And, it is time con-
lines denote the corresponding control limits. It can be seen that suming and cannot get the optimal parameter conguration. A
under the normal working condition a number of samples exceed better way proposed in this paper is to obtain the parameters with
T2 control limit, which brings false alarms if the fault criterion is certain optimization techniques.
strictly applied. On the other hand, the SPE values of some samples
do not surpass its control limit under fault condition, which tends 5.3. PCA-WD with parameter optimization
to miss fault alarms. The problems are mainly caused by the noise
of T2 and SPE statistics. We carried out several studies where faults In Eq. (23), the parameter selection of PCA-WD is formulated as
are added to different sensors; and the similar results are gotten. It an optimization problem, which takes the parameter vector of WD
shows that the classical PCA is not applicable for fault detection of as the optimization variable. In literature, PSO algorithm has been
the FWTP. successfully employed to solve complex optimization problems. In
this paper, PSO is also used to determine the optimal parameters
5.2. Application of PCA-WD of PCA-WD. The aim of the PSO method is to determine which set
of parameters, i.e. wavelet species, p1, threshold method, p2,
This paper intends to deal with the noise of T2 and SPE statistics threshold selection rules, p3, threshold rescaling method, p4, the
with wavelet denoising. A WD step is attached to T2 and SPE sta- decomposition level p5, length and step of sliding window, p6 and
tistics before fault detection as shown in Fig. 3. The same training p7, is optimal for fault detection. Here, the parameters of PCA-WD
data set and verication data set as above are applied to PCA-WD are coded as integers, as shown in Table 4; hence, the real values of
with and without optimal parameter selection. The application the parameters must be rounded to its nearest integer values
procedure of PCA-WD is shown in Fig. 3. during each iteration.
The same training data set, containing 500 samples, and the same
5.2.1. PCA-WD without parameter optimization verication data set, containing another 1000 samples, are used to
Accordingly, a constant deviation fault is intentionally added to verify the PCA-WD with parameter optimization. Further, a bias fault,
S5 from samples 401 to 800 to test the performance of PCA-WD. with the amplitude of 8% of the sensor mean value, is intentionally
Here, the WD parameters are determined through calculation introduced to sensor S5 between 401 and 800 samples. The PCA
analysis of cases; in other words, the parameters are selected model is rstly built with the training data set to get the score matrix

mostly on the researcher's experience. The parameters of WD and P , the projection matrix C and the control limits of the two statistics
2
sliding window are listed as below. as Tlim and SPElim . Then, the verication data set is applied to the
parameter selection problem as shown in Eq. (23), where t1000,
Coieft 4 wavelet. t1 700, and t2 300. PSO is used to solve the optimization problem.
Decomposition level 3. The parameters of PSO are specically explained as follows.
Threshold parameters: soft thresholding method, sqtwolog
threshold selection rules, sln threshold rescaling method. Generally, the population size implies a balance between ac-
Sliding window length len 256, and sliding window step curacy, stability, computation time and dimension. In our case,
step 32. population size is set to 50.
Inertia weight factor is a trade-off between global and local
The T2 and SPE statistics of PCA-WD fault detection algorithm exploration capabilities of the swarms. It is set according to
with the above parameters are illustrated in Fig. 5. It can be seen Eq. (24), where max 1.2 and min 0.4.
1.2 60
T2 statistic
1 40 T2lim
objective function
T2
0.8 20
0.6
0
0 200 400 600 800 1000
0.4 samples
60
0.2 SPE statistic
0 10 20 30 40 50 SPElim
iteration 40
SPE
Fig. 6. Objective function value. 20
Table 5 0
0 200 400 600 800 1000
Optimal parameters of PCA-WD. samples
Component Parameter Value (a) classical PCA
p1 Wavelet species db15

50
p2 Sliding window step 22 T2 statistic
p3 Threshold method soft 40 T2lim
p4 Threshold selection rule sqtwolog 30
T2
p5 Threshold rescaling method sln
20
p6 Decomposition level 3
p7 Sliding window length 256 10
0 200 400 600 800 1000
samples
60
The lower and upper bounds of pd, pmin
d
and pmax
d
are set ac- SPE statistic
SPElim
cording to Table 4. 40
The limits of velocity change must be within a reasonable
SPE
bound. We set v max

d = pmax
d
/2 and v min
d = pmax
d
/2, so as to avoid 20
over exploration.
The acceleration constants c1 and c2 represent the weights of 0
0 200 400 600 800 1000
stochastic acceleration terms toward local and global best, re- samples
spectively. In our case, c1 1.2 and c2 1.2. (b) PCA-WD with parameter optimization
Weighting factor = 6.8.
Fig. 7. T2 and SPE statistics with classical PCA and optimized PCA-WD. (For inter-
pretation of the references to color in this gure caption, the reader is referred to
Actually, the parameter selection of PSO is a rather broad topic. the web version of this paper.)
This paper focuses on the application of PSO, instead of the PSO
algorithm itself. Its parameters are selected through sample cal- Table 6
culation analysis. The optimization process is shown in Fig. 6, Performance comparison between classical PCA and optimized PCA-WD.
where the objective function converges to its maximum value,
Statistics FDR (%) MDR (%) SNR
1.04, after 20th iteration. Meanwhile, FDR 0%, MDR 0%,
SNRT 2 18.13 and SNRSPE 16.50 when the parameter vector gets T2 SPE T2 SPE T2 SPE
its optimal value. The solution to the optimization problem,
Pr [ p1, p2 , , p7], contains the optimal parameter conguration Classical PCA 1.5 8.33 0 10.8
Optimized PCA-WD 0 0 0 0 18.13 16.50
of the PCA-WD algorithm, as shown in Table 5.
For comparison, the T2 and SPE statistics in terms of the ver-
ication data set, with classical PCA and optimized PCA-WD, are
shown in Fig. 7(a) and (b), respectively, where the blue solid line Table 7
Fault descriptions.
represents the real-time statistics and the red dash line denotes
the corresponding control limit. During the faultless conditions, as Study cases Fault description Fault samples
1500 and 8011000, the T2 and SPE statistics of classical PCA
uctuate heavily. It is the source of false detection under faulty Normal condition
Bias fault d1 = 8% 501800
condition and missed detection under faultless condition. On the
Drift fault d2 = 0.05(k 300) 501800
contrary, the optimized PCA-WD has the ability to achieve precise
Broken line d3 = 0 501800
fault detection (FDR 0% and MDR 0%); because the WD part can
eliminate the effect of the noise of T2 and SPE statistics dramati-
cally. The performance criteria of classical PCA and optimized PCA- the only difference is the way to determine the parameters. The
WD are listed in Table 6 for quantitative comparison purpose. FDR PCA-WD with optimization excels in getting the optimal para-
and MDR are the core criteria of fault detection algorithms. In meters in an automatical and deterministic way.
Table 6, the FDR and MDR of T2 and SPE of classical PCA are 9.83%
and 10.8%; comparatively, both FDR and MDR of the optimized
PCA-WD are zero. The results show that the optimized PCA-WD 6. Comparative studies
can improve the fault detection performance greatly.
The results from PCA-WD with and without parameter opti- The above section demonstrates the application of PCA-WD
mization are similar, because the two algorithms are identical and and makes a comparative analysis between classical PCA, PCA-WD
40
T2 statistic T2 statistic
100
30 T2lim T2lim
T2
20
T2
50
10
0
0 0 200 400 600 800 1000
0 200 400 600 800 1000
samples
samples
30
20
SPE statistic SPE statistic
15 SPElim SPElim
20
SPE
SPE
10
10
5
0
0 0 200 400 600 800 1000
0 200 400 600 800 1000 samples
samples
(a) statistics of PCA
40 T2 statistic
T2 statistic 100
T2lim
30 T2lim
T2
20 50
T2
10
0
0 0 200 400 600 800 1000
0 200 400 600 800 1000 samples
samples
20 40 SPE statistic
SPE statistic
SPElim
SPElim 30
15 SPE
20
SPE
10
10
5
0
0 200 400 600 800 1000
0 samples
0 200 400 600 800 1000
samples
(b) statistics of SWPCA
40 T2 statistic
100
T2 statistic T2lim
30 T2lim
T2
50
20
T2
10 0
0 200 400 600 800 1000
0 samples
0 200 400 600 800 1000
samples 30
SPE statistic
20
SPE statistic SPElim
20
15 SPElim
SPE
10
SPE
10
5 0
0 200 400 600 800 1000
0 samples
0 200 400 600 800 1000
samples (c) statistics of optimized PCA-WD
(c) statistics of PCA-WD Fig. 9. Fault detection results under bias fault.
Fig. 8. Fault detection results under normal condition.
amplitude varies linearly with time, as
without optimization and PCA-WD with optimization. However, d = 0.05 (k 300),

the results are obtained only with constant deviation fault; logi-
cally, it cannot guarantee the electiveness of PCA-WD under other where d is the fault amplitude and k is the sample number.
kinds of faults. To thoroughly test the advantages of optimized Study Case 4: Broken line fault is introduced to S8 from 501 to
PCA-WD, some comparative studies between classical PCA, 800; which is implemented by setting the amplitude of S8 to zero.
SWPCA, and optimized PCA-WD are to be carried out. Specically, For fair comparisons between the algorithms, the same training
four study cases, as listed in Table 7, are used. data set in the above section is applied to the algorithms. More-
Study Case 1: Normal condition without any fault. over, the threshold of CPV l and condence limit are set to 85%
Study Case 2: Bias fault is added to S5 from 501 to 800 with a and 99%, respectively; and the sliding window length of SWPCA is
amplitude of 8% of its mean value. set to 500. Here, the PCA-WD takes the optimized parameters as
Study Case 3: Drift fault is added to S2 from 501 to 800; and its shown in Table 5 in the following investigations.
250 600
200 T2 statistic T2 statistic
150 T2lim 400 T2lim
T2
T2
100
200
50
0
0 200 400 600 800 1000 0
samples 0 200 400 600 800 1000
samples
60
SPE statistic 40
40 SPElim SPE statistic
30 SPElim
SPE
20
SPE
20
0 10
0 200 400 600 800 1000
samples
0
0 200 400 600 800 1000
(a) statistics of PCA samples

250
2
200 T statistic
T2lim 600
T2 statistic
150
T2lim
T2
100 400
T2
50
200
0
0 200 400 600 800 1000
samples 0
0 200 400 600 800 1000
60 samples
SPE statistic
SPElim 100
40 SPE statistic
SPE
SPElim
20
SPE
50
0
0 200 400 600 800 1000
samples 0
0 200 400 600 800 1000
(b) statistics of SWPCA samples

250
200 T2 statistic 500
T2lim T2 statistic
150
T2lim
T2
100
250
T2
50
0
0 200 400 600 800 1000
samples 0
0 200 400 600 800 1000
60 samples
SPE statistic
40
SPElim SPE statistic
40
30 SPElim
SPE
SPE
20 20
10
0
0 200 400 600 800 1000
samples 0
0 200 400 600 800 1000
samples
(c) statistics of PCA-WD
(c) statistics of PCA-WD method
Fig. 10. Fault detection results under drift fault.
Fig. 11. Fault detection results under broken line fault.
6.1. Study Case 1: normal condition 6.2. Study Case 2: bias fault
Firstly, the original operation data set of the FWTP under nor- Secondly, a bias fault is deliberately introduced to S5 from 501
mal condition, from SIS of the plant, is directly taken as verica- to 800 with an amplitude of 8% of its mean value to form a ver-
tion data set. It is then applied to classical PCA SWPCA, and PCA- ication data set. The data set is then respectively applied to the
WD with optimization, respectively; and the results are compared 3 fault detection algorithms with the results as shown in Fig. 9.
in Fig. 8. If the fault criterion is strictly applied, both classical PCA Under the faultless conditions, as 1500 and 8011000, both
and SWPCA cause false detection under faultless conditions; on classical PCA and SWPCA bring false alarms. And the two algo-
the contrary, PCA-WD operates normally without any false alarm rithms also cause missed detections under the faulty condition
(FDR 0%). Consequently, under normal conditions, the optimized from 501 to 800. Fig. 9 shows that the optimized PCA-WD works
PCA-WD shows better fault detection performance comparing well under both faultless and faulty conditions. In a word, the
with the classical PCA and SWPCA, due to its ability to restrain the optimized PCA-WD outperforms the classical PCA and SWPCA in
uctuation of T2 and SPE statistics. dealing with bias fault.
Table 8 boiler, in practice, sensor faults tend to cause severe consequences.

FDR and MDR of different methods. An effective fault detection algorithm is much needed to improve
the reliability of FWTPs. Classical PCA has been employed to the
Cases Algorithms FDR (%) MDR (%)
FWTP in our previous work; however, the noises of T2 and SPE
T2 SPE T2 SPE statistics lead to relatively high rate of false detections and missed
detections. In this paper, wavelet denoise is used to deal with this
Normal Classical PCA 1.4 5.2 problem. Specically, the WD is combined with PCA to form a new
SWPCA 0.5 3.5
PCA-WD 0 0
PCA-WD algorithm. The performance of PCA-WD is sensitive to its
Bias fault Classical PCA 1.33 11.67 0.25 2.5 parameters; and the parameter selection of this compound algo-
SWPCA 0.5 3.5 0.75 17.25 rithm is difcult. This paper formulates the parameter selection
PCA-WD 0.33 0 0 0.5 PCA-WD as an optimization problem and employs PSO to deal
Drift fault Classical PCA 1.5 4.33 0 0.5
with its nonlinearity and complexity. A FWTP from a coal-red
SWPCA 0.5 2.67 0 69
PCA-WD 0.33 0 0 0 power plant is taken as a study case. The real operation data of the
Broken line Classical PCA 1.11 4.9 0 0 FWTP is collected to verify the PCA-WD algorithm. The result
SWPCA 0.29 3.14 0 0 shows that WD is effective to restrain the noises of T2 and SPE
PCA-WD 0 1.6 0 0 statistics so as to improve the performance of PCA-WD algorithm.
And the parameter optimization can obtain the optimal para-
meters of PCA-WD in an automatic way; and thus relive the de-
6.3. Study Case 3: drift fault pendence on individual's experience. The comparative studies
between classical PCA, SWPCA and optimized PCA-WD algorithms,
Thirdly, a drift fault is added to S2 from 501 to 800 to form a data in terms of four kinds of faults, are nally carried out. The opti-
set for comparative study, whose amplitude varies linearly with time. mized PCA-WD excels its opponent under all the conditions. The
The data set is then respectively applied to classical PCA, SWPCA and results further convince the advantages of the newly proposed
optimized PCA-WD respectively. The results are shown in Fig. 10. PCA-WD algorithm. However, this paper mainly focuses on the
Again, the classical PCA, SWPCA cause false detections and missed application of PCA-WD algorithm and does not present the theo-
detections under faultless and faulty conditions; comparatively, the retical analysis concerning the denoising of the PCA statistics. In
optimized PCA-WD is capable of dealing with the drift fault. our future work, we plan to establish a solid base for our PCA-WD
algorithm with strict theoretical proof.
6.4. Study Case 4: broken line fault
Finally, a broken line fault is added to S9 from 501 to 800 to test Acknowledgment
the three algorithms. The broken line fault is simulated by setting
the value of S9 to zero during the faulty period. The results are This project is supported by National Natural Science Founda-
shown in Fig. 11. It can be seen that the optimized PCA-WD is tion of China (Grant No. 51475337) and International Science &
much better than classical PCA and SWPCA under this condition. Technology Cooperation Program of China (Grant No.
With a careful analysis, we nd that even the optimized PCA-WD 2015DFG72440), and is partially supported by the Open Research
brings relatively high FDR, specically, its SPE FDR reaches 1.6%. Fund of Key Laboratory of Transients in Hydraulic Machinery,
Fortunately, the MDRs of T2 and SPE statistics keep zero too; they Ministry of Education.
are actually the key indexes of the reliability of the optimized PCA-
WD algorithm. The reason is that broken line fault causes strong
signal jumps to the corresponding sensor, which in fact decreases References
the performance of the objected fault detection algorithms with-
out exception of the optimized PCA-WD. [1] Yin S, Ding SX, Xie X, Luo H. A review on basic data-driven approaches for
Furthermore, the performance criteria of the three algorithms industrial process monitoring. IEEE Trans Ind Electron 2014;61(11):641828.
[2] Ge Z, Song Z, Gao F. Review of recent research on data-based process mon-
with the four cases are listed in Table 8 for mathematical com-
itoring. Ind Eng Chem Res 2013;52(10):354362.
parison purpose. When applied to the FWTP of a coal-red power [3] Dong H, Wang Z, Gao H. Fault detection for Markovian jump systems with
plant, the optimized PCA-WD performs much better than classical sensor saturations and randomly varying nonlinearities. IEEE Trans Circuits
Syst IRegul Pap 2012;59(10):235462.
PCA and SWPCA, under the conditions with above four typical
[4] Chen KY, Chen LS, Chen MC, Lee CL. Using SVM based method for equipment
faults. Under normal condition, the optimized PCA-WD works well fault detection in a thermal power plant. Comput Ind 2011;62(1):4250.
without any false detection and missed detection. In terms of the [5] Hong JJ, Zhang J, Morris J. Progressive multi-block modelling for enhanced
fault isolation in batch processes. J Process Control 2014;24(1):1326.
conditions with the bias fault and drift fault, the optimized PCA-
[6] Widodo A, Yang BS. Wavelet support vector machine for induction machine fault
WD brings false detection too; but its FDRs are acceptable and diagnosis based on transient current signal. Expert Syst Appl 2008;35(1):30716.
much lower than classical PCA and SWPCA. Under the conditions [7] Yin S, Ding SX, Haghani A, Hao H, Zhang P. A comparison study of basic data-
with broken line fault, the performance of optimized PCA-WD is driven fault diagnosis and process monitoring methods on the benchmark
Tennessee Eastman process. J Process Control 2012;22(9):156781.
decreased by the strong signal jumps. Specically, the false de- [8] MacGregor JF, Kourti T. Statistical process control of multivariate processes.
tections and missed detections are always happened along with Control Eng Pract 1995;3(3):40314.
the samples where the signal changes present. In practice, the [9] Xu X, Xiao F, Wang S. Enhanced chiller sensor fault detection, diagnosis and
estimation using wavelet analysis and principal component analysis methods.
signals of the feed water treatment process do not uctuate vio- Appl Thermal Eng 2008;28(2):22637.
lently, unlike what happened in the above simulation studies. [10] Jackson JE, Mudholkar GS. Control procedures for residuals associated with
Somehow, it will improve the performance of the optimized PCA- principal component analysis. Technometrics 1979;21(3):3419.
[11] Schlkopf B, Smola AJ, Mller K. Nonlinear component analysis as a kernel
WD algorithm, and makes the newly proposed algorithm applic- eigenvalue problem. Neural Comput 1998;10(5):1299399.
able in eld application. [12] Jia M, Xu H, Liu X, Wang N. The optimization of the kind and parameters of kernel
function in KPCA for process monitoring. Comput Chem Eng 2012;46(15):94104.
7. Conclusion [13] Li CH, Lin CT, Kuo BC, Chu HS. An automatic method for selecting the para-
meter of the RBF kernel function to support vector machines. In: 2010 IEEE
international geoscience & remote sensing symposium, Honolulu, Hawaii, 25
Feed water treatment process is a vital sub-process of an utility 30 July 2010. p. 8369.
[14] Ding S, Zhang P, Ding E, Naik A, Deng P, Gui W. On the application of PCA identication and reconstruction: the unidimensional fault case. Comput
technique to fault diagnosis. Tsinghua Sci Technol 2010;15(2):13844. Chem Eng 1998;22(78):92743.
[15] Lu N, Wang F, Gao F. Combination method of principal component and wa- [28] Dunia R. Identication of faulty sensors using principal component analysis.
velet analysis for multivariate process monitoring and fault diagnosis. Ind Eng AIChE J 1996;42(10):2797812.
Chem Res 2003;42(18):4198207. [29] Qin S, Yue H, Dunia R. Self-validating inferential sensors with application to air
[16] Zhang Y, Ma C. Fault diagnosis of nonlinear processes using multiscale KPCA emission monitoring. Ind Eng Chem Res 1997;36:167585.
and multiscale KPLS. Chem Eng Sci 2011;66(1):6472. [30] Harkat M, Mourot G, Ragot J. An improved PCA scheme for sensor FDI: application
[17] Xu J, Hu S. Nonlinear process monitoring and fault diagnosis based on KPCA to an air quality monitoring network. J Process Control 2006;16:62534.
and MKL-SVM. In: 2010 international conference on articial intelligence and [31] Fan J, Qin S, Wang Y. Online monitoring of nonlinear multivariate industrial
computational intelligence(AICI2010), Sanya, China, 2324 October 2010. p. processes using ltering KICA-PCA. Control Eng Pract 2014;22:20516.
2337. [32] Shao R, Hu W, Wang Y, Qi X. The fault feature extraction and classication of gear
[18] Chen D, Li Z, He Z. Research on fault detection of Tennessee Eastman process using principal component analysis and kernel principal component analysis based
based on PCA. In: 25th Chinese control and decision conference (CCDC2013), on the wavelet packet transform. Measurement 2014;54:11832.
Guiyang, China, 2527 May 2013. p. 107881. [33] Kim H, Melhem H. Damage detection of structures by wavelet analysis. Eng
[19] Li G, Alcala CF, Qin SJ, et al. Generalized reconstruction-based contributions Struct 2004;26(3):34762.
for output-relevant fault diagnosis with application to the Tennessee Eastman [34] Donoho DL. Denoising by soft-thresholding. IEEE Trans Inf 1995(3):61327.
process. IEEE Trans Control Syst Technol 2011;19(5):111427. [35] Mallat S. A theory for multiresolution signal decomposition: the wavelet re-
[20] Hu Z, Chen Z, Gui W, Jiang B. Adaptive PCA based fault diagnosis scheme in presentation. IEEE Trans Pattern Anal Mach Intell 1989;11(7):67493.
imperial smelting process. ISA Trans 2014;53:144655. [36] Li N, Zhou R, Hu Q, Liu X. Mechanical fault diagnosis based on redundant
[21] Liu K, Jin X, Fei Z, Liang J. Adaptive partitioning PCA model for improving fault second generation wavelet packet transform, neighborhood rough set and
detection and isolation. Chinese J Chem Eng 23(6), 2015, 981991. http://dx. support vector machine. Mech Syst Signal Process 2012;28:60821.
doi.org/10.1016/j.cjche.2014.09.052. [37] del Valle Y, Venayagamoorthy G, Mohagheghi S, Hernandez J, Harley R. Par-
[22] Jaffel I, Taouali O, Elaissi E, Messaoud H. A new online fault detection method ticle swarm optimization: basic concepts, variants and applications in power
based on PCA technique. IMA J Math Control Inf 31, 2014, 487499. http://dx. systems. IEEE Trans Evol Comput 2008;12:17195.
doi.org/10.1093/imamci/dnt025. [38] Kennedy James. Particle swarm optimization. Encyclopedia of machine
[23] Wang X, Kruger U, Irwin GW. Process monitoring approach using fast moving learning.US: Springer; 2010. p. 7606.
window PCA. Ind Eng Chem Res 2005;44(15):5691702. [39] Kennedy J, Eberhart RC. A discrete binary version of the particle swarm op-
[24] Kim D, Lee IB. Process monitoring based on probabilistic PCA. Chemom Intell timizer. In: IEEE conference on computational cybernetics and simulation, vol.
Lab Syst 2003;67(2):10923. 5, 1997. p. 41048.
[25] Lau CK, Ghosh K, Hussain MA, Hussan CRC. Fault diagnosis of Tennessee Eastman [40] Laskari EC, Parsopoulos KE, Vrahatis MN. Particle swarm optimization for integer
process with multi-scale PCA and ANFIS. Chemom Intell Lab Syst 2013;120:114. programming. In: IEEE congress on evolutionary computation, 2002. p. 15827.
[26] Song B, Shi H, Ma Y, Wang J. Multisubspace principal component analysis with [41] Liao CJ, Tseng CT, Luarn P. A discrete version of particle swarm optimization
local outlier factor for multimode process monitoring. Ind Eng Chem Res for owshop scheduling problems. Comput Oper Res 2007;34(10):3099111.
2014;53(42):1645364. [42] Datta D, Figueira JR. A real-integer-discrete-coded particle swarm optimiza-
[27] Dunia R, Qin S. A unied geometric approach to process and sensor fault tion for design problems. Appl Soft Comput 2011;11(4):362533.

ISA Transactions: Shirong Zhang, Qian Tang, Yu Lin, Yuling Tang

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

ISA Transactions: Shirong Zhang, Qian Tang, Yu Lin, Yuling Tang

Încărcat de

Drepturi de autor:

Formate disponibile

ISA Transactions ()

Contents lists available at ScienceDirect

Fault detection of feed water treatment process using PCA-WD

raw water from secondary RO tank S1

1# mixed bed 2# mixed bed

1# desalted 2# desalted 3# desalted 4# desalted

to acid, alkali storage system and

Fig. 1. Flow chart of the feed water treatment process.

Fig. 2. Three level decomposition.

performance; hence, DWT is widely used in many elds. Speci-

Now, the signal (t ) is decomposed into a set of detail coefcients

Rules Descriptions Rescaling methods Descriptions

FAR Xt1 = Ct1/t1(%),

and a subset X t2, consisting of t2 faulty samples. And, t = t1 + t2.

problems; it is suitable for our problem at hand. Table 4

Component Parameter Value (a) classical PCA

p1 Wavelet species db15

bound. We set v max

without optimization and PCA-WD with optimization. However, d = 0.05 (k 300),

(a) statistics of PCA

(b) statistics of SWPCA

Table 8 boiler, in practice, sensor faults tend to cause severe consequences.

S-ar putea să vă placă și