Sunteți pe pagina 1din 22

sensors

Article
Fault Diagnosis Based on Chemical Sensor Data with
an Active Deep Neural Network
Peng Jiang 1, *, Zhixin Hu 1 , Jun Liu 2 , Shanen Yu 1 and Feng Wu 1
1 College of Automation, Hangzhou Dianzi University, 310018 Hangzhou, China;
hduhzx@163.com (Z.H.); shanen_yu@hdu.edu.cn (S.Y.); fengwu@hdu.edu.cn (F.W.)
2 State Key Laboratory of Industrial Control Technology, Institute of Industrial Process Control,
Zhejiang University, 310027 Hangzhou, China; liujun@163.com
* Correspondence: pjiang@hdu.edu.cn; Tel.: +86-571-8691-9131 (ext. 512); Fax: +86-571-8687-8566

Academic Editor: Leonhard M. Reindl


Received: 15 August 2016; Accepted: 6 October 2016; Published: 13 October 2016

Abstract: Big sensor data provide significant potential for chemical fault diagnosis, which involves
the baseline values of security, stability and reliability in chemical processes. A deep neural
network (DNN) with novel active learning for inducing chemical fault diagnosis is presented in
this study. It is a method using large amount of chemical sensor data, which is a combination of
deep learning and active learning criterion to target the difficulty of consecutive fault diagnosis.
DNN with deep architectures, instead of shallow ones, could be developed through deep learning
to learn a suitable feature representation from raw sensor data in an unsupervised manner using
stacked denoising auto-encoder (SDAE) and work through a layer-by-layer successive learning
process. The features are added to the top Softmax regression layer to construct the discriminative
fault characteristics for diagnosis in a supervised manner. Considering the expensive and time
consuming labeling of sensor data in chemical applications, in contrast to the available methods,
we employ a novel active learning criterion for the particularity of chemical processes, which is
a combination of Best vs. Second Best criterion (BvSB) and a Lowest False Positive criterion (LFP),
for further fine-tuning of diagnosis model in an active manner rather than passive manner. That is,
we allow models to rank the most informative sensor data to be labeled for updating the DNN
parameters during the interaction phase. The effectiveness of the proposed method is validated in
two well-known industrial datasets. Results indicate that the proposed method can obtain superior
diagnosis accuracy and provide significant performance improvement in accuracy and false positive
rate with less labeled chemical sensor data by further active learning compared with existing methods.

Keywords: fault diagnosis; deep learning; deep neural network; active learning; big sensor data

1. Introduction
Chemical industries have always been concerned about methods for reducing the risk of accidents
because they may commonly occur in extreme environments, such as extraordinarily high temperature
or pressure, which may result in public damage and large economic losses [1]. Accordingly, the chemical
industry is highly supervised, where effective fault diagnosis provides the baseline values of security,
stability and reliability, since fault diagnosis has been addressed as one of the best methods to prevent
industry accidents [2,3]. Modern chemical processes have become more complex with the development
of science and technology, and large amounts of data are being produced. The data can be analyzed
to learn whether a fault has occurred in chemical processes, while determining significant potential
in chemical fault diagnosis. The advancement of sensor technology has lessened the difficulties in
the acquisition of data [4], namely, chemical sensor data, which implies the necessity of an effective

Sensors 2016, 16, 1695; doi:10.3390/s16101695 www.mdpi.com/journal/sensors


Sensors 2016, 16, 1695 2 of 22

fault diagnosis method to monitor the entire process and detect the fault in time by mining potential
information from the large amounts of sensor data collected.
Three types of methods are currently used in fault diagnosis from data processing perspective,
namely model-based, signal-based and knowledge-based methods [5]. Model-based methods estimate
the output of the system by constructing a model and achieving fault diagnosis through the
residual between estimates and measurements. Methods of this type, such as parameter estimation
and parity space methods, provide in-depth analysis for the dynamic of systems [6,7]. Given the
complexity of modern chemical systems, explicitly representing the real chemical process with
a precise mathematical model is complicated. Signal-based methods are based on the analysis of
output signals, which address the problem of complex modeling. Typical signal analysis techniques,
including fast Fourier transform, spectral estimation and wavelet transform, are employed in this
type of methods [5]. Signal-based methods require thorough analysis and a priori knowledge on
fault mechanism. In addition, the manually extracted feature has a limitation in terms of application,
that is, it is only suitable for specific diagnosis issues, thus limiting the application in complex chemical
systems [8].
As a principal part of artificial intelligence techniques, machine learning techniques, which are
the main part of knowledge-based methods, have shown significant potential in fault diagnosis.
This method is also called intelligent fault diagnosis method, where artificial intelligence techniques
are combined [9]. This method attempts to acquire underlying knowledge from large amounts of
empirical data through model learning and is more desirable than other methods [8]. As representatives,
artificial neural network (ANN), support vector machine (SVM), and multi-layer perceptron (MLP)
have been applied successfully in the field of fault diagnosis [1013]. This type of method commonly
combines signal processing techniques for initial feature extraction from signals. Amar et al. [14]
extracted an image feature of the vibration spectrum and employed ANN to detect faults. Bin et al. [15]
presented a method that provides wavelet packets-empirical mode decomposition for characteristics
extraction and MLP network for fault classification. Luis et al. [16] achieved fault detection in the
petroleum industry using one-class SVM. The representative features from signal processing and
adaptive learning capability of machine learning algorithm can provide significant accurate results in
detecting or even discriminating the latent faults. However, it is just a manner of supervised learning
that implies the neglect of large amounts of unlabeled sensor data, while performing unsatisfactorily
when labeled sensor data are insufficient, which is often the case in chemical systems. Furthermore,
two weaknesses must be improved for better diagnosis:

(a) The poor ability of learning complex nonlinear relationships because of such a shallow
architecture [17];
(b) The features are only extracted based on signal-based techniques, and the model performance
strongly depend on expert and a priori knowledge, which have significant limitations in terms
of applicability.

Deep learning is a type of semi-supervised learning that holds significant strengths in overcoming
the aforementioned weaknesses in current knowledge-based methods through multiple non-linear
transformations and approximate non-linear functions for various diagnosis issues. Deep learning
demonstrates significant ability in data expression compared to shallow learning and is able to learn
the feature of input patterns adaptively for intelligent fault diagnosis rather than just depending on
manual extraction. Hinton et al. [18] proposed an unsupervised learning algorithm that trains the deep
belief network through the manner of greedy layer-by-layer. The employment of this deep learning
training algorithm solves the training problem of deep neural network(DNN) that easily incur leading
to catastrophic failure [17,19]. This development promotes the move of deep learning technique into
a new platform, and motivates significant performance in image, face recognition and natural language
processing [2022]. Deep learning also has significant merit in a few current studies on fault diagnosis
using large amount of data, mainly in machinery [9,2325]. However, the preponderance of deep
Sensors 2016, 16, 1695 3 of 22

learning also experiences the practical problem of how to select the sensor data collected to be labeled,
that is, further improvement in the effective utilization of labeled sensor data is still needed since
a model needs a priori knowledge of data-self, to perform better, whereas the labeling of collecting
sensor data can be expensive and difficult in chemical industry systems. Several studies have shown
that labeling requires a large number of experts and more than 10-fold that of time consumed relative
to obtaining data [26].
The objective of active learning is to learn a function that improves the model while requiring
as little sensor data labeling as possible. Active learning has been investigated for many real world
problems, such as image classification [27,28], biomedicine [29,30], and system monitoring [31,32],
which have presented a comprehensive survey. However, the combinations of deep learning and
active learning, which integrates the merit of feature representation and data efficiency, have not been
employed in current chemical fault diagnosis research. It is desirable to employ active learning to rank
the most informative sensor data to be labeled and deep learning to fine-tune the model in an active
manner and thus minimizes the number of training samples necessary to optimize the discrimination
capabilities of the fault diagnosis model as far as possible.
This study presents a novel DNN with active learning using large amount of chemical sensor
data for chemical fault diagnosis. It is a combination of deep learning and a novel active learning
criterion that targets the difficulty of consecutive fault diagnosis in chemical systems. The initial
DNN model employs a deep architecture with stacked denoising auto-encoder (SDAE) and works
through a hierarchical successive learning process where deep learning technique is applied for the
feature representation of diagnosis sensor data. The features learned are added to the top Softmax
regression layer to construct the discriminative fault characteristics for diagnosis in a supervised
manner. For further fine-tuning of the DNN, in contrast to the available methods, we select the most
useful samples by the combination of the proposed criterion Best vs. Second Best (BvSB) and Lowest
False Positive (LFP), which improve DNN actively for labeling through a novel active learning criterion
to target the chemical sensor data, which has better improvement in accuracy and false positive rate
than existing methods.
Compared with the existing related methods, the contributions of the proposed method are
summarized as follows:

(1) Deep learning is able to learn the feature of diagnosis sensor data adaptively for intelligent fault
diagnosis rather than merely relying on manual extraction.
(2) The method performed excellently in obtaining the potential information and fault characteristics
of raw sensor data by multiple non-linear transformations and approximate non-linear functions
and presented higher diagnosis accuracy than methods based on shallow architecture. Therefore,
the proposed method is a preferred approach for diagnosis in complex chemical systems.
(3) The combination of deep learning and active learning is proposed in the chemical fault diagnosis,
which improves the existing diagnosis methods significantly. Compared with available active
learning methods, a novel active learning criterion combined with BvSB and LFP is presented,
which is an active labeling method for the cost-effective selection of chemical sensor data to be
labeled and achieves the selection of the most valuable samples for inducing the DNN in chemical
fault diagnosis, thus improving the model performance maximally.

The remainder of this paper is organized as follows: The applicability analysis and preparations
related to the proposed method are formally introduced in Section 2; detailed descriptions of the
proposed method are presented in Section 3; the simulation evaluation is provided in Section 4; and the
conclusions are presented in Section 5.

2. Applicability Analysis and Model Preparation


We first present the merit and applicability of deep learning with chemical sensor data for fault
diagnosis in complex chemical processes in this section. Subsequently, we present several details about
Sensors 2016, 16, 1695 4 of 22

the model preparation by presenting an overview of the sparse auto-encoder and the method of data
preprocess involved in the construction of DNN.

2.1. Advantage of Deep Learning with Chemical Sensor Data


The aim of machine learning is to learn knowledge from data for application through specific
algorithms. Mining the discriminative feature concealed in the data is a prerequisite, that is, an abstract
concept that is provided for classification or recognition. Feature selection is a major part of machine
learning that requires a considerable investment of resources in particular areas, including fault
diagnosis [9]. Deep learning is a method that can adaptively mine the feature from raw sensor data,
namely, transforming original sensor data into a highly abstract expression through the stack of some
nonlinear models. With adequate transformation, deep learning attempts to find the internal structure
of input and potential relationship between variables [33].
The majority of traditional models, such as SVM, MLP, and radial basis function (RBF),
are considered as shallow architectures that have less than three layers of computation units [23].
Recent theories have shown the difficulty of maintaining the representational ability in the case of
reducing the algorithm structure, that is, the networks with an inadequate depth of layers are deficient
in representing and providing limitations on several learning tasks [17,34]. Deep learning constructs
a deep model through the simulation of the learning process of the human brain. In the field of chemical
fault diagnosis, deep learning obtains potential information and the fault characteristics of original
input of chemical sensor data through multiple non-linear transformations and approximate non-linear
functions with a small error that is attributed to various diagnosis issues [9]. Especially, the application
of greedy layer-by-layer training algorithm solved the training problems of deep hierarchical structures
that may easily stuck into poor local optima [18]. This development improved the performance
of feature extraction and identification of health condition, which led to the applicability of deep
learning in chemical fault diagnosis. Chemical systems are commonly composed of different kinds
of sub-systems that involve multidimensional heterogeneous sensory signal and highly nonlinear
correlations between diagnosis sensor data and the results. Furthermore, some biochemical reactions
vary highly in various configuration of constitutes. Accordingly, the combination of deep learning and
chemical fault diagnosis is applicable and significant.
Most of the collecting chemical sensor data are unlabeled data that are ignored in traditional
supervised models like ANN and SVM. Furthermore, the number of quality sensor data is much less
than that of the processing data because of the disparity in sampling rate [17]. Therefore, only a small
number of chemical sensor data can considered, and the rest of the process samples containing
abundant information are ignored. Deep learning is semi-supervised learning approach that applied
these sensor data on unsupervised feature extraction. The chemical sensor data abandoned by previous
methods can be used for unsupervised pre-training to extract explicit latent variables. It is therefore
plausible that more diagnosis models that are significant may be attained with more sensor data used.
Accordingly, attempting to apply the deep learning technique to chemical fault diagnosis is worth
the effort.

2.2. Sparse Auto-Encoder


Auto-encoder is a symmetrical neural network that learns the feature of an input in
an unsupervised manner. The auto-encoder is composed of the input layer, hidden layer and output
layer [35]. The basic structure of an auto-encoder is shown in Figure 1. The hidden layer encodes the
input, whereas the output layer reconstructs the input by minimizing the reconstruction errors to
obtain the best expression of data.
From the measured chemical sensor data for diagnosis, the unlabeled sensor dataset can be
represented as X = { x1 , x2 , x3 ...xn } , xi <n , where n is the number of samples for each diagnosis
Sensors 2016, 16, 1695 5 of 22

input. In the phase of encoder network, the encoder transforms the input vector xi to a hidden
representation h by an encoding function denoted by f :

h( xi ) = f ( xi ) = f (W1 xi + b1 ) (1)

where f is a nonlinear active function and = {W1 , b1 }. W1 < Di Dh is the weight matrix of the
encoder and b1 < Dh is the bias. The number of units in the input the hidden layers are denoted by
Di and Dh , respectively.
In the phase of decoder network, a reconstruction function denoted by g 0 maps h back into the
input, namely producing a reconstruction z:

zi = g 0 (h( xi )) = W2 h( xi ) + b2 (2)

where 0 = {W2 , b2 }, and W2 < Dh Di and b2 < Di are the weight matrix of the decoder and
bias, respectively.
In the model learning perspective, the weights of encoder and decoder are learned simultaneously
during the reconstruction of the original sensor data as long as possible, that is, attempting to incur
the minimization of the loss function that denotes the discrepancy between the input signal x and
reconstruction z. For dataset ( x1 , y1 ), ..., ( x n , yn ) , where xi = yi in the auto-encoder, the loss function


can be represented as:

m l 1 s l l +1
s
1 n 1 (l ) 2
J (W, b) = [ (
n i =1 2
|| z i y i || 2
)] +
2 l =1 i =1 j =1
(Wij ) (3)

where the first part is the mean square variance, the second part is the regularization term that reduces
the range of weights and prevents over fitting, and is the weight of regularization term.
The sparse penalty term is added to the auto-encoder to obtain the complex nonlinear relationship
between features, such that the learned features are of the sparse constraint that captures the most
significant factor of the input patterns [36]. We will minimize the loss function with a sparse to achieve
the sparse representation, as follows:
s2
Jsparse (W, b) = J (W, b) + KL(|| j ) (4)
j =1

where is the weight of sparsity term; s2 is the number of units in the hidden layer; is a sparsity
parameter and is typically a small value close to zero; and j is the output average in the hidden
layer. KL(.) is the KullbackLeibler divergence (KL divergence), which denotes the relative entropy
between and j [37]. KL(.) is a convex function that possesses the property KL(.) = 0 if j = and
increases monotonically as j approaches . KL(.) acts as the sparsity constraint on the coding that is
expressed as:
1
KL(|| j ) = log + (1 )log (5)
j 1 j
The optimal parameters of the sparse auto-encoder W and bias b can be solved by minimizing
the loss function with sparsity constraint. The optimization process can be realized using the
back-propagation (BP) algorithm [38]. For an auto-encoder, the output of the hidden layer determines
the potential expression of the input sensor data, and limits the ability of representation significantly
because of the shallow architecture. Stacked auto-encoder attempts to mine the characteristics of the
input patterns in unsupervised manner through a stack of auto-encoders, which presents a superior
effect on feature learning.
Sensors 2016, 16, 1695
Sensors 2016, 16, 1695 6 of 22 6 of 22

Figure 1. Auto-encoder.
Figure 1. Auto-encoder.
2.3. Method of Data Preprocessing in Models
2.3. Method of Data Preprocessing in Models
Data preprocessing is the foundation for the preeminent performance of models, especially the
feature normalization
Data preprocessing thatfoundation
is the standardizes for
the the
range of values stored
preeminent in differentof
performance features
models,[7]. especially
In the the
field of chemical fault diagnosis, different indices of sensor data possess dimensions, whereas sensor
feature normalization that standardizes the range of values stored in different features [7]. In the
measurements may have different units that lead to diverse scales. For example, revolving speed has
field of chemical fault diagnosis, different indices of sensor data possess dimensions, whereas sensor
rpm as its unit, whereas displacement is measured in the unit of mm. Accordingly, the influence
measurements mayscales
of different havebetween
different units
process that lead
variables to normalization
using diverse scales. For be
should example,
avoided.revolving
We adopt thespeed has
rpm asZ-score
its unit, whereas displacement
normalization in this study. is measured in the unit of mm. Accordingly, the influence
Z-scorebetween
of different scales normalization is a data
process preprocessing
variables using approach based on the
normalization mean and
should be standard
avoided. deviation
We adopt the
of raw data. It has become a traditional approach in the field of fault diagnosis and health monitoring,
Z-score normalization in this study.
which take advantage of not knowing the maximum and minimum of attributes beforehand and the
Z-score normalization is a data preprocessing approach based on the mean and standard
significant effect of reducing the effect of noise [39]. The details of Z-score normalization are as follows.
deviation of Forrawthedata. It has
sensor become
dataset denoteda traditional
by matrix X approach inthe
<mn , m is thenumber
field ofoffault diagnosis
samples and health
and n is the
monitoring,
number which take advantage
of attributes. Z-score method of not knowing
normalizes X to athe maximummatrix
dimensionless and with
minimum
zero meanofand attributes
unit variance. The i th row of the processed data can be computed by:
beforehand and the significant effect of reducing the effect of noise [39]. The details of Z-score
normalization are as follows. X x
Xi = i , f or i {1, 2...m} (6)
For the sensor dataset denoted by matrix X mn , m is the number of samples and n is the
number where
of attributes. Z-scoreofmethod
Xi is the sample normalizes
sensor dataset X to
X, x is the a dimensionless
mean matrix
vector of the original with
data andzero mean and
is the
standard th
deviation vector correspondingly.
unit variance. The i row of the processed data can be computed by:
X Deep
3. A Fault Diagnosis Method with Active x Network
Xi* i
for i {1, 2...m} (6)
We propose a chemical fault diagnosismethod with active DNN in this section. DNN is utilized
in this method to excavate potential information on chemical sensor data and is combined with a novel
where Xiactive
is thelearning
sample of sensor
criterion dataset
to obtain X , x is the
fault diagnosis in anmean vector
effective of The
manner. the general
original data and
framework of is the
the proposed method is indicated in Figure 2. We present the description of the proposed method in
standard deviation vector correspondingly.
following part.

3. A Fault Diagnosis Method with Active Deep Network


We propose a chemical fault diagnosis method with active DNN in this section. DNN is utilized
in this method to excavate potential information on chemical sensor data and is combined with a
novel active learning criterion to obtain fault diagnosis in an effective manner. The general
framework of the proposed method is indicated in Figure 2. We present the description of the
proposed method in following part.
Sensors 2016,
Sensors 16, 1695
2016, 16, 1695 77of
of22
22

Figure 2. An illustration of proposed method for chemical fault diagnosis.


Figure 2. An illustration of proposed method for chemical fault diagnosis.

3.1. Unsupervised
3.1. UnsupervisedLearning
Learning Using
Using SDAE
SDAE
The auto-encoder
The auto-encoderwithout
withoutanyanyconstraint
constraint is prone
is prone to copy
to copy the the
input input tooutput
to the the output
directly. directly.
The
The model
model exhibits
exhibits poorpoor performance
performance withwith greater
greater reconstruction
reconstruction error,especially
error, especiallywhen
whenthethedifference
difference
between the
between the training
training and
and test
test data
data is
is predominant.
predominant. Denoising
Denoising auto-encoder
auto-encoder attempts
attempts toto make
make the
the
learned feature representation robust rather than simply repeating input
learned feature representation robust rather than simply repeating input by adding partial by adding partial corruption
to the inputtopattern,
corruption the inputwhich canwhich
pattern, be employed to train the
can be employed tostacked
train theauto-encoder to initialize
stacked auto-encoder the deep
to initialize
architecture [24]. SDAE achieves the highly abstract expression of the original
the deep architecture [24]. SDAE achieves the highly abstract expression of the original chemical chemical sensor data
through the stack of multiple denoising auto-encoders. The process can be reformulated
sensor data through the stack of multiple denoising auto-encoders. The process can be reformulated with more
detailmore
with as follows:
detail as follows:
First, random noise
First, random noise is
is added
added into
into the
the original
original input
input via
via aa random
randommap: map: xex
qq((xxe |xx)) and
and is
is
mapped into the hidden layer:
mapped into the hidden layer: h = f ( xei ) = f (W1 xei + b1 ) (7)
h f (x i )as:
The activation function can be represented f (W1x i b1 ) (7)

The activation function can be represented as


f ( a) = 1/(1 + exp( a)) (8)
f ( a) 1 / (1 exp(-a)) (8)
The reconstruction of the input pattern is represented in same manner as the spare auto-encoder:
The reconstruction of the input pattern is represented in same manner as the spare
auto-encoder: zi = g 0 (h) = W2 h + b2 (9)
zi g (h) W2 h b2 (9)
The reconstruction error is computed from the difference between xi and zi , and can be minimized
The reconstruction error is computed from the difference xi and zi , and can be
by solving the cost function (Equation (4)). The training epoch willbetween
be repeated until the value of the
minimized by solving the cost function (Equation (4)). The training epoch
cost function goes lower than a pre-set threshold, which is close to zero. By this time, will be repeated until the
the parameter
value of the cost function goes lower 0
than a pre-set threshold, which is close to zero.
of the denoising auto-encoder {, } can be obtained. Considering that the SDAE is constructed By this time, the
parameter
with a stackofofthe denoising
multilayer auto-encoder
denoising {} can
auto-encoders. Thebeweights
obtained.
andConsidering
bias matricesthat the SDAE
of the SDAEare is
constructed with a stack of multilayer denoising auto-encoders. The weights
expressed by Wall and ball , respectively. The parameters of SDAE can be computed by: and bias matrices of the
SDAE are expressed by Wall and ball , respectively. The parameters of SDAE can be computed by:
Wall , ball = arg min Jsparse (W, b) (10)
, i0
Wall , ball argimin J sparse (W , b)
ii
(10)
where i = 1...n, and n represent the number of layer about the constructed SDAE.
where i 1...n , and n represent the number of layer about the constructed SDAE.
The SDAE process is shown in Figure 3. The trained parameters of the auto-encoder are used to
initialize the parameters of the first hidden layer of the SDAE, and the first hidden layer is trained by
Sensors 2016, 16, 1695 8 of 22

the input pattern, and then the output of the first hidden layer is used as the input of the second
Sensors 2016, 16, 1695 8 of 22
hidden layer. The training steps are repeated in sequence until the final auto-encoder is trained and
the ideal data expression is completed. The DNN with SDAE pre-training can be described as: For
The SDAE process is shown in Figure 3. The trained parameters of the auto-encoder are used to
l {1,..., L 1} :
initialize the parameters of the first hidden layer of the SDAE, and the first hidden layer is trained
by the input pattern, and then the output of the first hidden layer is used as the input of the second
hl 1 f (in
hidden layer. The training steps are repeated
h bl 1until
Wlsequence
1 l
) the final auto-encoder is trained (11)
and the ideal data expression is completed. The DNN with SDAE pre-training can be described as:
where represents
h l For 1}: output vector of the l layer, Wl is the weight matrixes, bl
l {1, ..., L the is the
hl +1 = f (Wl +1 hl + bl +1 ) (11)
encoder bias vector, f ( x) is the activation function used in the encoder, and hL is the final output
where hl represents the output vector of the l layer, Wl is the weight matrixes, bl is the encoder bias
features. vector, f ( x ) is the activation function used in the encoder, and h L is the final output features.

ith hidden layer


.
.
Denoising auto-encoder .

x
Reconstruct 3th hidden layer
-ion error
fdecoder F(3)
h 2th hidden layer

fencoder
F(2)
qnoise
Noisy 1th hidden layer
Input x x
input
F(1)
x1 x2 x3 x4 xi xN Input layer

Figure 3. Stacked
Figure denoising
3. Stacked auto-encoder
denoising auto-encoder (SDAE).
(SDAE).

3.2. Supervised Learning Stage and Fine-Tuning


3.2. Supervised Learning Stage and Fine-Tuning
After the completion of the unsupervised pre-training phase through the greedy layer-by-layer
Afterapproach,
the completion
fine-tuningofis the unsupervised
utilized in the next steppre-training phase
of the DNN training bythrough the greedy
adding a Softmax layer-by-layer
classifier on
approach,top
fine-tuning
of the network is utilized
as shown in the next
in Figure step
4. For of the DNN
the random training
layer l {1, ..., L by
1}adding
in DNN, a theSoftmax
weights classifier
on top of ofthe l th layer areastheshown
the network same asin that of the SDAE,
Figure 4. Forand thetherandom
weights of top network
layer areL initialized
l {1,..., 1} in DNN, the
randomly. The fine-tuning process is a stage of supervised learning. The outputs of DNN can be
the lth as:layer are the same as that of the SDAE, and the weights of top network are
weights ofexpressed
initialized randomly. The fine-tuning process is(W
h L +1 = P a Lstage
+1 h L +of
bL+supervised
1) learning. The outputs (12) of DNN
can be expressed
where P(.as:
) is the predictive function. Finally, BP algorithm is applied to train the entire deep network
and can be described as:
h
L 1
P(WN h b
L 1 L (12)
= arg min L( F (Ci )L,T
1
i) (13)
i =1
where P(.) is the predictive function. Finally, BP algorithm is applied to train the entire deep
where are the model parameters {Wl , bl }, l {1, ..., L + 1}, L ( X, Y ) is the cost function of the entire
network and can be
network, anddescribed as:
F (C ) is the compound function of the DNN, when the T is the objective element:
N
arg min L( F (Ci ), Ti )
F (C
* ) = P L+1 ( f L (... f l (... f 1 (C )))) (14)
(13)
i 1
More information regarding the training process of the BP algorithm can be found in [38].

where are the model parameters {Wl , bl } , l {1,..., L+1} ,


L X , Y is the cost function of the
entire network, and F(C ) is the compound function of the DNN, when the T is the objective
element:

F(C) PL1 ( fL (... fl (... f1 (C)))) (14)

More information regarding the training process of the BP algorithm can be found in [38].
Sensors 2016, 16, 1695 9 of 22
Sensors 2016, 16, 1695 9 of 22

Figure4.4.Deep
Figure Deep neural
neural network (DNN)framework.
network (DNN) framework.

The
Thesupervised
supervisedlearning
learning stage
stage attempts
attempts to utilize
utilize aa small
small amount
amountofoflabeled
labeledsensor
sensordata
datatoto
reduce
reducethe
thetraining
trainingerror
errorfurther
further and
and improve
improve the classification performanceofofDNN.
classification performance DNN.The Themain
main
problemofof DNN
problem DNN training
training about
about poor
poor local
localoptima,
optima,whichwhich cancan
cause catastrophic
cause failure,
catastrophic is
failure,
successfully addressed by the combination of the greedy layer-by-layer pre-training and
is successfully addressed by the combination of the greedy layer-by-layer pre-training and supervised supervised
fine-tuning[34].
fine-tuning [34].This
Thissemi-supervised
semi-supervised learning
learning realizes
realizes the
the effective
effectiveapplication
applicationofofunlabeled
unlabeled sensor
sensor
data,
data, whichoccupies
which occupiesaalarge
largeproportion
proportionof of chemical
chemical big
big data.
data. The
Theoverall
overalltraining
trainingprocedure
procedureofofDNN
DNN
is shown in Figure
is shown in Figure 5. 5.

Figure5.
Figure 5. Procedure
Procedure of DNN
DNN training
trainingprocess.
process.
Sensors 2016, 16, 1695 10 of 22

3.3. Active Learning Procedure for DNN (AL-DNN)


The acquisition of labeled sensor data in the application of chemical fault diagnosis is expensive
and time consuming because of the high complexity of industrial systems and the variety of fault
types, which require a lot of resources and manpower, such as seeking for an expert mark [29,32].
In contrast, unlabeled sensor data can be effectively collected by deploying a sensor network [4,5].
Therefore, exploring a method to productively use the labeled sensor data in the field of chemical fault
diagnosis is particularly crucial. This study presents an active learning method, which is applied to
DNN based on SDAE for further fine-tuning. The proposed active learning method is a newly active
labeling method for the cost-effective selection of collecting sensor data to be labeled and improves the
model performance maximally.
The labeled sensor dataset to be diagnosed is denoted by L = { x1 , x2 , ..., xn } and the labels
by Yi = {y1 , y2 , ..., yn }, where xi is the sample and yi is the label of the ith sample. The unlabeled
sensor dataset can be expressed by U = { xn+1 , xn+2 , ..., xn+m }. We assume the independent identical
distribution of each sample and the label of each sample is decided by a conditional distribution P(y| x ).
First, DNN is trained by the unlabeled sensor dataset U and initial labeled sensor dataset L. Active

learning allows us to rank the final sensor dataset Dtest = xs1 , xs2 , ..., xsk U for expert labeling,
thus minimizing the number of training samples necessary to maintain the discrimination capabilities
as high as possible. The most ambiguous chemical sensor data are provided to the expert for labeling
and then used to retrain the classifier.
The posterior probabilities of associating a sample to a given class (yi | xi ) were selected as the
measure index of model uncertainty in our study. This work combines two different selection criteria
to select the most relevant samples during the each iteration:
Best vs. Second Best criterion (BvSB): the samples that having the lowest difference between the
two highest posterior probabilities.

x BvSB = arg min( (y1 | x ) (y2 | x )) (15)


x

where (y1 | x ) and (y2 | x ) are the two highest posterior probabilities corresponding to the
DNN outputs.
Lowest False Positive criterion (LFP): under the condition of the exception class by model outputs,
the samples that have the lowest difference in posterior probabilities between this exception class and
the corresponding normal class.

x LFP = arg min(max( (y F | x )) (y N | x )) (16)


x

where (y F | x ) is the posterior probabilities under the condition of the exception class by model
outputs and (y N | x ) is the posterior probabilities corresponding to the normal class.
k samples were selected for final fine-tuning by the integration of the two criteria that can be
described as:
Xk = x BvSB x LFP (17)

For the BvSB criterion, the difference between the two highest posterior probabilities is indicative
of the manner in which a sample is shown and the uncertainty of classifier. The classifier confidence
is low when the two highest values are close. In the second criterion, the selected samples are most
probably prone to false positives, which result in fault missing. It is worth noting that the running
with failure for a long time results in potent damage in industrial systems [3]. Thus, the damage of
false positive in chemical fault diagnosis is far greater than the other justifications, such as miscarriage.
The following Algorithm 1 provides the main steps of the proposed method called AL-DNN.
Sensors 2016, 16, 1695 11 of 22

Algorithm 1: AL-DNN
Input:
Labeled sensor dataset: L = { x1 , x2 , ..., xn }
Unlabeled sensor dataset: U = { xn+1 , xn+2 , ..., xn+m }
Parameters of DNN: the number of hidden layer n; learning rate
Iterations: ITER
Number of chosen samples at the each iteration: k
Output:
The result of fault diagnosis
Main step:
1. Data preprocessing: data normalization by (Equation (6))
2. Pre-training (unsupervised): Use all samples in U to train a SDAE layer-by-layer, compute the
weights of SDAE by minimizing the cost Function (Equation (10)): Wall , ball
3. Use Wall and ball to initialize DNN, Use labeled sensor dataset L to fine-tune DNN
4. Obtain the overall parameters of trained DNN:
5. Classify all unlabeled samples in U using the constructed DNN: f (U ) = Test(C, U )
6. Initialization: Dtest =
7. Active learning stage:
For iteration = 1: ITER
Compute the posterior probabilities of all unlabeled samples in U: post
Select the samples having the lowest difference between the two highest values in post by
(Equation (15)):
x BvSB = select_BvSB(U )
Select the samples that will most probably show the false positive using (Equation (16)):

x LFP = select_LFP(U )

Obtain k chosen samples Xk = { xs1 , xs2 ...xsk } from U by the combination of two criteria:

Xk = x BvSB x LFP

Ask an expert to label the sensor dataset Xk : label ( Xk )


Update the unlabeled sensor dataset: U U Xk
Augment the active training set: Dtest Dtest Xk
Update the weights of DNN by fine-tuning using Dtest
End

4. Experimental Study
This section presents the development of the chemical fault diagnosis based on the proposed
method in two diagnosis cases. The University of California Irvine(UCI) dataset is employed to
evaluate the capability and superiority of the proposed framework compared with other framework
in the viewpoint of general situation and we further validate the method in Tennessee Eastman (TE)
dataset that for a chemical process. Its performance is compared with other related methods.

4.1. Data Description

4.1.1. Case Study 1: UCI DatasetDataset for Sensorless Drive Diagnosis Dataset
The experimental data used here are from the UCI machine learning repository, and are provided
by a real sensor signal. We employed the proposed method to validate the superiority of active DNN
framework in model performance. In the dataset, features are extracted from electric current drive
Sensors 2016, 16, 1695 12 of 22

severalSensors
times using 12 different operating conditions, that is, by different speeds, load moments
2016, 16, 1695 12 of 22
and
load forces. The current signals are measured with a current probe and an oscilloscope on two
phases. Six classes of the current conditions are used in this study to test the performance of the
signals by empirical mode decomposition (EMD). The drive has intact and defective components.
proposed method. Type A corresponds to the normal condition whereas B-F corresponds to the
This results in 11 different classes with different conditions. Each condition measured several times
different fault types caused by defective components. A total of 5319 samples for each health
using 12 different operating conditions, that is, by different speeds, load moments and load forces.
condition were signals
The current used. are
Twenty thousand
measured samples
with a current probewere
and anselected asonunlabeled
oscilloscope two phases.data and of
Six classes used in
pre-training, whereas
the current 500are
conditions samples were
used in this selected
study to test as
the initial labeled
performance data
of the for fine-tuning.
proposed method. Type TheA other
samples were selected
corresponds to theas test data.
normal condition whereas B-F corresponds to the different fault types caused by
defective components. A total of 5319 samples for each health condition were used. Twenty thousand
samples
4.1.2. Case were2:selected
Study as unlabeled datafor
TE DatasetDataset andTennessee
used in pre-training,
Eastmanwhereas
Process500 samples were selected
(TEP)
as initial labeled data for fine-tuning. The other samples were selected as test data.
TEP is a benchmark simulation model that tests the fault diagnosis approaches of process
control4.1.2. Case
in real Study 2: TE
chemical DatasetDataset
processes [40]. TEPfor hasTennessee Eastman
five major units:Process (TEP)a compressor, a separator,
a reactor,
a stripper and
TEP aiscondenser.
a benchmarkTEP involves
simulation modellarge
thatamount
tests the of chemical
fault diagnosissensor data of
approaches that can provide
process control fault
in real
diagnosis. chemical
Many processes
variables [40]. strong
have TEP hascorrelations
five major units:anda reactor,
coupling a compressor,
between aeach separator,
othera (including
stripper 41
and a condenser. TEP involves large amount of chemical sensor data that can provide fault diagnosis.
measurement variables and 11 effective control variables). Therefore, TEP is a highly complex
Many variables have strong correlations and coupling between each other (including 41 measurement
nonlinear process involving multidimensional heterogeneous sensory signal and highly nonlinear
variables and 11 effective control variables). Therefore, TEP is a highly complex nonlinear process
correlation between
involving processing heterogeneous
multidimensional variables. The flow diagram
sensory signal andof the process
highly nonlinear iscorrelation
shown inbetween
Figure 6.
The experimental
processing dataset
variables. The flowis generated
diagram of theby process
the TEP simulation
is shown model,
in Figure 6. and 21 types of faults can
be simulated.The The simulation
experimental times
dataset of the training
is generated andsimulation
by the TEP the test sets are and
model, 24 h21and 48ofh,faults
types and canthe faults
be simulated. The simulation times of the training and the test sets are 24 h and
appear after 1 h and 8 h, respectively. In this study, six classes of known faults and an unknown fault 48 h, and the faults
(IDV17appear
of theafter 1 h and are
dataset) 8 h, respectively.
used for the In this study, six classes
experiment, labeledof known faults and an unknown
17, respectively. Thus, the faultdataset
(IDV17 of the dataset) are used for the experiment, labeled 17, respectively. Thus, the dataset includes
includes 8820 samples; 100 samples were selected as the initial labeled data for fine-tuning and 2000
8820 samples; 100 samples were selected as the initial labeled data for fine-tuning and 2000 samples for
samples for testing. The other samples were selected as unlabeled data for pre-training. The specific
testing. The other samples were selected as unlabeled data for pre-training. The specific description of
description
fault inofthe
fault in the experiment
experiment is indicated in isTable
indicated
1. in Table 1.

Figure
Figure 6. Flow
6. Flow diagramofofthe
diagram the Tennessee
Tennessee Eastman
Eastmanprocess (TEP).
process (TEP).

Table 1. Description of fault.

Description Type
A/C feed ratio, B composition constant 1
B composition, A/C ration constant 2
Reactor cooling water inlet temperature 3
Condenser cooling water inlet temperature 4
A feed loss 5
Sensors 2016, 16, 1695 13 of 22

Table 1. Description of fault.

Description Type
A/C feed ratio, B composition constant 1
B composition, A/C ration constant 2
Reactor cooling water inlet temperature 3
Condenser cooling water inlet temperature 4
A feed loss 5
C header pressure loss-reduced availability 6
Unknown fault 7

4.2. Experiment Setup


In this study, DNN has four layers that are designed using cross validation. The unit number
of the input layer is determined by the dimension of the diagnosis input, whereas the unit number
of the output layer is determined by the number of fault states. For the two experimental datasets,
the numbers of units in the hidden layer are set to {200,100} and {100,50}, respectively. The activation
function of the DNN is sigmoid. Sparsity penalty was adopted in the DNN for the two datasets and is
an effective method to prevent over fitting as well as improve the generalization ability of the DNN.
The details of the DNN parameters for the two datasets are shown in Table 2. In addition, the training
needs to be repeated several times for stable and reliable results because of the inevitable randomness
of the neural network. In the experiment, the results are obtained after training 10 times repeatedly.

Table 2. Parameters of DNN.

Parameter DNN for Case Study 1 DNN for Case Study 2


Learning rate 0.1 0.05
Mini-batch 100 50
Momentum 0.9 0.9
Number of epoch 100 100
Coefficient of sparsity penalty 0.05 0.05
Noise level 0.5 0.1

4.3. Result and Analysis in Different Diagnosis Cases

4.3.1. Result on UCI Dataset


Shallow architecture accounted for most parts in the current fault diagnosis for chemical industry.
Two widely used algorithms in fault diagnosis that have shallow architectures, namely, neural network
with single hidden layer (SNN) and support vector machine (SVM) are employed with same active
learning criterion for comparison to verify the superiority of proposed approach. We try to validate the
merit about the deep architecture and the employment of deep learning in DNN, so we take the above
shallow architecture to the same dataset. Meanwhile, the back propagation neural network (BPNN)
that shares the same architecture and trained by the same parameters with DNN is included in the
comparison to further demonstrate the performance of deep learning in the construction of DNN,
since BPNN lacks feature learning in the pre-training stage relative to the proposed method. We call the
above methods active learning for single neural network (AL-SNN), active learning for support vector
machine (AL-SVM), and active learning for back propagation neural network (AL-BPNN), respectively.
Figure 7 shows the diagnostic accuracies using different methods in the same iteration number of
active learning, that is, the same number of labeled sensor data for training. Ten trials are carried out
repeatedly, and we clearly notice that all the accuracies using the proposed method are nearly stable at
95%, while the SNN-, SVM- and BPNN-based methods have lower diagnosis accuracies. It presents
the ability to obtain more abundant information using the proposed method than the approach with
only shallow architecture. This advantage becomes more evident in the complex chemical industry,
which has numerous variables and a highly nonlinear relationship.
Sensors 2016, 16, 1695 14 of 22
Sensors 2016, 16, 1695 14 of 22
Sensors 2016, 16, 1695 14 of 22

100
100

95
95

90

Accuracy(%)
90

Accuracy(%)
85
85

80 AL-DNN
80 AL-DNN
AL-BPNN
AL-BPNN
AL-SNN
AL-SNN
AL-SVM
75 AL-SVM
751 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
Trial number
Trial number

Figure
Figure7.7. Correct
7.Correct
Correct diagnosis
diagnosis rate of
of active learning for deep neural
neural network (AL-DNN), active
Figure diagnosis raterate active
of active learning
learning forneural
for deep deep network network
(AL-DNN), (AL-DNN), active
active learning
learning
learning for
for back
back propagation
propagation neural
neural network
network (AL-BPNN),
(AL-BPNN), active
active learning
learning for
for single
single neural
neural network
network
for back propagation neural network (AL-BPNN), active learning for single neural network (AL-SNN),
(AL-SNN),
(AL-SNN), and
and active
active learning for
for support
learningvectorsupport vector
vector machine
machine (AL-SVM).
(AL-SVM).
and active learning for support machine (AL-SVM).

It is worth mentioning that the performance in the


the tenth trial
trial using BPNN-based method was
good ItItisisworth
while
worth mentioning
mentioning
performing
that
that thethe
unsatisfactorily
performance
performance
in presenting
in tenth
in the tenth
largetrial using
result
using BPNN-based
BPNN-based
fluctuations in method method
different was was
good
trails. In
good
while while
performing performing unsatisfactorily
unsatisfactorily in presenting
incatastrophically
presenting largeas large
result result fluctuations
fluctuations in
in different different
trails. trails.
In trials, In
trials,
trials, BPNN-based
BPNN-based method
method failed
failed catastrophically as it
it is
is a
a deep
deep architecture
architecture for
for random
random initial
initial
BPNN-based
parameters. methodthe
Taking failed catastrophically
seventh and as it is aexamples,
deep architecture forofrandom initial the
parameters.
parameters.
Taking the Takingand
seventh theninth
seventh andasninth
trials ninth trials
trials as
examples, as
theexamples,
errors of
the
the
these
errors
errors
trialsofinthese
these
the
trials
trials in
in the
different
different
different
number of
number
number of
of epochs
epochs are
are indicated
indicated in
in Figure
Figure 8a,b.
8a,b. The
The results
results show
show that
that the
the proposed
proposed method
method achieve
achieve
epochs are
satisfactory indicated
results in Figure 8a,b. The results show that the proposed method achieve satisfactory
satisfactory
results after
results after
approximately
after approximately
approximately
100 iterations
100
100 iterations and
and updated
iterationssmoothly
and updated
updated smoothly into
into significant
smoothlysolutions
intoasignificant
significant
in theIn
solutions
solutions in
in the
the two
two trials,
trials, whereas
whereas the
the BPNN-based
BPNN-based method
method presents
presents a slower
slower convergence
convergence rate.
rate. In
two
the trials, whereas the BPNN-based method presents a slower convergence rate. In the seventh trial,
the seventh
seventh trial,trial, the
the BPNN-based
BPNN-based method method fallsfalls into
into the
the local
local optimal
optimal solution
solution where
where thethe diagnosis
diagnosis
the BPNN-based
error method fallsNeuralinto the local optimal solutionarchitecture
where the diagnosis error is approximately
error is is approximately
approximately 0.3. 0.3. Neural networks
networks with with deep
deep architecture have have aa significant
significant capability
capability of of
0.3. Neural
distinguishing networks with deep architecture have a significant capability of distinguishing the highly
distinguishing the highly complex characteristics of industry system, but not in a stable manner
the highly complex characteristics of industry system, but not in a stable manner
complex
because characteristics of industry system, but not in a stable manner because of random factors
because of of random
random factors
factors in in model
model construction
construction that that cause
cause the the training
training to to fail
fail catastrophically.
catastrophically. The The
inresult
model construction
indicates that thethat cause the
proposed training
method is to failrobust
more catastrophically.
and stable, The
which result indicates
overcome the that the
training
result indicates that the proposed method is more robust and stable, which overcome the training
proposed
problem method and is moremore robust and stable, which overcome the training problem of DNN and is
problem of of DNN
DNN and is is more effective
effective inin achieving
achieving better
better diagnosis
diagnosis results
results inin chemical
chemical occasions.
occasions.
more effective in achieving better diagnosis results in chemical occasions.

0.9 0.9
0.9 AL-DNN 0.9 AL-DNN
AL-DNN AL-DNN
AL-BPNN 0.8 AL-BPNN
0.8 AL-BPNN AL-BPNN
0.8 0.8

0.7 0.7
0.7 0.7

0.6 0.6
0.6 0.6

0.5 0.5
Error

0.5
Error

0.5
Error

Error

0.4 0.4
0.4 0.4

0.3 0.3
0.3 0.3

0.2 0.2
0.2 0.2

0.1 0.1
0.1 0.1

0 0
00 50 100 150 200 250 300 00 50 100 150 200 250 300
0 50 100 150
Training epochs 200 250 300 0 50 100 150
Training epochs 200 250 300
Training epochs Training epochs

((aa)) ((b
b))
Figure
Figure8.8.
Figure Curves
8.Curves
Curves of of error
error of
of the
of theproposed
the proposedand
proposed andAL-BPNN
and AL-BPNNmethods:
AL-BPNN methods:
methods: (a) the
(a)(a) seventh
thethe trial
seventh
seventh of
of dataset;
trial
trial and
of dataset;
dataset; and
(b)
and the
(b) (b) ninth
thethe trial
ninth
ninth of
trialtrial dataset.
of dataset.
of dataset.

We include
Weinclude
includethethe results
theresults obtained
resultsobtained
obtainedby by random
byrandom selection
randomselection and
selectionand entropy
andentropy selection
entropyselection criteria
criteriainin
selectioncriteria DNN
inDNN
DNNfor for
for
We
comparison,
comparison,whichwhich are referred
which are referred to to as DNN-random and DNN-entropy, respectively, to highlight the
comparison, to as
asDNN-random
DNN-randomand andDNN-entropy,
DNN-entropy, respectively,
respectively,to to
highlight the
highlight
benefit
benefit of
of using
using the proposed
thethe
proposed active
active learning
learning criterion.
criterion. The number
TheThe
number of
of samples
samples for fine-tuning
for for
fine-tuning is
is 500
500
the benefit of using proposed active learning criterion. number of samples fine-tuning is
initially. We update the weights of DNN by fine-tuning 50 new samples obtained
initially. We update the weights of DNN by fine-tuning 50 new samples obtained by different by different
selection
selection criteria.
criteria. Figure
Figure 99 shows
shows the
the behavior
behavior of
of diagnosis
diagnosis accuracy
accuracy during
during the
the each
each iteration.
iteration. The
The
Sensors 2016, 16, 1695 15 of 22
Sensors 2016, 16, 1695 15 of 22
Sensors 2016, 16, 1695 15 of 22
500 initially.method
proposed We update and theDNN-entropy
weights of DNN by fine-tuning
clearly provide50better
new samples
results obtained by different selection
than DNN-random, which
proposed
criteria. method
Figure 9 and the
shows DNN-entropy
behavior of clearly provide
diagnosis accuracy better
duringresults
the thaniteration.
each DNN-random, which
The proposed
ignores active learning, and the diagnosis performance improves by approximately 2%. This
ignores and
method active learning, and the provide
diagnosis performance improves by approximately 2%.active
This
improvementDNN-entropy
implies the clearly effectiveness better
and results than
significance ofDNN-random,
active learning which ignores
in chemical fault
improvement
learning, implies the performance
effectivenessimproves
and significance of active learning in chemical fault
diagnosis.and the
It is diagnosis
worth noting that the proposed method by approximately
outperforms2%. This improvement
DNN-entropy in withimplies
minor
diagnosis.
the It is worth
effectiveness and noting that of
significance theactive
proposed method
learning in outperforms
chemical fault DNN-entropy
diagnosis. It isin with noting
worth minor
superiority. It is a presentation that the proposed method provides better applicability and
superiority.
that the proposedIt is method
a presentation
outperforms that DNN-entropy
the proposed inmethod with minorprovides better Itapplicability
superiority. and
is a presentation
adaptability with less sensor data than other active learning criteria in the chemical industry.
adaptability
that the proposedwith method
less sensor data than other
provides active learning criteria in theless
chemical industry.
Furthermore, the comparison of the better
number applicability and adaptability
of false positive point of three with
methods sensor data than
is indicated in
Furthermore,
other active the
learningcomparison
criteria inof the
the number
chemical of false
industry. positive point
Furthermore, of three
the methods
comparison is
of indicated
the number in
Figure 10. The result shows that the proposed method significantly improves the false positive to a
Figure
of false 10. The result
positive shows that the proposed method significantly
10. Theimproves the that
falsethe
positive to a
great extent andpoint
presentsof three methods
a better effect is indicated
with in Figure
the increase of iterations. result
This shows
excellent proposed
performance in
great extent
method and presents
significantly a better
improves effect
falsewith the increase of iterations. This excellent performance in
fault detection is always a keytheproblem positive to a great
in system extent and
maintenance andpresents a better
monitoring. effect
Even with
though
faultincrease
the detection is always This
of isiterations. a key problem
excellent in system in
performance maintenance
fault detectionandismonitoring.
always aomission
key Even though
problem in
DNN-entropy effective in improving model accuracy, the effect on the enhancing of fault
DNN-entropy
system is
maintenance effective
and in improving
monitoring. model
Even accuracy,
though the effect
DNN-entropy on the
is enhancing
effective in omission
improving of fault
model
is limited. In industrial systems, a missing fault means the fault operation of systems for a long time,
is limited.the
accuracy, In effect
industrial
on is systems,
the enhancing a missing
omissionfault means
fault isthe fault operation of systems for a long time,
where the damage greater than other of diagnosis limited.
states, Inespecially
industrial systems,
in chemicala missing fault
processes.
where
means the damage
the faultthe operationis greater
of systems than other diagnosis states, especially in chemical processes.
Accordingly, proposed method for hasa along time, where
significant the damage
appeal is greater
and potential for than
faultother diagnosis
diagnosis and
Accordingly,
states, especiallythe inproposed
chemical method
processes. hasAccordingly,
a significant the appeal and potential
proposed method for afault
has diagnosis
significant and
appeal
system monitoring in the chemical industry.
system
and monitoring
potential for faultin the chemical
diagnosis and industry.
system monitoring in the chemical industry.
100
100
99
99
98
98
97
accuracy(%)

97
accuracy(%)

96
96
95
95
Average

94
Average

94
93
93
92
92 The proposed method
91 The proposed method
DNN-entropy
91 DNN-entropy
DNN-random
90 DNN-random
90 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4
Active 5 6
iterations 7 8 9 10
Active iterations

Figure 9.9.Classification
Figure9. accuracy
Classificationaccuracy during
accuracyduring each
during each iteration.
each iteration.
Figure Classification iteration.
200
200 The proposed method
The proposed method
180 DNN-entropy
180 DNN-entropy
DNN-random
160 DNN-random
160
point

140
point

140
120
positive

120
positive

100
100
False

80
False

80
60
60
40
40
20
20 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4
Active 5 6
iterations 7 8 9 10
Active iterations
Figure 10. The number of false-positive point during each iteration.
Figure 10. The number of false-positive point during each iteration.
Figure 10. The number of false-positive point during each iteration.
Figure
Figure 1111shows
shows thethe
ROC (receiver
ROC operating
(receiver characteristic
operating curve) curve)
characteristic curve comparison on different
curve comparison on
Figure 11 shows the ROC (receiver operating characteristic curve) curve comparison on
labeled
differentsensor data
labeled to further
sensor indicate
data to furtherthe superiority
indicate of the proposed
the superiority of the method
proposedinmethod
fault diagnosis.
in fault
different labeled sensor data to further indicate the superiority of the proposed method in fault
The ROC curve
diagnosis. reflects
The ROC thereflects
curve relationship between the
the relationship true positive
between the trueand false and
positive positive
falseatpositive
differentat
diagnosis. The ROC curve reflects the relationship between the true positive and false positive at
different thresholds. The ROC curve shows that the proposed method has a larger area under the
different thresholds. The ROC curve shows that the proposed method has a larger area under the
curve than DNN-random and DNN-entropy. Consequently, the proposed method provides the best
curve than DNN-random and DNN-entropy. Consequently, the proposed method provides the best
Sensors 2016, 16, 1695 16 of 22
Sensors 2016, 16, 1695 16 of 22
Sensors 2016, 16, 1695 16 of 22
performance
thresholds. through
The a novel
ROC curve active
shows learning
that criterionmethod
the proposed for fault diagnosis.
has The point
a larger area underunder this than
the curve ROC
performance
DNN-random through
curve is the optimal a novel active
threshold
and DNN-entropy. with learning
the criterion
least error,
Consequently, forhas
thewhich fault
proposed thediagnosis. The point
least provides
method numbers under
of false
the best this ROC
positives and
performance
curve
false is the optimal
negatives. threshold with the least error, which has the least numbers of false positives
through a novel active learning criterion for fault diagnosis. The point under this ROC curve is the and
false negatives.
optimal threshold with the least error, which has the least numbers of false positives and false negatives.
1
The proposed method
1
0.9 DNN-entropy
The proposed method
DNN-random
0.9 DNN-entropy
0.8
DNN-random
0.8
0.7
RateRate
0.7
0.6
Positive

0.6
0.5
Positive

0.5
0.4
TrueTrue

0.4
0.3

0.3
0.2

0.2
0.1

0.1
0
0 0.2 0.4 0.6 0.8 1
0 False Positive Rate
0 0.2 0.4 0.6 0.8 1
False Positive Rate

Figure 11. ROC (receiver operating characteristic curve) results comparison on different method.
Figure 11. ROC (receiver operating characteristic curve) results comparison on different method.
Figure 11. ROC (receiver operating characteristic curve) results comparison on different method.
4.3.2. Result on TE Dataset
4.3.2.
4.3.2. Result
Result on
on TE
TE Dataset
Dataset
We repeat the above experiments on the TE dataset to further assess the generalizability and
We
We repeat
repeat the above
above experiments on the
the TE dataset
dataset to to further assess thethe generalizability and
applicability of the
proposed experiments
method. It isonworth TEmentioning further
that theassess
methods generalizability
for comparison and are
applicability
applicability of
of proposed
proposed method.
method. It
It is
is worth
worth mentioning
mentioning that
that the methods
theresult
methods for comparison are
identical to those in the above experiment. The detailed classification usingfor comparison
different are
diagnosis
identical
identical to
to those
those in
in the
the above
abovedataexperiment.
experiment. The detailed
The detailed classification
classification result
result using
using different
different diagnosis
diagnosis
approaches with 100 labeled is demonstrated in Figure 12. The proposed method performs best
approaches
approaches with
with 100
100 labeled
labeled data
data is
is demonstrated
demonstrated in
in Figure
Figure 12.
12. The
The proposed
proposed method
method performs
performs best
best
in ten trials nearly 97% average accuracy in fault recognition compared with AL-SNN (88.56%),
in
in ten
ten trials
trials with
with nearly
nearly 97%
97% average
average accuracy
accuracy in fault
in fault recognition
recognition compared
compared with with AL-SNN (88.56%),
(88.56%),
AL-SVM (91.6%), and AL-BPNN (88.9%), indicating that the proposed method canAL-SNN
effectively obtain
AL-SVM
AL-SVM (91.6%), and AL-BPNN (88.9%), indicating that the proposed method can effectively obtain
potential (91.6%), and using
information AL-BPNNdeep(88.9%),
architectureindicating that the
and feature proposed
learning method
as well can effectively
as detect unknown obtain
faults.
potential
potential information
information using deep
using deep architecture
architecture and feature
and feature learning
learning as well
as well as detect unknown faults.
The BPNN-based method falls into local optimum in the eighth trial, asdemonstrating
detect unknown thefaults.
poor
The
The BPNN-based
BPNN-based method
method falls intointo
local optimum in the eighth trial, demonstrating the poorthe
stability
stability and convergence offalls
BPNN local
caused optimum
by the lack in the eighth
of unsupervised trial, demonstrating
pre-training. poor
and convergence
stability of BPNNof
and convergence caused
BPNNbycaused the lack byof
theunsupervised pre-training.
lack of unsupervised pre-training.
100

100
95

95
90

90
85
Accuracy(%)

85
Accuracy(%)

80

80
75

75
70
AL-DNN
70
AL-BPNN
AL-DNN
65
AL-SNN
AL-BPNN
65 AL-SVM
AL-SNN
60
1 2 3 4 5 6 7 8 9
AL-SVM 10
60 Trial number
1 2 3 4 5 6 7 8 9 10
Trial number
Figure 12. Correct diagnosis rate of AL-DNN, AL-BPNN, AL-SNN, AL-SVM on TE dataset.
Figure 12. Correct diagnosis rate of AL-DNN, AL-BPNN, AL-SNN, AL-SVM on TE dataset.
Figure 12. Correct diagnosis rate of AL-DNN, AL-BPNN, AL-SNN, AL-SVM on TE dataset.
Figure
Figure 13
13 shows
shows the
the results
results with
with different
different selection
selection criterion
criterion for
for DNN
DNN fine-tuning.
fine-tuning. The
The number
number
of
of samples for fine-tuning is 60 initially. We update the weights of DNN by fine-tuning 20
samples
Figure for
13 fine-tuning
shows the is 60
results initially.
with We
different update the
selection weights
criterion forof DNN
DNN by fine-tuning
fine-tuning. The 20 new
number
new
of samples
samples for fine-tuning
obtained is 60 selection
by different initially. criteria.
We update Thethe weights
result of DNN
presents that by
thefine-tuning 20 new
proposed method
samples obtained by different selection criteria. The result presents that the proposed
provides better performance than DNN-entropy and DNN-random, although they are both based method
provides better performance than DNN-entropy and DNN-random, although they are both based
Sensors 2016, 16, 1695 17 of 22

Sensors 2016, 16, 1695 17 of 22


Sensors 2016,
samples 16, 1695 by different selection criteria.
obtained The result presents that the proposed method provides 17 of 22
on theperformance
better DNN method. than The accuraciesand
DNN-entropy of DNN-random,
the three methodsalthoughreachthey99.76%,
are both99.48%,
based onand the98.2%
DNN
on the DNN
respectively method.
after 10 The accuracies
iterations. The of the
performancethree methods
improved reach
by 99.76%,
approximately
method. The accuracies of the three methods reach 99.76%, 99.48%, and 98.2% respectively after 99.48%,
1.5% andwith 98.2%
the
respectively
employment after
10 iterations.ofThe 10
active iterations.
learning. As
performance The performance improved
indicatedbyinapproximately
improved by
Figure 14, the proposed approximately
1.5% withmethod 1.5%
performs best
the employment with the
in the
of active
employment
inhibition
learning. As of
of falseactive learning.
positive
indicated As indicated
that leads
in Figure to the
14, the in Figure
running
proposed 14, the proposed
with failure
method performs method
in chemical
best in the performs
systems. best in
It isofworth
inhibition the
false
inhibition
illustrating
positive ofthat
that false
leads positive
some
to the that leads
fluctuations
running with to the running
occurred
failure on withsystems.
the effect
in chemical failure in
of improving chemical
It is worthfalse systems.
positive Itusing
illustrating is worth
that the
some
illustrating
proposed
fluctuations that
method some
(in the
occurred fluctuations
on thesecond occurred
iteration,
effect on the
the false
of improving effect
positive
false of
positive improving
rateusing
increase false positive
suddenly,method
the proposed using
which is(in the
most
the
proposed
probably
second methodby
caused
iteration, (in
the thethe
false second
randomness
positiveiteration,
of the
rate the false
neural
increase positivewhich
network).
suddenly, rateimprovement
The increase
is most suddenly,
becomes
probably which
stable
caused isbymost
with
the
probably
the caused
increase in by the
iterations, randomness
and of
outperforming the neural
the network).
other two The
methods improvement
overall.
randomness of the neural network). The improvement becomes stable with the increase in iterations, becomes stable with
the increase
and in iterations,
outperforming the otherandtwooutperforming the other two methods overall.
methods overall.
100
100
99
99
98
98
97
accuracy(%)

97
accuracy(%)

96
96
95
95
Average

94
Average

94
93
93
92
92 The proposed method
91 DNN-entropy
The proposed method
91 DNN-random
DNN-entropy
90
0 1 2 3 4 5 6 7 8
DNN-random 9 10
90
0 1 2 3 Active
4 iterations
5 6 7 8 9 10
Active iterations

Figure
Figure 13.
13. Classification
Classification accuracy
accuracy obtained
obtained on
on TE
TE dataset.
dataset.
Figure 13. Classification accuracy obtained on TE dataset.
18
The proposed method
18
16 DNN-entropy
The proposed method
16 DNN-random
DNN-entropy
14 DNN-random
14
point

12
point

12
10
positive

10
positive

8
8
False

6
False

6
4
4
2
2
0
0 1 2 3 4 5 6 7 8 9 10
0
0 1 2 3 Active
4 iterations
5 6 7 8 9 10
Active iterations
Figure 14. Number of false-positive point obtained on TE dataset.
Figure 14. Number of false-positive point obtained on TE dataset.
Figure 14. Number of false-positive point obtained on TE dataset.
Table 33 presents
Table presentsthe thedetails
detailsofofthe
thefault
fault diagnosis
diagnosis ononthethe
TETE dataset
dataset afterafter three
three iterations
iterations withwiththe
the Table 3
different presents
methods the details
mentioned of the fault
above. Itdiagnosis
is on
indicated the
that TE dataset
the after
accuracies three
in
different methods mentioned above. It is indicated that the accuracies in different fault types vary iterations
different with
fault the
types
different
vary
somewhat, methods
somewhat,
whereas mentioned
whereas above.
the proposed
the proposed It method
method is indicated that
performs
performs the accuracies
steadily
steadily and isand isinsuperior
superior differenttofault
to other othertypes vary
methods.
methods. The
somewhat,
The proposed
proposed whereas
method
method the
also proposed
also performs
performs method
better
better inperforms
unknown
in unknown steadily
typetype
of of andwhich
fault,
fault, iswhich
superior toseen
canbebe
can other
seen ininmethods.
thediagnosis
the The
diagnosis
proposed
result of method
type 7. also
SVM- performs
and better
SNN-based in unknown
methods showtype of fault,
limitation inwhich
the can be
recognition seen
result of type 7. SVM- and SNN-based methods show limitation in the recognition of types 2, 3 and 5.5. of in the
types diagnosis
2, 3 and
result
The of type 7. SVM-
information and SNN-based
extraction capability ofmethods show limitation
the proposed in the recognition
method facing the complexofchemical
types 2, 3system
and 5.
The information
demonstrated in extraction
case studies capability of the proposed
that possessing method facing
multidimensional the complex
heterogeneous chemical
sensory system
signal and
demonstrated in case studies that possessing multidimensional heterogeneous
highly nonlinear relationships between diagnosis inputs and the results is verified. sensory signal and
highly nonlinear relationships between diagnosis inputs and the results is verified.
Sensors 2016, 16, 1695 18 of 22

The information extraction capability of the proposed method facing the complex chemical system
demonstrated in case studies that possessing multidimensional heterogeneous sensory signal and
highly nonlinear relationships between diagnosis inputs and the results is verified.

Table 3. Classification results obtained on the TE dataset.

Fault Type Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Type 7


The proposed method 99.09% 98.69% 99.18% 94.78% 100% 100% 96.87%
DNN-entropy 100% 99.26% 99.71% 92.75% 99.69% 98.13% 96.24%
DNN-random 97.98% 99.80% 96.34% 95.32% 100% 96.11% 95.98%
AL-BPNN 95.06% 82.60% 89.86% 93.03% 94.24% 97.57% 91.03%
AL-SNN 97.26% 77.56% 75.36% 87.10% 84.37% 94.56% 89.26%
AL-SVM 95.14% 86.94% 90.53% 90.02% 91.57% 95.67% 92.31%

4.4. Discussion
The main contribution of this study is the construction of DNN with active learning to realize
a reliable and effective fault diagnosis with chemical sensor data. The proposed method is a novel
idea that utilizes deep learning with big chemical sensor data for both fault feature extraction and
intelligent fault diagnosis in chemical processes. An active learning criterion achieves the selection of
the most valuable sensor data for inducing the DNN model which is a novel active learning method
compared with available active learning criterion for the cost-effective selection of collecting sensor
data to be labeled, and improves the model performance maximally. We validate the method on
two well-known industrial datasets. The results obtained show that the proposed method provides
significant improvements in diagnosis accuracy and false positive compared with the existing methods.
The proposed method is compared with state-of-the-art methods in two perspectives, namely,
in that of the model and criterion of sample selection. For the diagnosis models, we have taken
two widely used algorithms which have shallow architectures for comparison, namely, a general
ANN-based and SVM-based methods, and the two method that have shallow architecture have been
employed in TE dataset that can be found in [10,11,41]. Meanwhile, BPNN, which shares the same
architecture and trained by the same parameters as DNN, is also added to the comparisons to further
demonstrate the performance of deep learning in the construction of DNN. The results above show that
the proposed method obtains significantly better diagnosis accuracies and stability than the methods
with shallow architectures and BPNN-based method. For sample selection criterion, we include
the methods called random selection and entropy selection criteria in the DNN for comparison.
Random selection implies to abandon the active learning that only DNN employed, while entropy
selection criterion means the development of entropy criterion with DNN, and the employment of
entropy criterion can be found in [29]. The results obtained in two datasets demonstrate that the
proposed method is robust and efficient during the iterative labeling process and performs better
in terms of diagnosis accuracies and the improvement of false positive. Therefore, deep learning
technique is particularly suitable for chemical fault diagnosis and not only in the field of pattern
recognition. The combination of the DNN with active learning strengthens the efficiency and feasibility
of fault diagnosis in chemical processes.
The number of hidden layer in DNN has a significant effect on performance, owing to the
feature learning in hidden layer works is the basis of DNN [9]. We repeat the above experiments
with the different configurations of hidden layers to assess the capability of proposed method further.
We consider the diagnosis result with the hidden layer settings to 1, 2, 3 and 4 in two datasets. Table 4
shows the details of each configuration. The results correspond to different configurations of the
hidden layer, as indicated in Figure 15a,b. It is worth noting that the active learning criteria here are the
same and the iterations signify the increase of labeled sensor data for fine-tuning. The configurations
with two hidden layers perform best in terms of diagnosis accuracy. In contrast, the scenarios based
on one hidden layer have limitations in terms of information extracted that lead to less accurate
Sensors 2016, 16, 1695 19 of 22

results. The increase in the number of hidden layer results in poorer performances in these case studies,
Sensors 2016,
suggesting the16,balance
1695 between the sensor data and complexity of architectures. 19 of 22

Table 4. The
Table 4. Theparameters
parameters of hidden
of hidden layer.layer.

Configuration Case Study 1 Case Study 2


Configuration Case Study 1 Case Study 2
The unit number of hidden layer {100} {100}
The number
The unit unit number of hidden
of hidden layerlayer {100}
{200,100} {100} {100,50}
The unit number of hidden layer {200,100} {100,50}
The unit number of hidden layer {200,100,50} {200,100,50}
The unit number of hidden layer {200,100,50} {200,100,50}
The unit number of hidden layer {300,200,100,50} {200,100,100,50}
The unit number of hidden layer {300,200,100,50} {200,100,100,50}

0.24 0.16
1-layer 1-layer
0.22
2-layer 0.14 2-layer
0.2 3-layer 3-layer
0.18 4-layer 0.12 4-layer

0.16
Diagnosis error

0.1

Diagnosis error
0.14

0.12 0.08

0.1
0.06
0.08

0.06 0.04

0.04
0.02
0.02
0
0 0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
Active iteration Active iteration

(b)
(a)
Figure
Figure 15.15. Classification
Classification resultsobtained
results obtainedbybyAL-DNN
AL-DNNwith
withmultiple
multiplelayers:
layers: (a)
(a) UCI
UCI (University
(University of
of
California Irvine) dataset; and (b) TE dataset.
California Irvine) dataset; and (b) TE dataset.

The criteria of active learning are manifold and have their own advantages in different
The criteria of active learning are manifold and have their own advantages in different scenarios,
scenarios, which require the further summarization for different occasions. Furthermore, the
which require the further summarization for different occasions. Furthermore, the construction of DNN
construction of DNN is largely dependent on the regulation of parameters, which relies on
is largely dependent on the regulation of parameters, which relies on experience. The architecture
experience. The architecture selection of DNN is still an open problem that will be investigated
selection of DNN is still an open problem that will be investigated further in the future work.
further in the future work.
5. Conclusions
5. Conclusions
This study presented a DNN with active learning for chemical fault diagnosis using chemical
sensor This
data.study
Deeppresented
learninga technique
DNN withisactive learning
employed asfor
a chemical fault diagnosis
novel method providedusing chemical
for feature
sensor data. Deep learning technique is employed as a novel method
representation of original sensor data. The DNN model employs a deep structure with multiple provided for feature
representation
SDAE of original
and works through sensor data.
a hierarchical The DNN
successive modelprocess
learning employs a deep structure
to construct withmodels.
the diagnosis multiple
SDAE and works through a hierarchical successive learning process to construct
A novel active learning criterion for the particularity of chemical process, that is, a combination the diagnosis
of models.
BvSB and A LFP,
novel active learning
is applied criterionwith
in combination for DNN
the particularity of chemicalthat
for further fine-tuning process,
improvesthat the
is, a
combination
performance of of
theBvSB
modelandin LFP, is applied
an active manner, in combination
which achieves with
theDNN for further
selection fine-tuning
of the most valuable that
sensor data for inducing the DNN model during the interaction phase. This approach shows severalthe
improves the performance of the model in an active manner, which achieves the selection of
most valuable
desirable sensor data for inducing the DNN model during the interaction phase. This approach
proprieties:
shows several desirable proprieties:
(1) It is able to adaptively mine the feature from the measured sensor signals or data by multiple
(1) It is able to adaptively mine the feature from the measured sensor signals or data by multiple
non-linear transformations and approximate non-linear functions that provide more potential
non-linear transformations and approximate non-linear functions that provide more potential
information for various diagnosis issues;
information for various diagnosis issues;
(2) It is an efficient approach for the use of unlabeled sensor data that improves the nature of the
(2) It is an efficient approach for the use of unlabeled sensor data that improves the nature of the
diagnosis model in an unsupervised learning;
diagnosis model in an unsupervised learning;
(3) It relies on a novel active learning criterion compared with available methods to select the most
(3) It relies on a novel active learning criterion compared with available methods to select the most
valuable sensor data and improve the DNN significantly, which requires less labeled sensor data
valuable sensor data and improve the DNN significantly, which requires less labeled sensor
during the iterative labeling process.
data during the iterative labeling process.
Compared with the state-of-the art methods on two well-known datasets, the proposed method
is able to achieve fault diagnosis with high performance and utilize labeled sensor data effectively.
Moreover, this method performs excellently, and is especially superior in diagnosis accuracies and
Sensors 2016, 16, 1695 20 of 22

Compared with the state-of-the art methods on two well-known datasets, the proposed method
is able to achieve fault diagnosis with high performance and utilize labeled sensor data effectively.
Moreover, this method performs excellently, and is especially superior in diagnosis accuracies and
improvement of false positive with less labeled sensor data. Accordingly, the proposed method
exhibits significant applicability and potential for fault diagnosis in complex chemical systems
that involve multidimensional heterogeneous sensory signal and highly nonlinear relationships
between the original senor data and diagnosis results through an effective manner of data utilization.
In future research, we also intend to investigate the combination of DNN and parameter optimization
technique further.

Acknowledgments: This paper was supported by the National Key Research and Development Program of
China (2016YFC0201400), the National Natural Science Foundation of China (NSFC61273072), the National
Natural Science Foundation of China and Zhejiang Joint Fund for Integrating of Informatization and
Industrialization (U1509217).
Author Contributions: Peng Jiang and Zhixin Hu conceived and designed the research; Peng Jiang and Zhixin Hu
performed the research; Peng Jiang, Zhixin Hu, Jun Liu, Shanen Yu and Feng Wu wrote the paper.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Ylamos, I.; Escudero, G.; Graells, M.; Puigjaner, L. Performance assessment of a novel fault diagnosis system
based on support vector machines. Comput. Chem. Eng. 2009, 33, 244255. [CrossRef]
2. Monroy, I.; Benitez, R.; Escudero, G.; Graells, M. A semi-supervised approach to fault diagnosis for chemical
processes. Comput. Chem. Eng. 2010, 34, 631642. [CrossRef]
3. Tamilselvan, P.; Wang, P. Failure diagnosis using deep belief learning based health state classification.
Reliab. Eng. Syst. Saf. 2013, 115, 124135. [CrossRef]
4. Kambatla, K.; Kollias, G.; Kumar, V.; Grama, A. Trends in big data analytics. J. Parallel Distrib. Comput. 2014,
74, 25612573. [CrossRef]
5. Dai, X.; Gao, Z. From model, signal to knowledge: A data-driven perspective of fault detection and diagnosis.
IEEE Trans. Ind. Inform. 2013, 9, 22262238. [CrossRef]
6. Isermann, R. Model-based fault-detection and diagnosisstatus and applications. Annu. Rev. Control 2005,
29, 7185. [CrossRef]
7. Li, F.; Upadhyaya, B.R.; Coffey, L.A. Model-based monitoring and fault diagnosis of fossil power plant
process units using group method of data handling. ISA Trans. 2009, 48, 213219. [CrossRef] [PubMed]
8. Zhang, L.; Lin, J.; Karim, R. An angle-based subspace anomaly detection approach to high-dimensional data:
With an application to industrial fault detection. Reliab. Eng. Syst. Saf. 2015, 142, 482497. [CrossRef]
9. Jia, F.; Lei, Y.; Lin, J.; Zhou, X.; Lu, N. Deep neural networks: A promising tool for fault characteristic mining
and intelligent diagnosis of rotating machinery with massive data. Mech. Syst. Signal Process. 2016, 72,
303315. [CrossRef]
10. Mahadevan, S.; Shah, S.L. Fault detection and diagnosis in process data using one-class support vector
machines. J. Process Control 2009, 19, 16271639. [CrossRef]
11. Eslamloueyan, R. Designing a hierarchical neural network based on fuzzy clustering for fault diagnosis of
the Tennessee-Eastman process. Appl. Soft Comput. 2011, 11, 14071415. [CrossRef]
12. Zhang, X.; Zhou, J. Multi-fault diagnosis for rolling element bearings based on ensemble empirical mode
decomposition and optimized support vector machines. Mech. Syst. Signal Process. 2013, 41, 127140.
[CrossRef]
13. Azadeh, A.; Saberi, M.; Kazem, A.; Ebrahimipoura, V.; Nourmohammadzadeha, A.; Saberid, Z. A flexible
algorithm for fault diagnosis in a centrifugal pump with corrupted data and noise based on ANN and
support vector machine with hyper-parameters optimization. Appl. Soft Comput. 2013, 13, 14781485.
[CrossRef]
14. Amar, M.; Gondal, I.; Wilson, C. Vibration spectrum imaging: A novel bearing fault classification approach.
IEEE Trans. Ind. Electron. 2015, 62, 494502. [CrossRef]
Sensors 2016, 16, 1695 21 of 22

15. Bin, G.F.; Gao, J.J.; Li, X.J.; Dhillon, B.S. Early fault diagnosis of rotating machinery based on wavelet
packetsEmpirical mode decomposition feature extraction and neural network. Mech. Syst. Signal Process.
2012, 27, 696711. [CrossRef]
16. Mart, L.; Sanchez-Pi, N.; Molina, J.M.; Garcia, A.C. Anomaly detection based on sensor data in petroleum
industry applications. Sensors 2015, 15, 27742797. [CrossRef] [PubMed]
17. Shang, C.; Yang, F.; Huang, D.; Lyu, W. Data-driven soft sensor development based on deep learning
technique. J. Process Control 2014, 24, 223233. [CrossRef]
18. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18,
15271554. [CrossRef] [PubMed]
19. Lu, W.; Wang, X.; Yang, C.; Zhang, T. A novel feature extraction method using deep neural network for
rolling bearing fault diagnosis. In Proceedings of the IEEE 27th Chinese Control and Decision Conference
(CCDC), Qingdao, China, 2325 May 2015.
20. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks.
In Proceedings of the Advances in neural information processing systems, Lake Tahoe, NV, USA,
38 December 2012; pp. 10971105.
21. Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the
27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 2124 June 2010; pp. 807814.
22. Socher, R.; Bengio, Y.; Manning, C.D. Deep learning for NLP (without magic). In Proceedings of the Tutorial
Abstracts of ACL 2012, Association for Computational Linguistics, Jeju Island, Korea, 814 July 2012.
23. Gan, M.; Wang, C. Construction of hierarchical diagnosis network based on deep learning and its application
in the fault pattern recognition of rolling element bearings. Mech. Syst. Signal Process. 2016, 72, 92104.
[CrossRef]
24. Sun, W.; Shao, S.; Zhao, R.; Yan, R.; Zhang, X.; Chen, X. A sparse auto-encoder-based deep neural network
approach for induction motor faults classification. Measurement 2016, 89, 171178. [CrossRef]
25. Xie, D.; Bai, L. A Hierarchical Deep Neural Network for Fault Diagnosis on Tennessee-Eastman Process.
In Proceedings of the IEEE 14th International Conference on Machine Learning and Applications (ICMLA),
Miami, FL, USA, 911 December 2015; pp. 745748.
26. Zhu, X. Semi-Supervised Learning Literature Survey. Comput. Sci. 2008, 37, 6377.
27. Bordes, A.; Ertekin, S.; Weston, J.; Bottou, L. Fast kernel classifiers with online and active learning. J. Mach.
Learn. Res. 2005, 6, 15791619.
28. Tuia, D.; Pasolli, E.; Emery, W.J. Using active learning to adapt remote sensing image classifiers.
Remote Sens. Environ. 2011, 115, 22322242. [CrossRef]
29. Al Rahhal, M.M.; Bazi, Y.; AlHichri, H.; Alajlan, N.; Melgani, F.; Yager, R.R. Deep learning approach for
active classification of electrocardiogram signals. Inf. Sci. 2016, 345, 340354. [CrossRef]
30. Settles, B. Active Learning Literature Survey; University of Wisconsin: Madison, WI, USA, 2010;
Volume 52, p. 11.
31. Bellala, G.; Stanley, J.; Bhavnani, S.K.; Scott, C. A rank-based approach to active diagnosis. IEEE Trans.
Pattern Anal. Mach. Intell. 2013, 35, 20782090. [CrossRef] [PubMed]
32. Zhao, X.; Li, M.; Xu, J.; Song, G. An effective procedure exploiting unlabeled data to build monitoring system.
Expert Syst. Appl. 2011, 38, 1019910204. [CrossRef]
33. Bengio, Y. Learning deep architectures for AI. Found. Trends Mach. Learn. 2009, 2, 1127. [CrossRef]
34. Najafabadi, M.M.; Villanustre, F.; Khoshgoftaar, T.M.; Seliya, N.; Wald, R.; Muharemagic, E. Deep learning
applications and challenges in big data analytics. J. Big Data 2015, 2, 1. [CrossRef]
35. Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A. Stacked denoising autoencoders:
Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010,
11, 33713408.
36. Larochelle, H.; Bengio, Y.; Louradour, J.; Lamblin, P. Exploring strategies for training deep neural networks.
J. Mach. Learn. Res. 2009, 10, 140.
37. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 7986. [CrossRef]
38. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors.
Cognit. Model. 1988, 5, 1. [CrossRef]
Sensors 2016, 16, 1695 22 of 22

39. Ellingson, B.M.; Zaw, T.; Cloughesy, T.F.; Naeini, K.M.; Lalezari, S.; Mong, S.; Lai, A.; Nghiemphu, P.L.;
Pope, W.B. Comparison between intensity normalization techniques for dynamic susceptibility contrast
(DSC)-MRI estimates of cerebral blood volume (CBV) in human gliomas. J. Magn. Reson. Imaging 2012, 35,
14721477. [CrossRef] [PubMed]
40. Downs, J.J.; Vogel, E.F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17,
245255. [CrossRef]
41. Ruiz, D.; Nougus, J.; Caldern, Z.; Espuna, A.; Puigjaner, L. Neural network based framework for fault
diagnosis in batch chemical plants. Comput. Chem. Eng. 2000, 24, 777784. [CrossRef]

2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

S-ar putea să vă placă și