TR2019 084

MITSUBISHI ELECTRIC RESEARCH LABORATORIES
http://www.merl.com
Deep Learning Algorithms for Bearing Fault Diagnostics – A

Review
Zhang, S.; Zhang, S.; Wang, B.; Habetler, T.
TR2019-084 September 05, 2019
Abstract
This paper presents a comprehensive review on applying various deep learning algorithms
to bearing fault diagnostics. Over the last ten years, the emergence and revolution of deep
learning (DL) methods have sparked great interests in both industry and academia. Some of
the most noticeable advantages of DL based models over conventional physics based models
or heuristic based methods are the automatic fault feature extraction and the improved
classifier performance. In addition, a thorough and intuitive comparison study is presented
summarizing the specific DL algorithm structure and its corresponding classifier accuracy for
a number of papers utilizing the same Case Western Reserve University (CWRU) bearing data
set. Finally, to facilitate the transition on applying various DL algorithms to bearing fault
diagnostics, detailed recommendations and suggestions are provided for specific application
conditions such as the setup environment, the data size, and the number of sensors and sensor
types. Future research directions to further enhance the performance of DL algorithms on
healthy monitoring are also presented.
Symposium on Diagnostics for Electric Machines, Power Electronics and Drives (SDEMPED)
This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in
whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all
such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric
Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all
applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require
a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.
Copyright
c Mitsubishi Electric Research Laboratories, Inc., 2019
201 Broadway, Cambridge, Massachusetts 02139
Deep Learning Algorithms for Bearing Fault
Diagnostics – A Review
Shen Zhang∗† , Shibo Zhang‡ , Bingnan Wang∗ , and Thomas G. Habetler†
∗ Mitsubishi Electric Research Laboratories, 201 Broadway, Cambridge, MA 02139, USA
† School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
‡ Computer Science, Northwestern University, Evanston, IL 60201, USA
Abstract—This paper presents a comprehensive review on frequency spectral analysis on monitored signals and analyzing
applying various deep learning algorithms to bearing fault diag- their component at the characteristic fault frequency, which
nostics. Over the last ten years, the emergence and revolution of can be calculated by a well-defined mechanical model [5] that
deep learning (DL) methods have sparked great interests in both
industry and academia. Some of the most noticeable advantages depends on the motor speed, bearing geometry and the specific
of DL based models over conventional physics based models or location of a defect.
heuristic based methods are the automatic fault feature extraction However, the accuracy of the mainstream bearing fault
and the improved classifier performance. In addition, a thorough diagnostics method based on vibration or acoustic signals can
and intuitive comparison study is presented summarizing the be affected by background noise due to external mechanical
specific DL algorithm structure and its corresponding classifier
accuracy for a number of papers utilizing the same Case Western excitation motion, while its sensitivity is also subject to change
Reserve University (CWRU) bearing data set. Finally, to facilitate based on sensor mounting positions and spatial constraints
the transition on applying various DL algorithms to bearing in a highly-compact design. Therefore, a popular alternative
fault diagnostics, detailed recommendations and suggestions are approach for bearing fault detection is accomplished by ana-
provided for specific application conditions such as the setup lyzing the stator current [9], which is measured in most motor
environment, the data size, and the number of sensors and
sensor types. Future research directions to further enhance the drives and thus would not bring extra device or installation
performance of DL algorithms on healthy monitoring are also costs.
presented. Despite its advantages such as economic savings and simple
Index Terms—Bearing fault; diagnostics; machine learning; implementation, stator current signature analysis can encounter
deep learning; feature extraction many practical issues. For example, the magnitude of stator
current at bearing fault signature frequencies can vary at
I. I NTRODUCTION different loads, different speeds, and different power ratings
Electric machines are widely employed in a variety of of the motor itself, thus bringing challenges to identify the
industry application processes and electrified transportation threshold stator current values to trigger a fault alarm at an
systems. However, for certain applications these machines may arbitrary operating condition. Therefore, a thorough testing
operate under unfavorable conditions, such as high ambient is usually required while the motor is still at the healthy
temperature, high moisture and overload, which can eventually condition, and the healthy data would be collected while
result in motor malfunctions that lead to high maintenance the targeted motor is running at different loads and speeds.
costs, severe financial losses, and safety concerns [1], [2]. The This process, summarized as “Learning Stage” in [12], is
malfunction of electric machines can be generally attributed to unfortunately very tedious and expensive to perform, and
various faults of different categories, which include the drive needs to be repeated for motors with different power ratings.
inverter failure, the stator winding insulation breakdown, the Most of the challenges described above can be attributed
bearing fault, and the air gap eccentricity. Several surveys to the fact that all of the conventional methods rely solely
on the likelihood of induction machine failures conducted by upon values at fault characteristic frequencies to determine the
the IEEE Industry Application Society (IEEE-IAS) [3] and presence of a bearing fault. However, there may exist some
the Japan Electrical Manufacturers’ Association (JEMA) [4] unique patterns or relationships hidden in the data that can
reveal that bearing fault is the most common fault type that potentially reveal a bearing fault; and these special features
accounts for 30% to 40% of the total faults. can be almost impossible for humans to identify at the first
Since bearing is the most vulnerable component in a motor place. Therefore, many researchers began applying various
and drive system, bearing fault detection has been a research machine learning algorithms, i.e., artificial neural networks
frontier for engineers and scientists for the past decade. Specif- (ANN), principal component analysis (PCA), support vector
ically, this problem is approached by interpreting a variety of machines (SVM), etc., to better parse the data, learn from
available signals, including vibration [7], acoustic noise [8], them, and then apply what they’ve learned to make intelligent
stator current [9], thermal-imaging [10], and multiple sensor decisions regarding the presence of bearing faults [13]–[16].
fusion [11]. The existence of a bearing fault as well as its Most of the literature applying these ML algorithms report
specific fault type can be readily determined by performing satisfactory results with accuracy over 90%.
1
III. D EEP L EARNING BASED A PPROACHES
As a subset of machine learning, deep learning possesses
powerful capability to learn and represent real-world applica-
tions with great flexibility as nested hierarchy from convoluted
concepts to simpler concepts, and abstract representations
computed in terms of perceptual ones. The trend of transi-
tioning from conventional ML methods to deep learning can
be attributed to the following reasons.
1) Hardware evolution: Training deep neural networks is
extremely computationally intensive, but running on a
high performance GPU can significantly accelerate this
training process.
Fig. 1. Structure of a rolling-element bearing with four types of abnormal 2) Algorithm evolution: More techniques and frameworks
scenarios: (a) misalignment (out-of-line), (b) shaft deflection, (c) crooked or
tilted outer race and (d) crooked or tilted inner race [5]. are invented and getting matured in terms of controlling
the training process of deeper models to achieve faster
speed, better convergence, and better generalization.
To achieve even better performance and higher classification 3) Data explosion: With the availability of more sensors
accuracy under versatile operating conditions or noisy condi- installed that collect an increasing amount of data, and the
tions, deep learning based methods are becoming increasingly application of crowdsourced labeling mechanism such as
popular to meet this need. Although this literature survey does Amazon mTurk [17], we have seen a surging appearance
not include all DL papers on bearing fault diagnostics due to of large scale dataset in many domains, such as ImageNet
length limit, it is observed that the number of papers grew in image recognition, MPI Sintel Flow in image optical
steadily in the last three years, clearly indicating booming in- flow, VoxCeleb in speaker identification, et al.
terests in employing DL methods for bearing fault diagnostics. All of the factors above contribute to the new era of deep
In this context, this paper seeks to present a thorough overview learning for a variety of data-related applications. Specifically,
on recent research work devoted to applying deep learning advantages of applying deep learning algorithms compared to
techniques to bearing fault diagnostics. conventional approaches include:
The rest of this paper is structured as follows. In Section
1) Best-in-class performance: The complexity of the com-
II, we give a brief overview to the bearing structure, and the
puted function grows exponentially with depth [18]. Deep
CWRU dataset to be used for a comparative study. In Section
learning has best-in-class performance that significantly
III, we discuss the advantages of DL approaches compared
outperforms other solutions on problems across multiple
with conventional machine learning methods, and introduce a
domains, including speech, language, vision, and gaming.
variety of DL approaches that have been applied for bearing
2) Automatic feature extraction: No need for feature engi-
fault diagnostics. Then in Section IV, a systematic comparison
neering. Conventional machine learning algorithms usu-
is provided on the classification accuracy of DL algorithms.
ally call for sophisticated manual feature engineering
Based on those observations, in Section V we provide our
which unavoidably involves expert domain knowledge
recommendations for applying DL algorithms to bearing fault
and numerous human effort. With a deep network, there’s
diagnostics, and future directions in this field.
no need for this.
II. B EARING S TRUCTURE AND FAULT DATASET 3) Transferability: The strong expressive power and high
The structure of a rolling-element bearing is illustrated in performance of a deep neural network trained in one
Fig. 1, which contains the outer race typically mounted on the domain can be easily generalized or transferred to other
motor cap, the inner race to hold the motor shaft, the ball or contexts or settings.
the rolling element, and the cage for restraining the relative Due to these advantages, We are witnessing an exponential
distance of the rolling elements. Four common scenarios of increase in DL applications. One such example is fault diag-
misalignment are demonstrated in Fig. 1 (a) to (d). nostics and health prognostics, and bearing fault identification
Data is the foundation for all machine learning and artificial is a very representative case.
intelligence methods. To develop effective DL algorithms
for bearing fault detection, a good collection of datasets is A. Convolutional Neural Network (CNN)
necessary. Since the bearing degradation process may take Inspired by animal visual cortex [19], convolution operation
many years, most people conduct experiment and collect data is first introduced to detect image patterns in a hierarchical
either using bearings with artificially injected faults, or with way from simple to complex features. The low layer detects
accelerated life testing. A few organizations have made the fundamental low level visual features such as edge and corner,
effort and provided bearing fault datasets for people to work and layers afterward detect higher level features.
on the DL research, i.e., the CWRU dataset, which can serve The first paper employing CNN to identify bearing fault was
as standards for the comparison of different algorithms. published in 2016 [20]. For the next three years many papers
2
104 IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 23, NO. 1, FEBRUARY 2018
Fig. 2. Architecture of the CNN-based fault diagnosis model.

Fig. 2. Architecture of the CNN-based fault diagnosis model [25].
and complicated object models in latter layers. The classification

layer can[25],
[21]–[23], use the learned
[26] emergedbankon of filters (or the
this topic extracted
that features)
contributed to transforming 1-D time-series data into 2-D input matrix and
to achieve the classification.
its performance advancement in various aspects. The basic ar- sent to CNN as input. The average accuracy with two sensors
chitecture ofIII. a CNN-based
CNN-BASED bearing
FAULTfault classifier
DIAGNOSIS Wis illustrated
ITH is 99.41%, comparing favorably against one sensor of 98.35%.
in Fig. 2. The 1-D temporal raw
MULTIPLE SENSORS data obtained from different Some other variations of CNN are also employed to tackle
accelerometers are firstly stacked to a 2-D matrix form, similar the bearing fault diagnosis challenge [25], [26] on the CWRU
to theThis paper proposes
representation an intelligent
of images, which isfault thendiagnosis approach
passed through a dataset to obtain more desirable characteristics compared to
for rotating machinery based on CNN
convolution layer for feature extraction, followed by a pooling with the raw data from the conventional CNN. For example, a CNN based on LeNet-5
multiple sensors. With the appealing capability of automatic
layer for down-sampling. The combination of this convolution- [24] is applied in [25] containing 2 alternating convolutional-
feature extraction of the approach, no hand-crafted features are
pooling can be repeated many times to further deepen the pooling layers and 2 fully-connected (FC) layers. This im-
needed to classify different conditions. Multiple sensor fusion at
network. Finally, the output from hidden layers will be passed proved CNN architecture is believed to provide better feature
data level is achieved by combining the raw data from multiple
along to one or several fully-connected (FC) layers, the result extraction capability, as the accuracy on the test set is an aston-
sensors into a 2-D matrix at the input layer. Fig. 1 shows the
of which serves as the input to a top classifier such as Softmax ishing
Fig. 3. 99.79%, which
Experimental is better
setup than other
of the CWRU DL methods such as
dataset.
flowchart of the proposed fault diagnosis method. Condition
or monitoring
Sigmoid functions. data of the running machinery is collected from the adaptive CNN (98.1%) and deep belief network (87.45%),
In [20], sensors
multiple the vibration
such as data weresignals
vibration acquiredfrom using two ac-
accelerometers. while dominating
Overfitting is a conventional
common issueML methodswhich
in training, such leads
as SVM to
celerometers, one installed on top of the housing,
After denoising and preprocessing, these sets of 1-D time series the other on (87.45%) and ANN (67.70%).
a poor performance on the testIndata
addition, a deep
especially with fully CNN
limited
theareback. CNN is able to autonomously learn
stacked row by row to form a 2-D input matrix. The temporal useful features (DFCNN)
training data. incorporating
This paper 4uses convolution-pooling
dropout to preventlayer pairs is
overfitting.
forinformation
bearing fault anddetection
the spatialfrom the raw from
information data the pre-processed
sensors are employed
Dropout isin a[26], with raw
technique thatdata transformed
avoids extracting intosame
spectrograms
features
by constructed
the scaledinFourier transform. The classification
the input matrix in this manner. All the collected results for easier processing.
repeatedly to reduce the An accuracy
chance of 99.22%
of overfitting is achieved,
[36]. During each
demonstrate that feature-learning based
samples are then divided into training, validation, and testingapproaches such as outperforming the 94.28%
iteration of training, neuronsofare
linear SVM dropped
randomly with particle swarm
out, which
CNN can also identify some early-stage
dataset. The training dataset is used to train the initialized CNN faulty conditions optimization
means temporarily (PSO),removed
and 91.43%from oftheconventional
network, along SVM. with all
without
modelexplicit characteristic
by minimizing the error frequencies,
between thesuch as lubrication
predicted condition their incoming and outgoing connections with probability p,
degradation, which cannot B.
so Auto-encoders
and the actual one. The be identified
validation by traditional
dataset is used to methods.
select a that a reduced network is left for training [37]. It can be
An
model adaptive CNN (ADCNN)
before possible overfitting. is Theapplied to the capability
generalization CWRU Auto-encoder
implemented by issetting
proposed
the in 1980s elements
selected as a method of theof unsuper-
feature
dataset
of theto dynamically
trained model is change
then evaluatedthe learning rate, for
by the testing a bet-
dataset. No vised pre-training method for ANN [6], [51]. After evolving
maps to zero. In the testing phase, the dropout is turned OFF
ter manual
trade-off between
feature training
extraction speed isand
or selection neededaccuracy
in this in [21].
approach the probability p
for decades, it has become widely adopted as an unsupervised
and will be multiplied by each feature map
Theas the
entire fault diagnosis
representative featuresmodel employs aextracted
are automatically fault pattern
during element.learning
feature Dropoutmethod
is considered
and atogreedy
exponentially
layer-wisecombineneuralmanynet-
determination component using 1 ADCNN and a fault size
the training process. different neural network architectures in an efficient
work pre-training method. The training process of a one hidden way to find
evaluation
Fig. 2 component
shows the detailed using 3 structure
ADCNNs, of 3-layer
the CNN-basedCNNs with fault the fittest
layer model. is illustrated in Fig. 3. An auto-encoder
auto-encoder
max pooling.model.
diagnosis Classification
Machine resultscondition on monitoring
the test set data demonstrate
Xin , (i = is trained from an ANN, which is composed of two parts:
ADCNN
1, 2, . . .provides
, m) from a better accuracy
m vibration compared
sensors to conventional
is collected and fused encoder and decoder. The output EofVALUATION
IV. EXPERIMENTAL the encoder is fed into
shallow
at the CNN
data level and as SVM X ∈ Rm ×n
inputmethods, especially
of the CNN for identifying
model. The the To
decoder as input. The ANN takes
evaluate the effectiveness of the proposed the mean square
approach forerror
ro-
rolling
inputelement
is convolved (ball)bydefect.
K1 filtersIn addition,
of size p1 × thisq1 proposed
× 1. The ReLUAD- between the original
tating machinery faultinput and output
diagnosis, as loss
two typical function,
rotating which
machinery,
CNN is also able to predict the fault size (defectto width)
operation is applied on the convolved outcome form the withK1 aims at imitating
bearings the input
and gearboxes, as the finalin output.
are investigated this paper.The decoder
Vibration
satisfactory accuracy.
feature maps with dimension (m − p1 + 1) × (n − q1 + 1). A part is dropped
signals and machine
of different only theconditions
encoder part is kept. Therefore
are collected through
Similar to earlier work [20], [21], [22] implements a 4-
max-pooling layer is followed to subsample the feature maps by the outputaccelerometers.
multiple of the encoder is the feature representation that can
using
layer (4). Followed
CNN structure bywith another such stage, the and
2 convolutional convolution
2 pooling pro- be employed in various classifiers.
cess aims to capture the representative
layers on bearing fault detection, and the accuracy outperforms features from the input
data. A fullySVM connected layer and a softmax layer are added next A.There
Caseare many
One: studiesFault
Bearing employing
Diagnosis auto-encoder in bearing
conventional and shallow Softmax regression classifier. fault diagnosis [27]–[30]. One of the earliest can be found in
to output the machine condition. Minibatch
In particular, when the vibration signal is mixed with ambient stochastic gradient [27],1) where
Experimental
a 5-layerSetup and Data
auto-encoder Description:
is utilized to mine In this
fault
descent
noise, is used in thiscan
the improvement paper
be astolarge
update the parameters
as 25%, showcasing ofthethe case study, the public available roller bearing
characteristics from the frequency spectrum and effectively condition dataset
model in the training process using (6)
excellent built-in denoising capabilities of the CNN algorithm. through (8). After train- collectedthe
classify from a motor
health drive system
condition by Case The
of machines. Western Reserve
classification
A ing,
sensor the CNN model extracts representative features directly
fusion approach is applied in [23], in which both University (CWRU) is analyzed [38]. The objective is to
accuracy reaches 99.6%, which is significantly higher than the
from the raw vibration signals from multiple sensors. Fault di- diagnose the different faults of bearing also with different levels
temporal and spatial information of the CWRU data from two 70% of back-propagation based neural networks (BPNN). In
agnosis can then be performed on new monitoring data. of severity. The main components of the experimental setup
accelerometers at both drive end and fan end are stacked by [28], an auto-encoder based extreme learning machine (ELM)
3
n autoencoder is a three-layer network including an encoder and a decoder, shown in Fig. 1. The encoder maps the input
from a high-dimensional space into codes in a low-dimensional space, and the decoder reconstructs the input data from
orresponding codes [26].
lowest visible layer. The DBNs can be trained greedily, one

layer at a time, and the contrastive divergence is applied to
D
xi zi D
d
x1
hi
z1 the next RBM in turn [32].
h1 Its first application on bearing fault diagnosis was published
x2 z2 in 2017 [33], in which a multi-sensor vibration data fusion
h2 Decoder technique is implemented to fuse the time domain and fre-
Encoder
x3 z3 quency domain features extracted via multiple 2-layer SAEs.
A 3-layer RBM based DBN is then used for classification. Val-
hd idation is performed on vibration data under different speeds,
xD zD
and a 97.82% accuracy demonstrated that the proposed method
can effectively identify bearing faults even after the change of
Fig. 1. The structure of an autoencoder. operating condition. In [34], a stochastic convolutional DBN
Fig. 3. Process of training a one hidden layer auto-encoder [31]. (SCDBN) is implemented by means of stochastic kernels and
averaging processing, and unsupervised CNN is built to extract
47 features. Later a 2-layer DBN is implemented with (28, 14)
is employed seeking to integrate the automatic feature extrac- nodes and 5 kernels in each layer. Finally, a Softmax layer is
tion capability of auto-encoders and the high training speed of used with an average accuracy of over 95%.
ELMs. The average accuracy of 99.83% compares favorably
Many DBN papers also employ the CWRU bearing dataset
against other conventional ML methods, including wavelet
as the input data [35]–[37] thanks to its popularity. For
package decomposition based SVM (WPD-SVM) (94.17%),
example, an adaptive DBN and dual-tree complex wavelet
EMD-SVM (82.83%), WPD-ELM (86.75%) and EMD-ELM
packet (DTCWPT) is proposed in [35]. The DTCWPT first
(81.55%). More importantly, the required training time drops
preprocesses vibration signals, where an original feature set
by around 60% to 70% using the same training and test data
with 9×8 feature parameters is generated. The decomposition
thanks to ELM.
level is 3, and the db5 function, which defines multiple scaling
Compared to CNN, the denoising capabilities of original
coefficients of the Daubechies wavelet, is taken as the basis
auto-encoders is not prominent. Thus in [29], a stacked de-
function. Then a 5-layer adaptive DBN is used for bearing
noising autoencoder (SDA) is implemented, which is suitable
fault classification. The average accuracy is 94.38%, which is
for deep architecture-based robust feature extraction on signals
much better compared to convention ANN (63.13%), GRNN
containing ambient noise in volatile working conditions. To
(69.38%), and SVM (66.88%) using the same training and
balance between performance and training speed, three hidden
test data. In [37], data from two accelerometers mounted on
layers with 100, 50, and 25 units respectively are employed.
the load end and fan end are processed by multiple DBNs
The original CWRU bearing data are perturbed by a 15 dB
for feature extraction; then the faulty conditions based on
random noise to mimic the noisy condition, and multiple
each extracted feature are determined with Softmax; and the
operating condition datasets are used as test sets to examine its
final health condition is fused by DS evidence theory. An
fault identification capability under speed and load changes.
accuracy of 98.8% is accomplished while including the load
This method achieves a worst case accuracy of 91.79%,
change from 1 hp to 2 and 3 hp. In contrast, the accuracy
which is 3% to 10% higher compared to conventional SAE
of SAE suffers the most from this load change, while the
without denoising capability, SVM, and random forest (RF)
accuracy employing CNN is also lower than DBN. Similar to
algorithms. Similar to [29], another form of SDA is utilized
this D-S theory based output fusion [36], a 4-layer DBN with
in [30] with three hidden layers. Signals of the CWRU dataset
different hyper-parameters coupled with ensemble learning is
are combined with different levels of random noises in the time
implemented in [37]. An improved ensemble method is used
domain and converted to frequency domain signals. The pro-
to acquire the weight matrix for each DBN, and the final
posed method has better diagnosis accuracy than deep belief
diagnosis result is formulated by each DBN based on their
networks (DBN), particularly with the added noise, where an
weights. The average accuracy of 96.95% is better compared
improvement of 7% in diagnosis accuracy is achieved.
to those employing a single DBN of different weights (around
80%) and a simple voting ensemble DBN (91.21%).
C. Deep Belief Network (DBN)
In deep learning, a Deep Belief Network (DBN) is com-
D. Recurrent Neural Network (RNN)
posed of unsupervised networks such as Restricted Boltzmann
Machines (RBMs) or autoencoders, where each preceding Different from feed-forward neural network (FNN), RNN
hidden layer becomes the visible layer for the next, as demon- processes input data in a recurrent manner. The architecture
strated by boxes of different colors in Fig. 4. Often recognized is shown in Fig. 5. With a flow path going from its hidden
as an undirected, generative energy-based model, an RBM layer to itself, when unrolled in the sequence, it can be viewed
consists of a “visible” input layer and a hidden layer with as a feed-forward neural network across the input sequence.
connections in between. The composition of RBNs leads to a As a sequence model, it can capture and model the sequential
fast and unsupervised training procedure and starts from the relationship in sequential data or time series data.
4
1696 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 66, NO. 7, JULY 2017
Meas. Sci. Technol. 29 (2018) 065107 H Jiang et al
Fig. 5. Restricted Boltzmann machine.
visible units and hidden units. Connections only exist between

1. (a) Structure of the RNN, (b) structure of the RNN across a time step.
the visible unitsFigure of the input layer and the hidden units of the
fh and Fig.
where hidden are5.
folayer. the (a) Architecture
activation functions of the hidden
of the RNN, and (b) RNN across a time step [38].
layer and the output layer, respectively, Wih are the weight
The DBN learning process
matrix connecting input layer with a hidden layer, Whh is the
includes two stages: in the first
weightstage,
matrix ofpretraining
the hidden layer theto its
RBM own loop layer step by step in a greedy
connection,
Who isway, filtering
the connection
and the [50],
inweight and
matrix between
second stage,athefine-tuning
new
hidden layer deepthelearning whole network architecture capsule
and the outputnetwork layer, and bh[48] and boproposed
are bias vectors by ofHinton
the et al. [47].
to adjust the parameters
hidden layer and output layer, respectively.
for achieving an ideal performance.
Fig. 4. Architecture of DBN.
The training
However, a conventional dataRNN wereas firstly
shown in input
figure into1 hasthe visible vector for
inherenttraining IV.first
flaws. According
the CtoOMPARISON
figure 1(b),
RBM in an we can OF
see that
unsupervised DLtheA LGORITHM
manner. And P ERFORMANCE
then
Fig. 4. Architecture of a DBN [33]. architecture of the RNN across time steps produced is equivalent by to anthe level below were
the feature representations
FNN with multiple hidden layers, and the number of time
USING THE CWRU D ATASET
all the feature sets were divided into two groups, one for regarded
steps can be regarded as asthe input
its total numberto trainof layers. theWhen next the RBM. This training
training and the other for testing. Secondly, SAE was applied RNN isprocess
trained using
A back
was systematic
repeated
propagation until comparison
the time
through RBMof
last (BPTT), wasthe accuracy of different DL
learnt.
However, trained with back propagation throughthetime error back algorithms
propagates employing
not onlycasefrom the theto the CWRU and dataset is presented in
for feature fusion. Finally, the fused features were input into RBM is a special of output
Boltzmann layer machines Markov
(BPTT), RNN for has the notorious gradient vanishinghidden issue layer but also through time t to time 1 simultaneously
TABLE. asI.that
The
if t istesting accuracy There of areall a of the DL of algorithms are
the DBN fault classification. random
[38]. However, fields
it can be seen shown inlarge,
too Fig.the 5.learning number
stemmed Thefrom
feature itsfusion
nature.
and Although
classificationproposed in 1980s,
of our proposed method RNNs
process will beabove
neurons in every
especially 95%, RBM
challenging which layer,
due to validates
thewhich
gradientare the
van- effectiveness
independent of each of applying deep
have includes
limitedthe applications
following sixfor this reason, until the birth
procedures. ofthe exploding
ishing other.
or A neuron
learning
problemwith [36].
to model,only
bearing inactivated and activated states can we would like
Therefore, an improved RNN namedfault diagnostics. However,
the long-short Figure 2. Structure of the LSTM.
long short-term memory (LSTM)
1) Data Segmentation: in 1997.
The vibration Specifically,
data were obtained LSTM be represented
term memory to stress
recurrent
in binary
neuralthat the
network
value 0 and 1, was
specific
(LSTMRNN),
respectively. Supposing
values of testing accuracy are only
from by multiple accelerometers mounted on different loca- a to RBM consists
the flawsof visible vector RNNv[39]. and hidden vector h,
is augmented adding recurrent gates called “forget” gates.
proposed overcome
demonstrated
of the conventional
for ancanintuitive
tions under different running conditions, and then were From figure 2,
the joint we can see that thedistribution
probability LSTMRNN be(v,
of h) isunderstanding
acquired given
(8) by the energy for
ht =theot hfollowing
(ct ) ,
Designed segmented
for overcoming the gradient vanishing issue, LSTM by replacing the
function reasons:
hidden neurons of a conventional RNN with
for grouping into two categories, one for long-short term memory (LSTM) units. The most obvious where , g, and h are the gate activation function, the input,
σ
has showntraining
astonishing
and the capability in modeling the long-term characteristic of 1) and
m the output activation functions of the LSTM units, respec-
other for testing. an LSTMGeneralization:
unit
m is that it mainly Some of
n consists
DLn
methodsWih , Wiiwith , Wif , accuracy
and W are the over
dependency in data, and therefore is taking aand dominant role
a memory inE(v, andθthree
cell h, ) = layer− gates, vi.e.j −an inputcgate,i hi − forget tively; g
weight matrices
(9) and the ioLSTM layer at time t; Whh,
g g
2) Feature Extraction: All the time-domain frequency-
gate, and output gate. In 99% addition,arebthej generally
dashed lines connecting applied between onvaj wvery i, j h
the i
input specific
layer dataset at a
time series and text analysis. It has received great success
domain features were extracted from each data set. the memory cell withfixed
j =1 i=1
peephole W
i=1 j =1 hi , Whf , and Who are the self-connection weight matrices
three layer operating
gates are called condition. However,
of
g g
the LSTM units this g
accuracy
between time t andwouldt − 1; bh, bi , bf , and
in speech recognition,Allnatural
3) Normalization: language
the feature processing,
vectors were normalized video
connections
where [40].
w,b,cSuchare architecture
the model of theparameters,
LSTM cell greatly v and h j are the binary
g g
and rescaled into range [0, 1], according to relaxes the problem suffer,
of gradient and may
vanishing ordrop
exploding.to j below
b o areg 90%
the under
bias vectors the
of the influence
input nodes, the input gate, the
analysis, etc. states
Therefore, this of
paper visible
adopts the unitLSTMRNN
j and hidden model to unit forget gate,
get sat-i , b j and ci are theirand the output gate, respectively; pi , pf , and po
of noise and variation ofaremotor
the weight speed/load,
matrices between the which
peephole is
g g
connections and
g
One of the earliest application x −ofxmin

(i) RNN on bearing fault
(i) isfactory results.
biases, The mathematical
respectively, andcalculation
w i, j isprocedure
the of
weight between visible
x(i) = (i) an LSTM
(8) unitunitj can common
andbe hidden
described as in
unit i . practical applications.
the three gate units, respectively.
diagnostics can be found in 2015 xmax −[38],
x(i) where fault features
min (3) The 2)
g (WEvaluation
gt =joint · xt + Whh · ht−1
ihdistribution metrics:
+ bhthe
over ) , visible Regardingand hiddenthe units selection
is of training
were first extracted using discrete wavelet transforms and later as follows: 3. The proposed method
4) Initialization: Initializing the AE parameters W and b i defined samples
t = σ Wii · xt + Whi · ht−1 + pi ct−1 + bi ,
from the CWRU
dataset, many papers did not
selected based on andorthogonal
setting up fuzzy neighbourhood
epochs, learningdiscrimi-
(4) g g g g
This paper proposes a novel intelligent method based on
randomly, maximum rate
guarantee a 1balanced sampling, which for fault diagnosis ratio
means the
· hθ ) + =pf ct−1exp(−E(v, h;anθ))improved DRNN of rolling bearings.
native analysis (OFNDA).
and sparsity These features were then fed
parameter. (5) into
ft = σ Wif · xt +p(v, W h;
Z (θ ) + bf , (10)
of hfdatat−1samples
g g g
selected g
from the healthy
The proposed method mainly condition
consists ofand four parts: firstly,
an RNN for fault classification, enabling the fault classifier
5) Feature Fusion: Two-layer SAEs were trained through the bearing data set design is based on frequency spectrum
(6) t = σ Wthe
owhere the
+ Whofaulty
io · xtpartition · ht−1 +condition
function po ct Z +(θbo) is , = not sequences;
close to
exp(−E(v, 1:1.
secondly,h)).In
the this
DRNN scenario,
construction; thirdly, an
to incorporate a dynamic
minimizing component.error
the reconstruction Theand experiment
the outputshowed
of g g g g
v,h
the last hidden layer was regarded as the fault feature Because accuracy
there are no
ct = it gt + ft ct−1 ,should
visible-visible not be used
or as
improved the
hidden-hidden only
DRNN metric
with
con- to
an adaptive evaluate
learning rate strategy;
that the proposed scheme based on RNN is capable of detect- (7)
nections, thean fourthly, the main diagnosis process of the proposed method.
representations. algorithm,
conditional and other
probabilities overmetrics
hidden including
and visible the precision
ing and6)classifying bearing faults accurately,
features even under non- units are given by
DBN Classification: The fused were utilized and recall should be⎡ introduced to ⎤further evaluation.
stationary tooperating
train the DBN conditions. In addition,
based classification model,a methodology
and then ⎡ 3
⎤
3) Randomness: Although these m DL methods use the same
using combined
the testing1-DdataCNNsets and
wereLSTM
used to to classify
validate the bearing
proposed fault p(h = 1|v; θ) = 1/ ⎣1 + exp ⎣−c − v j wi, j ⎦⎦the percentages of
i dataset to perform classifications, i
types is presented in [39], where the entire architecture is
SAE-DBN method. j =1
training data and test data are different. Even if this
composed of a 1-D CNN layer, a max pooling layer, a LSTM (11)
data distribution
isidentical,n the training and test data

layer,C.and
Deepa Belief
Softmax layer as the top classifier. The system
Network
might
p(v j = 1|h; θ) =be1/randomly
1 + exp −b selected
j − hfrom
i wi, j the CWRU bearing
input isDBN
the composed
raw sampling signallayers
of multiple without any pre-processing,
of RBMs can be effi-
dataset. Therefore this comparison
i=1 is not performed on
and the besttrained
ciently testinginaccuracy reaches layer-by-layer
an unsupervised, as high as 99.6%.
manner. (12)
Lower layers of DBN can extract low-level features in a the common ground, since the classification accuracy is
greedy way and the upper layers are used to represent more To train thesubject
E. Others to change
DBN model, a fasteven with so-called
algorithm the samecon- algorithm due to
abstract
There are characteristics
also numerous of the inputDL
other data.methods
In this paper, DBN for
employed randomness
trastive divergence, in the by
was proposed training
Hinton and
et al.test
[22].dataset
First, selection.
is composed by stacking three layers of RBMs, as shown the conditional probability of hidden units can be obtained by
bearing fault
in Fig. diagnostics,
4. Each RBM is asome are energy-based
two-layer based on new algorithms,
model with using (11), V.
thenRGibbs
ECOMMENDATIONS AND
sampling is employed to F
determine D IRECTIONS
UTURE the
some are mixtures of many DL methods listed above. For
example, in [40], a new large memory storage retrieval (LAM- A. Recommendations and Suggestions
STAR) neural network is proposed. In [41], both DBN and The successful implementation of ML and DL algorithms
SAE are applied simultaneously to identify the presence of a on bearing fault diagnostics can be attributed to the strong
bearing fault. Other examples include a mixture of CNN and correlations among features that follow the law of physics.
DBN [42], deep residual network (DRN) [43], [44], deep stack For researchers considering applying ML/DL methods to solve
network (DSN) [45], RNN based autoencoder [46], sparse bearing problems, the author suggests the following:
5
TABLE I
C OMPARISON OF C LASSIFICATION ACCURACY ON C ASE W ESTERN R ESERVE U NIVERSITY B EARING DATASET WITH D IFFERENT DL A LGORITHMS .
Feature extraction No. hidden Training sample

Reference Classifier Characteristics Testing accuracy
algorithms layers percentage
[21] Adaptive CNN (ADCNN) 3 Softmax Predict fault size 50% 97.90%
[22] CNN 4 Softmax Noise-resilient 90% 92.60%
[23] CNN 4 Softmax Sensor fusion 70% 99.40%
[25] CNN based on LeNet-5 8 FC layer Better feature extraction 83% 99.79%
Connectionist Validation with
[26] Deep fully CNN (DFCNN) 8 78% 99.22%
temporal classification actual service data
[27] SAE 3 ELM Adapt to load change 50% 99.61%
[28] SAE 3 ELM Reduce training time 50% 99.83%
[29] Stacked denoising AE (SDAE) 3 N/A Noise-resilient 50% 91.79%
[30] SDAE 3 Softmax Noise-resilient 80% 99.83%
dual-tree
[35] 5 N/A Adaptive DBN 67% 94.38%
complex wavelet
[36] DBN 2 Softmax Adapt to load change N/A 98.80%
DBN with
[37] 4 Sigmoid Accurate & robust N/A 96.95%
ensemble learning
[39] CNN-LSTM 3 Softmax Accurate 83% 99.6%
1) Examine the setup environment: Thoroughly examine the 1) More complicated datasets: Apply more complicated
working environment and all possible operating con- bearing datasets to prevent the accuracy saturation prob-
ditions of the setup (i.e., indoor/outdoor, fixed/volatile lems.
operating conditions, etc.). For the simplest case with an 2) From artificial to real-world: Make attempts to predict
indoor and a single operating point setup, conventional natural faults with algorithms trained by artificial faults,
ML methods or even frequency-based analytical models which is a reasonable expectation for DL-based fault
should suffice. Otherwise, more advanced DL approaches indicators in real-world applications.
with certain denoising blocks and extra layers should be 3) Explainability: Explain how DL methods work in fault
considered to improve the diagnostic robustness. detection applications. The interpretation of DL in general
2) Sensors: Determine the number and type of sensors to is not so developed as traditional ML methods. Several
be mounted close to bearings. For traditional frequency works, such as [53], [54], visualized the learnt CNN
based and ML based methods, one or two vibrations kernel to interpret its physical meaning. These works have
sensors mounted close to the bearing should be sufficient. provided intuitions on the explainability of DL, but more
For DL based approaches, due to the fact that many algo- in-depth investigations are necessary.
rithms such as CNN are mainly developed for computer
VI. C ONCLUSIONS
vision to handle the 2-D image data, multiple 1-D time-
series data obtained by the sensors in the bearing setup In this paper, a systematic review is conducted on existing
need to be stacked to form this 2-D data. Therefore, it literature employing deep learning algorithms to motor bearing
would be better to have more than two vibration sensors fault diagnostics, which has spurred the interest of academia
installed at the same time. In addition, other types of for the last five years. It is demonstrated that, despite the
sensors such as AE and stator current sensors can be fact that deep learning algorithms require a large dataset to
installed to form a multi-physics dataset to enhance the train, they can automatically perform adaptive feature ex-
performance. tractions on the bearing data without any prior expertise on
3) Data Size: If the size of the collected dataset is not suf- fault characteristic frequencies or operating conditions, making
ficient to train a DL model with good generalization, the them very suitable for real-time bearing fault diagnostics.
algorithm and its training process should be selected to A comparative study is presented and discussed regarding
make the most out of the data and computation resources. the effectiveness of different algorithms. Insights and future
For example, the dataset augmentation techniques such as research directions toward performance improvement and real-
generative adversarial network (GAN), and data random world implementation are provided.
sampling with replacement such as boostrapping can be
R EFERENCES
used, as well as some cross validation methods such as
leave-one-out, Monte Carlo, etc. With the problem of a [1] M. El Hachemi Benbouzid, “A review of induction motors signature
analysis as a medium for faults detection,” IEEE Trans. Ind. Electron.,
small labeled dataset, a promising routine is to apply a vol. 47, no. 5, pp. 984–993, Oct. 2000.
semi-supervised learning paradigm. [2] P. Zhang, Y. Du, T. G. Habetler and B. Lu,“A survey of condition
monitoring and protection methods for medium-voltage IMs,” IEEE
B. Future Research Directions Trans. Ind. Appl., vol. 47, no. 1, pp. 34–46, Jan./Feb. 2011.
[3] “Report of large motor reliability survey of industrial and commercial
The following research directions may facilitate the transi- installations, Part I,” IEEE Trans. Ind. Appl., vol. IA–21, no. 4, pp.
tion from research labs to real-world applications: 853-864, Jul./Aug. 1985.
6
[4] JEMA, “On recommended interval of updating IMs”, 2000. [31] H. Shao, H. Jiang, H. Zhao, and F. Wang, “A novel deep autoencoder
[5] T. Harris, Rolling Bearing Analysis, Hoboken, NJ, USA: Wiley, 1991. feature learning method for rotating machinery fault diagnosis,” Mech.
[6] J. Schmidhuber, “Deep learning in neural networks: An overview,” Syst. Signal Process., vol. 95, pp. 187–204, 2017.
Neural Netw., vol. 61, pp. 85–117, Jan. 2015. [32] Y. Bengio, Learning deep architectures for AI, Foundations and
[7] F. Immovilli, A. Bellini, R. Rubini and C. Tassoni, “Diagnosis of Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
bearing faults in induction machines by vibration or current signals: [33] Z. Chen and W. Li, “Multisensor feature fusion for bearing fault
A critical comparison,” IEEE Trans. Ind. Appl., vol. 46, no. 4, pp. diagnosis using sparse autoencoder and deep belief network,” IEEE
1350–1359, Jul./Aug. 2010. Trans. Instrum. Meas., vol. 66, no. 7, pp. 1693–1702, July 2017.
[8] M. Kang, J. Kim and J. Kim, “An FPGA-based multicore system for [34] S. Dong, Z. Zhang, G. Wen, S. Dong, Z. Zhang and G. Wen, “De-
real-time bearing fault diagnosis using ultrasampling rate AE signals,” sign and application of unsupervised convolutional neural networks
IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2319–2329, Apr. 2015. integrated with deep belief networks for mechanical fault diagnosis,”
[9] R. R. Schoen, T. G. Habetler, F. Kamran and R. G. Bartfield, “Motor in Proc. Progn. Syst. Health Manag. Conf., 2017, pp. 1–7.
bearing damage detection using stator current monitoring,” IEEE [35] D. T. Hoang and H. J. Kang, “Rolling bearing fault diagnosis using
Trans. Ind. Appl., vol. 31, no. 6, pp. 1274–1279, Nov./Dec. 1995. adaptive deep belief network with dual-tree complex wavelet packet,”
[10] D. López-Pérez and J. Antonino-Daviu, “Application of infrared ISA Trans., vol. 69, pp. 187–201, 2017.
thermography to failure detection in industrial IMs: case stories,” [36] D.-T. Hoang and H.-J. Kang, “Deep belief network and dempster-
IEEE Trans. Ind. Appl., vol. 53, no. 3, pp. 1901–1908, May 2017. shafer evidence theory for bearing fault diagnosis,” in Proc. IEEE
[11] E. T. Esfahani, S. Wang and V. Sundararajan, “Multisensor wire- Int. Sym. Ind. Electron. (ISIE), 2018, pp. 841–846.
less system for eccentricity and bearing fault detection in IMs,” [37] T. Liang, S. Wu, W. Duan, and R. Zhang, “Bearing fault diagnosis
IEEE/ASME Trans. Mechatronics, vol. 19, no. 3, pp. 818-826, 2014. based on improved ensemble learning and deep belief network,” J.
[12] B. Yazici and G. B. Kliman, “Adaptive, on line, statistical method and Phys.: Conf. Ser., vol. 1074, art no. 021054, pp. 1–7, 2018.
apparatus for motor bearing fault detection by passive motor current [38] W. Abed, S. Sharma, R. Sutton, and A. Motwani, “A robust bearing
monitoring,” U.S. Patent 5 726 905, Sep. 27, 1995. fault detection and diagnosis technique for brushless DC motors under
[13] M. A. Awadallah and M. M. Morcos, “Application of AI tools in non-stationary operating conditions,” J. Control Autom. Electr. Syst.,
fault diagnosis of electrical machines and drives – an overview,” IEEE vol. 26, no. 3, pp. 241–254, Jun. 2015.
Trans. Energy Convers., vol. 18, no. 2, pp. 245–251, June 2003. [39] H. Pan, X. He, S. Tang, F. Meng, “An improved bearing fault diagnosis
[14] F. Filippetti, G. Franceschini, C. Tassoni and P. Vas, “Recent develop- method using one-dimensional CNN and LSTM,” J. Mech. Eng., vol.
ments of induction motor drives fault diagnosis using AI techniques,” 64, no. 7–8, pp. 443–452, 2018.
IEEE Trans. Ind. Electron., vol. 47, no. 5, pp. 994–1004, Oct. 2000. [40] M. He and D. He, “DL based approach for bearing fault diagnosis,”
[15] L. Batista, B. Badri, R. Sabourin, and M. Thomas, “A classifier fusion IEEE Trans. Ind. Appl., vol. 53, no. 3, pp. 3057–3065, June 2017.
system for bearing fault diagnosis,” Expert Syst. Appl., vol. 40, no. 7, [41] Z. Chen, S. Deng, X. Chen, C. Li, R.-V. Sanchez, and H. Qin, “Deep
pp. 6788–6797, Dec. 2013. neural networks-based rolling bearing fault diagnosis,” Microelectron.
[16] R. Liu, B. Yang, E. Zio, and X. Chen, “Artificial intelligence for Rel., vol. 75, pp. 327–333, 2017.
fault diagnosis of rotating machinery: A review,” Mech. Syst. Signal [42] H. Shao, H. Jiang, H. Zhang, W. Duan, T. Liang, and S. Wu, “Rolling
Process., vol. 108, pp. 33–47, Aug. 2018. bearing fault feature learning using improved convolutional deep
[17] (2018, December). Amazon Mechanical Turk, [Online]. Available: belief network with compressed sensing,” Mech. Syst. Signal Process.,
https://www.mturk.com/. vol. 100, pp. 439–458, 2018.
[18] M. Raghu, B. Poole, J Kleinberg, S. Ganguli, and J.-S.Dickstein, “On [43] M. Zhao, M. Kang, B. Tang and M. Pecht, “Deep residual networks
the expressive power of DNNs,” arXiv:1606.05336, 2016. with dynamically weighted wavelet coefficients for fault diagnosis of
[19] K. Fukushima, “Neocognitron: A self-organizing neural network planetary gearboxes,” IEEE Trans. Ind. Electron., vol. 65, no. 5, pp.
model for a mechanism of pattern recognition unaffected by shift 4290–4300, May 2018.
in position,” Biolog. Cybernetics, vol. 36, pp. 193–202, 1980. [44] M. Zhao, M. Kang, B. Tang and M. Pecht, “Multi-wavelet coefficients
[20] S. Singh, C. Q. Howard, and C. Hansen, “Convolutional neural fusion in deep residual networks for fault diagnosis,” IEEE Trans. Ind.
network based fault detection for rotating machinery,” J. Sound Vib., Electron., vol. PP, no. PP, pp. PP–PP, 2018.
vol. 377, pp. 331–345, 2016. [45] C. Sun, M. Ma, Z. Zhao and X. Chen, “Sparse deep stacking network
[21] X. Guo, L. Chen, and C. Shen, “Hierarchical adaptive deep convo- for fault diagnosis of motor,” IEEE Trans. Ind. Informat., vol. 14, no.
lution neural network and its application to bearing fault diagnosis,” 7, pp. 3261–3270, July 2018.
Meas., vol. 93, pp. 490–502, 2016. [46] H. Liu, J. Zhou, Y. Zheng, W. Jiang, and Y. Zhang, “Fault diagnosis of
[22] C. Lu, Z. Wang, and Bo Zhou, “Intelligent fault diagnosis of rolling rolling bearings with recurrent neural network-based autoencoders,”
bearing using hierarchical convolutional network based health state ISA Trans., vol. 77, pp. 167–178, Jun. 2018.
classification,” Adv. Eng. Informat., vol. 32, pp. 139–157, 2017. [47] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between
[23] M. Xia, T. Li, L. Xu, L. Liu and C. W. de Silva, “Fault diagnosis for capsules,” in Proc. NIPS, 2017, pp. 3859–3869.
rotating machinery using multiple sensors and CNNs,” IEEE/ASME [48] Z. Zhu, G. Peng, y. CHen, and H. Gao, “A convolutional neural
Trans. Mechatronics, vol. 23, no. 1, pp. 101–110, Feb. 2018. network based on a capsule network with strong generalization for
[24] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient based bearing fault diagnosis,” Neurocomput., vol. 323, pp. 62–75, 2019.
learning applied to document recognition, Proceedings of the IEEE, [49] X. Li, W. Zhang, and Q. Ding, “A robust intelligent fault diagnosis
vol. 86, no. 11, pp. 2278–2324, Nov. 1998. method for rolling element bearings based on deep distance metric
[25] L. Wen, X. Li, L. Gao and Y. Zhang, “A new convolutional neural learning,” Neurocomput., vol. 310, pp. 77–95, 2018.
network-based data-driven fault diagnosis method,” IEEE Trans. Ind. [50] W. Qian, S. Li, J. Wang, and Q. Wu, “A novel supervised sparse
Electron., vol. 65, no. 7, pp. 5990–5998, July 2018. feature extraction method and its application on rotating machine fault
[26] W. Zhang, F. Zhang, W. Chen, Y. Jiang and D. Song, “Fault state diagnosis,” Neurocomput., vol. 210, pp. 129–140, 2018.
recognition of rolling bearing based fully convolutional network,” [51] D. H. Ballard, “Modular learning in neural networks,” in Proc.
Comput. in Sci. & Eng., vol. PP, no. PP, pp. PP–PP, 2018. AAAI’87 National Conf. AI, Seattle, WA, 1987, pp. 279–284.
[27] F. Jia, Y. Lei, J. Lin, X. Zhou, and N. Lu, “Deep neural networks: A [52] C. Beleites, U. Neugebauer, T. Bocklitz, C. Krafft, and J. Popp,
promising tool for fault characteristic mining and intelligent diagnosis “Sample size planning for classification models,” Analytica Chimica
of rotating machinery with massive data,” Mech. Syst. Signal Process., Acta, vol. 760, pp. 25–33, 2013.
vol. 100, pp. 743–765, 2016. [53] W. Zhang, G. Peng, C. Li, Y. Chen, Z. Zhang, ”A new deep learning
[28] W. Mao, J. He, Y. Li, and Y. Yan, “Bearing fault diagnosis with auto- model for fault diagnosis with good anti-noise and domain adaptation
encoder extreme learning machine: A comparative study,” Proc. Inst. ability on raw vibration signals,” Sensors., 17 (2) (2017), p. 425.
Mech. Eng. C, vol. 231, no. 8, pp. 1560–1578, 2016. [54] W. Zhang, C. Li, G. Peng, Y. Chen, Z. Zhang, ”A deep convolutional
[29] C. Lu, Z.-Y Wang, W.-L. Qin, J. Ma, “Fault diagnosis of rotary neural network with new training methods for bearing fault diagnosis
machinery components using a stacked denoising AE-based health under noisy environment and different working load,” Mech. Syst. and
state identification,” Signal Process., vol. 130, pp. 377–388, 2017. Sig. Proc., vol. 100, 2018, p. 439.
[30] X. Guo, C. Shen, and L. Chen, “Deep fault recognizer: An integrated
model to denoise and extract features for fault diagnosis in rotating
machinery,” Appl. Sci., 2017, vol. 7. no. 41, pp. 1–17.

TR2019 084

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

TR2019 084

Încărcat de

Drepturi de autor:

Formate disponibile

MITSUBISHI ELECTRIC RESEARCH LABORATORIES

Deep Learning Algorithms for Bearing Fault Diagnostics – A

Fig. 2. Architecture of the CNN-based fault diagnosis model.

and complicated object models in latter layers. The classification

lowest visible layer. The DBNs can be trained greedily, one

Fig. 5. Restricted Boltzmann machine.

visible units and hidden units. Connections only exist between

One of the earliest application x −ofxmin

Feature extraction No. hidden Training sample

S-ar putea să vă placă și