Documente Academic
Documente Profesional
Documente Cultură
Abstract: How to recognize targets with similar appearances and the reduction of acquisition costs, a large number of re-
from remote sensing images (RSIs) effectively and efficiently has mote sensing images (RSIs) of the Earth are available each
become a big challenge. Recently, convolutional neural network day. They are taken from satellites, airplanes or unmanned
(CNN) is preferred in the target classification due to the powerful
aerial vehicles (UAVs), with flexible and varied modalities,
feature representation ability and better performance. However,
the training and testing of CNN mainly rely on single machine.
and spatial and spectral resolutions.
Single machine has its natural limitation and bottleneck in pro- In recent years, deep learning methods [1,2], such as
cessing RSIs due to limited hardware resources and huge time convolutional neural network (CNN), are preferred to the
consuming. Besides, overfitting is a challenge for the CNN model RSIs classification tasks. All these studies reveal that the
due to the unbalance between RSIs data and the model structure. feature representation of a deep architecture achieves bet-
When a model is complex or the training data is relatively small, ter performance than traditional approaches [3 – 10]. Com-
overfitting occurs and leads to a poor predictive performance. To pared with hand-crafted features, such as histogram of ori-
address these problems, a distributed CNN architecture for RSIs
ented gradient (HOG) and scale-invariant feature trans-
target classification is proposed, which dramatically increases the
training speed of CNN and system scalability. It improves the
form (SIFT), CNN is able to automatically learn multiple
storage ability and processing efficiency of RSIs. Furthermore, stages of invariant features for the specific task and has
Bayesian regularization approach is utilized in order to initialize enjoyed the success in a great deal of applications. CNN
the weights of the CNN extractor, which increases the robustness shows a strong robustness against geometric distortions,
and flexibility of the CNN model. It helps prevent the overfitting such as shifts, scaling and inclination, due to the hierar-
and avoid the local optima caused by limited RSI training images chical learning structure. Furthermore, unlike many tradi-
or the inappropriate CNN structure. In addition, considering the tional methods, CNN can learn features automatically for
efficiency of the Naı̈ve Bayes classifier, a distributed Naı̈ve Bayes RSIs classification, which is suitable for real-time applica-
classifier is designed to reduce the training cost. Compared with
tions.
other algorithms, the proposed system and method perform the
best and increase the recognition accuracy. The results show that
However, traditional deep learning algorithms by the
the distributed system framework and the proposed algorithms are single machine applied to RSIs show inadequacies in com-
suitable for RSIs target classification tasks. putational capability and storage ability. The storage and
processing of RSIs are time-consuming with personal com-
Keywords: convolutional neural network (CNN), distributed ar-
puter due to the limitations of both hardware and software
chitecture, remote sensing images (RSIs), target classification,
resources. CNN can also be trained with the graphics pro-
pre-training.
cessing unit (GPU); however, the major drawback of the
DOI: 10.21629/JSEE.2019.02.02 GPU is similar to the single machine that is hopeless to
handle huge computation and huge storage/memory cost.
Therefore, clusters of computers and distributed systems
1. Introduction
are no doubt the promising choices [11 – 14]. The key to
With the rapid progress in the remote sensing technology implementing distributed processing of RSIs is developing
architecture in a parallel manner, as well as providing an
Manuscript received September 11, 2017.
effective distributed storage platform for RSIs data [15,16].
*Corresponding author.
This work was supported by the National Natural Science Foundation As a scalable framework, the distributed architecture
of China (U1435220). has been widely utilized for various tasks. Chu et al.
LI Binquan et al.: Effective distributed convolutional neural network architecture for remote sensing images target ... 239
[17] demonstrated that the MapReduce framework is suit- classifier is proposed in order to increase the training
able for some machine learning techniques including sup- speed.
port vector machine (SVM), logistics regression (LR) and Above all, our contributions can be summarized as fol-
Naı̈ve Bayes. lows. Firstly, a distributed CNN architecture is proposed
Motivated by the powerful data processing ability of the for training the networks, which dramatically increases the
distributed framework and the excellent performance of training speed and system scalability. It improves the stor-
CNN, a distributed framework is proposed for RSIs target age ability and processing efficiency of RSIs. Secondly,
classification tasks. a pre-training algorithm is proposed to increase the ro-
Nevertheless, the training of CNN encounters some bustness, flexibility and classification precision of CNN.
limitations. For example, the training of fully connected It prevents the training from overfitting and local min-
ima. Thirdly, a distributed Naı̈ve Bayes algorithm using
layers may suffer from overfitting problems and local min-
MapReduce is proposed in order to decrease the training
ima. Especially, the CNN model must control its parame-
cost and increase the efficiency of the classifier. Compared
ters properly when trained from small data sets. As a result,
with other algorithms, the distributed system and the pro-
with the limited RSI training images or an unsuitable CNN
posed method are suitable for RSIs target classification
structure, the balance between model complexity and deep tasks.
architecture should be properly controlled. The remainder of this paper is constructed as follows. In
To address this problem, Bayesian regularization Section 2, the distributed RSIs target classification systems
is combined to pre-train the networks and initialize are proposed. The experiment and analysis are presented in
the weights. Bayesian approach can potentially avoid Section 3. Finally, this paper is concluded in Section 4.
the above pitfalls in training neural networks [18,19].
Bayesian principle can not only automatically infer hyper-
2. Framework of distributed RSIs target
parameters by marginalizing them out of the posterior dis-
classification system
tribution, but also naturally account for the uncertainty in The distributed RSIs target classification system is illus-
parameter estimates and propagate the uncertainty to pre- trated in Fig. 1. It consists of two stages: training and
dictions. Furthermore, Bayesian techniques are often more recognition. In the training stage, training samples are
robust to overfitting since they average over values of pa- trained from the feature extractor to the classifier. In the
rameters rather than choose a single point estimate. recognition stage, images are sent to the trained framework
For the classifier designing, a distributed Naı̈ve Bayes for recognition.
2.1 Implementation of pre-training strategy extracted by fully connected layers. Then the CNN fea-
ture vectors of RSIs are sent to the Naı̈ve Bayes classifier.
The training stage consists of the feature extractor and Although backpropagation is widely used in the training
the Naı̈ve Bayes classifier. The feature extractor can ex- of neural networks [20], it still has some disadvantages,
tract local features and global features. We utilize convo- such as overfitting the training data. To address the prob-
lutional layers to obtain local features. Global features are lems, the Bayesian approach is introduced as pre-training
240 Journal of Systems Engineering and Electronics Vol. 30, No. 2, April 2019
(ii) Output the key, value pairs as weight w, global It outputs key, value pairs as classCi , (count(x1 ),
Δw. count(x2 ), . . . , count(xn )).
2.3 Designing of MapReduce based Naı̈ve Reduce stage: Reduce by classCi , and compute the
Bayes classifier global frequency of xj in Ci . The output key, value pairs
are as follows:
In machine learning, Naı̈ve Bayes classifiers are a fa-
mily of simple and efficient probabilistic classifiers based
classCi , (globalfrequency(x1 ),
on Bayes’ principle with independence assumptions be-
globalfrequency(x2 ), . . . ,
tween the features [25].
globalfrequency(xn )).
The CNN feature extracted from each training sample
can be represented as X = {x1 , x2 , . . . , xn }. Suppose
there are m possible classes {C1 , C2 , . . . , Cm }. The prob-
ability of a new CNN feature X being in class Ci can be
computed using Bayes’ rule:
P (X|Ci )P (Ci )
P (Ci |X) = . (6)
P (X)
Based on the independence assumptions, P (X|Ci ) can
be computed as follows:
P (X|Ci ) = P (x1 |Ci )P (x2 |Ci ) · · · P (xn |Ci ) =
n
P (xj |Ci ). (7)
j=1
As shown above, the distinction between aircraft and the pre-training strategy helps CNN achieve better perfor-
ship class is relatively high. For the ship class, the confu- mance, even with only 10% of the training samples. It re-
sion between warship and aircraft carrier is higher than that veals that the proposed method is more robust than those
between them and Freighter. For the airplanes class, there without pre-training.
is a high degree of discrimination between helicopters and
other types of aircraft. However, there is some confusion
between Helicopter-1 and Helicopter-2. Early warning air-
craft and Airliner have a certain degree of confusion. As
the bombers vary in appearance, they will be confused with
other aircraft types whose shapes are similar to them, such
as Bomber-1 and the fighters, Bomber-2 and transportation
planes. For the fighter sub-category, because the fighters’
appearances are very similar to each other, the confusion
between various fighters is obviously higher. In the follow-
ing, there is a certain degree of confusion between trans-
portation airplanes due to the similar shape. For airliners,
there is a certain confusion between them and the Early
warning aircraft. In general, CNN achieves more discrimi-
Fig. 7 Robustness evaluation of CNN model in different percentage
native feature representations than hand-crafted features. It of training samples
performs better than the methods without pre-training.
Besides, the computational cost is computed with dis- In addition, accuracy with different convolution lay-
tributed system or not. As shown in Table 6 and Fig. 6, ers is compared. As shown in Fig. 8, the observation
the distributed system reduces the computational cost in demonstrates that a deeper architecture can provide more
both the training and testing stages. It increases the pro- discriminative feature representation of RSIs. The CNN
cessing efficiency and is suitable for RSIs target classifica- model with pre-training method performs robustly due to
tion tasks. the automatic adaption of various model structures, even
with one convolution layer.
Table 6 Computational cost of training stage
Training method Time/h
CNN 12.9
Distributed CNN 4.2