Documente Academic
Documente Profesional
Documente Cultură
https://doi.org/10.1007/s11554-019-00852-3
Abstract
This paper considers the accident images and develops a deep learning method for feature extraction together with a mixture
of experts for classification. For the first task, the outputs of the last max-pooling layer of a Convolution Neural Network
(CNN) are used to extract the hidden features automatically. For the second task, a mixture of advanced variations of Extreme
Learning Machine (ELM) including basic ELM, constraint ELM (CELM), On-Line Sequential ELM (OSELM) and Kernel
ELM (KELM), is developed. This ensemble classifier combines the advantages of different ELMs using a gating network
and its accuracy is very high while the processing time is close to real-time. To show the efficiency, the different combina-
tions of the traditional feature extraction and feature selection methods and the various classifiers are examined on two kinds
of benchmarks including accident images’ data set and some general data sets. It is shown that the proposed system detects
the accidents with 99.31% precision, recall and F-measure. Besides, the precisions of accident-severity classification and
involved-vehicle classification are 90.27% and 92.73%, respectively. This system is suitable for on-line processing on the
accident images that will be captured by Unmanned Aerial Vehicles (UAV) or other surveillance systems.
Keywords Feature extraction · Accident images’ classification · Convolutional neural networks · Mixture of ELM ·
Ensemble learning
13
Vol.:(0123456789)
Journal of Real-Time Image Processing
Table 1 Some related works and systems in accident detection and feature extraction from accident
Ref. no. Goal Proposed method Limitation
[19] Accident detection Using histogram of flow gradient for accident detection Tracking the individual in crowded scenes was
impossible
[20] Fusion of the wireless sensors Maybe the sensors are not active*
[21] Using neural networks, Radon transform for angle It used satellite images for accident detection*
detection and traffic-flow measurements
[22] Using Gaussian Mixture Model to detect vehicles; then It cannot be used in the different weather conditions*
mean shift algorithm for vehicle tracking
[23] Using SVM and neural network It is limited to some special sensors
[24] Accident detection by GPS and GSM GPS data cannot be accessed everywhere
[25] Vehicle classifica- Using Eigen-window method on images provided by Its accuracy is low and it cannot detect motorcycles*
tion (no accident) B-snake
[26] Using YOLO model for vehicle detection and AlexNet It did not detect motorcycles*
for vehicle classification
[27] Using k-mean values for initial labelling and then using It used fixed CCTV images for classification*
linear SVM to identify the low-confident samples
[28] Using CNN for vehicle type detection The running time is high
[29] Severity of accident Using gradient boosting to analyze the relationship It cannot be used in the different weather conditions*
between crash severities and risk factors
[30] Using a series of neural networks to model the nonlin- It cannot estimate the risk
ear relationships between the injury severity levels
and crash-related factors
[31] Using decision tree for accident severity classification The focus is just accident severity*
[32] Using incident location, date, time, traffic lanes to It just focused on accident severity*
determine the severity of the accident
Convolutional Neural Network (CNN) is one of the best They used the outputs of ELM after learning a limited epoch
methods for image processing in the recent literature [14]. of the CNN. But its recognition rate is almost small.
It can extract the hidden features by the aid of pooling layers. Besides, some of the ensemble methods have been used
Usually, the outputs of the last pooling layer are applied for for image processing and some classifiers were combined
classification and regression purposes. We propose a mixture with some special feature selection methods [19, 20].
of ELM for the classification task. Yu et al. [21] developed a hybrid of CNN and ELM for
CNN is also capable of real-time image processing image classification. They used CNN for feature extrac-
issues. For example, Redmon et al. [15] used YOLO model tion and ELM for classification. Nevertheless, the perfor-
for object detection in images. They compared this model mance of ELM is not excellent for general cases. In fact,
with other models such as R-CNN and VGG and showed we do not have a fixed solution for all issues. In [22],
that YOLO has better results in real-time detection. In other a shallow non-convolutional neural network was trained
work, Ahn [16] proposed a system for real-time video object by ELM for MNIST benchmark. The results showed that
recognition using CNNs. This system has been implemented the classification error rate was about 1%. Such evidences
on a mid-range FPGA and achieves a computational speed encourage us to develop an ensemble method based on
greater than 170,000 classifications per second, and per- ELM for accident images analysis. Li et al. [20] proposed
forms scale-invariant object recognition from a 720 × 480 a dynamic ensemble classifier that uses a random feature
video stream at a speed of 60 fps. This shows that our pro- selection method to generate diverse classifiers. In [19], a
posed system can be used in real-time applications. three-phase ensemble system based on Mixture of Experts
In Table 1, some methods are reviewed that have been (ME) has been proposed. The authors selected an optimal
used for accident-image processing. It is worthy to note that, subset of features in the first phase and used them for train-
CNN takes the image using a series of convolutional, nonlin- ing of the experts with the standard ME algorithm. In all
ear, and max-pooling layers. It provides a good description of these ensemble methods, the learning algorithms must
about the image including useful features that can be applied be adapted by incorporating a penalty correlation term in
for classification purposes [17, 18]. In the latter reference, their error functions [23, 24]. In addition, the implicit and
ELM was applied to improve the learning speed of CNN. explicit features can be used in these algorithms and they
13
Journal of Real-Time Image Processing
2.3 System outputs
2 Accident analysis system
The results of the processing part are sent to the control
To analyze the accidents based on the image process- center for making a decision for calling an accident manage-
ing, in this section, we study a system using CNN and ment procedure as quickly as possible.
mixture of ELMs. This system takes images from vari-
ous sources, such as surveillance cameras, on-site people,
and UAVs, see, e.g., [28, 29]. Then the necessary features
are extracted and they are sent for classification. In the 3 System architecture
first step of our proposed system, the system recognizes
whether or not the accident happens. If the response is In this section, different algorithms for the feature extrac-
positive, the system calls two other ensemble classifiers tion from accident images are discussed. On the extracted
to recognize the types of involved vehicles in the accident features, we try to select the related features and classify the
and the severity of the accident. Details are presented in images into defined classes. We need to examine the compo-
the following. nents of Fig. 1 for any task of the accident analysis system.
13
Journal of Real-Time Image Processing
Fig. 2 The output of traditional feature extraction methods: a original image, b HOG features, c MSER features, d SURF features
3.1 Traditional feature extraction image, first, we extract the features by LBP, HOG, SURF,
and MSER. Then, we save these features in a matrix. This
Due to the fact that the images are received from vari- matrix is very large while the most entries are unprofitable
ous sources, image characteristics such as their angle and and they can decrease the accuracy. Finally, we use PCA,
intensity are different. Thus feature extraction from these CFS, and BBF on this matrix to reduce the dimension of
images, needs a hard work. The traditional feature extrac- the features and also to select the relevant and important
tion methods that are compared in our experiments are as features. It is worthwhile to note that PCA is one of the well
the following: known algorithms for this purpose. On the other hand, PCA
is one of the common methods for band reduction in image
• LBP [10]: It has been developed for 2D texture analy- processing. For example, in [34], PCA method has been used
sis. In this algorithm, a local structure is applied on the for band reduction on coloured images. Thus, it seems that
images and each pixel is compared with its neighboring the usage of PCA is very useful for the current study.
pixels.
• HOG [11]: It calculates the gradient in x and y directions 3.2 Advanced feature extraction by convolution
for each pixel and gives the size and direction for each of neural network
these gradients.
• MSER [12]: It is one of the bubble detection methods in When we tried to extract features from accident images by
the image processing. It uses the input image intensity traditional nonlinear feature extractors such as LBP, HOG,
range to detect stable regions. These regions are defined SURF, and MSE, we found out that the classification accu-
when the variations in their colour intensities are not racy was not acceptable. Probably, due to the collapse of
greater than a threshold. This property examines the the vehicles involved in the accident scene and disruption
different thresholds. These changes should be less than of colors, we cannot analyze the accidents based on the tra-
the input value so that the region is detected as a stable ditional features. In fact, the corner, the shape and the color
region. of vehicles in such images are completely irregular. Besides,
• SURF [13]: It uses a square filter (Hessian matrix) to the accident images have been collected from different
determine the points. The Hessian matrix measures the sources and in many cases, the properties of images such as
local changes around the pixels, and where the norm of light intensity and image angle are different. To extract the
this matrix is maximized, the corresponding point is con- meaningful features in these cases, extracting some hidden
sidered as a candidate for SURF features. and deep features, may improve the classification efficiency.
Since, CNN could greatly improve the detection accuracy
The output of features extracted from an accident image by extracting hidden features of images that are distorted by
is shown in Fig. 2. After extracting these features, they are various factors, we encouraged to apply CNN on accident
merged into a matrix. This causes redundancy in the matrix. images. Indeed, one of the main advantages of CNN is its
In addition, some of the extracted features are abusive fea- ability to extract the proper features from large data disper-
tures for classifying step and it may lead to wrong results. sion. However, CNN can be also used to classify the images
For these reasons, we need to find the proper features to clas- [14]. Thus, we should present the exact role of CNN for
sify the samples efficiently. For feature selection phase, three accident image analysis in our paper. For this end, we briefly
methods are used; Principal Component Analysis (PCA) describe the details of CNN in our implementation. As pre-
[31], Correlation-based Feature Selection (CFS) [32] and sented in Fig. 3, this CNN includes the following layers:
Bagging-Based Feature selection (BBF) [33]. Thus, for each
13
Journal of Real-Time Image Processing
Conv 1
28×28 Conv 2 FC SM
Conv 3
Input layer 14×14 7×7 Conv 4 car
3×3
truck
MP MP
Pool[2,2] Pool[2,2]
MP motorcycle
F(3,16) F(3,32) F(3,64) Pool[2,2] F(3,128)
Stride[2,2] Padding[1,1] Stride[2,2] Padding[1,1] Stride[2,2] Padding[1,1]
Padding[1,1]
Fig. 3 Configuration of CNN baseline to classify the involved vehi- as the stride size, “FC” as the fully connected, “SM” as the Softmax
cles in the accident (denoting “MP” as the max-pooling, “F” as the (adapted from [49])
filter, “Conv” as the convolution, “Pool” as the pooling size, “stride”
• Input layer This layer gets images with the size of are different in these images. We will show that CNN can
28 × 28 × 3. extract the hidden features based on these issues. In fact,
• Convolution layer In our model, four convolution layers CNN does three tasks: feature extraction by convolution
are used. These layers consist of 16, 32, 64, and 128 fil- layer, feature selection by pooling layer and classification
ters. The size of filters in these layers is 3 × 3 and padding by fully connected layer. However, in some papers, CNN just
size is 1. has used for feature extraction and feature selection [35, 36].
Similarly, in the first step of our method, we extract the fea-
– In the first convolution layer, the number of used tures using CNN. These features are mainly related to image
weights for a filter is 27 and the number of output is texture and image colors. For each image, CNN extracts a
28 that fully covered whole of image. The number vector with size of 1 × n such that n is the number of classes.
of neurons in this layer is 28 × 28 × 16. For example, for detecting the type of involved vehicles in
– In the second convolution layer, the number of the accident, the extracted feature vector for any image is a
weights is 32 × 32 × 3 and the number of output is 14. 1 × 3 vector with respect to three classes of vehicles. This
– In the third convolution layer, the number of output vector is considered as the input of the classification algo-
is 7. rithm. In fact, we have transferred any image to the features
– In the last convolution layer, the size of output is 3. that are extracted by the last pooling layer. The idea of this
part is similar to transfer learning approach that follows a
• Batch normalization layer After each convolution layer,
different feature space for knowledge transfer; but we do not
to normalize the features extracted by convolution layer,
have transfer of learning between the different domains of
we define the batch normalization layer. Our model con-
interest, see, e.g., [37] to understand the differences between
tains four batch normalization layers.
our approach and transfer learning in details.
• Rectified linear unit (ReLU) layer This layer uses max
Furthermore, the CNN baseline classifies the samples
(0,x) as the activation function for each neuron, which
using an MLP in the last layer. Our approach is replacing this
converts the negative values to zero. Our model contains
MLP with a more rapid classifier. Thus, the extracted fea-
four ReLU layer.
tures of CNN are sent to a new mixture of ELMs. Note that,
• Max-pooling layer We define four max-pooling layers
a linear system should be solved to adapt synaptic weights in
in our proposed model. The pool size in these layers is
the last layer of ELM. By utilizing three approaches includ-
[2,2]. Also the stride size of each layer is [2,2]. The size
ing direct method, decomposition method and iterative
of the output of these max-pooling layers is 14, 7, 3, and
method, these weights can be determined. Thus, the weight
1, respectively.
adaptation process of ELM can be implemented similar to
• Fully connected layer Our model contains one fully con-
MLP, iteratively. Since passing the error happens in MLP
nected layer (see Fig. 3).
not in ELM, in many experiments, the results of the trained
• Softmax layer The last layer converts fully connected
MLP are superior to those of ELM while the speed of ELM
outputs to a probability distribution on the classes.
is still better. Thus, when the processing time is more impor-
tant, the usage of mixture of ELMs is defensible.
Now to extract the features from accident images, we
focus on some different issues in the accident images. As
one can note, the angle, the light intensity, the weather con-
dition, the accident time and the background of accidents
13
Journal of Real-Time Image Processing
Table 2 Accident detection with Feature extractor Classifier Precision (%) Recall (%) F-measure (%) Accuracy (%)
hybrid of CNN and a different
classifier CNN OSELM 98.97 98.97 98.97 98.97
ELM 98.91 98.91 98.91 98.91
CELM 98.02 98.02 98.02 98.03
KELM 98.57 98.56 98.56 98.57
MLP 98.44 98.44 98.44 98.44
Stacking 94.67 94.65 94.7 94.7
XGBoost 94.33 94.33 94 94.76
SVM 94.33 94.33 94 94.67
RBF 94.33 94.33 94 94.8
MELM 99.31 99.31 99.31 99.31
Baseline CNN 90.37 90.37 90.33 90.84
Table 3 Severity classification Feature extractor Classifier Precision (%) Recall (%) F-measure (%) Accuracy (%)
with hybrid of CNN and
different classifiers CNN OSELM 84.66 71.66 74.66 91.22
ELM 85.49 72.94 76.08 91
CELM 60.23 62.54 61.35 90.15
KELM 93.47 63.38 63.10 90.15
MLP 93.14 90.10 91.59 93.04
Stacking 84.52 80.7 68.8 72.3
XGBoost 76 69.33 71.66 84.75
SVM 88.66 57.6 56.9 83.7
RBF 84 63.33 66.66 84.24
MELM 90.27 69.58 72.84 91.5
Baseline CNN 67.54 48.93 49.66 68.40
13
Journal of Real-Time Image Processing
understand why ME is necessary, the results of Tables 2, will be discarded as soon as boosting phase is completed.
3 and 4 can be considered. As one can see, the perfor- After boosting phase, OSELM will learn the training data
mances of the different neural networks for a single sub- one-by-one or chunk-by-chunk and all training data will be
problem of accident analyses are very different. However, discarded once the learning procedure completes [41].
after training phase, we can recognize the best classifier. We will show that the hybrid system, which uses CNN for
Thus, we cannot select the best classifier for a general feature selection and mixture of ELMs (MELM) for classifi-
task in common situation and we need an ME to combine cation, entitled as CNN–MELM is very efficient for complex
the capability of the different neural networks to classify images’ analysis. The final architecture of CNN–MELM is
general samples with more accuracy. presented in Fig. 4. In our experiments, OSELM on the acci-
In this paper, we develop an ME including the state-of- dent detection data set, KELM for determining the type of
the-art variations of ELM. They are ELM baseline [39], involved vehicles in accidents, and OSELM for determining
KELM [40], OSELM [41] and CELM [42]. As one can the severity of the accidents provide the best results. Thus
know, ELM is a kind of machine learning algorithm that the usage of different ELM algorithms for classification is
uses the topology of feedforward neural networks to solve approved. Although in our implementation, MELM consists
classification, regression, clustering, sparse approximation, of ELM, KELM, CELM, and OSELM, it can be extended by
compression, and feature learning. This network includes a any new version of rapid classifier.
single layer of hidden nodes, where the parameters of hid- Finally, in CNN–MELM, after classifying the extracted
den nodes (not just the weights connecting inputs to hidden features by different ELMs, a gating network is needed to
nodes) need not be tuned. In most cases, the output weights aggregate the results of the used classifiers. We examine
of hidden nodes of ELM are determined by solving a linear three gating networks including a plurality voting method
system in a single step. Thus, ELM algorithm can overcome (majority voting method), a Behaviour Knowledge Space
the problems such as local minimum, obtaining learning (BKS) [44] and decision templates [45]. In plurality vot-
parameters, and over-fitting. These problems can widely ing algorithm, the total votes received by each class are
happen in the traditional gradient-based learning algorithms obtained, and then the class with the highest number of votes
for training of the shallow networks such as MLP [43]. is selected as the result. An extensive and excellent analy-
Now, to present some details about ELM variations, we sis of this voting approach can be found in [46]. BKS uses
review some methods. In KELM, a kernel function such as a lookup table that lists the most common correct classes
Radial Base Function (RBF) is used to increase the classifi- for every possible class combination. Decision templates
cation accuracy [40]. In CELM, the weight vectors from the compute a similarity measure between the current decision
input layer to the hidden layer are constrained by drawing profile of the unknown instance and the average decision
the closed set of difference vectors between class samples. profiles of instances from each class. By experimental study,
They are the set of vectors connecting samples of one class we show that they are almost similar. We have used the first
to samples of a different class [42]. The OSELM consists of one in our proposed system. Based on the architecture of
two main phases; a boosting phase to train the SLFNs using Fig. 4, the main steps of CNN–MELM can be stated as the
the primitive ELM method with some batches of training following:
data for initialization stage. These boosting training data
13
Journal of Real-Time Image Processing
13
Journal of Real-Time Image Processing
Table 5 Parameters’ analysis for different classification algorithms that are used on the extracted features from accident images analysis data set
Classifier Parameters Range of accuracy Best configuration
RBF Batch size ∈ {50,100,150,200}, ridge ∈ Accuracy ∈ [86.9,89.30] Batch size = 150, ridge = 0.01
{0.1,0.01,0.001}
XGBoost Batch size ∈ {50,100,150}, seed ∈ Accuracy ∈[87.02,88.61] Batch size = 100, seed = 1
{0.5,1,1.5}
MLP Num-neurons ∈ {15,20,25}, learning- Accuracy ∈ [94.04,94.8] Num-neurons = 20, learning-
rate ∈ {0.1,0.2,0.3,0.4} rate = 0.2
ELM Num-neurons ∈{10,20,30} Accuracy = 92.03 Num-neurons = 20
KELM Num-neurons ∈{160 ,180,200,220} Accuracy ∈[92.29,92.54] Num-neurons = 180
OSELM Num-neurons∈{160,180,200,220} Accuracy ∈ [89.38, 91.28] Num-neurons = 180
Initial-training-data ∈{260,280,300,320} Initial-training-data = 300
Size-data-block ∈ {5,10,20,30} Size-data-block = 10
CELM Num-neurons ∈{150,200,250} Accuracy ∈ [48.29, 69.03] Num-neurons = 200
Fig. 6 MLP accuracy with respect to the different learning rates Fig. 7 OSELM accuracy with respect to the size of data block
13
Journal of Real-Time Image Processing
Table 6 Comparison between accuracies of different hybrids of feature selectors and classifiers on accident images’ analysis data set [62]
Pre-processing on the features Classifier (%)
Feature extractor Feature selector Stacking XGBoost MLP RBF SVM ELM OSELM CELM KELM FC*
LBP + HOG + PCA [45] 57.49 53.88 52.02 52.49 53.28 36.19 38.68 49.61 48.62 −
SURF+MSER CSF [46] 55.73 55.20 57.6 52.57 54.65 57.8 55.91 33.3 58.65 –
BBF [47] 47.20 47.76 48.59 46 47 56.65 56.34 55.64 52.76 –
CNN Max-pooling 89.4 87.02 94.8 89.30 86.83 92.03 91.28 69.03 92.29 –
layer
Baseline CNN – – – – – – – – – 73.8
Table 7 Finding the best configuration for CNN for accident images’ analysis data set [62]
Con- C M C M C M C FC Acc.
figuration (%)
index
1 C (fi (2,16), M (Pol (2,2), C (fi (2,32), M (Pol (2,2), C (fi (2,64), M (Pol (2,2), C (fi (2,128), F(O(2)) 63.46
P(3,3)) Stir(2,2)) P(3,3)) Stir(2,2)) P(3,3)) Stir(2,2)) P(3,3))
2 C (fi (3,16), M (Pol (2,2)), C (fi (3,32), M (Pol (2,2), C (fi (3,64), – – F(O(2)) 70.16
P(1,1)) Stir(2,2)) P(1,1)) Stir(2,2)) P(1,1))
3 C (fi (3,16), M (Pol (2,2)), C (fi (3,32), M (Pol (2,2), C (fi (3,64), M (Pol (2,2), C (fi (3,128), F(O(2)) 73.8
P(1,1)) Stir(2,2)) P(1,1)) Stir(2,2)) P(1,1)) Stir(2,2)) P(1,1))
*C Convolution layer, fi: filter, P padding, M max-pooling layer, Pol poll-size, Stir stride, FC fully connected layer, O output, ACCaccuracy
Table 5 shows the results of OSELM when the num- data as training and testing data, respectively. In addition, we
ber of initial training data and the size of data block used fivefold cross validation. Table 6 shows the accuracy
in each step are varying. The best values are given in of different combinations of feature selectors and classifiers.
the last column of Table 5. In Fig. 7, the behaviour Bold values show the best obtained results throughout the
of OSELM is presented when the size of data block paper. As one can see on the traditional feature extractors,
changes. PCA as the feature selector and stacking as the classifier
• In CELM, the number of hidden nodes is also ana- provides the best accuracy 57.49%. While, CSF as the clas-
lyzed in Table 5 when the activation function is sifier and KELM as the classifier produce 58.65% accuracy.
sigmoid function. In KELM, the number of hidden BBF as the feature selector and ELM classifier give the best
nodes is also analyzed in Table 5 when the kernel results while its accuracy is less than the previous combina-
function is RBF. tions. However, the accuracies of all of these experiments
are worse than baseline CNN with 73.8% accuracy. It is
As a conclusion for this subsection, the range of accuracy worthwhile to note that hybrid of CNN features and KELM
of MLP, ELM, and KELM for accident analysis is better classifier provides 92.29% accuracy, which is the best result
than the other classifiers when the features are extracted by among all of the methods.
the last pooling layer of CNN. We discuss on this topic in
the next subsections. 4.3 Best configuration for CNN
13
Journal of Real-Time Image Processing
Table 8 The learning Training algorithm for MLP Cross entropy Accuracy (%) Training time Accuracy/
algorithms for MLP as classifier (seconds) training
to classify the features extracted time
by the pooling layer of CNN
Scale-conjugate-gradient 0.062 94.42 25 3.78
Levenberg–Marquardt 0.029 94.8 27 3.51
BFGS quasi-newton 0.062 94.49 20 4.72
Resilient backpropagation 0.065 94.57 18 5.25
Conjugate gradient with Powell restart 0.064 94.38 18 5.24
Fletcher–Powell conjugate gradient 0.063 94.42 15 6.29
Polka–Ribiere conjugate gradient 0.062 94.49 15 6.30
One step secant 0.063 94.49 12 7.87
Gradient-descent with momentum 0.064 94.42 17 5.55
Table 9 Comparison between different gating networks for CNN– implementation. We checked different MLP with different
MELM on accident images analysis data set [62] hidden layers and different number of neurons and finally
Feature extractor Classifier Gating network Accuracy we used an MLP with 3 hidden layers and 20 neurons in
(%) each hidden layer for our experiment. Then, the accuracy
of different learning algorithms was compared. As one can
CNN MELM Plurality voting 92.66
see in Table 8, the best cross entropy has been determined
BSK 92.52
for Levenberg–Marquardt algorithm. Also, the accuracies
Decision templates 92
of the different learning algorithms are between 94.38 and
94.8. Again, Levenberg–Marquardt gets the best accuracy.
last layer, in what follows, we focus on its MLP configura- Since the processing time is important for real-time
tion to improve the results. applications, we used a new measure by determining the
rate of accuracy to the processing time. Because, we need
to maximize the accuracy and minimize the processing
4.4 Best configuration for MLP time, the maximal rate shows the best learning algorithm.
As one can see in the last column of Table 8, “one step
In this subsection, the accident images’ analysis data set [51] secant” algorithm gets the best score. In what follows, we
is used to find the best configuration for MLP. For this aim, consider this algorithm for MLP training process.
the features of the last pooling layer of CNN are considered
as the inputs for MLP. We have used MATLAB software for
Table 10 Comparison between Feature selector Classifier MNIST Air pollution Brain tumor data set Vehicle
the percent of accuracy of the image data
hybrids of CNN with different set
classifiers
CNN KELM 99.62 66.92 93.68 98.91
ELM 99.75 63.91 88.51 98.84
CELM 99.51 63.75 69.83 82.05
OSELM 99.60 63.75 90.2 98.77
MLP 99.76 62.11 88.80 99.7
Stacking 99.73 59.42 86.91 99.76
XGBoost 99.71 62.73 87.33 99.64
SVM 99.8 58.59 87.51 99.53
RBF 77.50 60.24 86.84 99.75
MELM 99.70 68.42 92.81 97.66
Baseline CNN 99.35 61.41 82.28 98.84
Best reported result 99.79 [67] 59.38 [64] 91.28 [65] 92.78
13
Journal of Real-Time Image Processing
13
Journal of Real-Time Image Processing
Table 11 Running time of Running time of feature selection using CNN (s) Classifier Running time(s)
classifiers to classify the vehicle
types in [62] Train Test
The number of the epochs in our CNN model is 20 for all Thus, we cannot propose CNN–MLP for accident analyzing
of the training iterations. In addition, the learning rate in our in real-time, while the hybrid of CNN and the variants of
model is 0.05. In Tables 2, 3 and 4, Precision, Recall, and ELM is still defensible.
F-measure of CNN–MELM for accident detection, accident Now, we compare our classifier with one of the best
severity classification, and involved vehicles’ classification CNN-based works to recognize vehicle types [52]. In the
are presented, respectively. In all of these tables, the best val- latter reference, the average of precision to detect the differ-
ues are shown with bold numbers. To compare the results of ent vehicle types was between 66.36 and 90.65% and their
CNN–MELM with that of the other algorithms, we present total average was 81.05%. This is worse than the result of our
the same measures for all of the classifiers. Table 2 shows proposed MELM with averaged precision 92.73%.
that CNN–MELM has the best results among the other algo- Besides, in Table 4, it is shown that the processing times
rithms to detect the accidents in the images. of ELM, CELM, OSELM, and KELM to detect the vehi-
Table 3 reveals that the precision of CNN–KELM is the cle type are 0.38, 0.2, 0.0469, and 0.3127s, respectively,
best for accident severity classification. Its behaviour is while the processing times of MLP, Stacking, XGBoost and
close to CNN–MLP. After these algorithms, CNN–MELM RBF are 12, 3830, 2.6 and 7.96 s, which are worse than
has the third rank. However, the recall, the F-measure, ELM-based classifiers. Just SVM classifies this data set
and the accuracy of CNN–MLP are better than the others. with 0.72 s while its precision is 88.66%, which is worse
The recall of CNN–ELM is the third rank, the F-measure than ELM, KELM, and OSELM as presented in Table 4.
of CNN–OSELM is the second rank and the accuracy of Also, the processing times of KELM and ELM are not as
CNN–MELM has the second rank in this experiment. These good as the same time for OSELM. This shows that they
results show that based on all of the measures, the results of can be neglected from MELM for real-time implementa-
the hybrid of CNN and ELM variants can be used to classify tion. However, as the last column of Table 4 shows, the rate
the accident severities efficiently. of accuracy of KELM to its training time has the third rank
Table 4 compares the different classifiers for vehicles between the mentioned classifiers which shows its capability
that are involved in the accidents. As one can see, the dif- for classification of accident images.
ferent measures of CNN–MLP are superior compared with Table 11 also shows the running time of CNN–MELM
the other methods. However, the differences between these for involved-vehicle classification in [51]. The running time
measures from the corresponding measures of CNN–MELM of CNN for feature extraction is 20 s but this time is for
are 1.55% for precision, 2.34% for recall, 2% for F-measure a data set with 2398 images. Therefore, the average run-
and 1.83% for accuracy. Thus, one can say that it is better to ning time for feature extraction for a single image is 0.008 s.
use CNN–MLP for involved-vehicle classification. After feature extraction, the next step is classification. The
However, the training of MLP is very time-consuming. running time for training and testing on this data set is
Since both of the accuracy and training time are important shown in Table 11. The average of running time for testing
for real-time applications, we define the rate of accuracy to a single image is 0.0003 s. This shows that after training,
processing time similar to Subsection 4.4. In the last column CNN–MELM consumes 0.0083 s for analyzing the type of
of Table 4, this measure is evaluated for all of the classifiers. involved vehicles in any accident image and so this classifi-
As one can see, the hybrid of CNN and OSELM, CELM, cation can be done in real-time.
and KELM are the best algorithms for this examination.
Additionally, MELM is better than MLP at least 7 times.
13
Journal of Real-Time Image Processing
13
Journal of Real-Time Image Processing
14. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classifica- 35. Yu, J.S., Chen, J., Xiang, Z.Q., Zou, Y.X.: A hybrid convolutional
tion with deep convolutional neural networks. Adv. Neural Inf. neural networks with extreme learning machine for WCE image
Process. Syst. 25, 1097–1105 (2012) classification. In: IEEE International Conference on Robotics and
15. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only Biomimetics (ROBIO) (2015)
look once: unified, real-time object detection. In: Conference 36. McDonnell, M.D., Tissera, M.D., Vladusich, T., Van Schaik,
on computer vision and pattern recognition, pp. 779–788 (2016) A., Tapson, J.: Fast, simple and accurate handwritten digit clas-
16. Ahn, B.: Real-time video object recognition using convolutional sification by training shallow neural network classifiers with the
neural network. In: Neural Networks (IJCNN). pp. 1–7 (2015) ‘extreme learning machine’algorithm. PLoS One 10(8), e0134254
17. LeCun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional net- (2015)
works and applications in vision. In: International Symposium 37. Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neu-
on Circuits and Systems (ISCAS) (2010) ral Netw. 12(10), 1399–1404 (1999)
18. Lee, K., Park, D.C.: Image classification using fast learning 38. Masoudnia, S., Ebrahimpour, R., Arani, S.: Incorporation of a
convolutional neural networks. Adv. Sci. Technol. Lett. 113, regularization term to control negative correlation in mixture of
50–55 (2015) experts. Neural Process. Lett. 36(1), 31–47 (2012)
19. Sadeky, S., Al-Hamadiy, A., Michaelisy, B., Sayed, U.: “Real- 39. Islam, M., Yao, X., Nirjon, S., Islam, M., Murase, K.: Bagging
time automatic traffic accident recognition using Hfg. In: 20th and boosting negatively correlated neural networks. IEEE Trans.
International Conference on Pattern Recognition (ICPR) (2010) Syst. Man Cybern. Part B (Cybern.) 38(3), 771–784 (2008)
20. Nejjari, F., Benhlima, L., Bah S.: Event traffic detection using 40. Liu, Y., Yao, X.: Simultaneous training of negatively correlated
heterogenous wireless sensors network. In: 13th International neural networks in an ensemble. IEEE Trans.Syst. Man Cybern.
Conference of Computer Systems and Applications (AICCSA) Part B (Cybern.) 29(6), 716–725 (1999)
(2016) 41. Ebrahimpour, R., Sadeghnejad, N., Masoudnia, S., Arani, S.:
21. Kahaki, S., Nordin, M.: Highway traffic incident detection using Boosted pre-loaded mixture of experts for low-resolution face
high-resolution aerial remote sensing imagery. J. Comput. Sci. recognition. Int. J. Hybrid Intell. Syst. 9(3), 145–158 (2012)
7(6), 949 (2011) 42. Lotfi, M., Motamedi, S., Sharifian, S.: Time-based feedback-
22. Jiansheng, F.: Vision-based real-time traffic accident detection. control framework for real-time video surveillance systems with
In: 11th World Congress on Intelligent Control and Automation, utilization control. J. Real-Time Image Proc. (2016). https://doi.
WCICA (2014) org/10.1007/s11554-016-0637-4
23. Chen, L., Cao, Y., Ji, R.: Automatic incident detection algorithm 43. Zarándy, Á, Nemeth, M., Nagy, Z., Kiss, A., Santha, L., Zsedro-
based on support vector machine. In: Sixth International Con- vits: A real-time multi-camera vision system for UAV collision
ference on Natural Computation (ICNC) (2010) warning and navigation. J. Real-Time Image Proc. 4, 709–724
24. Prabha, C., Sunitha, R., Anitha, R.: Automatic vehicle accident (2016)
detection and messaging system using GSM and GPS modem. 44. Puri, A.: A survey of unmanned aerial vehicles (UAV) for traffic
Int. J. Adv. Res Electr. Electron. Instrum. Eng. 3(7), 10723– surveillance. Department of computer science and engineering,
10727 (2014) University of South Florida (2005)
25. Kagesawa, M., Nakamura, A., Ikeuchi, K., Saito H.: Vehicle 45. Pearson, K.: Liii. On lines and planes of closest fit to systems of
type classification in infra-red image using parallel vision board. points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2(11),
ITSWC (2000) 559–572 (1901)
26. Zhou, Y., Nejati, H., Do, T.-T., Cheung, N.-M., Cheah, L.: Image- 46. Hall, M.: Correlation-based feature selection for machine learn-
based vehicle analysis using deep neural network: a systematic ing. PhD thesis, Department of Computer Science, University
study. In: International Conference on Digital Signal Processing of Waikato Hamilton (1999)
(DSP) (2016) 47. Hamon, J.: Optimisation Combinatoire Pour La Sélection De
27. Chen, Z., Ellis, T.: Semi-automatic annotation samples for vehicle Variables En Régression En Grande Dimension: Application
type classification in urban environments. IET Intel. Transp. Syst. En Génétique Animale. Université des Sciences et Technologie
3(9), 240–249 (2014) de Lille-Lille I. (2013)
28. Wang, X., Zhang, W., Wu, X., Xiao, L., Qian, Y., Fang, Z.: 48. Mofarreh-Bonab, M., Mofarreh-Bonab, M.: Color image com-
Real-time vehicle type classification with deep convolutional pression using PCA. Int. J. Comput. Appl. 111(5):16–19 (2015)
neural networks. J. Real-Time Image Proc. (2017). https://doi. 49. Zhang, L., Yang, F., Zhang, Y.D., Zhu, Y.J.: Road crack detec-
org/10.1007/s11554-017-0712-5 tion using deep convolutional neural network. In: International
29. Zheng, Z., Lu, P., Lantz, B.: Commercial truck crash injury sever- Conference on Image Processing (ICIP), pp. 3708–3712 (2016)
ity analysis using gradient boosting data mining model. J. Saf. 50. Weng, Q., Mao, Z., Lin, J., Liao, X.: Land-use scene classi-
Res. 65, 115–124 (2018) fication based on a cnn using a constrained extreme learning
30. Delen, D., Sharda, R., Bessonov, M.: Identifying significant pre- machine. Int. J. Remote Sens. pp. 1–19 (2018)
dictors of injury severity in traffic accidents using a series of arti- 51. Martinel, N., Piciarelli, C., Foresti, G., Micheloni C.: Mobile
ficial neural networks. Accid. Anal. Prev. 38(3), 434–444 (2006) food recognition with an extreme deep tree. In: Proceedings of
31. Chang, L.-Y., Wang, H.-W.: Analysis of traffic injury severity: the 10th International Conference on Distributed Smart Camera
an application of non-parametric classification tree techniques. (2016)
Accid. Anal. Prev. 38(5), 1019–1027 (2006) 52. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans.
32. Nguyen, C.H., Cai, Chen, F.: Automatic classification of traffic Knowl. Data Eng. 22(10), 1345–1359 (2010)
incident’s severity using machine learning approaches. IET Intel. 53. Abbasi, E., Shiri, M., Ghatee, M.: A regularized root–quartic
Transp. Syst. 11, 615–623 (2017) mixture of experts for complex classification problems. Knowl.-
33. Kheradpisheh, S., Sharifizadeh, F., Nowzari-Dalini, A., Gan- Based Syst. 110, 98–109 (2016)
jtabesh, M., Ebrahimpour, R.: Mixture of feature specified experts. 54. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning
Inf. Fusion 20, 242–251 (2014) machine: a new learning scheme of feedforward neural net-
34. Li, L., Zou, B., Hu, Q., Wu, X., Yu, D.: Dynamic classifier ensem- works. In: International Joint Conference on Neural Networks
ble using classification confidence. Neurocomputing 99, 581–591 (2004)
(2013)
13
Journal of Real-Time Image Processing
55. Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning 67. Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus R.: Regu-
machine for regression and multiclass classification. IEEE Trans. larization of neural networks using dropconnect. In: International
Syst. Man Cybern. Part B (Cybern.) 42(2), 513–529 (2012) Conference on Machine Learning (2013)
56. Huang, G.-B., Liang, N.-Y., Rong, H.-J., Saratchandran, P., Sunda- 68. Haut, J., Paoletti, M., Plaza, J., Plaza, A.: Fast dimensionality
rarajan N.: “On-line sequential extreme learning machine. Com- reduction and classification of hyperspectral images with extreme
put. Intell. 2005, 232–237 (2005) learning machines. J. Real-Time Image Proc. 15(3), 439–462
57. Zhu, W., Miao, J., Qing, L.: Constrained extreme learning (2018)
machine: a novel highly discriminative random feedforward neural
network. In: International Joint Conference on Neural Networks Publisher’s Note Springer Nature remains neutral with regard to
(IJCNN) (2014) jurisdictional claims in published maps and institutional affiliations.
58. Tian, H.X., Mao, Z.Z.: An ensemble ELM based on modified
AdaBoost. RT algorithm for predicting the temperature of molten
steel in ladle furnace. IEEE Trans. Autom. Sci. Eng. 7(1), 73–80 Ali Pashaei is an M.Sc. student in Department of Computer Science
(2010) of Amirkabir University of Technology, Tehran, Iran. He works on
59. Huang, Y., Suen, C.: The behavior-knowledge space method for image processing, deep learning and extreme learning algorithms. He
combination of multiple classifiers. In: IEEE Computer Society has written and presented two papers in two international conferences
Conference on Computer Vision and Pattern Recognition (1993) on Intelligent Transportation Systems.
60. Kuncheva, L., Bezdek, J., Duin, R.: Decision templates for mul-
tiple classifier fusion: an experimental comparison. Pattern Rec- Mehdi Ghatee is an Associate Professor with Department of Com-
ognit. 34(2), 299–314 (2001) puter Science, Amirkabir University of Technology, Tehran, Iran. His
61. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and major is ITS, Smartphone-based ITS Systems, Neural Network and
Algorithms. John Wiley & Sons (2004) Fuzzy Systems. He has written more than 100 papers on national and
62. Pashaei, A., Ghatee, M., Sajedi, H.: Accident images analysis international journals and conferences. Previously, he was Chairman of
dataset. Amirkabir University of Technology, 2018. (Online). the Department of Computer Science and Project Manager of Iranian
https://github.com/mghatee/Accident-Images-Analysis-Dataset. Nation Plan of Intelligent Transportation Systems and currently is an
Accessed 2018 Associate Dean for Undergraduate Affairs of Faculty of Mathematics
63. Yann, L., Corinna, C., Christopher, J.: The Mnist Database of and Computer Science. He is also a member of the Board of Directors
Handwritten Digits (Online). http://yhann.lecun.com/exdb/mnist of the ITS-RI and director of NORC.
(1998)
64. Vahdatpour, M., Sajedi, H., Ramezani, F.: Air pollution forecast- Hedieh Sajedi received a B.Sc. degree in computer engineering from
ing from sky images with shallow and deep classifiers. Earth Sci. Amirkabir University of Technology in 2003, and M.Sc. and Ph.D.
Inf. 11(3), 413–422 (2018) degrees in computer engineering (artificial intelligence) from Sharif
65. Cheng, J., Huang, W., Cao, S., Yang, R., Yang, W., Yun, Z., Wang, University of Technology, Tehran, Iran in 2006 and 2010, respectively.
Z., Feng, Q.: Enhanced performance of brain tumor classification She is currently an Assistant Professor at the Department of Computer
via tumor region augmentation and partition. PLoS One 10(10), Science, Tehran University, Iran. She has written more than 80 papers
0140381 (2015) on national and international journals and conferences. Her research
66. Arróspide, J., Salgado, L., Nieto, M.: Video analysis based vehi- interests include multimedia data hiding, steganography and steganaly-
cle detection and tracking using an MCMC sampling framework. sis methods, pattern recognition, and machine learning.
EURASIP J. Adv. Signal Process. (2012)
13