Convolution Neural Network Joint With Mixture of Extreme Learning Machines For Feature Extraction and Classification of Accident Images

Journal of Real-Time Image Processing
https://doi.org/10.1007/s11554-019-00852-3
ORIGINAL RESEARCH PAPER
Convolution neural network joint with mixture of extreme learning

machines for feature extraction and classification of accident images
Ali Pashaei1 · Mehdi Ghatee1 · Hedieh Sajedi2
Received: 12 August 2018 / Accepted: 16 January 2019

© Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract
This paper considers the accident images and develops a deep learning method for feature extraction together with a mixture
of experts for classification. For the first task, the outputs of the last max-pooling layer of a Convolution Neural Network
(CNN) are used to extract the hidden features automatically. For the second task, a mixture of advanced variations of Extreme
Learning Machine (ELM) including basic ELM, constraint ELM (CELM), On-Line Sequential ELM (OSELM) and Kernel
ELM (KELM), is developed. This ensemble classifier combines the advantages of different ELMs using a gating network
and its accuracy is very high while the processing time is close to real-time. To show the efficiency, the different combina-
tions of the traditional feature extraction and feature selection methods and the various classifiers are examined on two kinds
of benchmarks including accident images’ data set and some general data sets. It is shown that the proposed system detects
the accidents with 99.31% precision, recall and F-measure. Besides, the precisions of accident-severity classification and
involved-vehicle classification are 90.27% and 92.73%, respectively. This system is suitable for on-line processing on the
accident images that will be captured by Unmanned Aerial Vehicles (UAV) or other surveillance systems.
Keywords Feature extraction · Accident images’ classification · Convolutional neural networks · Mixture of ELM ·
Ensemble learning
1 Introduction On the other hand, there are many researches on accident

analysis. To get a real-time system to support the accidents,
A system for accident management plays an important role different image-processing methods have been developed
in traffic control and emergency systems. In such systems, [2, 3]. Some of these methods apply machine leaning algo-
the data of the different sources can be collected to support rithms such as artificial neural networks (ANN) [7]. Oth-
injured people [1]. One of the most important data sources ers have focused on accident characteristics. For example,
is image source [2, 3]. Fixed cameras or portable ones can Chioun and Fu [8] conducted an integrated study of crash
capture these images but the latter is more effective. In [4], a frequency and severity. They mentioned that the road and
drone was used for freeway monitoring system. Cao et al. [5] the traffic conditions, the land-use and cultural parameters
proposed a system for vehicle detection and tracking using influence the frequency and the severity of the incidents.
Unmanned Aerial Vehicles (UAVs). Also in [6], a similar The type of vehicle has also a major effect on the severity
system has been developed to monitor road conditions. of the accident. Many works have investigated the severity
of truck accidents [9].
In continuation of these researches, this paper focuses
* Mehdi Ghatee on image processing and machine learning for accident
ghatee@aut.ac.ir analysis. To this aim, we first emphasise on feature extrac-
Hedieh Sajedi tion from accident images. The traditional methods for this
hhsajedi@ut.ac.ir task such as Local Binary Pattern (LBP) [10], Histogram of
Gradient (HOG) [11], Maximally Stable Extremal Regions
1
Department of Computer Science, Amirkabir University (MSER) [12] and Speeded-Up Robust Features (SURF) [13]
of Technology, Tehran, Iran
are considered at the first; however, we will show that they
2
School of Mathematics, Statistics and Computer Science, are not specialized for accident analysis. On the other hand,
College of Science, University of Tehran, Tehran, Iran
13
Vol.:(0123456789)
Table 1 Some related works and systems in accident detection and feature extraction from accident
Ref. no. Goal Proposed method Limitation
[19] Accident detection Using histogram of flow gradient for accident detection Tracking the individual in crowded scenes was
impossible
[20] Fusion of the wireless sensors Maybe the sensors are not active*
[21] Using neural networks, Radon transform for angle It used satellite images for accident detection*
detection and traffic-flow measurements
[22] Using Gaussian Mixture Model to detect vehicles; then It cannot be used in the different weather conditions*
mean shift algorithm for vehicle tracking
[23] Using SVM and neural network It is limited to some special sensors
[24] Accident detection by GPS and GSM GPS data cannot be accessed everywhere
[25] Vehicle classifica- Using Eigen-window method on images provided by Its accuracy is low and it cannot detect motorcycles*
tion (no accident) B-snake
[26] Using YOLO model for vehicle detection and AlexNet It did not detect motorcycles*
for vehicle classification
[27] Using k-mean values for initial labelling and then using It used fixed CCTV images for classification*
linear SVM to identify the low-confident samples
[28] Using CNN for vehicle type detection The running time is high
[29] Severity of accident Using gradient boosting to analyze the relationship It cannot be used in the different weather conditions*
between crash severities and risk factors
[30] Using a series of neural networks to model the nonlin- It cannot estimate the risk
ear relationships between the injury severity levels
and crash-related factors
[31] Using decision tree for accident severity classification The focus is just accident severity*
[32] Using incident location, date, time, traffic lanes to It just focused on accident severity*
determine the severity of the accident
*These issues were solved in this paper
Convolutional Neural Network (CNN) is one of the best They used the outputs of ELM after learning a limited epoch
methods for image processing in the recent literature [14]. of the CNN. But its recognition rate is almost small.
It can extract the hidden features by the aid of pooling layers. Besides, some of the ensemble methods have been used
Usually, the outputs of the last pooling layer are applied for for image processing and some classifiers were combined
classification and regression purposes. We propose a mixture with some special feature selection methods [19, 20].
of ELM for the classification task. Yu et al. [21] developed a hybrid of CNN and ELM for
CNN is also capable of real-time image processing image classification. They used CNN for feature extrac-
issues. For example, Redmon et al. [15] used YOLO model tion and ELM for classification. Nevertheless, the perfor-
for object detection in images. They compared this model mance of ELM is not excellent for general cases. In fact,
with other models such as R-CNN and VGG and showed we do not have a fixed solution for all issues. In [22],
that YOLO has better results in real-time detection. In other a shallow non-convolutional neural network was trained
work, Ahn [16] proposed a system for real-time video object by ELM for MNIST benchmark. The results showed that
recognition using CNNs. This system has been implemented the classification error rate was about 1%. Such evidences
on a mid-range FPGA and achieves a computational speed encourage us to develop an ensemble method based on
greater than 170,000 classifications per second, and per- ELM for accident images analysis. Li et al. [20] proposed
forms scale-invariant object recognition from a 720 × 480 a dynamic ensemble classifier that uses a random feature
video stream at a speed of 60 fps. This shows that our pro- selection method to generate diverse classifiers. In [19], a
posed system can be used in real-time applications. three-phase ensemble system based on Mixture of Experts
In Table 1, some methods are reviewed that have been (ME) has been proposed. The authors selected an optimal
used for accident-image processing. It is worthy to note that, subset of features in the first phase and used them for train-
CNN takes the image using a series of convolutional, nonlin- ing of the experts with the standard ME algorithm. In all
ear, and max-pooling layers. It provides a good description of these ensemble methods, the learning algorithms must
about the image including useful features that can be applied be adapted by incorporating a penalty correlation term in
for classification purposes [17, 18]. In the latter reference, their error functions [23, 24]. In addition, the implicit and
ELM was applied to improve the learning speed of CNN. explicit features can be used in these algorithms and they
13
can complete each other [25, 26]. For example, in bagging

and boosting algorithms, a series of experts is sequentially
located and they are interdependent to each other. Thus,
they cannot provide cooperation among experts. Also, in
these methods, there is no feedback from the combination
phase to the learning phase. To see more details about
combination of implicit and explicit approaches, one can
refer to [27].
In this paper, we propose a system to analyze the acci-
dent images from various sources including UAV, fixed
cameras or portable cameras; however, we do not focus
on technological points. Instead, we emphasise on acci-
dent analysis to recognize the occurrence of accident, to
Fig. 1 The examined components of system for any task of accident
classify the involved vehicles in accidents and to classify analysis
the severity of accident. To do this, a new system is devel-
oped including two phases. In the first phase, we extract
the features by the max-pooling layer of CNN. Since the 2.1 System inputs
images are received from different sources and from differ-
ent angles, traditional algorithms cannot be used to extract In the first step, the system receives the images from the
the proper features, while CNN can. In the second phase, a accident scene. These images can be obtained from vari-
mixture of experts including different variations of ELM is ous sources, see, e.g., Puri [30] for a survey. The system then
developed. Experimental results show that this ensemble sends these images to detect the occurrence of an accident
algorithm is capable of classifying the images based on the and to extract its properties.
extracted features by CNN. The main contribution of this
paper can be expressed in the following:
2.2 System processes
• Technological viewpoint: This paper classifies the
properties of accidents in real-time (or at least close The proposed system, first, detects the occurrence of the
to real-time) and it is the most benefit of this work accident. Second, it extracts the characteristics of the acci-
compared with the previous ones. Also, the precision of dent. Since, UAV images are captured in different situations
system is acceptable for accident-support applications. and from different angles, so the powerful image processes
• Theoretical viewpoint: This paper fundamentally devel- and machine learning algorithms should be combined to
oped a mixture of experts (ME) instead of fully con- extract accurate results. Our proposed algorithm for this part
nected layer of CNN to enhance the quality of clas- consists of two main parts; feature extraction and classifica-
sification by the aid of different experts including the tion based on these features. In Sect. 3, the details of these
variants of ELM learning algorithms. parts are presented.
2.3 System outputs
2 Accident analysis system
The results of the processing part are sent to the control
To analyze the accidents based on the image process- center for making a decision for calling an accident manage-
ing, in this section, we study a system using CNN and ment procedure as quickly as possible.
mixture of ELMs. This system takes images from vari-
ous sources, such as surveillance cameras, on-site people,
and UAVs, see, e.g., [28, 29]. Then the necessary features
are extracted and they are sent for classification. In the 3 System architecture
first step of our proposed system, the system recognizes
whether or not the accident happens. If the response is In this section, different algorithms for the feature extrac-
positive, the system calls two other ensemble classifiers tion from accident images are discussed. On the extracted
to recognize the types of involved vehicles in the accident features, we try to select the related features and classify the
and the severity of the accident. Details are presented in images into defined classes. We need to examine the compo-
the following. nents of Fig. 1 for any task of the accident analysis system.
13
Fig. 2 The output of traditional feature extraction methods: a original image, b HOG features, c MSER features, d SURF features
3.1 Traditional feature extraction image, first, we extract the features by LBP, HOG, SURF,
and MSER. Then, we save these features in a matrix. This
Due to the fact that the images are received from vari- matrix is very large while the most entries are unprofitable
ous sources, image characteristics such as their angle and and they can decrease the accuracy. Finally, we use PCA,
intensity are different. Thus feature extraction from these CFS, and BBF on this matrix to reduce the dimension of
images, needs a hard work. The traditional feature extrac- the features and also to select the relevant and important
tion methods that are compared in our experiments are as features. It is worthwhile to note that PCA is one of the well
the following: known algorithms for this purpose. On the other hand, PCA
is one of the common methods for band reduction in image
• LBP [10]: It has been developed for 2D texture analy- processing. For example, in [34], PCA method has been used
sis. In this algorithm, a local structure is applied on the for band reduction on coloured images. Thus, it seems that
images and each pixel is compared with its neighboring the usage of PCA is very useful for the current study.
pixels.
• HOG [11]: It calculates the gradient in x and y directions 3.2 Advanced feature extraction by convolution
for each pixel and gives the size and direction for each of neural network
these gradients.
• MSER [12]: It is one of the bubble detection methods in When we tried to extract features from accident images by
the image processing. It uses the input image intensity traditional nonlinear feature extractors such as LBP, HOG,
range to detect stable regions. These regions are defined SURF, and MSE, we found out that the classification accu-
when the variations in their colour intensities are not racy was not acceptable. Probably, due to the collapse of
greater than a threshold. This property examines the the vehicles involved in the accident scene and disruption
different thresholds. These changes should be less than of colors, we cannot analyze the accidents based on the tra-
the input value so that the region is detected as a stable ditional features. In fact, the corner, the shape and the color
region. of vehicles in such images are completely irregular. Besides,
• SURF [13]: It uses a square filter (Hessian matrix) to the accident images have been collected from different
determine the points. The Hessian matrix measures the sources and in many cases, the properties of images such as
local changes around the pixels, and where the norm of light intensity and image angle are different. To extract the
this matrix is maximized, the corresponding point is con- meaningful features in these cases, extracting some hidden
sidered as a candidate for SURF features. and deep features, may improve the classification efficiency.
Since, CNN could greatly improve the detection accuracy
The output of features extracted from an accident image by extracting hidden features of images that are distorted by
is shown in Fig. 2. After extracting these features, they are various factors, we encouraged to apply CNN on accident
merged into a matrix. This causes redundancy in the matrix. images. Indeed, one of the main advantages of CNN is its
In addition, some of the extracted features are abusive fea- ability to extract the proper features from large data disper-
tures for classifying step and it may lead to wrong results. sion. However, CNN can be also used to classify the images
For these reasons, we need to find the proper features to clas- [14]. Thus, we should present the exact role of CNN for
sify the samples efficiently. For feature selection phase, three accident image analysis in our paper. For this end, we briefly
methods are used; Principal Component Analysis (PCA) describe the details of CNN in our implementation. As pre-
[31], Correlation-based Feature Selection (CFS) [32] and sented in Fig. 3, this CNN includes the following layers:
Bagging-Based Feature selection (BBF) [33]. Thus, for each
13
Conv 1
28×28 Conv 2 FC SM
Conv 3
Input layer 14×14 7×7 Conv 4 car
3×3
truck
MP MP
Pool[2,2] Pool[2,2]
MP motorcycle
F(3,16) F(3,32) F(3,64) Pool[2,2] F(3,128)
Stride[2,2] Padding[1,1] Stride[2,2] Padding[1,1] Stride[2,2] Padding[1,1]
Padding[1,1]
Fig. 3 Configuration of CNN baseline to classify the involved vehi- as the stride size, “FC” as the fully connected, “SM” as the Softmax
cles in the accident (denoting “MP” as the max-pooling, “F” as the (adapted from [49])
filter, “Conv” as the convolution, “Pool” as the pooling size, “stride”
• Input layer This layer gets images with the size of are different in these images. We will show that CNN can
28 × 28 × 3. extract the hidden features based on these issues. In fact,
• Convolution layer In our model, four convolution layers CNN does three tasks: feature extraction by convolution
are used. These layers consist of 16, 32, 64, and 128 fil- layer, feature selection by pooling layer and classification
ters. The size of filters in these layers is 3 × 3 and padding by fully connected layer. However, in some papers, CNN just
size is 1. has used for feature extraction and feature selection [35, 36].
Similarly, in the first step of our method, we extract the fea-
– In the first convolution layer, the number of used tures using CNN. These features are mainly related to image
weights for a filter is 27 and the number of output is texture and image colors. For each image, CNN extracts a
28 that fully covered whole of image. The number vector with size of 1 × n such that n is the number of classes.
of neurons in this layer is 28 × 28 × 16. For example, for detecting the type of involved vehicles in
– In the second convolution layer, the number of the accident, the extracted feature vector for any image is a
weights is 32 × 32 × 3 and the number of output is 14. 1 × 3 vector with respect to three classes of vehicles. This
– In the third convolution layer, the number of output vector is considered as the input of the classification algo-
is 7. rithm. In fact, we have transferred any image to the features
– In the last convolution layer, the size of output is 3. that are extracted by the last pooling layer. The idea of this
part is similar to transfer learning approach that follows a
• Batch normalization layer After each convolution layer,
different feature space for knowledge transfer; but we do not
to normalize the features extracted by convolution layer,
have transfer of learning between the different domains of
we define the batch normalization layer. Our model con-
interest, see, e.g., [37] to understand the differences between
tains four batch normalization layers.
our approach and transfer learning in details.
• Rectified linear unit (ReLU) layer This layer uses max
Furthermore, the CNN baseline classifies the samples
(0,x) as the activation function for each neuron, which
using an MLP in the last layer. Our approach is replacing this
converts the negative values to zero. Our model contains
MLP with a more rapid classifier. Thus, the extracted fea-
four ReLU layer.
tures of CNN are sent to a new mixture of ELMs. Note that,
• Max-pooling layer We define four max-pooling layers
a linear system should be solved to adapt synaptic weights in
in our proposed model. The pool size in these layers is
the last layer of ELM. By utilizing three approaches includ-
[2,2]. Also the stride size of each layer is [2,2]. The size
ing direct method, decomposition method and iterative
of the output of these max-pooling layers is 14, 7, 3, and
method, these weights can be determined. Thus, the weight
1, respectively.
adaptation process of ELM can be implemented similar to
• Fully connected layer Our model contains one fully con-
MLP, iteratively. Since passing the error happens in MLP
nected layer (see Fig. 3).
not in ELM, in many experiments, the results of the trained
• Softmax layer The last layer converts fully connected
MLP are superior to those of ELM while the speed of ELM
outputs to a probability distribution on the classes.
is still better. Thus, when the processing time is more impor-
tant, the usage of mixture of ELMs is defensible.
Now to extract the features from accident images, we
focus on some different issues in the accident images. As
one can note, the angle, the light intensity, the weather con-
dition, the accident time and the background of accidents
13
Table 2 Accident detection with Feature extractor Classifier Precision (%) Recall (%) F-measure (%) Accuracy (%)
hybrid of CNN and a different
classifier CNN OSELM 98.97 98.97 98.97 98.97
ELM 98.91 98.91 98.91 98.91
CELM 98.02 98.02 98.02 98.03
KELM 98.57 98.56 98.56 98.57
MLP 98.44 98.44 98.44 98.44
Stacking 94.67 94.65 94.7 94.7
XGBoost 94.33 94.33 94 94.76
SVM 94.33 94.33 94 94.67
RBF 94.33 94.33 94 94.8
MELM 99.31 99.31 99.31 99.31
Baseline CNN 90.37 90.37 90.33 90.84
Bold values are the best obtained results
Table 3 Severity classification Feature extractor Classifier Precision (%) Recall (%) F-measure (%) Accuracy (%)
with hybrid of CNN and
different classifiers CNN OSELM 84.66 71.66 74.66 91.22
ELM 85.49 72.94 76.08 91
CELM 60.23 62.54 61.35 90.15
KELM 93.47 63.38 63.10 90.15
MLP 93.14 90.10 91.59 93.04
Stacking 84.52 80.7 68.8 72.3
XGBoost 76 69.33 71.66 84.75
SVM 88.66 57.6 56.9 83.7
RBF 84 63.33 66.66 84.24
MELM 90.27 69.58 72.84 91.5
Baseline CNN 67.54 48.93 49.66 68.40
Table 4 Involved vehicles classification with hybrid of CNN and different classifiers

Feature extractor Classifier Precision (%) Recall (%) F-measure (%) Accuracy (%) Training time Accuracy/
(s) training
time
CNN OSELM 92.16 92.22 92.13 91.28 0.0469 1942.86

ELM 92.33 92.22 92.2 92.03 0.38 242.18
CELM 67.49 68.72 67.79 69.03 0.2 345.15
KELM 92.05 91.84 91.84 92.29 0.3127 295.14
MLP 94.28 94.94 94.6 94.8 12 7.87
Stacking 89.33 89.43 89.33 89.4 3830 0.02
XGBoost 88 88.33 88 87.02 2.6 33.47
SVM 88.66 89 88.66 86.83 0.42 206.74
RBF 89.4 89.7 90.1 89.30 7.96 10.94
MELM 92.73 92.6 92.6 92.66 1.65 56.16
Baseline CNN 75.1 74.19 74.21 73.8 20 3.69
3.3 Mixture of experts (ME) decisions are aggregated to obtain better generalization

ability in comparison with the baseline models. Mixture
An ensemble of neural networks (NNs) combines multi- of experts (ME) is an ensemble learning approach con-
ple NNs to learn the complex samples effectively. Their sisting of several experts and a gating network [38]. To
13
Fig. 4 The architecture of CNN–MELM for accident analysis
understand why ME is necessary, the results of Tables 2, will be discarded as soon as boosting phase is completed.
3 and 4 can be considered. As one can see, the perfor- After boosting phase, OSELM will learn the training data
mances of the different neural networks for a single sub- one-by-one or chunk-by-chunk and all training data will be
problem of accident analyses are very different. However, discarded once the learning procedure completes [41].
after training phase, we can recognize the best classifier. We will show that the hybrid system, which uses CNN for
Thus, we cannot select the best classifier for a general feature selection and mixture of ELMs (MELM) for classifi-
task in common situation and we need an ME to combine cation, entitled as CNN–MELM is very efficient for complex
the capability of the different neural networks to classify images’ analysis. The final architecture of CNN–MELM is
general samples with more accuracy. presented in Fig. 4. In our experiments, OSELM on the acci-
In this paper, we develop an ME including the state-of- dent detection data set, KELM for determining the type of
the-art variations of ELM. They are ELM baseline [39], involved vehicles in accidents, and OSELM for determining
KELM [40], OSELM [41] and CELM [42]. As one can the severity of the accidents provide the best results. Thus
know, ELM is a kind of machine learning algorithm that the usage of different ELM algorithms for classification is
uses the topology of feedforward neural networks to solve approved. Although in our implementation, MELM consists
classification, regression, clustering, sparse approximation, of ELM, KELM, CELM, and OSELM, it can be extended by
compression, and feature learning. This network includes a any new version of rapid classifier.
single layer of hidden nodes, where the parameters of hid- Finally, in CNN–MELM, after classifying the extracted
den nodes (not just the weights connecting inputs to hidden features by different ELMs, a gating network is needed to
nodes) need not be tuned. In most cases, the output weights aggregate the results of the used classifiers. We examine
of hidden nodes of ELM are determined by solving a linear three gating networks including a plurality voting method
system in a single step. Thus, ELM algorithm can overcome (majority voting method), a Behaviour Knowledge Space
the problems such as local minimum, obtaining learning (BKS) [44] and decision templates [45]. In plurality vot-
parameters, and over-fitting. These problems can widely ing algorithm, the total votes received by each class are
happen in the traditional gradient-based learning algorithms obtained, and then the class with the highest number of votes
for training of the shallow networks such as MLP [43]. is selected as the result. An extensive and excellent analy-
Now, to present some details about ELM variations, we sis of this voting approach can be found in [46]. BKS uses
review some methods. In KELM, a kernel function such as a lookup table that lists the most common correct classes
Radial Base Function (RBF) is used to increase the classifi- for every possible class combination. Decision templates
cation accuracy [40]. In CELM, the weight vectors from the compute a similarity measure between the current decision
input layer to the hidden layer are constrained by drawing profile of the unknown instance and the average decision
the closed set of difference vectors between class samples. profiles of instances from each class. By experimental study,
They are the set of vectors connecting samples of one class we show that they are almost similar. We have used the first
to samples of a different class [42]. The OSELM consists of one in our proposed system. Based on the architecture of
two main phases; a boosting phase to train the SLFNs using Fig. 4, the main steps of CNN–MELM can be stated as the
the primitive ELM method with some batches of training following:
data for initialization stage. These boosting training data
13
• Extracting features from the accident images by CNN, • Accident-detection

• Detecting the accident by MELM, • Vehicles-in-accidents
• Classifying the severity and the type of involved vehicles • Accident-severity
in accident by two separated MELMs.
The aim of “Accident-detection” data set is to detect the
In the next section, the results of this system on different occurrence of accidents, by image processing. It includes
benchmarks are presented. two subfolders with labels “without-accident” and “with-
accident” consisting of 2500 and 2398 images, respectively.
The goal of “Vehicles-in-accidents” data set is to clas-
4 Experiment result sify the vehicles involved in an accident, by image process-
ing. It includes three subfolders with labels “light vehicle”,
To test the performance of CNN–MELM, this system is “heavy vehicle” and “motorcycle” including 892, 876, and
applied on several standard data sets, including, MNIST 868 images, respectively.
[47], air pollution [48], brain tumor data set [49], vehicle The purpose of “Accident-severity” data set is to classify
image data set [50] and a new accident images’ analysis data the severity of accidents, by image processing. It includes
set [51]. The details of benchmarks in the examinations are three subfolders with labels “low dangerous”, “medium dan-
the following: gerous” and “high dangerous” consisting of 118, 1603, and
1225 images. These three data sets are accessible in [51].
a. MNIST data set This data set (Modified National Insti- Considering these five data sets, in this section, we first
tute of Standards and Technology database) [47] is a show the quality of features extracted by traditional methods.
large data set of handwritten digits that is commonly Then, we obtain the best configuration of CNN for feature
used for training various image processing systems. This extraction and show the results of accident analysis.
data set is also widely used for training and testing in the
field of machine learning. The MNIST database contains 4.1 Comparison between the different classifiers
60,000 training images and 10,000 testing images. In
[22], the error rates on this data set have been reported In this subsection, we analyze the performance of the differ-
below 1% when they used shallow neural networks. ent classifiers that work on the extracted features by CNN.
They trained such networks using ELM approach, which We present the results on accident images analysis data set
also enables a very rapid training time. [51]. We pursue the following experimental studies:
b. Air pollution data set This data set has been introduced
by Vahdatpour et al. in 2018 [48]. This data set contains 1. For networks except for ELM, we examine the following
four classes and each class contains about 100 images. cases:
It was prepared on different days by imaging from the
• SVM is considered as the linear support vector
sky of Tehran city.
machine and the polynomial function is used as its
c. Brain tumor images data set This data set containing
kernel.
3064 images from 233 patients with 3 kinds of brain
• We consider radial basis function neural network
tumors: meningioma (708 slices), glioma (1426 slices),
(RBF) that is trained in a fully supervised manner
and pituitary tumor (930 slices).
by BFGS minimizing method.1 For this network,
d. Vehicle image data set This data set includes 7325
the different batch sizes and ridge parameters have
images from vehicles. The database consists of 3425
been evaluated in Table 5 and the minimum and the
images of vehicle rears taken from different points of
maximum accuracies for Accident images’ analysis
view, and 3900 images extracted from road sequences
data set are obtained. Finally, the best parameters are
not containing vehicles [50].
determined in the last column of this table. In Fig. 5,
e. Accident images analysis data set For special purpose
the accuracy of RBF for different batch sizes is illus-
of accident analysis, we collected a new data set. The
trated. As one can see, for the fixed ridge = 20, the
images have been collected by UAV and some resources
batch size = 150 is the best option. Similar analysis
of Google images from different scenes at different times
has been done for XGBoost in Table 5 to determine
and locations. The size of these images was transformed
the best batch size and seed number.
to 28×28 pixels. The resolution of these images was
also transformed to 96 dpi by a MATLAB program.
The images are labelled by human experts. This data 1
Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) is one of
set includes three parts: the best quasi-Newton methods that has been proposed for uncon-
straint nonlinear programming.
13
Table 5 Parameters’ analysis for different classification algorithms that are used on the extracted features from accident images analysis data set
Classifier Parameters Range of accuracy Best configuration
RBF Batch size ∈ {50,100,150,200}, ridge ∈ Accuracy ∈ [86.9,89.30] Batch size = 150, ridge = 0.01
{0.1,0.01,0.001}
XGBoost Batch size ∈ {50,100,150}, seed ∈ Accuracy ∈[87.02,88.61] Batch size = 100, seed = 1
{0.5,1,1.5}
MLP Num-neurons ∈ {15,20,25}, learning- Accuracy ∈ [94.04,94.8] Num-neurons = 20, learning-
rate ∈ {0.1,0.2,0.3,0.4} rate = 0.2
ELM Num-neurons ∈{10,20,30} Accuracy = 92.03 Num-neurons = 20
KELM Num-neurons ∈{160 ,180,200,220} Accuracy ∈[92.29,92.54] Num-neurons = 180
OSELM Num-neurons∈{160,180,200,220} Accuracy ∈ [89.38, 91.28] Num-neurons = 180
Initial-training-data ∈{260,280,300,320} Initial-training-data = 300
Size-data-block ∈ {5,10,20,30} Size-data-block = 10
CELM Num-neurons ∈{150,200,250} Accuracy ∈ [48.29, 69.03] Num-neurons = 200
• Stacking applies SVM, RBF, and XGBoost as sub-

classifiers and SVM is used as meta-classifier.
• In MLP, the number of hidden layers is 3. The num-
ber of nodes in the hidden layer and the learning rate
were evaluated in Table 5. As one can see in the last
column of Table 5, the number of hidden neurons
equalling 20 and the learning rate equalling 0.2 pro-
vide the best results for MLP with sigmoid function.
In Fig. 6 the trend of MLP accuracy with respect to
the different learning rate are presented where the
best learning rate is recognized.
• For ELM-based networks, we studied the following
cases:
• We consider KELM with RBF function as the ker-
nel with different numbers of neurons in the hidden
layer, see Table 5.
• For OSELM, the number of hidden nodes changes
Fig. 5 RBF accuracy with respect to the different batch sizes
when the activation function is sigmoid function.
Fig. 6 MLP accuracy with respect to the different learning rates Fig. 7 OSELM accuracy with respect to the size of data block
13
Table 6 Comparison between accuracies of different hybrids of feature selectors and classifiers on accident images’ analysis data set [62]
Pre-processing on the features Classifier (%)
Feature extractor Feature selector Stacking XGBoost MLP RBF SVM ELM OSELM CELM KELM FC*
LBP + HOG + PCA [45] 57.49 53.88 52.02 52.49 53.28 36.19 38.68 49.61 48.62 −
SURF+MSER CSF [46] 55.73 55.20 57.6 52.57 54.65 57.8 55.91 33.3 58.65 –
BBF [47] 47.20 47.76 48.59 46 47 56.65 56.34 55.64 52.76 –
CNN Max-pooling 89.4 87.02 94.8 89.30 86.83 92.03 91.28 69.03 92.29 –
layer
Baseline CNN – – – – – – – – – 73.8

*FC fully connected layer of baseline CNN
Table 7 Finding the best configuration for CNN for accident images’ analysis data set [62]
Con- C M C M C M C FC Acc.
figuration (%)
index
1 C (fi (2,16), M (Pol (2,2), C (fi (2,32), M (Pol (2,2), C (fi (2,64), M (Pol (2,2), C (fi (2,128), F(O(2)) 63.46
P(3,3)) Stir(2,2)) P(3,3)) Stir(2,2)) P(3,3)) Stir(2,2)) P(3,3))
2 C (fi (3,16), M (Pol (2,2)), C (fi (3,32), M (Pol (2,2), C (fi (3,64), – – F(O(2)) 70.16
P(1,1)) Stir(2,2)) P(1,1)) Stir(2,2)) P(1,1))
3 C (fi (3,16), M (Pol (2,2)), C (fi (3,32), M (Pol (2,2), C (fi (3,64), M (Pol (2,2), C (fi (3,128), F(O(2)) 73.8
P(1,1)) Stir(2,2)) P(1,1)) Stir(2,2)) P(1,1)) Stir(2,2)) P(1,1))
*C Convolution layer, fi: filter, P padding, M max-pooling layer, Pol poll-size, Stir stride, FC fully connected layer, O output, ACCaccuracy
Table 5 shows the results of OSELM when the num- data as training and testing data, respectively. In addition, we
ber of initial training data and the size of data block used fivefold cross validation. Table 6 shows the accuracy
in each step are varying. The best values are given in of different combinations of feature selectors and classifiers.
the last column of Table 5. In Fig. 7, the behaviour Bold values show the best obtained results throughout the
of OSELM is presented when the size of data block paper. As one can see on the traditional feature extractors,
changes. PCA as the feature selector and stacking as the classifier
• In CELM, the number of hidden nodes is also ana- provides the best accuracy 57.49%. While, CSF as the clas-
lyzed in Table 5 when the activation function is sifier and KELM as the classifier produce 58.65% accuracy.
sigmoid function. In KELM, the number of hidden BBF as the feature selector and ELM classifier give the best
nodes is also analyzed in Table 5 when the kernel results while its accuracy is less than the previous combina-
function is RBF. tions. However, the accuracies of all of these experiments
are worse than baseline CNN with 73.8% accuracy. It is
As a conclusion for this subsection, the range of accuracy worthwhile to note that hybrid of CNN features and KELM
of MLP, ELM, and KELM for accident analysis is better classifier provides 92.29% accuracy, which is the best result
than the other classifiers when the features are extracted by among all of the methods.
the last pooling layer of CNN. We discuss on this topic in
the next subsections. 4.3 Best configuration for CNN
To find the best configuration of CNN, we considered the

4.2 Quality of features accident images’ analysis data set [51]. In Table 7, three
different configurations of baseline CNN were compared.
To test the performance of the feature extractor and feature Trivially, the third configuration provides the best accuracy.
selector methods, we applied these methods on the acci- Thus, this configuration is used for all of the experiments
dent images’ analysis data set [51] to detect involved vehicle in our study. However, the best accuracy of CNN is 73.8%.
types in the accident. We considered 80% and 20% of the Since baseline CNN applies MLP for classification in the
13
Table 8 The learning Training algorithm for MLP Cross entropy Accuracy (%) Training time Accuracy/
algorithms for MLP as classifier (seconds) training
to classify the features extracted time
by the pooling layer of CNN
Scale-conjugate-gradient 0.062 94.42 25 3.78
Levenberg–Marquardt 0.029 94.8 27 3.51
BFGS quasi-newton 0.062 94.49 20 4.72
Resilient backpropagation 0.065 94.57 18 5.25
Conjugate gradient with Powell restart 0.064 94.38 18 5.24
Fletcher–Powell conjugate gradient 0.063 94.42 15 6.29
Polka–Ribiere conjugate gradient 0.062 94.49 15 6.30
One step secant 0.063 94.49 12 7.87
Gradient-descent with momentum 0.064 94.42 17 5.55
Table 9 Comparison between different gating networks for CNN– implementation. We checked different MLP with different
MELM on accident images analysis data set [62] hidden layers and different number of neurons and finally
Feature extractor Classifier Gating network Accuracy we used an MLP with 3 hidden layers and 20 neurons in
(%) each hidden layer for our experiment. Then, the accuracy
of different learning algorithms was compared. As one can
CNN MELM Plurality voting 92.66
see in Table 8, the best cross entropy has been determined
BSK 92.52
for Levenberg–Marquardt algorithm. Also, the accuracies
Decision templates 92
of the different learning algorithms are between 94.38 and
94.8. Again, Levenberg–Marquardt gets the best accuracy.
last layer, in what follows, we focus on its MLP configura- Since the processing time is important for real-time
tion to improve the results. applications, we used a new measure by determining the
rate of accuracy to the processing time. Because, we need
to maximize the accuracy and minimize the processing
4.4 Best configuration for MLP time, the maximal rate shows the best learning algorithm.
As one can see in the last column of Table 8, “one step
In this subsection, the accident images’ analysis data set [51] secant” algorithm gets the best score. In what follows, we
is used to find the best configuration for MLP. For this aim, consider this algorithm for MLP training process.
the features of the last pooling layer of CNN are considered
as the inputs for MLP. We have used MATLAB software for
Table 10 Comparison between Feature selector Classifier MNIST Air pollution Brain tumor data set Vehicle
the percent of accuracy of the image data
hybrids of CNN with different set
classifiers
CNN KELM 99.62 66.92 93.68 98.91
ELM 99.75 63.91 88.51 98.84
CELM 99.51 63.75 69.83 82.05
OSELM 99.60 63.75 90.2 98.77
MLP 99.76 62.11 88.80 99.7
Stacking 99.73 59.42 86.91 99.76
XGBoost 99.71 62.73 87.33 99.64
SVM 99.8 58.59 87.51 99.53
RBF 77.50 60.24 86.84 99.75
MELM 99.70 68.42 92.81 97.66
Baseline CNN 99.35 61.41 82.28 98.84
Best reported result 99.79 [67] 59.38 [64] 91.28 [65] 92.78
13
Fig. 8 Some samples of accident images (raw data)
4.5 Best gating network
In this experiment, we compared three approaches for gat-

ing network. We determine the performance of the proposed
MELM with respect to these networks. As one can see in Fig. 9 The transformed 28 × 28 pixels of accident images presented in
Fig. 8 (They are provided in accident images’ analysis data set [62]
Table 9, there are not significant differences between the and used as the input samples of our proposed system)
presented gating networks. However, the plurality voting
method is the best option for gating network for classifying
the accident images. 4.7 Accident analysis by CNN–MELM
In this section, we evaluate our proposed system for accident

4.6 Effectiveness of CNN–MELM detection. If an accident happens, we distinguish the charac-
teristics of this accident by classifying the accident severities
Now the features extracted by CNN are directly passed and involved vehicles. To evaluate the performance of the
to MELM. The results of CNN–MELM on four stand- different classifiers, we used the accident images’ analysis
ard benchmarks are compared with various classifiers in data set [51]. In Figs. 8 and 9, some samples of these images
Table 10. This table shows that hybrid of CNN with SVM are illustrated. Since the severity of the accident is a rela-
has the best result on the MNIST data set, but CNN–MELM tional concept or is defined with respect to the severity of
is close to this classifier. On air pollution, the best results the injury, number of injuries, number of deaths, and num-
are obtained by CNN–MELM. On brain tumor data set, ber of damaged cars, it is very hard to propose a common
the best classifier is CNN–KELM and then CNN–MELM. measure for severity evaluation based on the images. Usually
Finally, for vehicle image data set, the best result is pro- the police officers judge about severity of accidents based
vided by CNN-Stacking while the difference between the on their experiences and status of vehicles and injured peo-
accuracies of MELM and this method is 2.1%, which shows ple. On the other hand, intelligent sensors can collect some
that CNN–MELM is still a reasonable classifier. As a final details about these statuses. Since this paper focuses just
point, the results of Table 10 show that MNIST and vehicle on accident images, these approaches cannot be followed
image data sets are not as much as complex to distinguish and we need to get an overall description about severity
the differences of the presented classifiers while the other for accident management in the initial time steps (golden
mentioned data sets are better to distinguish the differences time). Thus, we invited some human experts to assign the
between the classifiers. In these types of complex data sets, labels “low dangerous”, “medium dangerous” and “high
CNN–MELM and CNN–KELM could classify the samples, dangerous” to each image. The severity classification by
successfully in a low processing time. CNN–MELM is stated in Table 3.
As a conclusion, using mixture of ELM as the classifier, We used MATLAB-2017b and WEKA open-software to
we can obtain the best results from CNN–MELM. Therefore, compare the ELM-based methods with the other classifiers.
to detect the accidents and to distinguish the characteristics We implemented all of the classifiers on a system with Intel
of crashes, we can use this ensemble method on the accident i5-2430M CPU @ 2.4 GHz processor and 4G of memory.
images, perfectly.
13
Table 11 Running time of Running time of feature selection using CNN (s) Classifier Running time(s)
classifiers to classify the vehicle
types in [62] Train Test
20 MELM 1.65 0.95

CELM 0.20 0.07
ELM 0.38 0.03
OSELM 0.0469 About 0
KELM 0.3127 0.1451
MLP 12 4.52
Stacking 3830 1000
XGBoost 2.6 0.7
SVM 0.42 0.15
RBF 7.96 0.2
The number of the epochs in our CNN model is 20 for all Thus, we cannot propose CNN–MLP for accident analyzing
of the training iterations. In addition, the learning rate in our in real-time, while the hybrid of CNN and the variants of
model is 0.05. In Tables 2, 3 and 4, Precision, Recall, and ELM is still defensible.
F-measure of CNN–MELM for accident detection, accident Now, we compare our classifier with one of the best
severity classification, and involved vehicles’ classification CNN-based works to recognize vehicle types [52]. In the
are presented, respectively. In all of these tables, the best val- latter reference, the average of precision to detect the differ-
ues are shown with bold numbers. To compare the results of ent vehicle types was between 66.36 and 90.65% and their
CNN–MELM with that of the other algorithms, we present total average was 81.05%. This is worse than the result of our
the same measures for all of the classifiers. Table 2 shows proposed MELM with averaged precision 92.73%.
that CNN–MELM has the best results among the other algo- Besides, in Table 4, it is shown that the processing times
rithms to detect the accidents in the images. of ELM, CELM, OSELM, and KELM to detect the vehi-
Table 3 reveals that the precision of CNN–KELM is the cle type are 0.38, 0.2, 0.0469, and 0.3127s, respectively,
best for accident severity classification. Its behaviour is while the processing times of MLP, Stacking, XGBoost and
close to CNN–MLP. After these algorithms, CNN–MELM RBF are 12, 3830, 2.6 and 7.96 s, which are worse than
has the third rank. However, the recall, the F-measure, ELM-based classifiers. Just SVM classifies this data set
and the accuracy of CNN–MLP are better than the others. with 0.72 s while its precision is 88.66%, which is worse
The recall of CNN–ELM is the third rank, the F-measure than ELM, KELM, and OSELM as presented in Table 4.
of CNN–OSELM is the second rank and the accuracy of Also, the processing times of KELM and ELM are not as
CNN–MELM has the second rank in this experiment. These good as the same time for OSELM. This shows that they
results show that based on all of the measures, the results of can be neglected from MELM for real-time implementa-
the hybrid of CNN and ELM variants can be used to classify tion. However, as the last column of Table 4 shows, the rate
the accident severities efficiently. of accuracy of KELM to its training time has the third rank
Table 4 compares the different classifiers for vehicles between the mentioned classifiers which shows its capability
that are involved in the accidents. As one can see, the dif- for classification of accident images.
ferent measures of CNN–MLP are superior compared with Table 11 also shows the running time of CNN–MELM
the other methods. However, the differences between these for involved-vehicle classification in [51]. The running time
measures from the corresponding measures of CNN–MELM of CNN for feature extraction is 20 s but this time is for
are 1.55% for precision, 2.34% for recall, 2% for F-measure a data set with 2398 images. Therefore, the average run-
and 1.83% for accuracy. Thus, one can say that it is better to ning time for feature extraction for a single image is 0.008 s.
use CNN–MLP for involved-vehicle classification. After feature extraction, the next step is classification. The
However, the training of MLP is very time-consuming. running time for training and testing on this data set is
Since both of the accuracy and training time are important shown in Table 11. The average of running time for testing
for real-time applications, we define the rate of accuracy to a single image is 0.0003 s. This shows that after training,
processing time similar to Subsection 4.4. In the last column CNN–MELM consumes 0.0083 s for analyzing the type of
of Table 4, this measure is evaluated for all of the classifiers. involved vehicles in any accident image and so this classifi-
As one can see, the hybrid of CNN and OSELM, CELM, cation can be done in real-time.
and KELM are the best algorithms for this examination.
Additionally, MELM is better than MLP at least 7 times.
13
5 Conclusion optimization of the weights of ELM can lead us to find

the accuracy similar to MLP in less computation time.
Understanding the characteristics of an accident provides a Besides, with respect to the technological viewpoint, since
great help for accident management centers. To recognize the images can be sent to our proposed system by different
these characteristics, the images are good sources of infor- sources such as CCTV cameras or UAV images, we need a
mation. When these images are collected by various sources robust algorithm for data fusion before feature extraction
such as UAVs, surveillance cameras, and residences and are and accident classification. For fusion purposes, maybe
sent to a processing center, we need to process them in real- some kinds of neural networks or some layers of deep net-
time to call a supporting procedure for accident manage- works can be utilized. In addition, the readers can define
ment. This paper tried to detect the accidents, to classify an ensemble of CNN and the traditional feature extraction
the severity of accident and finally to recognize the type of methods such as LBP or HOG for feature extraction. In
involved vehicles in real-time. such approaches, the ensemble method will extract the
For this aim, we proposed a hybrid system consisting features instead of classification and again the different
of a CNN for feature extraction and a mixture of ELM classifiers can be applied. In addition, the theoretical anal-
(MELM) for classification purposes. To evaluate this sys- ysis should be stated for combination of CNN and ELM
tem, first, traditional feature extraction methods such as variations in a single system.
LBP, HOG, SURF, and MSER were applied. We compared
the features of these methods with the features of pooling
layer of CNN and showed that the results of CNN were References
preferable. Then, to select the best classifier, we compared
CNN–MELM with some shallow networks and baseline 1. Bisht, N., Siddhi, P., Kashyap, H.: Monitoring road accidents
using sensors and providing medical facilities. Treatise Electr
CNN on MNIST, air pollution, brain tumor data set, vehi- Magn 2, 68–73 (2012)
cle image data set and accident images’ analysis data set 2. Hoose, N., Vicencio, M., Zhang, X.: Incident detection in urban
[51]. The results showed that the accuracy, the precision, roads using computer image processing. Traffic Eng Control
the recall and the F-measure of CNN–MELM were accept- 33(4), 236–244 (1992)
3. Zifeng, J.: Macro and micro freeway automatic incident detec-
able while the processing times were close to real-time. tion (aid) methods based on image processing. In: Intelligent
Really, by CNN–MELM on accident images [62], we Transportation System, ITSC’97 (1997)
found out 99.31% precision for accident detection, 90.27% 4. Coifman, B., McCord, M., Mishalani, R., Iswalt, M., Ji, Y.:
precision for severity classification and 92.73% precision Roadway traffic monitoring from an unmanned aerial vehicle.
In: IEE Proceedings-Intelligent Transport Systems (2006)
for vehicle classification. By considering accident images’ 5. Cao, X., Lan, J., Yan, P., Li, X.: Vehicle detection and tracking
analysis data set [62], we also showed that the accident in airborne videos by multi-motion layer analysis. Mach. Vis.
characteristics could be recognized in real-time. Appl. 23(5), 921–935 (2012)
Finally note that, our ensemble mixture of expert 6. Kim, N., Chervonenkis, M.: Situation control of unmanned
aerial vehicles for road traffic monitoring. Modern Appl. Sci.
method consists of four learning algorithms containing 9(5), 1 (2015)
ELM, KELM, CELM, and OSELM. We used KELM 7. Srinivasan, D., Jin, X., Cheu, R.: Evaluation of adaptive neural
because of its perfect accuracy. But as the experiments network models for freeway incident detection. IEEE Trans.
showed, KELM is almost time-consuming compared Intell. Transp. Syst. 5(1), 1–11 (2004)
8. Chiou, Y.-C., Fu, C.: Modeling crash frequency and severity
with the other variants of ELM or some other learning using multinomial-generalized poisson model with error com-
algorithms such as SVM and XGBoost. However, their ponents. Accid. Anal. Prev. 50, 73–82 (2013)
differences can be neglected. By the way, for real-time 9. Anderson, J., Govada, M., Steffen, T., Thorne, C., Varvarigou,
applications, we can remove KELM from the MELM and V., Kales, S., Burks, S.: Obesity is associated with the future
risk of heavy truck crashes among newly recruited commercial
the experimental results showed that the accuracy remains drivers. Accid. Anal. Prev. 49, 378–384 (2012)
almost fixed. The reason is that the difference between the 10. Ahonen, T., Hadid, A., Pietikainen M.: Face description with
accuracies of KELM and OSELM is not significant. Since, local binary patterns: application to face recognition. IEEE
in the proposed ensemble method, we want to provide an Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
11. Wang, X., Han, T., Yan, S.: An hog-Lbp human detector with
opportunity to receive a rapid solution associate with accu- partial occlusion handling. In: International Conference on
rate solution, we have not neglected from KELM. Trivially Computer Vision (2009)
if the real-time computation gets the most priority in an 12. Chen, H., Tsai, S., Schroth, G., Chen, D., Grzeszczuk, R., Girod,
application, the user can ignore from KELM from mixture B.: Robust text detection in natural images with edge-enhanced
maximally stable extremal regions. In: 18th IEEE International
of ELM. Conference on Image Processing (ICIP) (2011)
For the future works, we try to combine the power 13. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up
of ELM for feature selection inside CNN model. Also, robust features (surf). Comput Vis. Image Underst. 110(3),
346–359 (2008)
13
14. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classifica- 35. Yu, J.S., Chen, J., Xiang, Z.Q., Zou, Y.X.: A hybrid convolutional
tion with deep convolutional neural networks. Adv. Neural Inf. neural networks with extreme learning machine for WCE image
Process. Syst. 25, 1097–1105 (2012) classification. In: IEEE International Conference on Robotics and
15. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only Biomimetics (ROBIO) (2015)
look once: unified, real-time object detection. In: Conference 36. McDonnell, M.D., Tissera, M.D., Vladusich, T., Van Schaik,
on computer vision and pattern recognition, pp. 779–788 (2016) A., Tapson, J.: Fast, simple and accurate handwritten digit clas-
16. Ahn, B.: Real-time video object recognition using convolutional sification by training shallow neural network classifiers with the
neural network. In: Neural Networks (IJCNN). pp. 1–7 (2015) ‘extreme learning machine’algorithm. PLoS One 10(8), e0134254
17. LeCun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional net- (2015)
works and applications in vision. In: International Symposium 37. Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neu-
on Circuits and Systems (ISCAS) (2010) ral Netw. 12(10), 1399–1404 (1999)
18. Lee, K., Park, D.C.: Image classification using fast learning 38. Masoudnia, S., Ebrahimpour, R., Arani, S.: Incorporation of a
convolutional neural networks. Adv. Sci. Technol. Lett. 113, regularization term to control negative correlation in mixture of
50–55 (2015) experts. Neural Process. Lett. 36(1), 31–47 (2012)
19. Sadeky, S., Al-Hamadiy, A., Michaelisy, B., Sayed, U.: “Real- 39. Islam, M., Yao, X., Nirjon, S., Islam, M., Murase, K.: Bagging
time automatic traffic accident recognition using Hfg. In: 20th and boosting negatively correlated neural networks. IEEE Trans.
International Conference on Pattern Recognition (ICPR) (2010) Syst. Man Cybern. Part B (Cybern.) 38(3), 771–784 (2008)
20. Nejjari, F., Benhlima, L., Bah S.: Event traffic detection using 40. Liu, Y., Yao, X.: Simultaneous training of negatively correlated
heterogenous wireless sensors network. In: 13th International neural networks in an ensemble. IEEE Trans.Syst. Man Cybern.
Conference of Computer Systems and Applications (AICCSA) Part B (Cybern.) 29(6), 716–725 (1999)
(2016) 41. Ebrahimpour, R., Sadeghnejad, N., Masoudnia, S., Arani, S.:
21. Kahaki, S., Nordin, M.: Highway traffic incident detection using Boosted pre-loaded mixture of experts for low-resolution face
high-resolution aerial remote sensing imagery. J. Comput. Sci. recognition. Int. J. Hybrid Intell. Syst. 9(3), 145–158 (2012)
7(6), 949 (2011) 42. Lotfi, M., Motamedi, S., Sharifian, S.: Time-based feedback-
22. Jiansheng, F.: Vision-based real-time traffic accident detection. control framework for real-time video surveillance systems with
In: 11th World Congress on Intelligent Control and Automation, utilization control. J. Real-Time Image Proc. (2016). https://doi.
WCICA (2014) org/10.1007/s11554-016-0637-4
23. Chen, L., Cao, Y., Ji, R.: Automatic incident detection algorithm 43. Zarándy, Á, Nemeth, M., Nagy, Z., Kiss, A., Santha, L., Zsedro-
based on support vector machine. In: Sixth International Con- vits: A real-time multi-camera vision system for UAV collision
ference on Natural Computation (ICNC) (2010) warning and navigation. J. Real-Time Image Proc. 4, 709–724
24. Prabha, C., Sunitha, R., Anitha, R.: Automatic vehicle accident (2016)
detection and messaging system using GSM and GPS modem. 44. Puri, A.: A survey of unmanned aerial vehicles (UAV) for traffic
Int. J. Adv. Res Electr. Electron. Instrum. Eng. 3(7), 10723– surveillance. Department of computer science and engineering,
10727 (2014) University of South Florida (2005)
25. Kagesawa, M., Nakamura, A., Ikeuchi, K., Saito H.: Vehicle 45. Pearson, K.: Liii. On lines and planes of closest fit to systems of
type classification in infra-red image using parallel vision board. points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2(11),
ITSWC (2000) 559–572 (1901)
26. Zhou, Y., Nejati, H., Do, T.-T., Cheung, N.-M., Cheah, L.: Image- 46. Hall, M.: Correlation-based feature selection for machine learn-
based vehicle analysis using deep neural network: a systematic ing. PhD thesis, Department of Computer Science, University
study. In: International Conference on Digital Signal Processing of Waikato Hamilton (1999)
(DSP) (2016) 47. Hamon, J.: Optimisation Combinatoire Pour La Sélection De
27. Chen, Z., Ellis, T.: Semi-automatic annotation samples for vehicle Variables En Régression En Grande Dimension: Application
type classification in urban environments. IET Intel. Transp. Syst. En Génétique Animale. Université des Sciences et Technologie
3(9), 240–249 (2014) de Lille-Lille I. (2013)
28. Wang, X., Zhang, W., Wu, X., Xiao, L., Qian, Y., Fang, Z.: 48. Mofarreh-Bonab, M., Mofarreh-Bonab, M.: Color image com-
Real-time vehicle type classification with deep convolutional pression using PCA. Int. J. Comput. Appl. 111(5):16–19 (2015)
neural networks. J. Real-Time Image Proc. (2017). https://doi. 49. Zhang, L., Yang, F., Zhang, Y.D., Zhu, Y.J.: Road crack detec-
org/10.1007/s11554-017-0712-5 tion using deep convolutional neural network. In: International
29. Zheng, Z., Lu, P., Lantz, B.: Commercial truck crash injury sever- Conference on Image Processing (ICIP), pp. 3708–3712 (2016)
ity analysis using gradient boosting data mining model. J. Saf. 50. Weng, Q., Mao, Z., Lin, J., Liao, X.: Land-use scene classi-
Res. 65, 115–124 (2018) fication based on a cnn using a constrained extreme learning
30. Delen, D., Sharda, R., Bessonov, M.: Identifying significant pre- machine. Int. J. Remote Sens. pp. 1–19 (2018)
dictors of injury severity in traffic accidents using a series of arti- 51. Martinel, N., Piciarelli, C., Foresti, G., Micheloni C.: Mobile
ficial neural networks. Accid. Anal. Prev. 38(3), 434–444 (2006) food recognition with an extreme deep tree. In: Proceedings of
31. Chang, L.-Y., Wang, H.-W.: Analysis of traffic injury severity: the 10th International Conference on Distributed Smart Camera
an application of non-parametric classification tree techniques. (2016)
Accid. Anal. Prev. 38(5), 1019–1027 (2006) 52. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans.
32. Nguyen, C.H., Cai, Chen, F.: Automatic classification of traffic Knowl. Data Eng. 22(10), 1345–1359 (2010)
incident’s severity using machine learning approaches. IET Intel. 53. Abbasi, E., Shiri, M., Ghatee, M.: A regularized root–quartic
Transp. Syst. 11, 615–623 (2017) mixture of experts for complex classification problems. Knowl.-
33. Kheradpisheh, S., Sharifizadeh, F., Nowzari-Dalini, A., Gan- Based Syst. 110, 98–109 (2016)
jtabesh, M., Ebrahimpour, R.: Mixture of feature specified experts. 54. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning
Inf. Fusion 20, 242–251 (2014) machine: a new learning scheme of feedforward neural net-
34. Li, L., Zou, B., Hu, Q., Wu, X., Yu, D.: Dynamic classifier ensem- works. In: International Joint Conference on Neural Networks
ble using classification confidence. Neurocomputing 99, 581–591 (2004)
(2013)
13
55. Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning 67. Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus R.: Regu-
machine for regression and multiclass classification. IEEE Trans. larization of neural networks using dropconnect. In: International
Syst. Man Cybern. Part B (Cybern.) 42(2), 513–529 (2012) Conference on Machine Learning (2013)
56. Huang, G.-B., Liang, N.-Y., Rong, H.-J., Saratchandran, P., Sunda- 68. Haut, J., Paoletti, M., Plaza, J., Plaza, A.: Fast dimensionality
rarajan N.: “On-line sequential extreme learning machine. Com- reduction and classification of hyperspectral images with extreme
put. Intell. 2005, 232–237 (2005) learning machines. J. Real-Time Image Proc. 15(3), 439–462
57. Zhu, W., Miao, J., Qing, L.: Constrained extreme learning (2018)
machine: a novel highly discriminative random feedforward neural
network. In: International Joint Conference on Neural Networks Publisher’s Note Springer Nature remains neutral with regard to
(IJCNN) (2014) jurisdictional claims in published maps and institutional affiliations.
58. Tian, H.X., Mao, Z.Z.: An ensemble ELM based on modified
AdaBoost. RT algorithm for predicting the temperature of molten
steel in ladle furnace. IEEE Trans. Autom. Sci. Eng. 7(1), 73–80 Ali Pashaei is an M.Sc. student in Department of Computer Science
(2010) of Amirkabir University of Technology, Tehran, Iran. He works on
59. Huang, Y., Suen, C.: The behavior-knowledge space method for image processing, deep learning and extreme learning algorithms. He
combination of multiple classifiers. In: IEEE Computer Society has written and presented two papers in two international conferences
Conference on Computer Vision and Pattern Recognition (1993) on Intelligent Transportation Systems.
60. Kuncheva, L., Bezdek, J., Duin, R.: Decision templates for mul-
tiple classifier fusion: an experimental comparison. Pattern Rec- Mehdi Ghatee is an Associate Professor with Department of Com-
ognit. 34(2), 299–314 (2001) puter Science, Amirkabir University of Technology, Tehran, Iran. His
61. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and major is ITS, Smartphone-based ITS Systems, Neural Network and
Algorithms. John Wiley & Sons (2004) Fuzzy Systems. He has written more than 100 papers on national and
62. Pashaei, A., Ghatee, M., Sajedi, H.: Accident images analysis international journals and conferences. Previously, he was Chairman of
dataset. Amirkabir University of Technology, 2018. (Online). the Department of Computer Science and Project Manager of Iranian
https://github.com/mghatee/Accident-Images-Analysis-Dataset. Nation Plan of Intelligent Transportation Systems and currently is an
Accessed 2018 Associate Dean for Undergraduate Affairs of Faculty of Mathematics
63. Yann, L., Corinna, C., Christopher, J.: The Mnist Database of and Computer Science. He is also a member of the Board of Directors
Handwritten Digits (Online). http://yhann.lecun.com/exdb/mnist of the ITS-RI and director of NORC.
(1998)
64. Vahdatpour, M., Sajedi, H., Ramezani, F.: Air pollution forecast- Hedieh Sajedi received a B.Sc. degree in computer engineering from
ing from sky images with shallow and deep classifiers. Earth Sci. Amirkabir University of Technology in 2003, and M.Sc. and Ph.D.
Inf. 11(3), 413–422 (2018) degrees in computer engineering (artificial intelligence) from Sharif
65. Cheng, J., Huang, W., Cao, S., Yang, R., Yang, W., Yun, Z., Wang, University of Technology, Tehran, Iran in 2006 and 2010, respectively.
Z., Feng, Q.: Enhanced performance of brain tumor classification She is currently an Assistant Professor at the Department of Computer
via tumor region augmentation and partition. PLoS One 10(10), Science, Tehran University, Iran. She has written more than 80 papers
0140381 (2015) on national and international journals and conferences. Her research
66. Arróspide, J., Salgado, L., Nieto, M.: Video analysis based vehi- interests include multimedia data hiding, steganography and steganaly-
cle detection and tracking using an MCMC sampling framework. sis methods, pattern recognition, and machine learning.
EURASIP J. Adv. Signal Process. (2012)
13

Convolution Neural Network Joint With Mixture of Extreme Learning Machines For Feature Extraction and Classification of Accident Images

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Convolution Neural Network Joint With Mixture of Extreme Learning Machines For Feature Extraction and Classification of Accident Images

Încărcat de

Drepturi de autor:

Formate disponibile

Journal of Real-Time Image Processing

ORIGINAL RESEARCH PAPER

Convolution neural network joint with mixture of extreme learning

Received: 12 August 2018 / Accepted: 16 January 2019

1 Introduction On the other hand, there are many researches on accident

*These issues were solved in this paper

can complete each other [25, 26]. For example, in bagging

Bold values are the best obtained results

Bold values are the best obtained results

Table 4 Involved vehicles classification with hybrid of CNN and different classifiers

CNN OSELM 92.16 92.22 92.13 91.28 0.0469 1942.86

Bold values are the best obtained results

3.3 Mixture of experts (ME) decisions are aggregated to obtain better generalization

Fig. 4 The architecture of CNN–MELM for accident analysis

• Extracting features from the accident images by CNN, • Accident-detection

• Stacking applies SVM, RBF, and XGBoost as sub-

Bold values are the best obtained results

To find the best configuration of CNN, we considered the

Bold values are the best obtained results

Bold values are the best obtained results

Fig. 8 Some samples of accident images (raw data)

4.5 Best gating network

In this experiment, we compared three approaches for gat-

In this section, we evaluate our proposed system for accident

20 MELM 1.65 0.95

5 Conclusion optimization of the weights of ELM can lead us to find

S-ar putea să vă placă și

Table 4 Involved vehicles classification with hybrid of CNN and different classifiers

3.3 Mixture of experts (ME) decisions are aggregated to obtain better generalization

Fig. 4 The architecture of CNN–MELM for accident analysis

Fig. 8 Some samples of accident images (raw data)

4.5 Best gating network

5 Conclusion optimization of the weights of ELM can lead us to find