Sunteți pe pagina 1din 16

Neural Comput & Applic (2010) 19:421436 DOI 10.

1007/s00521-009-0298-3

ORIGINAL ARTICLE

Vehicle inductive signatures recognition using a Madaline neural network


Glauston R. Teixeira de Lima Jose Demsio S. Silva Osamu Saotome

Received: 25 September 2008 / Accepted: 19 August 2009 / Published online: 10 September 2009 Springer-Verlag London Limited 2009

Abstract In this paper, we report results obtained with a Madaline neural network trained to classify inductive signatures of two vehicles classes: trucks with one rear axle and trucks with double rear axle. In order to train the Madaline, the inductive signatures were pre-processed and both classes, named C2 and C3, were subdivided into four subclasses. Thus, the initial classication task was split into four smaller tasks (theoretically) easier to be performed. The heuristic adopted in the training attempts to minimize the effects of the input space non-linearity on the classier performance by uncoupling the learning of the classes and, for this, we induce output Adalines to specialize in learning one of the classes. The percentages of correct classications presented concern patterns which were not submitted to the neural network in the training process, and, therefore, they indicate the neural network generalization ability. The results are good and stimulate the maintenance of this research on the use of Madaline networks in vehicle

classication tasks using not linearly separable inductive signatures. Keywords Vehicle inductive signatures Vehicle classication Madaline neural network

1 Introduction The degree of difculty to accomplish a classication task is primarily determined by the class overlap in the input space [1, 2]. The difculty is even greater if, in addition to the overlap, there is also class unbalance and the number of available patterns is reduced. Consider, for instance, a classication problem in which the input patterns are the inductive signatures of two classes of vehicles as shown in Fig. 1. These signals are collected by inductive loop trafc sensors [3] and the morphology of the curves in Fig. 1 is derived from the impedance alteration of the magnetic loop when the vehicle passes over [4]. It is hypothesized that the proximity of the metal parts of the axles alters the impedance of the loops and thus the presence of the axles is signalized. This way, the vehicle can be classied by the number of axles. Inductive signatures are used in trafc surveillance and management systems to recognize the class of a vehicle to estimate its speed and even to identify individual vehicle among other expected results [59]. These information are used to build a statistical database that may help trafc surveillance and management systems in decision-making. The class of a vehicle is one of the most important information and serves, for instance, for access control to areas where the circulation is restricted to certain types of vehicles and for the collection of different values in tollgates.

G. R. T. de Lima (&) Programa de Pos-Graduacao em Engenharia Eletronica e Computacao, Instituto Tecnologico de Aeronautica-ITA, Praca Marechal Eduardo Gomes, 50, Vila das Acacias, Sao Jose dos Campos, SP 12228-900, Brazil e-mail: glau11@gmail.com J. D. S. Silva Laboratorio Associado de Computacao e Matematica Aplicada-LAC, Instituto Nacional de Pesquisas Espacias-INPE, Av. dos Astronautas, 1758, Jardim da Granja, Sao Jose dos Campos, SP 12227-010, Brazil O. Saotome Divisao de Engenharia Eletronica e Computacao, Instituto Tecnologico de Aeronautica-ITA, Praca Marechal Eduardo Gomes, 50, Vila das Acacias, Sao Jose dos Campos, SP 12228-900, Brazil

123

422

Neural Comput & Applic (2010) 19:421436

Fig. 1 Inductive signatures: C2 at left and C3 at right

The left side of Fig. 1 shows the inductive signatures of trucks with two axles (one rear axle) and the right side shows the inductive signatures of trucks with three axles (double rear axle). The inductive signatures of the trucks with two axles will be referred in this article as class C2 and the inductive signatures of the trucks with three axles will be referred as class C3. The data shown in Fig. 1 were acquired in a real world setup assembled near a toll park on a road. There are 328 C2 signatures and 132 C3 signatures each one with 1,024 samples. The metallic structures of these two classes of trucks are very similar; hence, the respective inductive signatures are also very similar. So, for classication purposes, a problem to be faced is the strong overlap of attributes between the classes C2 and C3. Moreover, in addition to the class overlapping, this database presents two more problems: class imbalance and reduced amount of data. Thus, although the use of the classier proposed in this article in a real vehicle classication task is a very attractive possibility (indeed, this possibility is the primary motivation of this research), it is necessary to emphasize the following: for now the inductive signatures are used in this work just as a database (among many others that would be possible and convenient) to study a new approach for the recurring problem of performing a classication task in representation domains with few patterns of each class and with a high overlapping rate among the classes. Some articial neural networks, as Kohonen Self-Organized Maps, Multilayer Perceptrons and Probabilistic Neural Network, have been trained with inductive signatures to aid in vehicle classication tasks [1014]. In this paper, we propose a Madaline neural network as the classier. Although the Madaline, even without any prior data processing, is able to separate patterns that are not linearly separable [15], in the present proposal we attempt to improve the classier performance by adapting the training algorithm.

The remainder of this paper is organized as follows: in Sect. 2, we explain, in short form, the pre-processing applied to inductive signatures; in Sect. 3, a brief revision of works concerning Madaline training algorithms is done; in Sect. 4, the Madaline architecture and the algorithm used to train the network are presented; the last two sessions are dedicated to the results, discussion and conclusions.

2 Data pre-processing The inductive signatures were ltered by a SavitzkyGolay lter [16] and their amplitudes were normalized to within the range from -1 to 1. Moreover, before training the Madaline, the signatures were divided into morphological subclasses. The ltering, the amplitude normalization and the subdivision are justied as follows. 2.1 Filtering In order to develop a classication method for classes C2 and C3, we work with the following assumption: the information that is useful to distinguish one class from another, i.e., the information about the rear axle of the truck, is stored in the inductive signature in long duration segments or, in other words, in the low frequency components of the inductive signatures. So, to remove existing high frequency components from the raw data (which, according to our work hypothesis, do not carry useful information), the signatures have been smoothed with SavitzkyGolay lter. This lter does a local polynomial tting on a set of neighboring samples to calculate the smoothed value for each sample. The main advantage of this method is that the smoothed samples tend to preserve features of the distribution of the original samples.

123

Neural Comput & Applic (2010) 19:421436

423

2.2 Amplitude normalization The minimum value of an inductive signature is determined, primarily, by the proximity of the vehicle metallic structure with the inductive loop. By considering that the distance from the rear axles to the road surface is standardized for C2 trucks and C3 trucks, any discrepancies in the values of this attribute (the minimum value of the inductive signature) are generated by other parts of the metallic structure that are close to the rear axles. But these other parts of the vehicle metallic structure (unlike the rear axles) are not intrinsic characteristics of the vehicle class and they can be present in trucks of both classes and, therefore, their inuence on the format of the inductive signatures should be suppressed. So, the inductive signature amplitudes have been normalized. 2.3 Division into morphological subclasses In order to better understand and formally characterize the overlap between the classes C2 and C3, some preliminary tests were performed with the inductive signatures. In these tests, the results obtained with a Kohonen Self-Organized Map showed that sets of patterns in both classes share very similar morphological prototypes. Once it is desired to classify the inductive signatures of classes C2 and C3 using a supervised learning neural network and bearing in mind the assumption that a number of C2 and C3 vectors share the same morphological prototype, we supposed that to characterize each class by identifying its several morphological prototypes and, next, to divide each class into morphological subclasses built around the found prototypes were, probably, a pre-processing that could facilitate the solution of the classication task. Then, we developed a routine to accomplish the subdivision of the classes as presented in [17] and the built morphological subclasses are shown in Figs. 2 and 3. Although the subdivision shown in Figs. 2 and 3 is satisfactory, because all the relevant morphological groups of the two classes were well identied and separated, for the purpose of accomplishing a classication task, it is suitable to establish a correspondence one-toone among subclasses so that the classiers may be trained with pairs of corresponding subclasses. In these regards, a new subdivision may be adopted by transforming the subclasses (1, 2, 3, 4) and (5, 6) of class C2 in only two subclasses. After this gathering process, the number of subclasses in C2 matches the number of subclasses in C3 (which was not modied). The result is shown in Fig. 4 in which the pairs of corresponding subclasses are in the same column.

3 Madaline training algorithms: a short review Madaline neural networks are built with multiple Adalines (whence the name Madaline) or multiple adaptive linear neurons: the basic unit of processing trained with the delta rule proposed by Widrow and Hoff [18]. The rst Madaline networks, proposed in the early 1960s, were formed by a layer of adaptive Adalines connected to a single no adaptive output Adaline that worked as a xed logical element [15, 19]. Threshold functions were used as Adalines activation functions. In this early Madaline structure, the adaptive training of the Adalines in the rst layer was performed using an algorithm called Madaline rule I (or MRI) [20]. MRI algorithm adopts the principle of minimal disturbance and performs the adaptation as follows: no adaptation is made if the network response corresponds to the desired answer for current input pattern; if the network responds incorrectly to correct the network response, the weight vectors (connecting the input pattern to the rst layer) for Adalines whose net inputs are close to zero are adjusted. By choosing to adapt only Adalines with the smallest net input values (those whose signal (? or -) of the input is easier to revert), the learning that the network has already acquired on the other input patterns previously presented is disturbed as little as possible because the necessary changes in the weights are smaller (this is the principle of minimal disturbance). In the late 1980s, a training algorithm called Madaline rule II (or MRII) was proposed by Winter and Widrow [19]. MRII is an extension of MRI but with some modications. Regarding the architecture, the network Madaline trained by MRII may contain many adaptive elements in the output layer and more than one hidden layer with adaptive elements. If the network response to an input pattern is incorrect, an adaptation by trial and error is made taking into account the principle of minimal disturbance. Starting in the rst hidden layer, the activation of the Adaline whose net input is closer to zero is reversed and the network response is recomputed. If the network output error is reduced, Adalines weights are adjusted by delta rule using the new Adaline output value as target. Otherwise, if the network output error is not reduced, this trial adaptation is discarded and a trial adaptation is done for the next Adaline in the rst layer whose net input is closer to zero. For each input pattern presented to the neural network, such adaptations by trial and error are applied to hidden Adalines with net input next to zero until the networks response is correct (or other stopping criteria) and until the neural network is considered trained. MRII can also be implemented by applying trial adaptation to more than one hidden Adaline at once. Madaline rule III or (MRIII) algorithm, proposed by Andes and Widrow [15] in the early 1990s, is a modied

123

424

Neural Comput & Applic (2010) 19:421436

Fig. 2 Morphological subclasses of the class C2

Fig. 3 Morphological subclasses of the class C3

Fig. 4 Final subdivision of the classes C2 (top) and C3 (bottom)

version of MRII in which the threshold functions are replaced by sigmoid functions for Adalines activation calculation. MRIII was developed to circumvent problems with hardware implementations of Backpropagation algorithm due to the need to accurately implement the sigmoid

function and its derivative function. In MRIII, instead of using the derivative of the sigmoid function, a small disturbance Ds is added to the Adaline net input. Since Ds is small enough, the measured effects of Ds on the Adaline output and on error signal (difference between the desired

123

Neural Comput & Applic (2010) 19:421436

425

response and the Adaline output) can be used to replace the derivative of the sigmoid function in the hardware implementation of the training algorithm. Therefore, the algorithms, MRIII and Backpropagation, are equivalent. Another difference between MRII and MRIII is that in MRIII all hidden Adalines undergo adaptation. Even so, the principle of minimal disturbance is still observed in MRIII because the most signicant changes in the network are provided by the adaptation of hidden Adalines with the lowest net input, since the derivative of the sigmoid function has a maximum for input value equal to zero and, therefore, the disturbance Ds applied to Adaline net input produces the greatest effect on those Adalines whose net input values are closer to zero. In MRI, MRII and MRIII algorithms, the output errors play a key role, but hidden errors are not considered. However, there is a drawback with respect to hidden errors in supervised training of multilayer neural networks. If, on the one hand, to calculate the error for output layer is straightforward (since the desired output responses are known), on the other hand, such easiness does not exists for the hidden layers. The desired responses for the neurons in the hidden layers are not known a priori. A different kind of approach for training multilayer neural networks addresses this issue by including in the training algorithm tools for estimating optimum values for hidden neurons responses [2123]. Shortly, the optimum hidden neurons targets are calculated as follows: an input pattern is presented; if the network responses are incorrect, the weight matrix, M, for output layer is updated using delta rule; a vector To = [f (to1), f (to2), , f (toi), , f (tok)] is calculated (where f is the inverse function of the output layer sigmoid activation function and the tois are the output layer targets); the vector Th with the target values for hidden neurons is computed by the product (M1)To (where M1 is the pseudoinverse of M). The expected effectiveness of this training algorithm is based on the assumption that by computing the targets values for the hidden neurons in this way, the algorithm forces network outputs to assume values that are estimates (in the leastsquare-error sense) of their respective output targets (tois). Other interesting approaches for training neural networks are proposed in [24, 25]. However, the structure trained with the algorithms proposed in this work is not a multilayer neural network instead it consists of a single layer of Perceptrons denominated parallel Perceptron that outputs only binary or bipolar values. The output of the parallel Perceptron is calculated by a threshold function whose net input is the sum of all the individual Perceptron outputs. If the absolute value of the output error is smaller than a pre-established tolerance value, e, then the output of the parallel Perceptron is correct and the weights need not be updated. If the absolute value of the output error is greater than e, the correction of the error is achieved by

updating a number of weight vectors equal to the minimal number of sign changes in the individual Perceptron outputs which is necessary for the output of the parallel Perceptron becoming correct [24] or applying the so-called p-delta learning rule [25]. The p-delta algorithm applies delta rule to weight vectors of the Perceptrons whose output is greater than zero if the output error of the parallel Perceptron is lower than -e or applies delta rule to weight vectors of the Perceptrons whose output is lower than zero if the output error of the parallel Perceptron is greater than e. The delta rule (but with a different rate learning) is also applied to those individual Perceptrons whose output is right but only by a too small margin. This serves to increase the margin of classication that, in turn, improves stability against noise in the input patterns as well improves the generalization ability of the parallel Perceptron. As pointed out in [25], the parallel Perceptron is advantageous because it is a universal approximator in which it is not necessary to use sigmoid functions and to tune many hidden layers (such as, e.g., a Multilayer Perceptron trained with backpropagation). In this paper, we propose an algorithm for training a Madaline neural network that is advantageous because, to some extent, it integrates (with adaptations) the strengths of the three training approaches outlined above: the principle of minimal disturbance; the estimation of appropriate target values for hidden neurons (which removes the need for backpropagation of output errors); adoption of a margin of classication (which increases the robustness against noisy input patterns and increases the ability to generalize). In our algorithm, in order to include the principle of minimal disturbance, each output Adaline was assigned to learn (or to specialize) one of the class present in the input space. Then, during the presentation of a pattern to the network, only the weight vector of the output Adaline that was assigned to learn the class of the input pattern is updated. Thus, we uncoupled (to some extent) the learning of a class with the learning of the other class (in other words, we try to make the network to learn the patterns of a class without disturbing the learning already acquired on the other class). With respect to target values for hidden neurons, we did not estimate optimum values, but values were good enough to produce the desired network responses. Compared with the method that estimates optimum target values, these approximate values have the advantage of providing good results with fast convergence without additional computational costs such as the calculation of pseudoinverse matrices and the inversion of activation functions. We addressed the margin of classication requirement by adopting a decision threshold, e, different from zero. An input pattern is considered as belonging to class C2 if the network response is greater than e or it is considered as belonging to class C3 if the network response is lower than -e.

123

426

Neural Comput & Applic (2010) 19:421436

4 The Madaline architecture and training algorithm 4.1 Motivation to use Madaline classier We believe that the division of the classes into morphological subclasses might improve the classication performance because the division breaks the initial great problem, which requests a more difcult solution, in some smaller problems with easier solutions. However, after the subdivision, the resultant four new input spaces, as shown in Fig. 4, correspond to the sub-regions of the original input space where the class overlap is, in fact, critical since the pairs of subclasses which constitute the new input spaces are morphologically very similar. At rst, it is not expected to obtain a null classication error in cases, so difcult, like this and the best that one can do is to attempt to minimize the error of classication. The delta rule is a suitable choice for the learning process because it can produce the least mean square solution for patterns that are not linearly independent [26, 27]. The possibility of minimizing the classication error was the rst reason to choose a Madaline neural network as classier in this study. But, even so, in the tests with Madaline, we introduced some modications in the training algorithm towards to improving the classier performance in a non-linear representation domain (like the two classes of inductive signatures).

The activations, the weight vectors and the threshold used in the neural network are listed below. Wh1j, Wh2j, Wh3j, Wh4j weight vectors for Adalines in the hidden layer, j = 1, , n = 1,024 activations for Adalines in the hidden layer weight vectors for Adalines in the output layer, j = 1, , 4 activations for Adalines in the output layer threshold adopted to decide whether the responses of the network are correct or not

yh1, yh2, yh3, yh4 Wo1j, Wo2j yo1, yo2 L

4.3 Training algorithm The activations for hidden Adalines and for output Adalines are calculated as follows: yh1 x WT h1 yh2 x WT h2 1a 1b 1c 1d

yh3 x WT h3 yh4 x WT h4 where x is the current input pattern. yo1 yh1 yh2 yh3 yh3 yh4 WT o1 yh4 WT o2

2a 2b

4.2 Architecture Figure 5 illustrates the Madaline network designed for the conducted experiments reported in this paper.
Fig. 5 Madaline architecture

yo2 yh1

yh2

(It can be noted by (1a1d) and (2a, 2b) that the identity function was used as activation function.)

123

Neural Comput & Applic (2010) 19:421436

427

If the pattern presented to the network belongs to class C2, the response of the network is considered correct if yo1 is greater than L and if yo2 is smaller than -L. For input patterns that belong to class C3, yo1 must be smaller than -L and yo2 must be greater than L. During the training, if the network responds properly to an input pattern, no adjustment is applied to the weight vectors. If the network does not respond correctly, the adjustment of the weights is done, but in a different way for each class. 4.3.1 The input pattern belongs to C2 Weights Wh1 and Wh2 are adjusted by the delta rule in order to bring the activations yh1 and yh2 for positive values and weights Wh3 and Wh4 are adjusted by the delta rule in order to bring the activation yh3 and yh4 for negative values. Wh1 Wh1 ah k L yh1 x Wh2 Wh2 ah k L yh2 x Wh3 Wh3 ah k L yh3 x Wh4 Wh4 ah k L yh4 x 3a 3b 3c 3d

The same considerations made above regarding the use of the values k * L and -k * L apply to (4), since yo1 is considered correct if its value is greater than L. 4.3.2 The input pattern belongs to C3 Weights Wh1 and Wh2 are adjusted by the delta rule in order to bring the activations yh1 and yh2 for negative values and weights Wh3 and Wh4 are adjusted by the delta rule in order to bring the activation yh3 and yh4 for positive values. Wh1 Wh1 ah k L yh1 x Wh2 Wh2 ah k L yh2 x Wh3 Wh3 ah k L yh3 x Wh4 Wh4 ah k L yh4 x 5a 5b 5c 5d

where ah is the learning rate for the hidden Adalines and k is a constant integer. For Adalines in the hidden layer, the desired responses are not known; therefore, the values k * L and -k * L in the (3a3d) are not desired responses, but just values that, according to the proposed training heuristic, are considered an appropriate reference to ensure the learning. Taking into account how the hidden activation values are manipulated in (2a) and (2b), it is plausible to think that, during the training, the adjust of weights given by (3a3d) brings the activation yo1 for a positive value greater than L and also brings the activation yo2 for a negative value smaller than -L. If so, since eventual errors associated to yo2 are already corrected (at least, partly), by the application of (3a3d), we decided to adjust only the values of Wo1 when the neural network responds incorrectly to a pattern of C2. This is our way of inducing the rst Adaline in the output layer to specialize in only learning inductive signatures of C2 or, in the other words, this is our way of (at some extent) uncoupling C2 learning with the C3 learning and, thus, try to minimize the effects of class overlapping on the classier performance. Updating of the weight vector Wo1 is given by the formula: Wo1 Wo1 ao k L yo1 yh1 yh2 yh3 yh4 4 where ao is the learning rate for the output Adalines.

Similar to the previous case, the adjustment given by (5a5d) already partially contributes for correcting the errors in yo1 during the presentations of the class C3 input patterns. So, only the output weights Wo2 are adjusted under the presentations of class C3 patterns and, thus, the second Adaline of the output layer specializes in only learning inductive signatures of C3. Updating of the weight vector Wo2 is given by the formula: Wo2 Wo2 ao k L yo2 yh1 yh2 yh3 yh4 6 5 Simulations, results and discussions 5.1 Simulations with the proposed Madaline network For each pair of corresponding subclasses, shown in Fig. 4, 500 simulations were executed with the proposed Madaline network. In each simulation, 75% of the inductive signatures of each subclass of the pair were randomly chosen and used to train the network and the remaining 25% of the inductive signatures were used to evaluate the network generalization ability after training. Table 1 shows, for each pair of subclass, the amount of inductive signatures corresponding to this division in training set with 75% of the signatures and in test set with 25% of the signatures. In these simulations, the weight vectors were initialized as small random values. The learning rate for the hidden layer, ah, was initialized as 0.0001 and the initial value for output layer learning rate, ao, was set as 0.001. At each epoch, the values of ah and ao were adjusted according to (7a) and (7b), in which, mne is the maximum number of epochs allowed for each simulation set as 100,000. According to (7a) and (7b), if the number of epochs reaches

123

428 Table 1 Number of inductive signatures used to train and to test the classiers Pair of subclasses Class C2 Number of signatures used to train the classier 1 2 3 4 127 52 39 26 Number of signatures used to test the classier 43 18 14 9 Class C3

Neural Comput & Applic (2010) 19:421436

Number of signatures used to train the classier 24 20 33 20

Number of signatures used to test the classier 9 7 12 7

the maximum allowed, the nal values of ah and ao will be, respectively, 0.00001 and 0.0001. ah ah 0:1mne1 ao ao 0:1mne1
1 1

7a 7b 3.

The threshold L was set as 0.1 and the value of the constant k was set as 10. The stopping criterion was chosen between: 1. 2. The maximum number of epochs; or The response of the network to be correct for all the training patterns in an epoch.

4.

In all the simulations, the training process was stopped by the second criterion. The average number of epochs to train the network was 654, 21, 1,139, and 137 for the pairs of subclasses 1, 2, 3, and 4, respectively. Figure 6 shows typical mean squared error curves of the Madaline network during the training with each pair of subclasses. 5.2 Other simulations performed for results comparison With the purpose of results comparison, more ve different simulations were performed. All the simulations were conducted as follows: 75% of the signatures were randomly selected to train the classier and the remaining 25% used to test the trained classier. Five hundred simulations were executed in each case. 1. Simulation 1: In these simulations, it was used as Madaline architecture with only two hidden Adalines. These simulations aim to justify the hidden layer with four Adalines in the Madaline neural network proposed in this paper. Simulation 2: In these simulations, the training algorithm is modied. If the Madaline neural network responds incorrectly to an input pattern, both output weight vectors are adjusted, independently of the class to which the input pattern belongs. The simulations

5.

were conducted to verify our assumption that updating just the weight vector of one output Adaline (depending on the class to which belongs the input pattern) uncouples the learning of the classes and minimizes the effect of the non-linear input space on the Madaline performance. Simulation 3: In these simulations, class C2 and class C3 were not divided into subclasses and the simulations were accomplished to verify the hypothesis that the subdivision of the classes provides a better overall classication performance. Simulation 4: A Multilayer Perceptron network trained with Backpropagation algorithm was used to classify the inductive signatures. These simulations were conducted to compare the performance of the proposed Madaline with the performance of other classic neural classier. Simulation 5: A Support Vector Machine was used to classify the inductive signatures. These simulations were conducted to compare the performance of the proposed Madaline with the performance of a nonneural classication approach.

2.

In the simulations 1, 2 and 3 above, weights initialization, initial values and formulas for adjusting the learning rates and criteria for stopping the simulations remained the same that was used in the simulations described in Sect. 4.1. In simulation 1, the average number of epochs required during the network training was 858, 30, 1,667 and 202 for the pairs of subclasses 1, 2, 3 and 4, respectively. Figure 7 shows typical mean squared error curves for each pair of subclasses in the network training stage. In simulation 2, average number of epochs in the training phase was 761, 22, 1,511 and 165 for the subclasses 1, 2, 3 and 4, respectively. Figure 8 shows typical mean squared error curves during the Madaline training for each pair of subclasses. In simulation 3, the allowed maximum number of epochs for training (100,000) was reached (in all the simulations). In the training phase, the average number of patterns learned by network (in the 500 simulations) was

123

Neural Comput & Applic (2010) 19:421436

429

Fig. 6 Typical mean squared error curves of the proposed Madaline

324 that amounts to approximately 93.9% of the number of patterns used for training. In simulations with MLP, the weight vectors were initialized as small random values close to zero, initial learning rate for hidden layer was set as 0.0001 and initial learning rate for output layer was set as 0.01. The learning rates were adjusted by (7a) and (7b) and could vary until 0.00001 and 0.001, respectively, in 100,000 epochs. The architecture of the Perceptron was the same as that of Madaline network (Fig. 5). The threshold to evaluate the responses of the network was set as 0.3. For C2 patterns, network response was correct if the activation of the rst output neuron was greater than 0.3 and the activation of the second output neuron was smaller than -0.3. The opposite was considered for C3 patterns. The stopping criterion was chosen among maximum number of epochs or the neural network to respond properly to all input patterns in an epoch. Figure 9 shows the activation function of the neurons in the MLP network. The average number of epochs in the MLP training phase was 1,585, 135, 1,911 and 650 for the pairs of subclasses 1, 2, 3 and 4, respectively. Figure 10 shows typical mean squared error curves during the MLP training for each pair of subclasses.

In the simulations with SVM, routines available in [28] were used. The mapping of the input patterns to the feature space was made with a linear kernel function (inner product kernel) and the upper bound for Lagrange multipliers was set as 5 (value selected after some tests). It is necessary to clarify that in a round of simulations, the simulation described in Sect. 4.1 and the ve kinds of simulations described in this section were accomplished. The same training set and the same test set were used to accomplish the six simulations. Therefore, the results presented in the next section are not affected due to differences in the training and test sets. 5.3 Results The percentages of correct classications presented in this section were obtained with the test set (25% of the inductive signatures separated in each simulation). Therefore, the results estimate the ability of generalization of the classiers. Table 2 presents, for each pair of subclasses, the results corresponding to the average percentage of correct classications obtained with the proposed Madaline neural network in the 500 simulations conducted.

123

430

Neural Comput & Applic (2010) 19:421436

Fig. 7 Typical mean squared error curves of the Madaline in simulation 1

Table 3 presents, for each pair of subclasses, the results corresponding to the average percentage of correct classications obtained with the Madaline neural network in the simulation 1. Table 4 presents, for each pair of subclasses, the results corresponding to the average percentage of correct classications obtained with the Madaline neural network in the simulation 2. Table 5 presents, for each pair of subclasses, the results corresponding to the average percentage of correct classications obtained with the Madaline neural network in the simulation 3. Table 6 presents, for each pair of subclasses, the results corresponding to the average percentage of correct classications obtained with the Multilayer Perceptron in the simulation 4. Table 7 presents, for each pair of subclasses, the results corresponding to the average percentage of correct classications obtained with the SVM classier in the simulation 5. 5.4 Results and discussion Recalling (once more) that the numbers in Table 2 express the generalization ability of the proposed Madaline neural

network after training, it may be concluded that the results are excellent in view of the reduced amount of available data to train the network and the class imbalance (there are 328 C2 patterns and 132 C3 patterns). Moreover, the architecture of the Madaline and the training algorithm presented in this paper are prototype versions that may be improved. It must be emphasized that during the network training, in all the simulations, all the patterns were learned in a relatively small average number of epochs. The Madalines performance in this classication task, where the input space is non-linearly separable, indicates that the modications introduced in the training algorithm (i.e., a simple, but suitable manipulation in the neural network internal states [given by (2a, 2b)] thereby the hidden weights may be adjusted in a supervised way [given by (3a3d) and (5a5d)] and each output weights may be adjusted in order to learn only one of the class [given by (4) and (6)]) have worked properly in the sense of reducing the impact of the input space non-linearities on the classier performance. As shown in Tables 2 and 3, the Madaline architecture with only two hidden Adalines, as used in the simulation 1, provides percentage of correct classications that, for practical purposes, are equivalent to those obtained using the Madaline architecture with four hidden Adalines (in

123

Neural Comput & Applic (2010) 19:421436

431

Fig. 8 Typical mean squared error curves of the Madaline in the simulation 2

Fig. 9 Sigmoid used as activation function in the MLP

fact the percentages in Table 2 are a little better). The advantage of using four hidden units is just the faster convergence during the training of the network. This is attested by the smaller average numbers of epochs needed for training the architecture Madaline with four hidden Adalines and also by the comparison between learning curves in the Figs. 6 and 7 that shows a decay of the mean squared error more pronounced in the curves of Fig. 6 for the same number of epochs.

Making the updating of just one output weight vector depending on the class of the current input pattern seems to be benecial for the overall performance of the Madaline. Although the percentages of correct classications shown in the Tables 2 and 4 have close values, it is worth noting that the results of Table 2 are a little better. The largest difference between the percentages in these two tables appears in the classication of the pair of subclasses 3. This pair of subclasses seems to be one whose classication is more difcult, probably because it is the pair of subclasses where there is the greater overlap of attributes (observe that in all the tables presented in Sect. 4.3 the pair of subclasses 3 has the lowest percentage of correct classication) and therefore this is the pair of subclasses in which the learning of a class can more intensely disturb the learning already acquired on the other class. Thus, the difference, greater than 1%, among values in Tables 2 and 4 for this pair of subclasses supports the hypothesis that adjusting only one of the output weight vectors may decreases, to some extent, the learning interference among classes. We may draw the same conclusion by comparing the learning curves in Figs. 6 and 8, especially ones for the pairs of subclasses 1 and 3. The learning curves for the pairs of subclasses 2 and 4 have very similar behavior in the two kinds of simulation. In Fig. 6 (pairs of subclasses 1 and 3), the mean squared

123

432

Neural Comput & Applic (2010) 19:421436

Fig. 10 Typical mean squared error curves of the MLP

Table 2 Classication results obtained with the proposed Madaline neural network Pair of subclasses 1 2 3 4 Total percentage of correct classications (calculated by weighted average): 91.7 Percentage of correct classications for each class (calculated by weighted average)

Average percentage of correct classications Class C2 Class C3

94.5 97.4 87.0 86.3 93.0

86.1 94.0 83.0 94.8 88.3

Table 3 Classication results obtained with the Madaline neural network in the simulations 1 Pair of subclasses 1 2 3 4 Total percentage of correct classications (calculated by weighted average): 91.3 Percentage of correct classications for each class (calculated by weighted average)

Average percentage of correct classications C2 C3

94.5 96.8 87.0 86.3 92.9

85.0 91.4 83.0 93.4 87.3

123

Neural Comput & Applic (2010) 19:421436 Table 4 Classication results obtained with the Madaline neural network in the simulations 2 Pair of subclasses 1 2 3 Total percentage of correct classications (calculated by weighted average): 91.3 4 Percentage of correct classications for each class (calculated by weighted average) 94.3 97.3 86.8 85.4 92.7 86.0 94.1 81.5 94.5 87.8

433

Average percentage of correct classications C2 C3

error converges more quickly to zero than in Fig. 8 (compare, for instance, the values of the error in 100 epochs and in 200 epochs) and the changes in the mean square error value from an epoch to the next are lower in Fig. 6. Lastly, a smaller average number of epochs were necessary for training the Madaline in the simulations where the training algorithm proposed was used.

Table 5 Classication results obtained with the Madaline neural network in the simulations 3 Classes C2 C3 Average percentage of correct classications 88.4 76.5

Total percentage of correct classications (calculated by weighted average): 85.0 Table 6 Classication results obtained with the MLP (simulations 4) Pair of subclasses 1 2 3 4 Total percentage of correct classications (calculated by weighted average): 89.3

Multilayer Perceptron network trained with Backpropagation algorithm is, probably, the most widely used neural classier today. Thus, when the issue is performance comparison, MLP is an excellent (and almost obligatory) reference. The results in Tables 2 and 5 demonstrate that, in the classication of C2 and C3, a Madaline trained with the proposed learning algorithm performs better than MLP. The average number of epochs in the training phase is another point in favor of the proposal Madaline in comparison with the MLP. Take into account that MLP can approximate any continuous nonlinear function arbitrarily well to any degree of accuracy provided it contains one or more hidden layers, the fact that proposal Madaline has presented a better overall performance is an important proof that the training heuristic we are studying is an interesting alternative to solve non-linear classication tasks. This result, at rst, seems
Average percentage of correct classications C2 C3

92.6 97.2 86.2 83.3 91.5

79.8 88.7 78.2 92.9 83.7

Percentage of correct classications for each class (calculated by weighted average)

Table 7 Classication results obtained with the SVM classier (simulations 5) Pair of subclasses 1 2 3 4 Total percentage of correct classications (calculated by weighted average): 96.7 Percentage of correct classications for each class (calculated by weighted average)

Average percentage of correct classications C2 C3

98.4 100.0 88.0 97.7 96.9

93.0 100.0 94.2 100.0 96.2

123

434

Neural Comput & Applic (2010) 19:421436

contradictory because the equations that update the hidden weights of the proposed Madaline [(3a3d) and (5a5d)] use less information than the adjustment that is made by the Backpropagation algorithm. But we can understand these results considering how the traditional Backpropagation algorithm handles the problem of local minimum in the error surface [2931]. To leave a local minimum, the traditional Backpropagation performs a search in all directions in the weight space without worrying in harmonizing the adjustment of the hidden weights with the adjustment of the output weights. This generates a displacement to a point in the error surface considerably distant from the local minimum. Furthermore, the process is very slow [32]. Thus, the MLP restarts the search for the global minimum from a point in the error surface where the error is greater than in local minimum. In the search for a solution to the classication task, there are no guarantees that the learning will converge to the global minimum; therefore, the best possible solution may correspond to a local minimum. But due to the strong oscillations of the Backpropagation to leave a local minimum, such best solutions may be lost. In the proposed Madaline training algorithm, the adjustment of the hidden weights and output weights is made harmoniously. This allowed us to update the weights of just one of output Adalines, since the correction of the error in the response of the other output Adaline (if such an error existed) would be (to some extent) achieved by the adjustment of the hidden weights. Hence, the neural network presents a more smooth behavior when leaves a local minimum point and remains close to it. That is, the proposed Madaline network takes better advantage of the local solutions. The comparison of the results in Tables 2 and 6 does not leave doubts about the superiority of the SVM classier. This result was already expected and the inclusion of the SVM among the benchmarks had exactly this objective: to evidence the limitations of the Madaline learning algorithm presented in this article. Even so, two points should be considered. The rst one refers to the aforesaid simplicity of the version of the algorithm proposed. In fact, this is a prototype version that, for now, serves as proof of concept for a heuristic learning algorithm, but it will be improved in the sequence of this research. The second point is that in the absence of an adequate data pre-processing, the reduced amount of training examples associated to a strong class overlap may be critical to the performance of a supervised learning neural networks but the same is not true for SVM since the theoretical fundamentals that support SVM [33, 34] can overcome all this problems. Of course, can overcome all this problems is always advantageous. But if

the classication system designer has a larger and really representative database instead a limited database, a simpler classier design maybe is more attractive. It is worth remembering that SVM design involves solving a quadratic programming problem, including inversion of large matrices, that demands a huge computational effort for large database [35] which can represent a limitation to SVM use since a good solution for any classication task usually not only addresses accuracy but also involves trade-offs between computational complexity and accuracy.

6 Conclusions In this article, a Madaline neural network was trained to recognize inductive signatures of two vehicles classes: trucks with a rear axle and trucks with two rear axles. The two classes of inductive signatures were pre-processing and divided into four morphological subclasses before to be presented to Madaline for classication. The inductive signatures of these two vehicle classes are linearly dependent and strongly overlapped. Moreover, the number of signatures available to train the network is reduced and there is a considerable class imbalance. On such input spaces, the rst options to perform classication task generally are non-linear classiers more complex than Madaline. However, Madaline has performed very well in classifying the inductive signatures. We credit the good performance of the network in this non-linear case to the innovations that were introduced in the training algorithm. Taking advantage of the fact that the input space consists of only two classes, we handled the network internal states (its hidden activations) in such a way that allowed us adjusting through supervised learning not only the output weight vectors but also the weight vectors of the hidden layers. Further, by properly handling the hidden activations, we could update only one output weight vector at each presentation of an input patterns depending on the class of the pattern. The heuristic behind this decision was to reduce interference between the classes during the training phase and, thus, minimize the adverse effects of the class overlap on the classier performance. To benchmarking the proposed Madaline, ve different simulations were made. The division of the input space into subclasses and the updating of just one output weight vector tied to the class of the current input pattern were conrmed as benecial decisions to the overall performance of the classication system. In comparison with the Perceptron Multilayer trained with Backpropagation, proposed Madalines performance proved to be higher and this result is particularly important in view that the MLP is

123

Neural Comput & Applic (2010) 19:421436

435 9. Sun CC, Arr GS, Ramachandran RP, Ritchie SG (2004) Vehicle reidentication using multidetector fusion. IEEE Trans Intell Transp Syst 5(3):155164 10. Oh S, Ritchie SG, Oh C (2002) Real time trafc measurement from single loop inductive signatures. In: 81st annual meeting of the Transportation Research Board, Washington, DC 11. Sun C, Ritchie SG, Oh S (2003) Inductive classifying articial network for vehicle type categorization. Comput-Aided Civil Infrastruct Eng 18:161172 12. Sun C (2000) An investigation in the use of inductive loop signatures for vehicle classication. California PATH Research Report-UCB-ITS-PRR-2000-4 13. Ki Y-K, Baik D-K (2006) Vehicle-classication algorithm for single-loop detectors using neural networks. IEEE Trans Veh Technol 55(6):17041711 14. Oh C, Ritchie SG (2007) Recognizing vehicle classication information from blade sensor signature. Pattern Recognit Lett 29(9):10411049 15. Widrow B, Lehr MA (1990) 30 years of adaptive neural networks: perceptron, Madaline and backpropagation. Proc IEEE 78(9):14151442 16. Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplied least squares procedures. Anal Chem 36:1627 1639 17. Lima GRT, Silva JDS, Saotome O (2007) Morphological identication of inductive signatures using an Adaptive Resonance Theory based learning scheme. Learn Nonlinear Model-SBRN 5(1):2035 18. Widrow B, Hoff ME (1960) Adaptive switching circuits, 1960 IRE Western electric show and convention record, Part 4, pp 96 104 19. Winter R, Widrow B (1988) Madaline Rule II: a training algorithm for neural networks. In: IEEE International conference on neural networks, pp 401408 20. Ridgway WC III (1962) 30 An adaptive logic system with generalizing properties. PhD thesis, Stanford Electronics Labs Rep 1556-1, Stanford University, Stanford 21. Linden A, Kindermann J (1989) Inversion of multilayer nets. In: Proceedings of international joint conference on neural network, Washington DC, vol II, pp 425430 22. Shepanski JF (1988) Fast learning in articial neural systems: multilayer perceptron training using optimal estimation. In: Proceedings of IEEE international conference on neural network, San Diego, vol I, pp 465472 23. Song J, Hassoun MH (1990) Learning with hidden targets. In: Proceedings of international joint conference on neural network, San Diego, vol 3, pp 9398 24. Nilsson NJ (1990) The mathematical foundations of learning machines, Sect. 6.3. Morgan Kauffmann, San Mateo 25. Auer P, Burgsteiner H, Maass W (2008) A learning rule for very simple universal approximators consisting of a single layer of Perceptrons. Neural Netw 21:786795 26. Fausett L (1994) Fundamentals of neural networks: architectures, algorithms and applications. Prentice-Hall, Upper Saddle River 27. Rumelhart DE, McClelland JL (1986) Parallel distributed processing: explorations in the microstructure of cognition, vol 1: foundations. MIT Press, Cambridge 28. http://www.isis.ecs.soton.ac.uk/resources/svminfo/ 29. Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice-Hall, Upper Saddle River 30. Gori M, Tesi A (1992) On the problem of local minima in backpropagation. IEEE Trans Pattern Anal Mach Intell 14(1):7686 31. Bi W, Wang X, Zeng T, Tamura H (2005) Avoiding the local minima problem in backpropagation algorithm with modied error function. IEICE Trans Fundam E88-A(12):36453653

probably the most widely used neural network in classication and recognition tasks just due to its already proven power. SVM classier yielded better results than Madaline, but this result was already expected because the operation SVM is supported by a strong theoretical base that allows overcome difculties such as class overlapping and data scarcity and, on the contrary, supervised learning neural network can be quite sensitive to these difculties. Even so, in the cases where a large and representative database is available, proposal Madaline may be more attractive taking into account trade-offs between complexity and accuracy. The article presents a novelty in vehicle classication using inductive signatures. The authors found no reports of other works where Madaline neural networks are used to recognize this kind of signal. Good solutions for trafc management systems have undeniable importance in daily life today and the possibility that reliable solutions to practical and relevant problems (for instance, roads automation) could be developed from studies like this emphasizes still more the importance of the good results reported in this article. The Madaline learning algorithm described in this article is just a preliminary version but the results are encouraging and, hence, more elaborate versions of this heuristic learning algorithm should be developed and tested in subsequent stages of our research.

References
1. Duda RO, Hart PE, Stork DG (2000) Pattern classication, 2nd edn. Wiley, New York 2. Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford 3. US Department of Transportation/Federal Highway Administration (2006) Trafc detector handbook, 3rd edn, vol 1, chap 2, pp 156 4. Mimbela LEY, Klein LA (2003) A summary of vehicle detection and surveillance technologies used in intelligent transportation systems. Vehicle Detector Clearinghouse ProjectFederal Highway Administrations (FHWA) Intelligent Transportation Systems Program Ofce 5. Sun C, Ritchie SG (1999) Individual vehicle speed estimation using single loop inductive waveforms. J Transp Eng 125(6):531 538 6. Ki Y-K, Baik D-K (2006) Model for accurate speed measurement using double-loop detectors. IEEE Trans Veh Technol 55(4):10941101 7. Gajda J, Sroka R, Stencel M, Wajda A, Zeglen T (2001) A vehicle classication based on inductive loop detectors. In: Proceedings of the 18th IEEE instrumentation and measurement technology conference, Budapest, pp 460464 8. Oh C, Tok A, Ritchie SG (2005) Real-time freeway level of service using inductive-signature-based vehicle reidentication system. IEEE Trans Intell Transp Syst 6(2):138146

123

436 32. Xiong J-J, Zhang H (2003) Research on the problem of neural network convergence. In: Proceedings of the IEEE second international conference on machine learning and cybernetics, Xi-an-China, vol 2, pp 11321134 33. Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

Neural Comput & Applic (2010) 19:421436 34. Vapnik VN (1991) Principles of risk minimization for learning theory. Neural Inf Process Syst 4:831838 35. Mitra P, Murthy CA, Pal SK (2005) Active support vector learning with statistical queries. Support vector machines: theory and applications. Springer, Berlin, pp 99111

123

S-ar putea să vă placă și