Documente Academic
Documente Profesional
Documente Cultură
INTRODUCTION
Preprocessing partitions the image into isolated objects and it may also scale the
image to allow focus on the object. This part is not been done in the example
program yet and may be considered in future application.
Feature extraction extract high level information about individual pattern to facilitate
recognition..
The classifier identifies the category to which the pattern belongs or in general the
attributes associated with the given pattern. In this thesis, artificial neural network is
chosen as classifier.
Subsequently, this method was used for pattern recognition by Alt [2] in 1962, ship
identification by Smith and Wright [20] in 1971, aircraft identification by Dudani et
al, [7] in 1977, pattern matching by Dirilten [6] in 1977 and scene matching by Wong
and Hall [23] in 1978. Sadjadi [19] extended the definition of moments into three
dimensions and derived the corresponding invariants. Teague [21] introduced the
concept of using orthogonal moments like Zernike and Legendre in 1980. He also
gave a treatment on the properties of lower order moments and outlined the
relationship between regular moments and Zernike Moments. Reddi [17] presented
the concept of radial and angular moments without the aid of the theory of algebraic
invariants in 1981. A general notion of complex moments was introduced by AboMostafa [1] in 1984 together with their relationship to regular moments and analysed
their properties in terms of information redundancy and noise sensitivity.
Lu[13]. Teh and Chin [22] presented a paper on the analysis of the method of
moments by addressing certain fundamental questions such as image representation
ability, noise sensitivity and information redundancy. Reiss [18] revised the
fundamental theorem of moments invariants in 1991, Belkasim et al [4] presented a
study of moments under noisy conditions for pattern recognition comprising of
Zernike, Pseudo-Zernike, Normalised-Zernike, Normalised-Pseudo-Zernike, TeagueZernike, Hu Moment Invariants and Regular Moment Invariants in 1991. Pawlak
[16] studied the reconstruction aspects of moment descriptors in 1992. Bailey and
Srinath [3] used a number of orthogonal polynomials for the recognition of
handwritten Arabic numerals in 1996. Liao and Pawlak [14] performed an analysis of
the error caused by discretisation and noise on moment descriptors and proposed
several new techniques to increase their efficiency and accuracy. Gruber and Hsu
[10] analysed the effects of noise with non-zero mean on moment-based image
normalisation methods and suggested some modifications to reduces noise
sensitivity, of these improved moments are treated to validate the proposed methods.
This thesis analyses the use of functions of Cartesian moments, Legendre moments
and Zernike moments in pattern recognition problem. An introduction to all of the
moments mentioned earlier is presented. Hu moments invariant which is obtained
from Cartesian moment is also discussed. The best moment as features descriptor is
determined by experiment and their robustness in noisy environment is also
experimented.
This thesis consists of seven chapters inclusive of the first chapter on introduction
and a final chapter devoted to conclusion.
Chapter 2 will treat a discussion on both non orthogonal moments and orthogonal
moments. The orthogonal moment discussed is Cartesian moment while orthogonal
moments are Zernike and Legendre moment. This chapter gives mathematical
foundation behind these moments. Method of Hu moment invariant is also
mentioned.
Chapter 4 is where each of the moment performances is tested with the multilayer
perceptron neural network. The data set used, preprocessing applied and testing
method is explained. Result obtained from the experiment is discussed.
For chapter 5, various parameter used in multilayer perceptron for pattern recognition
is analyzed. Effect of normalization of input, modifying the value of learning rate and
momentum, and usage of two layers of hidden neurons is experimented and result
obtained is presented and discussed.
The final chapter is conclusion. Here all of the experimental results are concluded
and summarized. Various suggestions for future works are also presented.
CHAPTER TWO
STATISTICAL MOMENTS
In both case, it is preferable that the features selected are insensitive to variations
such as changes in shift, orientation and size.
Moments concept not only widely used in areas of pattern recognition but also in
statistical theory. This can be seen in analogy between the two areas. In statistics,
moments of order zero, one and two probability function represents total probability,
expectation and variance respectively. As for pattern recognition, take a twodimensional image, the geometric moment function of order zero, one and two can
provide the shape information. The information can be used to generate features
which are invariant to image translation, rotation and scaling.
with p,q Number as order indices (x,y) are Cartesian co-ordinates; f is a nonnegative intensity function with bounded and compact support so that integration
within the available image plane is sufficient to gather all the signal information. The
kernel function or the basis set,
integral plane. The image planes will be considered as finite and all integrals is
designated over this finite plane.
Since there are many useful feature descriptors can be obtained more easily from a
binary representation, all moment computation will used this binary representation..
If the image is a grey-level type image, boundary threshold will apply to obtain
binary images. A certain grey-level number will be chosen to segregate object pixels
and background pixels. This type of image is known as binary image and the moment
is known as silhouette moments.
Cartesian moments are the simples among moment function and combination of easy
to understand and ease implementation caused it to be popular in many applications.
Cartesian moments are defined with the basis set in (1) replaced by
the Cartesian moments,
. Therefore
Region-based feature usually require a point from which features may be derived.
The centroid usually used to specify location of object. It is the point that the sum of
the square of the distance from it to all other points within object is at minimum. The
centroid co-ordinates can be expressed by using the order moment as shown below:
Cartesian moments have limitation on their usefulness since they are not invariant to
different positions with respect to the origin; the central moment is used to
circumvent the problem.
The moment
translated by an amount
as
To enable invariance to scale , normalized moments. Two dimensional scalenormalized centralized moments are used [25], given by
The centralized geometric moments are translation invariant and can be normalized
with respect to changes in scale. Though, to enable invariance to rotation they require
reformulation. Hu [11] described two different methods for producing rotation
invariant moments. The first used a method called principal axes. The second method
Hu described is the method of absolute moment invariants is discussed here. Hu
derived these expressions from algebraic invariants applied to the moment generating
function under a rotation transformation. They consist of groups of nonlinear
centralized moment expressions. The result is a set of absolute orthogonal (i.e.
rotation) moment invariants, which can be used for scale, position, and rotation
invariant pattern identification. They are computed from normalized centralized
moments up to order three are shown below:
The reader is referred to [11] for a complete analysis of the moment invariants. These
moments are of finite order therefore unlike centralized moments they do not
comprise of a complete set of image descriptors [26] however higher order invariants
can be derived [4, 11].
8
Cartesian moments (equation 2.2) are formed using a monomial basis set,
which is non-orthogonal and this is passes onto the Cartesian moments. These
monomials increase rapidly in range as the order increases, producing highly
correlated descriptions. Thus important descriptive information will be contained
within small differences between moments which leads to the need for high
computational precision. Moments produced using orthogonal basis sets do exist and
these orthogonal moments have the advantage of needing lower precision to
represent the differences to the same accuracy as the monomials. Orthogonality
means mutually perpendicular, expressed mathematically- two functions
are orthogonal over an interval
and
Since primary interest of this thesis is discrete images, so the integrals within
moment descriptors are replaced by summations. Sequence of polynomials which
orthogonal with respect to integrations are also orthogonal with respect to
summations [28]. Two such orthogonal moments, Legendre and Zernike, will be
discussed in this thesis.
where
and
is the
is defined over the same interval as the basis set, where the
and
and
(2.10)
The Zernike polynomials were first proposed in 1934 by Zernike [9]. Complex
Zernike moments [15] are constructed using a set of complex polynomials which
form a complete orthogonal basis set defined on the unit disc
will expressed them as
where
Here we
and
is true.
10
coordinates is:
where
and
is the orthogonal
where
where
and it must be noted that if the conditions in equation
2.13 are not met, then
In this chapter, we have established what a statistical moment is and a general theory
of moment have been presented. Moments can be divided to two different categories,
non-orthogonal and orthogonal moments.
Zernike moments and Legendre moments which are orthogonal moments have also
been introduced in this chapter. Equation to calculate both moments have also been
discussed.
11
CHAPTER 3
In this chapter we will introduce the concept of Artificial Neural Network (ANN).
The chapter begins with general discussion on Neural Network before moving on to
the Multilayer Perceptron model network. The Back propagation Method which is
used to train the Multilayer Perceptron is described next.
Term Neural Network is taken from neural systems or nervous systems of living
creatures. It is also sometimes referred to as connectionist models, parallel distributed
processors or neuro-computers. Neural network is basically an information
processing systems and can be thought as a black box device that accept inputs and
produce output.
Central to system of human nervous system is the brain, represented by neural net,
which continually receives information, perceives it, and makes appropriate decision.
Nerve cells in the brain are called neurons, which itself can make contact with
several thousand other neurons. Neurons are the unit which the brain uses to process
information. It is estimated that there are approximately 10 billion neurons in human
cortex. Since neuron is the building block of the brain and is also essential to the
formation of artificial neural network concept, it will be studied in better details here.
Dendrites - Dendrites are hair like extensions of a neuron, and each dendrite
can bring some input to the neuron (from neurons in the previous layer).
These inputs are given to the soma.
12
Soma - Soma is responsible for processing these inputs, and the output is
provided to other neurons through the axon and synapses.
Axon - The axon is responsible for carrying the output of soma to other
neurons, through the synapses
The dashed line shows the axon hillock, where transmission of signals starts
The boundary of the neuron is known as the cell membrane. There is a voltage
difference (the membrane potential) between the inside and outside of the membrane.
If the input is large enough, an action potential is then generated. The action potential
(neuronal spike) then travels down the axon, away from the cell body.
13
The connections between one neuron and another are called synapses. Information
always leaves a neuron via its axon (see Figure 3.1 above), and is then transmitted
across a synapse to the receiving neuron.
Neurons only fire when input is bigger than some threshold. It should, however, be
noted that firing doesn't get bigger as the stimulus increases, its an all or nothing
arrangement.
14
Spikes (signals) are important, since other neurons receive them. Neurons
communicate with spikes. The information sent is coded by spikes.
The block diagram of figure 3.4 shows the model of a artificial neuron, which form
the basis of designing artificial neural networks. Table 3.1 list three basic elements of
neuron model based on basic design in figure 3.4.
15
is multiplied by
where
of neuron ;
is the
the induced local field . Here we present three basic types of activation function:
Figure 3.5(a):
2. Piecewise Linear Function; For piecewise-linear function b we have :
3.5(b)Sigmoid Function.
3.2.3 ARCHITECTURE
How neurons are structured is intimately linked with learning algorithm used to train
the network. Therefore, design of neural networks may be defined as structured. The
18
layer. If some of the synaptic connections are missing from the network, the
network is classified as partially connected.
3. Recurrent networks
Recurrent network differ with other type of feed forward neural network in
that it has at least one feedback loop. For example, a recurrent network may
consist of a
network and on its performance. Moreover, the feedback loops involve the
use of particular branches composed of unit delay-elements which result in a
nonlinear dynamical behavior.
20
There are a lot of Neural network by now and nobody knows exactly how many.
Below is listed some of the well-known method.
There are two main kinds of learning algorithms which is supervised and
unsupervised [30].
In supervised learning, the correct results are known and are given to the
neural networks during training so that the neural networks can be adjusted to
try match its output to the target values. After training, the neural network is
tested by giving it only input values, not target values, and seeing how close it
comes to outputting the correct target values.
In unsupervised learning, the neural networks are not provided with correct
result during training.
Supervised
Unsupervised
Feedforward
Competitive
Linear
Vector Quantization
Perceptron - Rosenblatt
(1976)
Self-Organizing Map
Kohonen - Kohonen
(1995), Fausett (1994)
Fausett (1994)
Grossberg - Grossberg
Backprop - Rumelhart,
(1991b)
21
(1986)
Classification only
Regression only
GNN: General
Regression Neural
Network - Specht (1991),
Nadaraya (1964), Watson
(1964)
BAM: Bidirectional
Associative Memory -
(1994)
Competitive
Neocognitron - Fukushima,
Autoassociation
(1994)
(1994)
Back propagation can be operated in two modes, either sequential or batch modes. In
sequential mode of operation, weight updating is performed after the presentation of
each training example. While in the batch mode, weight updating is performed after
the presentation of all the training examples that constitute an epoch. Despite
sequential mode of back-propagation learning has several disadvantages, it is highly
popular ( and used for application of pattern recognition in this thesis) for two
important practical reason[31]:
For sequential mode of operation, the algorithm cycle through the training sample
as follows:
distribution whose mean is zero and whose variance is chosen to make the
standard deviation of the induced local fields of the neurons lie at the
transition between the linear and saturated parts of the sigmoid function.
2. Presentations of Training Examples. The network is presented with an epoch
in layer
is:
in layer is:
23
If neuron
by:
for neuron in output layer L.
for neuron
in
hidden layer
where the prime denotes differentiation with respect to the argument. The
synaptic
where
Once, the training phase is completed, application of the network involves only the
computations of the feed forward phase.
24
indices
for iteration
25
at iteration .
where
In order to avoid entering the local minimum, the learning parameters, the number of
hidden nodes, or initial value of connecting weights may be changed. The techniques
basically trying to change scenario involved with moving about the terrain.
26
The learning rate constant is proportional to the change of the connection weights to
the gradient of the error with respect to the weights. The larger the constant, the
larger the changes in the connection weights. Usually a value is selected as high as
possible without leading to oscillation.
A few methods discussed here can probably improve the back propagation
algorithms performance.
weight space computed by the method of steepest descent. The smaller the learning
rate parameter, the smaller the changes to the synaptic weight in the network will be
form one iteration to the next which will lead to smoother trajectory in weight space.
Though this improvement is attained at the cost of slower rate of learning. While for
large learning rate, the network may become unstable ( i.e. oscillary). Simple method
is to modify the delta rule by including a moment term, as shown by Rumelhart et al.
1986a[32].
27
also have the benefit of preventing the learning process from terminating in a shallow
local minimum on the error surface.
Normalize the input
"Normalizing" a vector most often means dividing by a norm of the vector,
for example, to make the Euclidean length of the vector equal to one. In the artificial
neural network literature, "normalizing" also often refers to rescaling by the
minimum and range of the vector, to make all the elements lie between 0 and 1.
Following is equation used to normalize the input value.
Target Value.
It is important for the desired response to be chosen within the range of
sigmoid activation function. Otherwise the backpropagation algorithm tends to drive
the free parameters of the network to infinity which lead to slow down of the learning
process by driving the hidden neurons into saturation.
Concept of multilayer perceptron and the algorithm have been discussed in details
since it will be implemented as classifier for pattern recognition in later chapter.
Methods to improve the performance of back propagation algorithm have also been
discussed.
28
CHAPTER FOUR
In this chapter, the classification power of all the moments discussed in chapter 2,
namely Cartesian moments, Hu moment invariant, Zernike moments and Legendre
moment will be experimentally tested and the results are reported. Noise sensitivity
of all of the features is also examined.
A data sets consist of Roman alphabet with different font used are generated.
It consist of 10 lower case Roman characters from A to J. Ten different set of
fonts ( for a total of 100 images) is used as
4.1 shows the 10 different alphabet for the first set of font.
Figure 4.1: Ten data of alphabet for one set of font. (Size scaled to be bigger).
29
The multilayer perceptron with back propagation learning will be used as classifiers.
All of the details regarding algorithm of back propagation and multilayer perceptron
have been discussed earlier therefore will not discussed here.
All of the features for each will be feed forwarded into the input of the neural
networks while 10 output will be used, one for each class of alphabet. 50 of the
images in the data set will be used for training (where the value of its output is given)
and another 50 will be used as the training set. For the training phase, each data will
have the class it belong (the output) set to 1, while all others is set to 0.
Six different numbers of hidden units of 0, 5, 10, 15, 20 and 25 are used in
classification experiment. The training error rate is set at 0.001 for all experiments
while the value of the learning rate and momentum is both set at 0.1.
Experiment to test for noise tolerant of each moment is also conducted. All of the
parameters is the same as previous experiment, except this time only one set of
hidden units will be used, 15. The network will be trained with the same 50 training
images as previous experiment, but the 50 test data will be inserted random noise
valued 0.1%, 0.2%,0.3%,0.4% and 0.5%.
30
HMI
Values
0.3742
0.005225
0.0340
0.0011
-0.0034
-0.0011
-1.43E-1
CM
Values 0.485085 0.354145
0.032151
0.02072 0.0843867
0.02568 0.00211
ZM
Values 2.09919 0.262626 0.629776 0.361196 0.366395 0.381796 0.42044
ZM
Values
0.703121
0.42901
0.701442
LM
Values
0.0052
0.00975
-0.0759
-0.0402
-0.18954
0.09376
0.028075
LM
Values
0.13361
0.137221
-0.19553
0.10315
0.0641634
Table 4.1: List for all moments that have been computed from a data in the
dataset.
31
Figure 4.2 shows the result for all of the moment tested with neural network. From
the graph, it is shown that Legendre moment has the most recognition rates for all
value of hidden neurons. Performance of Legendre moments ( also happen to other
set of moments) decreased after 15 hidden neuron probably due to over fitting by the
neural networks due to small set of data used to train the network.
32
From the experimental result, Legendre moments are recognized as the best feature
descriptor. This is probably due to orthogonality of the moment. Performances of the
moments in presence of random noises are also tested and once again Legendre
moments is shown superior compared to other type of moments.
33
CHAPTER FIVE
Since classifier also played an important role in pattern recognition, in this chapter,
analysis will be done on multilayer perceptron with back propagation learning rule
performances when certain parameter of the neural network is modified.
All of the setup of the neural network is the same as that have been used in chapter 4
to determine the performance of different moments earlier. The only different here
34
are the inputs. All of the normalize inputs is put through the same method that have
been mentioned earlier and the results is observed.
Table 5.2 is the results obtained by normalizing the input while table 5.1 is the results
with regular inputs (without any normalization). It is shown here that normalization
has improved the performances of all moments except Legendre moments.
Performances of Hu Moments Invariants and Cartesian moments have improved by
at least 10 percent for each value of hidden neurons. The normalization of inputs also
seems to make the neural networks less prone to over fitting as the results is more
consistent for all value of hidden neurons.
35
In back propagation, too low a learning rate makes the network learn very slowly.
Too high a learning rate makes the weights and objective function diverge, so there is
no learning at all. Therefore care must be taken to choose the right value learning
rate.
36
Two set of experiment will be done, one varying the value of learning rate while
setting the momentum value to be constant at 0.1. While for another experiment,
value of momentum is varied while learning rate is set to be constant at 0.1. Same
data set as have been used in previous experiment is used. The hidden neurons are
set at 15 and threshold value at 0.001. A normalized value of Legendre moments is
used as inputs since from earlier experiment; it is shown to be the best features
descriptor.
Results obtain in figure 5.3 shown that the lower value of the learning rate, the higher
the recognitions rate. Though with lower learning rate, comes the problem of large
number of iteration needed to reach the required threshold value. As table 5.1 shows,
758709 number of iterations is needed to reach threshold for learning rate of 0.01
compared to 10793 and 1225 iterations for 1 and 5, respectively. Since pattern
recognition utmost concerns is on the recognition ability, it is recommended that
small value should be chosen as learning rate though consideration must be taken on
the learning rate ( number of iteration needed to reach threshold).
From figure 5.4, it is shown that momentum does help improve the recognition rate
since there is an increase of 6 percent from the result when momentum value of 0.1 is
used from 0. Although, a large value of momentum does make the recognition
performance decrease. As for effect of momentum on learning rate, from table 5.2, it
is shown that higher value of momentum will decrease the number of iterations
required to reach the threshold value.
37
Learning
Rate
0.01
0.1
10
No. of
Iteration
758709
82726
10793
1225
625
Table 5.1 Number of Iteration Needed to Reach Threshold Value with Different
Value of Learning Rate and Constant Momentum
38
Momentum
0.1
10
No. of Iteration
82609
61109
41909
13812
7655
Table 5.2 Number of Iteration Needed to Reach Threshold Value with Different
Value of Momentum and Constant Learning Rate
From the results obtain, it is thus observed that inclusion of momentum does bring
positive effect on performance of back propagation on both learning rate and
recognition performance though care must be taken to choose the right value.
all of the moments give better recognition performance except for Legendre
moments.
40
CHAPTER SIX
CONCLUSION
This thesis has discussed the use of statistical moments and artificial neural network
in the area of pattern recognition. Mathematical foundation behind Cartesian
moments, Hu moment invariant, Legendre moments and Zernike moments have been
discussed. Discussion on multilayer perceptron with back propagation learning as
classifier has also been done in chapter 3 and suggestion to improve the neural
networks performance have been discussed.
In chapter 5, effect of normalization of inputs and different value of learning rate and
momentum on multilayer perceptron with back propagation learning are also
analyzed. Normalization of inputs makes the entire moment classification rate
increased except for Legendre moment. It is also found that normalization of inputs
make the neural network less susceptible to over fitting phenomena. From the results
obtained, it is also concluded that higher classification rate can be achieved by
making the learning rate smaller though a lot more iteration will be needed to achieve
the threshold value. It is also shown that presence of momentum in learning can
improve the classification performance and decrease amount of iteration needed to
achieve threshold value. Higher momentum will make less iteration needed for the
neural network to reach threshold though that will lead to lower recognition rate.
41
In this thesis, use of moments and multilayer perceptron neural network certain type
of problem. Though it is shown that it can work, lots of task can still be done to
improve them further. In addition, more work has to be performed to ensure that
these methods can work in other situation as well. Some of task suggested are:
Small data set is usually one of the reasons that lead to over fitting. Therefore a larger
data set probably can make the neural network less prone to over fit.
The data set used in this thesis is only from word processing fonts. The subject of
hand-written characters could be looked into.
42
APPENDIX A
All coding is done using C++ programming language and IDE used is Visual Studio
2010.
43
0,1,xbar,ybar
2,0,xbar,ybar
0,2,xbar,ybar
2,1,xbar,ybar
1,2,xbar,ybar
3,0,xbar,ybar
0,3,xbar,ybar
1,1,xbar,ybar
0,0,xbar,ybar
));
));
));
));
));
));
));
));
));
44
45
46
APPENDIX B
Code used under license with license from The GNU General Public License
(GPLv3). Modified from original at
http://www.codeproject.com/KB/recipes/BP.aspx
Header file:
//////////////////////////////////////////////
//
Fully connected multilayered feed
//
//
forward
artificial neural network using
//
Backpropogation
algorithm for training.
//////////////////////////////////////////////
//
//
#ifndef backprop_h
#define backprop_h
using namespace std;
#include
#include
#include
#include
<assert.h>
<iostream>
<stdio.h>
<math.h>
//#include <stdio.h>
#include <tchar.h>
#include <string>
//#include <iostream>
#include <cmath>
#include <cctype>
#include <cstring>
#include <fstream>
#include <cstdlib>
#include <iomanip>
class CBackProp{
//
//
//
47
//
//
no of layers in net
including input layer
int numl;
//
//
//
learning rate
double beta;
//
momentum parameter
double alpha;
//
//
//
squashing function
double sigmoid(double in);
public:
~CBackProp();
//
//
//
//
//
};
#endif
CPP file:
#include
#include
#include
#include
"BackProp.h"
<time.h>
<stdlib.h>
"stdafx.h";
//
initializes and allocates memory on heap
CBackProp::CBackProp(int nl,int *sz,double b,double a):beta(b),alpha(a)
{
//
set no of layers and their sizes
numl=nl;
lsize=new int[numl];
48
for(int i=0;i<numl;i++){
lsize[i]=sz[i];
}
//
allocate memory for output of each neuron
out = new double*[numl];
for( int i=0;i<numl;i++){
out[i]=new double[lsize[i]];
}
//
allocate memory for delta
delta = new double*[numl];
for(int i=1;i<numl;i++){
delta[i]=new double[lsize[i]];
}
//
allocate memory for weights
weight = new double**[numl];
for(int i=1;i<numl;i++){
weight[i]=new double*[lsize[i]];
}
for(int i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
weight[i][j]=new double[lsize[i-1]+1];
}
}
//
allocate memory for previous weights
prevDwt = new double**[numl];
for(int i=1;i<numl;i++){
prevDwt[i]=new double*[lsize[i]];
}
for(int i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
prevDwt[i][j]=new double[lsize[i-1]+1];
}
}
//
seed and assign random weights
srand((unsigned)(time(NULL)));
for(int i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
for(int k=0;k<lsize[i-1]+1;k++)
weight[i][j][k]=(double)(rand())/(RAND_MAX/2) 1;//32767
//
initialize previous weights to 0 for first iteration
for(int i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
for(int k=0;k<lsize[i-1]+1;k++)
prevDwt[i][j][k]=(double)0.0;
}
49
CBackProp::~CBackProp()
{
//
free out
for(int i=0;i<numl;i++)
delete[] out[i];
delete[] out;
//
free delta
for(int i=1;i<numl;i++)
delete[] delta[i];
delete[] delta;
//
free weight
for(int i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
delete[] weight[i][j];
for(int i=1;i<numl;i++)
delete[] weight[i];
delete[] weight;
//
free prevDwt
for(int i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
delete[] prevDwt[i][j];
for(int i=1;i<numl;i++)
delete[] prevDwt[i];
delete[] prevDwt;
//
free layer info
delete[] lsize;
}
//
sigmoid function
double CBackProp::sigmoid(double in)
{
return (double)(1/(1+exp(-in)));
}
//
mean square error
double CBackProp::mse(double *tgt) const
{
double mse=0;
for(int i=0;i<lsize[numl-1];i++){
mse+=(tgt[i]-out[numl-1][i])*(tgt[i]-out[numl-1][i]);
}
return mse/2;
}
//
returns i'th output of the net
double CBackProp::Out(int i) const
{
return out[numl-1][i];
}
// feed forward one set of input
void CBackProp::ffwd(double *in)
50
{
double sum;
//
assign content to input layer
for(int i=0;i<lsize[0];i++)
out[0][i]=in[i]; // output_from_neuron(i,j) Jth neuron in Ith
Layer
//
assign output(activation) value
//
to each neuron usng sigmoid func
for(int i=1;i<numl;i++){
// For each layer
for(int j=0;j<lsize[i];j++){
// For each neuron in
current layer
sum=0.0;
for(int k=0;k<lsize[i-1];k++){
// For input
from each neuron in preceeding layer
sum+= out[i-1][k]*weight[i][j][k]; // Apply weight
to inputs and add to sum
}
sum+=weight[i][j][lsize[i-1]];
// Apply bias
out[i][j]=sigmoid(sum);
// Apply
sigmoid function
}
}
}
//
backpropogate errors from output
//
layer uptill the first hidden layer
void CBackProp::bpgt(double *in,double *tgt)
{
double sum;
//
update output values for each neuron
ffwd(in);
//
find delta for output layer
for(int i=0;i<lsize[numl-1];i++){
delta[numl-1][i]=out[numl-1][i]*
(1-out[numl-1][i])*(tgt[i]-out[numl-1][i]);
}
//
find delta for hidden layers
for(int i=numl-2;i>0;i--){
for(int j=0;j<lsize[i];j++){
sum=0.0;
for(int k=0;k<lsize[i+1];k++){
sum+=delta[i+1][k]*weight[i+1][k][j];
}
delta[i][j]=out[i][j]*(1-out[i][j])*sum;
}
}
//
apply momentum ( does nothing if alpha=0 )
for(int i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
for(int k=0;k<lsize[i-1];k++){
weight[i][j][k]+=alpha*prevDwt[i][j][k];
}
51
weight[i][j][lsize[i-1]]+=alpha*prevDwt[i][j][lsize[i-1]];
}
}
//
adjust weights usng steepest descent
for(int i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
for(int k=0;k<lsize[i-1];k++){
prevDwt[i][j][k]=beta*delta[i][j]*out[i-1][k];
weight[i][j][k]+=prevDwt[i][j][k];
}
prevDwt[i][j][lsize[i-1]]=beta*delta[i][j];
weight[i][j][lsize[i-1]]+=prevDwt[i][j][lsize[i-1]];
}
}
}
52
REFERENCES
1. Abu Mostafa Y. S., Recognition aspects of moment invariants, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, No. 6, 1984,
pp. 698-706.
2 . Alt F.L., Digital pattern recognition by moments, Journal of the Assn. For
Computing Machinery, Vol. 9., No. 2, 1962, pp. 240-258.
3. Bailey R.R. and Srinath M, Orthogonal moment features for use with parametric
and non-parametric classifiers, IEEE Transaction on Pattern Anallysis and
Machine Intelligence, Vol. 18, No. 4, 1996, pp. 389-399.
4. Belkasim S. O., Shridhar, M. and Ahmadi, M. Pattern recognition with moment
invariants: A comparative study and new results, Vol. 24, 1991, pp. 1117-1138.
5. Carpenter G.A., Grossberg S., Markuzon N., Reynolds J.H., and Rosen D.B.,
Fuzzy ARTMAP: A neural network architecture for incremental supervised
learning of analog multidemnsional maps, IEEE Transactions on Neural
Networks, Vol. 3, No. 5, 1992m pp. 698-713.
6. Dirilten H., Pattern matching under affine transformations, IEEE Trans. on
Computers, Vol. 26, No. 3, 1977, pp. 314-317.
7. Dudani A. Breeding K.J. and McGhee R. B., Aircraft identification by moment
invariant, IEEE Trans. on Computers, Vol. C-26, pp. 39-45, 1977.
8 . Elliot E. B., Algebra of Quantica, Oxford University Press, New York, 2nd
edition, 1913.
9. F. Zernike. Beugungstheorie des Schneidenverfahrens und seiner verbesserten
Form, der Phasenkontrastmethode (Diffraction theory of the cut procedure and its
improved form, the phase contrast method). Physica, 1:pp. 689-704, 1934.
10. Gruber M. and Hsu K.Y., Moment based image normalization with high noise
tolerance, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.
19, No.2, 1997, pp. 136-138.
11 . Hu M. K. , Visual pattern recognition by moment invariants, IRE Transactions
on Information Theory, Vol. 8, No. 1, 1962, pp. 179-187.
12. Khotanzad A., Rotation invariant pattern recognition using Zernike moments,
International Conference on Pattern Recognition, 1988, pp. 326-328.
53
55