Sunteți pe pagina 1din 55

CHAPTER 1

INTRODUCTION

Pattern recognition is the study of how specific hardware whether it is a computer, a


machine or whatever of interest can observe the environment, learns to distinguish
pattern of interest and make sound and reasonable decisions about their categories of
the pattern. Most children as early as a few years old can recognize digits and letters.
Small characters, large characters, handwritten, machine printed or rotated - all are
easily recognized by the young. This ability taken for granted by us, until we face the
task of teaching a machine to do the same.

1.1 BASIC OF PATTERN RECOGNITION SYSTEM

Typical practical pattern recognition systems usually contain several stages in


addition to the recognition engine. Figure 1 shows a typical recognition system. [24].

Figure 1.1 all the aspects of a typical pattern recognition task.

Preprocessing partitions the image into isolated objects and it may also scale the
image to allow focus on the object. This part is not been done in the example
program yet and may be considered in future application.

Feature extraction extract high level information about individual pattern to facilitate
recognition..

The classifier identifies the category to which the pattern belongs or in general the
attributes associated with the given pattern. In this thesis, artificial neural network is
chosen as classifier.

1.2 BRIEF HISTORY OF MOMENTS AS FEATURES

As moments played an important part for image recognition, brief historic


background of moments will be discussed. One of the earliest significant paper on
the application of moments was published by Hu [11 ] in 1962. His approach was
based on the work of the nineteenth century mathematicians Boole, Cayley and
Sylvester, on the theory of algebraic forms [8]. He used regular moments to develop
a set of non-linear functions called moments-invariants that are invariant to
translation, size and rotation which he then applied to a simple character recognition
problem.

Subsequently, this method was used for pattern recognition by Alt [2] in 1962, ship
identification by Smith and Wright [20] in 1971, aircraft identification by Dudani et
al, [7] in 1977, pattern matching by Dirilten [6] in 1977 and scene matching by Wong
and Hall [23] in 1978. Sadjadi [19] extended the definition of moments into three
dimensions and derived the corresponding invariants. Teague [21] introduced the
concept of using orthogonal moments like Zernike and Legendre in 1980. He also
gave a treatment on the properties of lower order moments and outlined the
relationship between regular moments and Zernike Moments. Reddi [17] presented
the concept of radial and angular moments without the aid of the theory of algebraic
invariants in 1981. A general notion of complex moments was introduced by AboMostafa [1] in 1984 together with their relationship to regular moments and analysed
their properties in terms of information redundancy and noise sensitivity.

Despite emergence of other image and shape representative features in pattern


recognition, researches on moments are still attempted rigorously. Recently, Zernike
moments were applied to rotation invariant pattern recognition in 1988 by Khotanzad
[12] and to object identification using a neural network in 1990 by Khotanzad and
2

Lu[13]. Teh and Chin [22] presented a paper on the analysis of the method of
moments by addressing certain fundamental questions such as image representation
ability, noise sensitivity and information redundancy. Reiss [18] revised the
fundamental theorem of moments invariants in 1991, Belkasim et al [4] presented a
study of moments under noisy conditions for pattern recognition comprising of
Zernike, Pseudo-Zernike, Normalised-Zernike, Normalised-Pseudo-Zernike, TeagueZernike, Hu Moment Invariants and Regular Moment Invariants in 1991. Pawlak
[16] studied the reconstruction aspects of moment descriptors in 1992. Bailey and
Srinath [3] used a number of orthogonal polynomials for the recognition of
handwritten Arabic numerals in 1996. Liao and Pawlak [14] performed an analysis of
the error caused by discretisation and noise on moment descriptors and proposed
several new techniques to increase their efficiency and accuracy. Gruber and Hsu
[10] analysed the effects of noise with non-zero mean on moment-based image
normalisation methods and suggested some modifications to reduces noise
sensitivity, of these improved moments are treated to validate the proposed methods.

1.3 SCOPE AND CONTRIBUTION OF THE THESIS

This thesis analyses the use of functions of Cartesian moments, Legendre moments
and Zernike moments in pattern recognition problem. An introduction to all of the
moments mentioned earlier is presented. Hu moments invariant which is obtained
from Cartesian moment is also discussed. The best moment as features descriptor is
determined by experiment and their robustness in noisy environment is also
experimented.

Analysis on multilayer perceptron with back propagation learning which an artificial


neural network as classifier is has also been conducted. The phenomenon of over
fitting is studied by experimenting the neural network with different values of hidden
neuron. Effect of normalization of inputs on recognition performances of neural
network is also discussed and result presented. Study on effect of learning rate and
momentum on back propagation learning and recognition rate has also been done.

1.4 ORGANISATION OF THE THESIS

This thesis consists of seven chapters inclusive of the first chapter on introduction
and a final chapter devoted to conclusion.

Chapter 2 will treat a discussion on both non orthogonal moments and orthogonal
moments. The orthogonal moment discussed is Cartesian moment while orthogonal
moments are Zernike and Legendre moment. This chapter gives mathematical
foundation behind these moments. Method of Hu moment invariant is also
mentioned.

In chapter 3, introduction on the classifiers used in this thesis, artificial neuron


network, specifically multilayer perceptron with back propagation learning algorithm
is presented. Biological neural networks which inspired the advent of artificial neural
networks is discussed briefly. Details of architecture of artificial neural networks,
algorithm for multilayer perceptron and back propagation learning is discussed in
details as it will all be applied in classification experiment later. Several methods to
improve the multilayer perceptron is also presented.

Chapter 4 is where each of the moment performances is tested with the multilayer
perceptron neural network. The data set used, preprocessing applied and testing
method is explained. Result obtained from the experiment is discussed.

For chapter 5, various parameter used in multilayer perceptron for pattern recognition
is analyzed. Effect of normalization of input, modifying the value of learning rate and
momentum, and usage of two layers of hidden neurons is experimented and result
obtained is presented and discussed.

The final chapter is conclusion. Here all of the experimental results are concluded
and summarized. Various suggestions for future works are also presented.

CHAPTER TWO

STATISTICAL MOMENTS

In this chapter, we will concentrate on laying the foundation to understand two


dimensional geometrical moments and also two dimensional Legendre moments.
This chapter starts with introduction to general definition of moment. Then, the
functions that are used to normalize Cartesian moments to obtain invariance to
translation, rotation and scaling are explained. Method of moments by Hu[ 11] will
also be discussed.

Ability to describe or represent a given image in a form that is suitable for


mathematical manipulation and computer processing have always been the goal in
the field of pattern recognition. Generally, there are two choices for representing an
image:

Image represented based on its external characteristics such as its boundary.

Image represented in terms of its internal characteristics such as pixel


comprising the image.

In both case, it is preferable that the features selected are insensitive to variations
such as changes in shift, orientation and size.

Moments concept not only widely used in areas of pattern recognition but also in
statistical theory. This can be seen in analogy between the two areas. In statistics,
moments of order zero, one and two probability function represents total probability,
expectation and variance respectively. As for pattern recognition, take a twodimensional image, the geometric moment function of order zero, one and two can
provide the shape information. The information can be used to generate features
which are invariant to image translation, rotation and scaling.

2.1 GENERAL DEFINITION OF MOMENTS

Moments are applicable to many different aspects of image processing. When


applied to images, they describe the image distribution with respect to its axes. They
are designed to capture both global and detailed geometric information about the
image.
If an images can be thought as two dimensional density functions, then the general
moment definition is given by:

with p,q Number as order indices (x,y) are Cartesian co-ordinates; f is a nonnegative intensity function with bounded and compact support so that integration
within the available image plane is sufficient to gather all the signal information. The
kernel function or the basis set,

pq is a continuous function of (x,y) defined in the

integral plane. The image planes will be considered as finite and all integrals is
designated over this finite plane.

Since there are many useful feature descriptors can be obtained more easily from a
binary representation, all moment computation will used this binary representation..
If the image is a grey-level type image, boundary threshold will apply to obtain
binary images. A certain grey-level number will be chosen to segregate object pixels
and background pixels. This type of image is known as binary image and the moment
is known as silhouette moments.

2.2 CARTESIAN MOMENTS

Cartesian moments are the simples among moment function and combination of easy
to understand and ease implementation caused it to be popular in many applications.
Cartesian moments are defined with the basis set in (1) replaced by
the Cartesian moments,

. Therefore

can be expressed as:


6

where the parameters are as defined in (1).


Digital images with M x M array of pixels, double integration of equation (2) can be
approximated by summation formulas. Which translate to

The zero order moment

is defined as the total mass of the image.

Region-based feature usually require a point from which features may be derived.
The centroid usually used to specify location of object. It is the point that the sum of
the square of the distance from it to all other points within object is at minimum. The
centroid co-ordinates can be expressed by using the order moment as shown below:

2.2.1 INVARIANCE TO TRANSLATION AND SCALE

Cartesian moments have limitation on their usefulness since they are not invariant to
different positions with respect to the origin; the central moment is used to
circumvent the problem.
The moment

translated by an amount

Thus the central moments


and

, are defined as,

can be computed from (2) on substituting

as

To enable invariance to scale , normalized moments. Two dimensional scalenormalized centralized moments are used [25], given by

2.2.2 HU MOMENT INVARIANT

The centralized geometric moments are translation invariant and can be normalized
with respect to changes in scale. Though, to enable invariance to rotation they require
reformulation. Hu [11] described two different methods for producing rotation
invariant moments. The first used a method called principal axes. The second method
Hu described is the method of absolute moment invariants is discussed here. Hu
derived these expressions from algebraic invariants applied to the moment generating
function under a rotation transformation. They consist of groups of nonlinear
centralized moment expressions. The result is a set of absolute orthogonal (i.e.
rotation) moment invariants, which can be used for scale, position, and rotation
invariant pattern identification. They are computed from normalized centralized
moments up to order three are shown below:

Finally a skew invariant, to help distinguish mirror images:

The reader is referred to [11] for a complete analysis of the moment invariants. These
moments are of finite order therefore unlike centralized moments they do not
comprise of a complete set of image descriptors [26] however higher order invariants
can be derived [4, 11].
8

2.3 ORTHOGONAL MOMENTS

Cartesian moments (equation 2.2) are formed using a monomial basis set,
which is non-orthogonal and this is passes onto the Cartesian moments. These
monomials increase rapidly in range as the order increases, producing highly
correlated descriptions. Thus important descriptive information will be contained
within small differences between moments which leads to the need for high
computational precision. Moments produced using orthogonal basis sets do exist and
these orthogonal moments have the advantage of needing lower precision to
represent the differences to the same accuracy as the monomials. Orthogonality
means mutually perpendicular, expressed mathematically- two functions
are orthogonal over an interval

and

if and only if:

Since primary interest of this thesis is discrete images, so the integrals within
moment descriptors are replaced by summations. Sequence of polynomials which
orthogonal with respect to integrations are also orthogonal with respect to
summations [28]. Two such orthogonal moments, Legendre and Zernike, will be
discussed in this thesis.

2.3.1 LEGENDRE MOMENTS

The Legendre moments [18] of order

where

and

are defined as:

are the Legendre polynomials and

is the

continuous image function. The Legendre polynomials are a complete orthogonal


9

basis set defined over the interval


the image function

. For orthogonality to exist in the moments,

is defined over the same interval as the basis set, where the

order Legendre polynomial is defined as:

and

are the Legendre coefficients given by:

where n-j = even.


So, for a discrete image with current pixel

and

(2.10)

,equation 2.8 becomes:

are defined over the interval

2.3.2 COMPLEX ZERNIKE MOMENTS

The Zernike polynomials were first proposed in 1934 by Zernike [9]. Complex
Zernike moments [15] are constructed using a set of complex polynomials which
form a complete orthogonal basis set defined on the unit disc
will expressed them as

where

Here we

. Two dimensional Zernike moments:

and defines the order,

and denotes the complex conjugate. While

is the function being described


is an integer (can be positive or

negative) depicting the angular dependence, or rotation, subject to the conditions:

and

is true.

10

The Zernike polynomials [9]

Zernike polynomial expressed in polar

coordinates is:

where

are defined over the unit disc,

radial polynomial, defined as

and

is the orthogonal

Orthogonal radial polynomial:

where

where
and it must be noted that if the conditions in equation
2.13 are not met, then

2.4 CHAPTER SUMMARY

In this chapter, we have established what a statistical moment is and a general theory
of moment have been presented. Moments can be divided to two different categories,
non-orthogonal and orthogonal moments.

Cartesian moment which is in categories of non-orthogonal moments has been


discussed. Derivation to get Cartesian moment invariant to translation and scale has
been shown. Hu moments invariant which make Cartesian moment invariant to
rotation and skew have been discussed briefly.

Zernike moments and Legendre moments which are orthogonal moments have also
been introduced in this chapter. Equation to calculate both moments have also been
discussed.
11

CHAPTER 3

ARTIFICIAL NEURAL NETWORK

In this chapter we will introduce the concept of Artificial Neural Network (ANN).
The chapter begins with general discussion on Neural Network before moving on to
the Multilayer Perceptron model network. The Back propagation Method which is
used to train the Multilayer Perceptron is described next.

Term Neural Network is taken from neural systems or nervous systems of living
creatures. It is also sometimes referred to as connectionist models, parallel distributed
processors or neuro-computers. Neural network is basically an information
processing systems and can be thought as a black box device that accept inputs and
produce output.

3.1 BIOLOGICAL NEURAL NETWORK

Central to system of human nervous system is the brain, represented by neural net,
which continually receives information, perceives it, and makes appropriate decision.
Nerve cells in the brain are called neurons, which itself can make contact with
several thousand other neurons. Neurons are the unit which the brain uses to process
information. It is estimated that there are approximately 10 billion neurons in human
cortex. Since neuron is the building block of the brain and is also essential to the
formation of artificial neural network concept, it will be studied in better details here.

3.1.1 BIOLOGICAL NEURON

The four basic components of a biological neuron are

Dendrites - Dendrites are hair like extensions of a neuron, and each dendrite
can bring some input to the neuron (from neurons in the previous layer).
These inputs are given to the soma.

12

Soma - Soma is responsible for processing these inputs, and the output is
provided to other neurons through the axon and synapses.

Axon - The axon is responsible for carrying the output of soma to other
neurons, through the synapses

Synapses - Synapses of one neuron is connected to the dendrites of neurons in


the next layer. The connections between neurons are possible because of
synapses and dendrites.

The dashed line shows the axon hillock, where transmission of signals starts

Figure 3.1: Neuron [29]

The boundary of the neuron is known as the cell membrane. There is a voltage
difference (the membrane potential) between the inside and outside of the membrane.

If the input is large enough, an action potential is then generated. The action potential
(neuronal spike) then travels down the axon, away from the cell body.

13

Figure 3.2 Neuron Spiking [29]


3.1.2 SYNAPSES

The connections between one neuron and another are called synapses. Information
always leaves a neuron via its axon (see Figure 3.1 above), and is then transmitted
across a synapse to the receiving neuron.

3.1.3 NEURON FIRING

Neurons only fire when input is bigger than some threshold. It should, however, be
noted that firing doesn't get bigger as the stimulus increases, its an all or nothing
arrangement.

14

Figure 3.3 Neuron Firing [29]

Spikes (signals) are important, since other neurons receive them. Neurons
communicate with spikes. The information sent is coded by spikes.

3.2 ARTIFICIAL NEURAL NETWORK

An artificial neural network is a system that processes information in a parallel


distributed manner. It consists of a large number of simple processing elements
called neurons. These processing elements are interconnected to each other and
artificial neural network power lies in the tremendous number of interconnections
which correlate to its learning capability.

3.2.1 MODELS OF AN ARTIFICIAL NEURON

The block diagram of figure 3.4 shows the model of a artificial neuron, which form
the basis of designing artificial neural networks. Table 3.1 list three basic elements of
neuron model based on basic design in figure 3.4.
15

Figure 3.4 Model of an artificial neuron.

A set of synapses or connecting links each characterized by a weight or strength of


its own. A signal
synaptic weight

at the input of synapse connected to neuron

is multiplied by

The first subscript refers to the input end of the synapse

which the weight refers.


An adder for summing the input signals, weighted by the respective synapses of
the neuron;
An activation function for limiting the amplitude of the output neuron. Typically,
the normalized amplitude range of the output of a neuron is written in the closed
unit interval [0,1] or alternatively [-1,1].

Table 3.1: Three basic elements of neuron model.

In mathematical terms, neurons may be described as the following pair of


equations:
16

where

are the input signals;

of neuron ;

are the synaptic weights

is the linear combiner output due to the input signals; ;

activation function; and

is the

us the output signal of the neuron.

3.2.2 TYPES OF ACTIVATION FUNCTION

The activation function denoted by

, defines the output of a neuron in terms of

the induced local field . Here we present three basic types of activation function:

1. Threshold Function. For this type of activation function, as described in

Figure 3.5(a):
2. Piecewise Linear Function; For piecewise-linear function b we have :

where the amplification factor inside the linear region of operation is


assumed to be unity.
3. Sigmoid Function; Sigmoid function whose graph is S-shaped is most

common form of activation function used in the construction of artificial


neural networks. Example of sigmoid function is the logistic function, defined
by

where is the slope parameter of the

sigmoid functions. By varying the parameter , sigmoid functions of different


17

slopes is obtained. In the limit, as threshold function approaches infinity, the


sigmoid function becomes simply a threshold function.

Beside ranging from 0 to +1, sometime it may be desirable to have the


activation function

ranged from -1 to +1. In which the activation function

assumes an anti-symmetric form with

respect to the origin.

3.5(a) Threshold Function [29]

3.5(b)Sigmoid Function.

3.2.3 ARCHITECTURE

How neurons are structured is intimately linked with learning algorithm used to train
the network. Therefore, design of neural networks may be defined as structured. The
18

classification of learning network will be considered next. In this section, network


architectures will be given attention.

In general, three fundamentally different classes of network architectures can


be identified:

1. Single-Layer Feed forward Networks.

In a layered neural network the neurons are organized in the


form of layers. In the simples form of a layered network, we have an input
layer of source nodes that project onto an output later of neurons
(computation nodes), but not vice versa. In other words, this network is
strictly a feed forward or acyclic type. The designation single-layer
referring to the output layer of computation nodes (neurons). Input layer is
not counted as no computation is performed.
2. Multilayer Feed forward Networks

This feed forward neural network distinguished itself by the presence


of one or more hidden layers. The nodes in these hidden layers are
subsequently called hidden neurons or hidden units. The function of hidden
neurons is to intervene between external input and the network output in
some useful manner. By adding one or more hidden layers, the network will
be able to extract higher-order statistics. The ability of neurons to extract
higher-order statistics is particularly valuable when the size of the input layer
is large.
The source nodes in the input layer of the network supply respective
elements of the activation pattern (input vector), which constitute the input
signals applied to the neurons in the second layer. The output signals of the
second layer are used as inputs to the third layer, and so on for the rest of the
network. The set of output signals of the neurons in the output layer of the
network constitutes the overall response of the network to the activation
pattern supplied by the source nodes in the input layer.
The network is said to be fully connected when every node in each
layer of the network is connected to every other node in adjacent forward
19

layer. If some of the synaptic connections are missing from the network, the
network is classified as partially connected.

3. Recurrent networks

Recurrent network differ with other type of feed forward neural network in
that it has at least one feedback loop. For example, a recurrent network may
consist of a

single layer of neurons with each neuron feeding its output

signal back to the inputs

of all the other

feedback loops has a profound impact

neurons. The presence of


on the learning capability of the

network and on its performance. Moreover, the feedback loops involve the
use of particular branches composed of unit delay-elements which result in a
nonlinear dynamical behavior.

Figure 3.6: A two layered feed forward network.

20

3.4 OVERVIEW OF LEARNING ALGORITHM

There are a lot of Neural network by now and nobody knows exactly how many.
Below is listed some of the well-known method.

There are two main kinds of learning algorithms which is supervised and
unsupervised [30].

In supervised learning, the correct results are known and are given to the
neural networks during training so that the neural networks can be adjusted to
try match its output to the target values. After training, the neural network is
tested by giving it only input values, not target values, and seeing how close it
comes to outputting the correct target values.

In unsupervised learning, the neural networks are not provided with correct
result during training.

Table below list some well-known neural networks.

Supervised

Unsupervised

Feedforward

Competitive

Linear

Vector Quantization

Hebbian - Hebb (1949),


Fausett (1994)

Perceptron - Rosenblatt

(1976)

Self-Organizing Map

(1958), Minsky and


Papert (1969/1988),

MLP: Multilayer perceptron Bishop (1995), Reed and Marks


(1999), Fausett (1994)

Kohonen - Kohonen
(1995), Fausett (1994)

Fausett (1994)

Grossberg - Grossberg

GTM: - Bishop, Svensn


and Williams (1997)

Adaptive resonance theory

Fuzzy ART - Carpenter,

Backprop - Rumelhart,

Grossberg and Rosen

Hinton, and Williams

(1991b)

21

(1986)

Classification only

DCL: Differential Competitive


Learning - Kosko (1992)

LVQ: Learning Vector


Quantization - Kohonen
(1988), Fausett (1994)

Regression only

GNN: General
Regression Neural
Network - Specht (1991),
Nadaraya (1964), Watson
(1964)

Feedback - Hertz, Krogh, and Palmer

Dimension Reduction - Diamantaras

(1991), Medsker and Jain (2000)

and Kung (1996)

BAM: Bidirectional

Associative Memory -

Hebbian - Hebb (1949), Fausett


(1994)

Kosko (1992), Fausett

Oja - Oja (1989)

(1994)

Sanger - Sanger (1989)

Differential Hebbian - Kosko


(1992)

Competitive

Neocognitron - Fukushima,

Autoassociation

BSB: Brain State in a Box -

Miyake, and Ito (1983),

Anderson et al. (1977), Fausett

Fukushima, (1988), Fausett

(1994)

(1994)

Hopfield - Hopfield (1982),


Fausett (1994)

Table 3.2 Some well-known learning algorithm of neural networks[30].


22

3.4 BACKPROPAGATION ALGORITHM

Back propagation can be operated in two modes, either sequential or batch modes. In
sequential mode of operation, weight updating is performed after the presentation of
each training example. While in the batch mode, weight updating is performed after
the presentation of all the training examples that constitute an epoch. Despite
sequential mode of back-propagation learning has several disadvantages, it is highly
popular ( and used for application of pattern recognition in this thesis) for two
important practical reason[31]:

1. The algorithm is simple to implement.


2. It provides effective solutions to large and difficult problem.

For sequential mode of operation, the algorithm cycle through the training sample
as follows:

1. Initialization. Synaptic weights and thresholds is picked from a uniform

distribution whose mean is zero and whose variance is chosen to make the
standard deviation of the induced local fields of the neurons lie at the
transition between the linear and saturated parts of the sigmoid function.
2. Presentations of Training Examples. The network is presented with an epoch

of training examples. For each example of set, ordered in some fashion,


perform the sequence of forward and backward computation described in
point 3 and 4 below.
3. Forward computation. The induced local fields and function signals of the

network are computed by proceeding forward through the network, layer by


layer. Induced local field for neuron

in layer

is:

Using the sigmoid function, the output signal of neuron

in layer is:
23

If neuron

is the output layer, set

The error signal is then computed:

4. Backward Computation. Compute the local gradients of the network, defined

by:
for neuron in output layer L.

for neuron

in

hidden layer
where the prime denotes differentiation with respect to the argument. The
synaptic

where

weight in layer is adjusted according to the generalized delta rule:

is the learning-rate parameter and

is the momentum constant.

5. Iteration. The forward and backward computations under points 3 and 4 is

iterated by presenting new epochs of training examples to the network until


the stopping criterion is met.

Once, the training phase is completed, application of the network involves only the
computations of the feed forward phase.

24

Figure 3.7 Flow Chart of Training Process

indices

Refer to different neurons in the network; with signals propagating


from left to right, neuron lies in a layer to the right of neurons
and neuron lies in a layer to the right of neuron j when neuron j is
a hidden unit.
Instantaneous sum of error squares or error energy at iteration .
The average of

over all values of .

Error signal at the output of neuron

for iteration

Desired response for neuron and is used to compute


Function signal appearing at the output of neuron at iteration .
The synaptic weight connecting the output neuron to the input of
neuron at iteration .
The correction for synaptic weight connecting the output neuron

25

to the input of neuron at iteration .


Induced local field ( i.e weighted sum of all synaptic inputs plus
bias) at neuron

at iteration .

Activation function describing the input-output functional


relationship of the nonlinearity associated with neuron .
Bias applied to neuron .
The th element of the input vector.
th element of output vector.
Learning rate parameter.
Size of (i.e. number of nodes) in layer if the multilayer
perceptron;

where

is the Depth of the network.

Table 3.3: Notation used in Back propagation algorithm.

3.4.1 DISCUSSION ABOUT BACKPROPAGATION ALGORITHM

Back propagation algorithm is based on steepest descent method to find the


minimum of error i.e local minimum. Ideally, global minimum is sought. Since
higher grounds could probably surround a local minimum, the network usually does
not leave a local minimum by the standard back propagation algorithm.

In order to avoid entering the local minimum, the learning parameters, the number of
hidden nodes, or initial value of connecting weights may be changed. The techniques
basically trying to change scenario involved with moving about the terrain.

26

The learning rate constant is proportional to the change of the connection weights to
the gradient of the error with respect to the weights. The larger the constant, the
larger the changes in the connection weights. Usually a value is selected as high as
possible without leading to oscillation.

Other problem to be addressed is determination of the size of the hidden layers. It is


known from regression analysis that if many kind of variables is chosen the
prediction error becomes small. Though it must be noted that minimizing the squared
error by increasing the amount of neurons does not mean building a good artificial
neural network. The network may learn the data perfectly but the testing stage may
fail since a small variation in input would be considered as belonging to different
class. This is known as over fitting problem which due to the artificial neural network
memorizing the pattern instead of generalizing them.

A few methods discussed here can probably improve the back propagation
algorithms performance.

Introduction of momentum constant


Back propagation algorithm provides an approximation to the trajectory in

weight space computed by the method of steepest descent. The smaller the learning
rate parameter, the smaller the changes to the synaptic weight in the network will be
form one iteration to the next which will lead to smoother trajectory in weight space.
Though this improvement is attained at the cost of slower rate of learning. While for
large learning rate, the network may become unstable ( i.e. oscillary). Simple method
is to modify the delta rule by including a moment term, as shown by Rumelhart et al.
1986a[32].

where is a positive number denoted momentum constant. This equation is called


generalized delta rule.
Introduction of momentum is a minor modification to the weight update but it
can have some beneficial effects on the learning behavior of the algorithm. It may

27

also have the benefit of preventing the learning process from terminating in a shallow
local minimum on the error surface.
Normalize the input
"Normalizing" a vector most often means dividing by a norm of the vector,
for example, to make the Euclidean length of the vector equal to one. In the artificial
neural network literature, "normalizing" also often refers to rescaling by the
minimum and range of the vector, to make all the elements lie between 0 and 1.
Following is equation used to normalize the input value.

Target Value.
It is important for the desired response to be chosen within the range of
sigmoid activation function. Otherwise the backpropagation algorithm tends to drive
the free parameters of the network to infinity which lead to slow down of the learning
process by driving the hidden neurons into saturation.

3.5 CHAPTER SUMMARY

In this chapter, we have discussed general theory of neural networks. Biological


neural network especially neurons have been discussed briefly since it play vital role
in our model of artificial neural network.

Underlying component of artificial neurons, the weight connecting to the neuron,


various type of activation function is also introduce in this chapter. Various
architecture of neural network and also learning algorithm is also mentioned briefly.

Concept of multilayer perceptron and the algorithm have been discussed in details
since it will be implemented as classifier for pattern recognition in later chapter.
Methods to improve the performance of back propagation algorithm have also been
discussed.

28

CHAPTER FOUR

MOMENTS AS FEATURES FOR PATTERN RECOGNITION

In this chapter, the classification power of all the moments discussed in chapter 2,
namely Cartesian moments, Hu moment invariant, Zernike moments and Legendre
moment will be experimentally tested and the results are reported. Noise sensitivity
of all of the features is also examined.

4.1 THE UTILIZED DATA SET

A data sets consist of Roman alphabet with different font used are generated.
It consist of 10 lower case Roman characters from A to J. Ten different set of
fonts ( for a total of 100 images) is used as

greyscale bitmap image. Figure

4.1 shows the 10 different alphabet for the first set of font.

Figure 4.1: Ten data of alphabet for one set of font. (Size scaled to be bigger).

29

4.2 FEATURES EXTRACTED


In this experiment, the number of input features extracted using Hu invariant
moment is seven while the number of Zernike moments extracted is 10, and for
Legendre moments is 12. Figure 4.2 list all the features that is extracted from a
sample of data in the data set.

4.3 NEURAL NETWORK CLASSIFICATION

The multilayer perceptron with back propagation learning will be used as classifiers.
All of the details regarding algorithm of back propagation and multilayer perceptron
have been discussed earlier therefore will not discussed here.

All of the features for each will be feed forwarded into the input of the neural
networks while 10 output will be used, one for each class of alphabet. 50 of the
images in the data set will be used for training (where the value of its output is given)
and another 50 will be used as the training set. For the training phase, each data will
have the class it belong (the output) set to 1, while all others is set to 0.

Six different numbers of hidden units of 0, 5, 10, 15, 20 and 25 are used in
classification experiment. The training error rate is set at 0.001 for all experiments
while the value of the learning rate and momentum is both set at 0.1.

Experiment to test for noise tolerant of each moment is also conducted. All of the
parameters is the same as previous experiment, except this time only one set of
hidden units will be used, 15. The network will be trained with the same 50 training
images as previous experiment, but the 50 test data will be inserted random noise
valued 0.1%, 0.2%,0.3%,0.4% and 0.5%.

30

HMI
Values

0.3742

0.005225

0.0340

0.0011

-0.0034

-0.0011

-1.43E-1

CM
Values 0.485085 0.354145

0.032151

0.02072 0.0843867

0.02568 0.00211

ZM
Values 2.09919 0.262626 0.629776 0.361196 0.366395 0.381796 0.42044

ZM
Values

0.703121

0.42901

0.701442

LM
Values

0.0052

0.00975

-0.0759

-0.0402

-0.18954

0.09376

0.028075

LM
Values

0.13361

0.137221

-0.19553

0.10315

0.0641634

Table 4.1: List for all moments that have been computed from a data in the
dataset.
31

4.4 RESULT AND DISCUSSION

Figure 4.2 shows the result for all of the moment tested with neural network. From
the graph, it is shown that Legendre moment has the most recognition rates for all
value of hidden neurons. Performance of Legendre moments ( also happen to other
set of moments) decreased after 15 hidden neuron probably due to over fitting by the
neural networks due to small set of data used to train the network.

As for performance of the moments with presence of random noise in image,


once again Legendre moments are shown to be superior to other type of moments. It
can be seen from the graph in Figure 4.3 that Legendre moments recognition rates is
the highest for all percentage of noise introduced to the data. This is probably due to
orthogonal nature of Legendre moment, which leads to less fluctuation in value with
presence of noise in the data.

Figure 4.2 Comparison of recognition performance between moments.

32

Figure 4.3 Comparison of performance between different moments under noisy


environment.

4.5 CHAPTER SUMMARY

In this chapter, methodology to test the performance of all moments is presented.


Details of the data set used are given and all of the parameter used for the artificial
neural network is presented.

From the experimental result, Legendre moments are recognized as the best feature
descriptor. This is probably due to orthogonality of the moment. Performances of the
moments in presence of random noises are also tested and once again Legendre
moments is shown superior compared to other type of moments.

33

CHAPTER FIVE

ANALYSIS OF MULTILAYER PERCEPTRON WITH


BACKPROPAGATION LEARNING RULE AS CLASSIFIERS

Since classifier also played an important role in pattern recognition, in this chapter,
analysis will be done on multilayer perceptron with back propagation learning rule
performances when certain parameter of the neural network is modified.

Firstly, effect of normalization of input on recognition learning performance will be


experimented. Then, impact of adding additional hidden neurons layer into the neural
network will also be experimented and the result discussed. Finally, the impact of
learning rate and momentum will also be discussed.

5.1 NORMALIZATION OF INPUT

As discussed earlier in chapter 3, normalization of input involves rescaling all of the


inputs (i.e. features) value from 0 to 1. There is a common misconception that the
inputs to a multilayer perceptron must be in the interval [0, 1]. There is in fact no
such requirement, although there often are benefits to standardizing the inputs.
Normalization of cases should be approached with caution because it discards
information. If that information is irrelevant, then normalization cases can be quite
helpful. If that information is important, then standardizing cases can be disastrous.
Here it will be studied what effect normalization of input will have towards the
recognition abilities of moments.

5.1.1 EXPERIMENT PROCEDURE

All of the setup of the neural network is the same as that have been used in chapter 4
to determine the performance of different moments earlier. The only different here

34

are the inputs. All of the normalize inputs is put through the same method that have
been mentioned earlier and the results is observed.

5.1.2 RESULT AND DISCUSSION

Table 5.2 is the results obtained by normalizing the input while table 5.1 is the results
with regular inputs (without any normalization). It is shown here that normalization
has improved the performances of all moments except Legendre moments.
Performances of Hu Moments Invariants and Cartesian moments have improved by
at least 10 percent for each value of hidden neurons. The normalization of inputs also
seems to make the neural networks less prone to over fitting as the results is more
consistent for all value of hidden neurons.

Figure 5.1: Performance of Moments without Normalization of Inputs

35

Figure 5.2: Performance of Moments with Normalized Inputs

5.2 EFFECT OF LEARNING RATE AND MOMENTUM ON


PERFORMANCE

In back propagation, too low a learning rate makes the network learn very slowly.
Too high a learning rate makes the weights and objective function diverge, so there is
no learning at all. Therefore care must be taken to choose the right value learning
rate.

As have been discussed in chapter 3, introduction of momentum can have some


beneficial effects on the learning behavior of the algorithm. It may also have the
benefit of preventing the learning process from terminating in a shallow local
minimum on the error surface. Here, the effect of both learning rate will be
experimented and the results discussed.

36

5.2.1 EXPERIMENT CONDUCTED

Two set of experiment will be done, one varying the value of learning rate while
setting the momentum value to be constant at 0.1. While for another experiment,
value of momentum is varied while learning rate is set to be constant at 0.1. Same
data set as have been used in previous experiment is used. The hidden neurons are
set at 15 and threshold value at 0.001. A normalized value of Legendre moments is
used as inputs since from earlier experiment; it is shown to be the best features
descriptor.

5.2.2 RESULTS AND DISCUSSION

Results obtain in figure 5.3 shown that the lower value of the learning rate, the higher
the recognitions rate. Though with lower learning rate, comes the problem of large
number of iteration needed to reach the required threshold value. As table 5.1 shows,
758709 number of iterations is needed to reach threshold for learning rate of 0.01
compared to 10793 and 1225 iterations for 1 and 5, respectively. Since pattern
recognition utmost concerns is on the recognition ability, it is recommended that
small value should be chosen as learning rate though consideration must be taken on
the learning rate ( number of iteration needed to reach threshold).

From figure 5.4, it is shown that momentum does help improve the recognition rate
since there is an increase of 6 percent from the result when momentum value of 0.1 is
used from 0. Although, a large value of momentum does make the recognition
performance decrease. As for effect of momentum on learning rate, from table 5.2, it
is shown that higher value of momentum will decrease the number of iterations
required to reach the threshold value.

37

Figure 5.3 Performance of Neural Network with Different Value of Learning


Rate and Constant Momentum

Learning
Rate

0.01

0.1

10

No. of
Iteration

758709

82726

10793

1225

625

Table 5.1 Number of Iteration Needed to Reach Threshold Value with Different
Value of Learning Rate and Constant Momentum

38

Figure 5.4 Performance of Neural Network with Different Value of Momentum


and Constant Learning Rate

Momentum

0.1

10

No. of Iteration

82609

61109

41909

13812

7655

Table 5.2 Number of Iteration Needed to Reach Threshold Value with Different
Value of Momentum and Constant Learning Rate

From the results obtain, it is thus observed that inclusion of momentum does bring
positive effect on performance of back propagation on both learning rate and
recognition performance though care must be taken to choose the right value.

5.3 CHAPTER SUMMARY

In this chapter, effects of normalization of inputs on recognition performance have


been experimented. From the results obtained, it is found that normalization do make
39

all of the moments give better recognition performance except for Legendre
moments.

Effects of learning rate and momentum on performances of back propagation neural


network are also examined. Experiments are conducted looking at value of
recognition percentage and iterations needed to achieve threshold value. Results
shows that lower learning rate do give higher recognition performance though it
comes with much more iterations needed to reached the threshold value. Results
obtained also shows that addition of momentum do increase the recognition
performance and it also make the neural network requires less iterations needed to
achieve threshold. Although higher value of momentum will also lead to lower
recognition rates.

40

CHAPTER SIX

CONCLUSION

This thesis has discussed the use of statistical moments and artificial neural network
in the area of pattern recognition. Mathematical foundation behind Cartesian
moments, Hu moment invariant, Legendre moments and Zernike moments have been
discussed. Discussion on multilayer perceptron with back propagation learning as
classifier has also been done in chapter 3 and suggestion to improve the neural
networks performance have been discussed.

This thesis has successfully implemented a multilayer perceptron with neural


network to classify English alphabet with different type of font. Each moment that
has been discussed is experimented and their classification performance is
determined. Legendre moments are shown having to be the best classifier for both
noiseless and noisy data. Here, the neural network also is found to be affected by
phenomena of over fitting when large value of hidden neurons is used in the neural
network.

In chapter 5, effect of normalization of inputs and different value of learning rate and
momentum on multilayer perceptron with back propagation learning are also
analyzed. Normalization of inputs makes the entire moment classification rate
increased except for Legendre moment. It is also found that normalization of inputs
make the neural network less susceptible to over fitting phenomena. From the results
obtained, it is also concluded that higher classification rate can be achieved by
making the learning rate smaller though a lot more iteration will be needed to achieve
the threshold value. It is also shown that presence of momentum in learning can
improve the classification performance and decrease amount of iteration needed to
achieve threshold value. Higher momentum will make less iteration needed for the
neural network to reach threshold though that will lead to lower recognition rate.

41

6.1 SUGGESTION FOR FUTURE WORK

In this thesis, use of moments and multilayer perceptron neural network certain type
of problem. Though it is shown that it can work, lots of task can still be done to
improve them further. In addition, more work has to be performed to ensure that
these methods can work in other situation as well. Some of task suggested are:

6.1.1 A Larger Dataset

Small data set is usually one of the reasons that lead to over fitting. Therefore a larger
data set probably can make the neural network less prone to over fit.

6.1.2 Hand-written Data Set

The data set used in this thesis is only from word processing fonts. The subject of
hand-written characters could be looked into.

42

APPENDIX A

CODING FOR CALCULATION OF MOMENTS

All coding is done using C++ programming language and IDE used is Visual Studio
2010.

/******** Cartesian Moments ********/


double BMP::ComputeMoment (int p, int q)
{
double mass=0;
for( int y=0; y<Height;++y)
{
for( int x=0; x<Width;++x)
{
mass=mass + IntPow(x,p)*IntPow(y,q)*(DataArrays[y][x]);
}
}
return mass;
}
double BMP::ComputeMomentNormalized ( int p, int q, double xbar, double ybar)
{
double mass=0;
for( int y=0; y<Height;++y)
{
for( int x=0; x<Width;++x)
{
mass=mass + pow(x-xbar,p)*pow(yybar,q)*(DataArrays[y][x]);
}
}
return mass;
}
double BMP::ScalingNormalization( double p, double q,
uzero)
{
double l;
double mass;
l = ((p+q)/2)+1;
mass = upq/(pow(uzero,l));
return mass;
}

double upq, double

/********** Hu Moment Invariant *********/


void BMP::ComputeHuSet()
{
moment = new double[7];

43

//for hu moment computation


double u20,u02,u21,u12,u30,u03,u11,u00,u01;
double xbar,ybar;
double n20,n02,n21,n12,n30,n03,n11;
m00 = (ComputeMoment ( 0,0 ));
m01 = (ComputeMoment ( 0,1 ));
m10 = (ComputeMoment ( 1,0 ));
xbar = m10/m00;
ybar = m01/m00;
//Normalized Moment is computed
u01 = (ComputeMomentNormalized (
u20 = (ComputeMomentNormalized (
u02 = (ComputeMomentNormalized (
u21 = (ComputeMomentNormalized (
u12 = (ComputeMomentNormalized (
u30 = (ComputeMomentNormalized (
u03 = (ComputeMomentNormalized (
u11 = (ComputeMomentNormalized (
u00 = (ComputeMomentNormalized (

0,1,xbar,ybar
2,0,xbar,ybar
0,2,xbar,ybar
2,1,xbar,ybar
1,2,xbar,ybar
3,0,xbar,ybar
0,3,xbar,ybar
1,1,xbar,ybar
0,0,xbar,ybar

));
));
));
));
));
));
));
));
));

// scaling normalization applied to central moment:


n20 = (ScalingNormalization ( 2,0,u20,u00 ));
n02 = (ScalingNormalization ( 0,2,u02,u00 ));
n21 = (ScalingNormalization ( 2,1,u21,u00 ));
n12 = (ScalingNormalization ( 1,2,u12,u00 ));
n30 = (ScalingNormalization ( 3,0,u30,u00 ));
n03 = (ScalingNormalization ( 0,3,u03,u00 ));
n11 = (ScalingNormalization ( 1,1,u11,u00 ));
moment[0] = (n20 + n02);
moment[1] = (pow((n20 - n02),2)+4*pow(n11,2));
moment[2] = (pow((n30-3*n12),2) + pow((3*n21-n03),2));
moment[3] = (pow((n30+n12),2) + pow((n21+n03),2));
moment[4] = ((n30-3*n12)*(n30+n12)*pow((n30+n12),2)-3*(pow((n21+n03),2)
+(3*n21-n03)*(n21+n03)*(3*pow((n30+n12),2)-pow((n21+n03),2))));
moment[5] = ((n20-n02)*pow((n30+n12),2)- pow((n21+n03),2)+
4*n11*(n30+n12)*(n21+n03));
moment[6] = (3*n21-n03)*(n30+n12)*(pow((n30+n12),2)3*(pow((n21+n03),2)))-(n30-3*n12)*(n21+n03)*(3*pow((n30+n12),2)pow((n21+n03),2));
}

/********** Zernike Moment *********/


double BMP::ComputeZernikeMoment(int n, int m)
{
double M_PI = 22/7;
double rho, theta;
cmplx zernike(0,0);
for( int i=0;i<Width;i++)
{
// map y to unit circle
for( int j=0;j<Height;j++)
{

44

// map x, y to unit circle


double y = -1 + (2*(j+0.5))/Height;
double x = -1 + (2*(i+0.5))/Width;
theta = 0.0;
rho = (double)sqrt(x*x+y*y);
if(rho<=1.0)
{
if(x!=0)
theta=atan(y/x);
cmplx pixel = (int)DataArrays[i][j];
zernike+=conj(R(n,m,rho)*polar(1.0,m*theta))*pixel;
}
}
}
cmplx val = (n+1)/M_PI;
zernike=zernike*val;
double result = abs(zernike/(cmplx)Height); //
return result;
}
//Radial polynomial
// p-|q| must be even and |m|<=n
double R(int p, int q, double rho)
{
double val=0.0,j;
for( int s=0;s<=((p-q)/2);s++)
{
j=-1;
double num = pow(j,s)*Factorial(p-s);
double denom = Factorial(s)*Factorial(((p+q)/2)-s)*Factorial(((pq)/2)-s);
val+=(num/denom)*pow(rho,p-2*s);
}
return val;
}

/********** Legendre Moment *********/


double BMP::ComputeLegendreMoment(double p, double q)
{
double xnorm,ynorm;
double mass=0;
double lpq;
lpq=((2*(double)p+1)*(2*(double)q+1))/(Height*Width);
for( int j=0; j<Height;++j)
{
for( int i=0; i<Width;++i)
{
xnorm=((2*((double)i+1))/((double)Width-1))-1;
ynorm=((2*((double)j+1))/((double)Height-1))-1;

45

mass= mass + lpq*ComputeLegendrePolynomial( p, xnorm


)*ComputeLegendrePolynomial( q, ynorm )*(double)DataArrays[j][i];
}
}
return mass;
}
double BMP::ComputeLegendrePolynomial(double p,double x)
{
double mass=0;
double m,n,j,q;
int g;
for(int k=0; k<p+1;k++)
{
g=p-k;
if(g%2)
mass=mass;
else
{
m=p-k;
n=p+k;
j=-1;
q=2;
mass = mass +
pow(j,m/2)*(1/pow(q,p))*((Factorial(n)*pow(x,k))/(Factorial(m/2)*Factorial((n)
/2)*Factorial(k)));
}
}
return mass;
}

46

APPENDIX B

CODING FOR MULTILAYER PERCEPTRON WITH BACK


PROPAGATION LEARNING

Code used under license with license from The GNU General Public License
(GPLv3). Modified from original at
http://www.codeproject.com/KB/recipes/BP.aspx

Header file:
//////////////////////////////////////////////
//
Fully connected multilayered feed
//
//
forward
artificial neural network using
//
Backpropogation
algorithm for training.
//////////////////////////////////////////////

//
//

#ifndef backprop_h
#define backprop_h
using namespace std;
#include
#include
#include
#include

<assert.h>
<iostream>
<stdio.h>
<math.h>

//#include <stdio.h>
#include <tchar.h>
#include <string>
//#include <iostream>
#include <cmath>
#include <cctype>
#include <cstring>
#include <fstream>
#include <cstdlib>
#include <iomanip>
class CBackProp{
//

output of each neuron


double **out;

//

delta error value for each neuron


double **delta;

//

vector of weights for each neuron


double ***weight;

47

//
//

no of layers in net
including input layer
int numl;

//
//

vector of numl elements for size


of each layer
int *lsize;

//

learning rate
double beta;

//

momentum parameter
double alpha;

//
//

storage for weight-change made


in previous epoch
double ***prevDwt;

//

squashing function
double sigmoid(double in);

public:
~CBackProp();
//

initializes and allocates memory


CBackProp(int nl,int *sz,double b,double a);

//

backpropogates error for one set of input


void bpgt(double *in,double *tgt);

//

feed forwards activations for one set of inputs


void ffwd(double *in);

//

returns mean square error of the net


double mse(double *tgt) const;

//

returns i'th output of the net


double Out(int i) const;

};
#endif

CPP file:
#include
#include
#include
#include

"BackProp.h"
<time.h>
<stdlib.h>
"stdafx.h";

//
initializes and allocates memory on heap
CBackProp::CBackProp(int nl,int *sz,double b,double a):beta(b),alpha(a)
{
//
set no of layers and their sizes
numl=nl;
lsize=new int[numl];

48

for(int i=0;i<numl;i++){
lsize[i]=sz[i];
}
//
allocate memory for output of each neuron
out = new double*[numl];
for( int i=0;i<numl;i++){
out[i]=new double[lsize[i]];
}
//
allocate memory for delta
delta = new double*[numl];
for(int i=1;i<numl;i++){
delta[i]=new double[lsize[i]];
}
//
allocate memory for weights
weight = new double**[numl];
for(int i=1;i<numl;i++){
weight[i]=new double*[lsize[i]];
}
for(int i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
weight[i][j]=new double[lsize[i-1]+1];
}
}
//
allocate memory for previous weights
prevDwt = new double**[numl];
for(int i=1;i<numl;i++){
prevDwt[i]=new double*[lsize[i]];
}
for(int i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
prevDwt[i][j]=new double[lsize[i-1]+1];
}
}
//
seed and assign random weights
srand((unsigned)(time(NULL)));
for(int i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
for(int k=0;k<lsize[i-1]+1;k++)
weight[i][j][k]=(double)(rand())/(RAND_MAX/2) 1;//32767
//
initialize previous weights to 0 for first iteration
for(int i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
for(int k=0;k<lsize[i-1]+1;k++)
prevDwt[i][j][k]=(double)0.0;
}

49

CBackProp::~CBackProp()
{
//
free out
for(int i=0;i<numl;i++)
delete[] out[i];
delete[] out;
//
free delta
for(int i=1;i<numl;i++)
delete[] delta[i];
delete[] delta;
//
free weight
for(int i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
delete[] weight[i][j];
for(int i=1;i<numl;i++)
delete[] weight[i];
delete[] weight;
//
free prevDwt
for(int i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
delete[] prevDwt[i][j];
for(int i=1;i<numl;i++)
delete[] prevDwt[i];
delete[] prevDwt;
//
free layer info
delete[] lsize;
}
//
sigmoid function
double CBackProp::sigmoid(double in)
{
return (double)(1/(1+exp(-in)));
}
//
mean square error
double CBackProp::mse(double *tgt) const
{
double mse=0;
for(int i=0;i<lsize[numl-1];i++){
mse+=(tgt[i]-out[numl-1][i])*(tgt[i]-out[numl-1][i]);
}
return mse/2;
}
//
returns i'th output of the net
double CBackProp::Out(int i) const
{
return out[numl-1][i];
}
// feed forward one set of input
void CBackProp::ffwd(double *in)

50

{
double sum;
//
assign content to input layer
for(int i=0;i<lsize[0];i++)
out[0][i]=in[i]; // output_from_neuron(i,j) Jth neuron in Ith
Layer
//
assign output(activation) value
//
to each neuron usng sigmoid func
for(int i=1;i<numl;i++){
// For each layer
for(int j=0;j<lsize[i];j++){
// For each neuron in
current layer
sum=0.0;
for(int k=0;k<lsize[i-1];k++){
// For input
from each neuron in preceeding layer
sum+= out[i-1][k]*weight[i][j][k]; // Apply weight
to inputs and add to sum
}
sum+=weight[i][j][lsize[i-1]];
// Apply bias
out[i][j]=sigmoid(sum);
// Apply
sigmoid function
}
}
}
//
backpropogate errors from output
//
layer uptill the first hidden layer
void CBackProp::bpgt(double *in,double *tgt)
{
double sum;
//
update output values for each neuron
ffwd(in);
//
find delta for output layer
for(int i=0;i<lsize[numl-1];i++){
delta[numl-1][i]=out[numl-1][i]*
(1-out[numl-1][i])*(tgt[i]-out[numl-1][i]);
}
//
find delta for hidden layers
for(int i=numl-2;i>0;i--){
for(int j=0;j<lsize[i];j++){
sum=0.0;
for(int k=0;k<lsize[i+1];k++){
sum+=delta[i+1][k]*weight[i+1][k][j];
}
delta[i][j]=out[i][j]*(1-out[i][j])*sum;
}
}
//
apply momentum ( does nothing if alpha=0 )
for(int i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
for(int k=0;k<lsize[i-1];k++){
weight[i][j][k]+=alpha*prevDwt[i][j][k];
}

51

weight[i][j][lsize[i-1]]+=alpha*prevDwt[i][j][lsize[i-1]];
}
}
//
adjust weights usng steepest descent
for(int i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
for(int k=0;k<lsize[i-1];k++){
prevDwt[i][j][k]=beta*delta[i][j]*out[i-1][k];
weight[i][j][k]+=prevDwt[i][j][k];
}
prevDwt[i][j][lsize[i-1]]=beta*delta[i][j];
weight[i][j][lsize[i-1]]+=prevDwt[i][j][lsize[i-1]];
}
}
}

52

REFERENCES
1. Abu Mostafa Y. S., Recognition aspects of moment invariants, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 6, No. 6, 1984,
pp. 698-706.
2 . Alt F.L., Digital pattern recognition by moments, Journal of the Assn. For
Computing Machinery, Vol. 9., No. 2, 1962, pp. 240-258.
3. Bailey R.R. and Srinath M, Orthogonal moment features for use with parametric
and non-parametric classifiers, IEEE Transaction on Pattern Anallysis and
Machine Intelligence, Vol. 18, No. 4, 1996, pp. 389-399.
4. Belkasim S. O., Shridhar, M. and Ahmadi, M. Pattern recognition with moment
invariants: A comparative study and new results, Vol. 24, 1991, pp. 1117-1138.
5. Carpenter G.A., Grossberg S., Markuzon N., Reynolds J.H., and Rosen D.B.,
Fuzzy ARTMAP: A neural network architecture for incremental supervised
learning of analog multidemnsional maps, IEEE Transactions on Neural
Networks, Vol. 3, No. 5, 1992m pp. 698-713.
6. Dirilten H., Pattern matching under affine transformations, IEEE Trans. on
Computers, Vol. 26, No. 3, 1977, pp. 314-317.
7. Dudani A. Breeding K.J. and McGhee R. B., Aircraft identification by moment
invariant, IEEE Trans. on Computers, Vol. C-26, pp. 39-45, 1977.
8 . Elliot E. B., Algebra of Quantica, Oxford University Press, New York, 2nd
edition, 1913.
9. F. Zernike. Beugungstheorie des Schneidenverfahrens und seiner verbesserten
Form, der Phasenkontrastmethode (Diffraction theory of the cut procedure and its
improved form, the phase contrast method). Physica, 1:pp. 689-704, 1934.
10. Gruber M. and Hsu K.Y., Moment based image normalization with high noise
tolerance, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.
19, No.2, 1997, pp. 136-138.
11 . Hu M. K. , Visual pattern recognition by moment invariants, IRE Transactions
on Information Theory, Vol. 8, No. 1, 1962, pp. 179-187.
12. Khotanzad A., Rotation invariant pattern recognition using Zernike moments,
International Conference on Pattern Recognition, 1988, pp. 326-328.
53

13. Khotanzad A. and Lu J. H. , Rotation invariant image recognition using features


selected via a systematic method, Pattern Recognition, Vol. 23, No. 10, 1990, pp.
1089-1101.
14. Liao S. X. and Pawlak M., On image analysis by moments, IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vol. 16, No. 3, 1996, pp. 254-266.
15. M. R. Teague. Image analysis via the general theory of moments. Journal of the
Optical Society of America, 70(8):pp. 920-930, 1979.
16. Pawlak M., On the reconstruction aspects of moment descriptors, IEEE
Transactions on Information Theory, Vol. 38, Bo. 6, 1982, no. 1695-1708.
17. Reddi S. S., Radial and angular moment invariants for image identification,
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 3, No. 2,
1981, pp. 240-242.
18. Reiss T. H., The revised fundamental theorem of moment invariants. IEEE
Transactions of Pattern Analysis and Machine Intelligence, Vol. 13, No. 8, 1991,
pp. 830-834.
19. Sadjadi T. J., Three dimensional moments invariants, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 2, No. 2, 1980, 127-136.
20. Smith F. W. and Wright M.H., Automatic ship photo interpretation by method
of moments, IEEE Transactions on Computers, Vol. 20, No. 9, 1971, pp. 10891095.
21. Teague M.R., Image Analysis via the general theory of moments, Journal of
the Optical Society of America, Vol. 70, No. 8, 1990, pp. 920=-930.
22. Teh. C. H. and Chin R. T., On image analysis by the method of moments,
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10.
23. Wond R. Y. and Hall E. L., Scene matching with moment invariants,
Computer Vision Graphics and Image Processing, Vol. 8, No.1, 1978, pp. 16-24.
24. Chen C. H. , Statistical Pattern Recognition, Hayden, Washington, D.C.
25. J. Wood. ,Invariant pattern recognition: A review. ,Pattern Recognition,
29(1):pp. 1-17, 1996.
26. B. C. Li. Applications of moment invariants to neurocomputing for pattern
recognition. PhD Dissertation, The Pennsylvania State University, 1990.
28. L. L. Yudell. ,Mathematical functions and approximations, chapter 11, page 432.
54

Academics Press Inc, 1975


29. Sacha B. (17 Nov 2006) AI : Neural Network for beginners (Part 1 of 3)
,Retrieved from http://www.codeproject.com/KB/recipes/NeuralNetwork_1.aspx
30. Sarle, W.S., ed. (1997), Neural Network FAQ, part 1 of 7: Introduction, periodic
posting to the Usenet newsgroup comp.ai.neural-nets,
URL:ftp://ftp.sas.com/pub/neural/FAQ.html
31. Haykin S., Neural Network: A Comprehensive Foundation, Prentice Hall, 1999.
32. Rumelhart, D.E., G.E. Hinton and R.J. Williams, 1986a.Learning representation
of back-propagation errors, Nature (London), vol. 323m pp. 533-536.
33. LeCun, Y., 1993. Efficient Learning and Second-order Methods, A Tutorial at
NIPS 93, Denver.

55

S-ar putea să vă placă și