Documente Academic
Documente Profesional
Documente Cultură
Neural Networks
W hat is a N eural Network?
The human brain is a highly complex, nonlinear and parallel computer. It has the
capability to organize its structural constituents, known as neurons, so as to perform
certain computations many times faster than the fastest digital computer in existence
today.
A neural network is massively parallel distributed processor made up of simple
processing units, which has a natural propensity for storing experiential knowledge and
making it available for use.
It resembles the brain in two respects:
1. Knowledge is acquired by the network from its environment through process.
2. Interneuron connection strengths, known as synaptic weights, are used to store
the acquired knowledge.
The procedure used to perform the learning process is called a learning algorithm.
Its function is to modify the synaptic weights of the network to attain a desired design
objective.
Neural networks are also referred to as neurocomputers, connectionist networks,
parallel distributed processors.
http://rajakishor.co.cc Page 2
3. Adaptivity
Adapting the synaptic weights to change in the surrounding environments.
4. Evidential response
5. Contextual information
6. Fault tolerance
7. VLSI implementability
8. Uniformity of analysis and design
9. Neurobiological analogy
H uman B rain
The human nervous system may be viewed as a three-stage system.
Central to the nervous system is the brain. It is represented by the neural net. The
brain continually receives the information, perceives it, and makes appropriate decisions.
The arrows pointing from left to right indicate the forward transmission of information –
bearing signals through the system. The arrows pointing from right to left signify the
presence of feedback in the system.
The receptors convert stimuli from the human body or the external environment
into electrical impulses that convey information to the neural net (the brain). The effectors
convert electrical impulses generated by the neural net into discernible responsible as
system outputs.
Typically, neurons are five to six orders of magnitude slower than silicon gates.
Events in the silicon chip happen in the 10 -9s – range, whereas neural events happen in
the 10-3s – range.
It is estimated that there are approximately 10 billion neurons and 60 trillion
synapses or connections in the human brain.
Synapses are elementary structural and functional units that mediate the
interactions between neurons. The most common kind of synapse is a chemical synapse.
A chemical synapse operates as follows. A pre-synaptic process liberates a
transmitter substance that diffuses across the synaptic junction between neurons and then
http://rajakishor.co.cc Page 3
acts on a post-synaptic process. Thus, a synapse converts a pre-synaptic electric signal into
a chemical signal and then back into a post synaptic electrical signal.
Structural organization of levels in the brain
The synapses represent the most fundamental level, depending on molecules and
ions for their action.
A neural microcircuit refers to an assembly of synapses organized into patterns of
connectivity to produce a functional operation of interest.
The neural microcircuits are grouped to form dendritic subunits within the
dendritic trees of individual neurons.
The whole neuron is about 100m in size. It contains several dendritic subunits.
The local circuits are made up of neurons with similar or different properties. Each
circuit is about 1mm in size. The neural assemblies perform operations on characteristics
of a localized region in the brain.
http://rajakishor.co.cc Page 4
The interregional circuits are made up of pathways, columns and topographic
maps, which involve multiple regions located in different parts of the brain.
Topographic maps are organized to respond to incoming sensory information.
The central nervous system is the final level of complexity where the topographic
maps and other interregional circuits mediate specific types of behavior.
M odels of a N euron
A neuron is an information-processing unit that is fundamental to the operation of a
neural network. Its model can be shown in the following block diagram.
http://rajakishor.co.cc Page 5
A neuron k may be mathematically described as follows:
where x1, x2, …, xm are the input signals; W k1, Wk2, …, Wkm are the synaptic weights of the
neuron k; uk is the linear combiner output due to the input signal; b k is the bias; vk is the
induced local field; (.) is the activation function and yk is the output signal of the neuron k.
The use of bias bk has the effect of applying an affine transformation to the output u k
of the linear combiner.
So, we can have
vk = uk + bk ---- (2)
Now, the equation (1) will be written as follows:
Due to this affine transformation, the graph of vk versus uk no longer passes through
the origin.
vk is called the induced local field or activation potential of neuron k. In vk we have
added a synapse. Its input is x0 = +1 and weight is Wk0 = bk.
http://rajakishor.co.cc Page 6
T ypes of A ctivation F unction
1. Threshold function
2. Piecewise-linear function
3. Sigmoid function
The activation function defines the output of a neuron in terms of the induced local
field vk.
1. Threshold function
The function is defined as
1, if v 0
(v )
0, if v 0
1, if vk 0
yk
0, if vk 0
where
m
vk Wkj x j bk
j 1
This model is also called the McCullouch-Pitts model. In this model, the output of a
neuron is 1, if the induced local field of that neuron is nonnegative, and 0 otherwise. This
statement describes the all-or-none property of the model.
http://rajakishor.co.cc Page 7
2. Piecewise-linear function
The activation function, here, is defined as
1
1, v
2
1 1
( v ) v, v
2 2
1
0, v
2
where the amplification factor inside the linear region of operation is assumed to be unit.
Two situations can be observed for this function:
A linear combiner arises if the linear region of operation is maintained without
running into situation.
The piecewise-linear function reduces to a threshold function if the amplification
factor of the linear region is made infinitely large.
3. Sigmoid function
This is the most common form of activation function used in the construction of
artificial neural networks. It is defined as a strictly increasing function that exhibits a
graceful balance between linear and nonlinear behavior.
An example of sigmoid function is the logistic function, which is defined as
1
(v )
1 e av
http://rajakishor.co.cc Page 8
N eural N etworks and D irected G raphs
The neural network can be represented through a signal-flow graph. A signal-flow
graph is a network of directed links (branches) that are interconnected at certain points
called nodes. A typical node j has an associated node signal x j. A typical directed link
originates at node j and terminates on node k; it has an associated transfer function or
transmittance that specifies manner in which the signal y k at node k depends on the signal
xj at node j.
The flow of signals in the various parts of the graph is directed by three basic rules.
Rule-1:
A signal flows along a link only in the direction defined by the arrow on the link.
There are two types of links:
Synaptic links: whose behavior is governed by a linear input-output relation.
Here, we have yk = Wkjxj.
For example,
Rule-2:
A node signal equals the algebraic sum of all signals entering the pertinent node via
the incoming links. This also called the synaptic convergence or fan-in.
For example,
http://rajakishor.co.cc Page 9
Rule-3:
The signal at a node is transmitted to each outgoing link originating from that node.
For example,
http://rajakishor.co.cc Page 10
For example,
The above network is a feedforward or acyclic type. This is also called a single-layer
network. The single layer refers to the output layer as computations take place only at the
output nodes.
http://rajakishor.co.cc Page 11
The source nodes in the input layer supply respective elements of the
activation pattern (input vector), which constitutes the input signals applied
to the second layer.
The output signals of the second layer are used as inputs to the inputs to the
third layer, and so on for the rest of the network.
The set of output signals of the neurons in the output layer constitutes the
overall response of the network to the activation pattern supplied by the
source nodes in the input layer.
The above is a recurrent network with no hidden neurons. The presence of feedback
loops has an impact on the learning capability of the network and on its performance.
Moreover, the feedback loops involve the use of unit-delay elements (denoted by Z-1),
which result in a nonlinear dynamical behavior of the network.
http://rajakishor.co.cc Page 12
K nowledge R epresentation
Knowledge refers to stored information or models used by a person or machine to
interpret, predict, and appropriately respond to the outside world.
Knowledge representation involves the following:
1. Indentifying the information that is to be processed
2. Physically encoding the information for subsequent use
Knowledge representation is goal directed. In real-world applications of “intelligent”
machines, a good solution depends on a good representation of knowledge.
A major task for a neural network is to provide a model for a real-time environment
into which it is embedded. Knowledge of the world consists of two kinds of information:
1. Prior information: It gives the known state of the world. It is represented by facts
about what is and what has been known.
2. Observations: These are the measures of the world. These are obtained by the
sensors that probe the environment where the neural network operates.
The set of input-output pairs, with each pair consisting of an input signal and the
corresponding desired response is called a set of training data or training sample.
Ex: Handwritten digital recognition.
The training sample consists of a large variety of handwritten digits that are
representative of a real-time situation. Given such a set of examples, the design of a neural
network may proceed as follows:
Step-1: Select an appropriate architecture for the NN, with an input layer consisting of
source nodes equal in number to the pixels of an input image, and an output layer
consisting of 10 neurons (one for each digit). A subset of examples is then used to
train the network by means of a suitable algorithm. This phase is the learning
phase.
Step-2: The recognition performance of the trained network is tested with data not seen
before. Here, an input image is presented to the network and not its corresponding
digit. The NN now compares the input image with the stored image of digits and
then produces the required output digit. This phase is called the generalization.
Note: The training data for a NN may consist of both positive and negative examples.
http://rajakishor.co.cc Page 13
Example: A simple neuronal model for recognizing handwritten digits.
Consider an input set X of key patterns X1, X2, X3, ……
Each key pattern represents a specific handwritten digit.
The network has k neurons.
Let W = {w1j(i), w2j(i), w3j(i), ……}, for j= 1,2,3, …., k be the set of weights of X1, X2,
X3, ….. with respect to each of k neurons in the network. i referrers to an instance.
Let y(j) be the generated output of neuron j for j=1,2,…k.
Let d(j) be the desired output of neuron j, for j=1,2,…..k.
Let e(j)= d(j) – y(j) be the error that is calculated at neuron j, for j = 1,2,…,k.
Now we design the neuronal model for the system as follows.
In the above model, each neuron computes a specific digit j. With every key pattern,
synapses are established to every neuron in the model. We assumed that the weights of
each key pattern can be either 0 or 1.
http://rajakishor.co.cc Page 14
Ex: Let the key pattern x1 corresponds a hand written digit 1. So its synaptic weight
W11(i) should be 1 for the 1st neuron and all other synaptic weights for x1 is must be 0.
Weight matrix for the above model can be as follows.
http://rajakishor.co.cc Page 15
The two inputs Xi and Xj are said to be similar if d(Xi, Xj) is minimum.
http://rajakishor.co.cc Page 16
B L asic earning L aws
A neural network learns about its environment through an interactive process of
adjustments applied to its synaptic weights and bias levels. The network becomes more
knowledgeable after each iteration of the learning process.
Learning is a process by which the free parameters of a neural network are adapted
through a process of stimulation by the environment in which the network is embedded.
The operation of a neural network is governed by neuronal dynamics. Neuronal
dynamics consists of two parts: one corresponding to the dynamics of the activation state
and the other corresponding to the dynamics of the synaptic weights.
The Short Term Memory (STM) in neural networks is modeled by the activation
state of the network. The Long Term Memory (LTM) corresponds to the encoded pattern
information in the synaptic weights due to learning.
Learning laws are merely implementation models of synaptic dynamics. Typically, a
model of synaptic dynamics is described in terms of expressions for the first derivative of
the weights. They are called learning equations.
Learning laws describe the weight vector for the ith processing unit at time instant
(t+1) in terms of the weight vector at time instant (t) as follows:
Wi(t+1) = Wi(t) + Wi(t)
where Wi(t) is the change in the weight vector.
There are different methods for implementing the learning feature of a neural
network, leading to several learning laws. Some basic learning laws are discussed below.
All these learning laws use only local information for adjusting the weight of the connection
between two units.
Hebb’s Laws
Here the change in the weight vector is given by
Wi(t) = f(WiTa)a
Therefore, the jth component of Wi is given by
wij = f(WiTa)aj
= siaj, for j = 1, 2, …, M.
where si is the output signal of the ith unit. a is the input vector.
The Hebb’s law states that the weight increment is proportional to the product of
the input data and the resulting output signal of the unit. This law requires weight
initialization to small random values around wij = 0 prior to learning. This law represents
an unsupervised learning.
http://rajakishor.co.cc Page 17
Perceptron Learning Law
The perceptron law is applicable only for bipolar output functions f(.). This is also
called discrete perceptron learning law. The expression for wij shows that the weights are
adjusted only if the actual output si is incorrect, since the term in the square brackets is
zero for the correct output.
This is a supervised learning law, as the law requires a desired output for each
input. In implementation, the weights can be initialized to any random initial values, as
they are not critical. The weights converge to the final values eventually by repeated use of
the input-output pattern pairs, provided the pattern pairs are representable by the system.
http://rajakishor.co.cc Page 18
Widrow and Hoff LMS Learning Law
http://rajakishor.co.cc Page 19
Instar (Winner-take-all) Learning Law
All the inputs are connected to each of the units in the output layer in a feedforward
manner. For a given input vector a, the output from each unit i is computed using the
weighted sum wiTa. The unit k that gives maximum output is identified. That is
WkT max(Wi T a )
i
Then the weight vector leading to the kth unit is adjusted as follows:
Wk = (a - Wk)
Therefore,
wkj = (aj - wkj), for j = 1, 2, …, M.
The final weight vector tends to represent a group of input vectors within a small
neighbourhood. This is a case of unsupervised learning. In implementation, the values of
the weight vectors are initialized to random values prior to learning, and the vector lengths
are normalized during learning.
The outstar learning law is also related to a group of units arranged in a layer as
shown below.
http://rajakishor.co.cc Page 20
In this law the weights are adjusted so as to capture the desired output pattern
characteristics. The adjustment of the weights is given by
Wjk = (dj - wjk), for j = 1, 2, …, M
where the kth unit is the only active unit in the input layer. The vector d = (d1, d2, …, dM)T is
the desired response from the layer of M units.
The outstar learning is a supervised learning law, and it is used with a network of
instars to capture the characteristics of the input and output patterns for data compression.
In implementation, the weight vectors are initialized to zero prior to learning.
P attern R ecognition
Data refers to the collection of raw facts, whereas, the pattern refers to an observed
sequence of facts.
The main difference between human and machine intelligence comes from the fact
that humans perceive everything as a pattern, whereas for a machine everything is data.
Even in routine data consisting of integer numbers (like telephone numbers, bank account
numbers, car numbers) humans tend to perceive a pattern. If there is no pattern, then it is
very difficult for a human being to remember and reproduce the data later.
Thus storage and recall operations in human beings and machines are performed by
different mechanisms. The pattern nature in storage and recall automatically gives
robustness and fault tolerance for the human system.
Pattern recognition tasks
Pattern recognition is the process of identifying a specified sequence that is hidden
in a large amount of data.
Following are the pattern recognition tasks.
1. Pattern association
2. Pattern classification
3. Pattern mapping
4. Pattern grouping
5. Feature mapping
6. Pattern variability
7. Temporal patterns
8. Stability-plasticity dilemma
http://rajakishor.co.cc Page 21
Basic ANN Models for Pattern Recognition Problems
1. Feedforward ANN
Pattern association
Pattern classification
Pattern mapping/classification
2. Feedback ANN
Autoassociation
Pattern storage (LTM)
Pattern environment storage (LTM)
3. Feedforward and Feedback (Competitive Learning) ANN
Pattern storage (STM)
Pattern clustering
Feature mapping
In any pattern recognition task we have a set of input patterns and the
corresponding output patterns. Depending on the nature of the output patterns and the
nature of the task environment, the problem could be identified as one of association or
classification or mapping.
The given set of input-output pattern pairs form only a few samples of an unknown
system. From these samples the pattern recognition model should capture the
characteristics of the system.
Without looking into the details of the system, let us assume that the input-output
patterns are available or given to us. Without loss of generality, let us also assume that the
patterns could be represented as vectors in multidimensional spaces.
http://rajakishor.co.cc Page 22