Neural Networks

Lecture 1
Lecturer: (M.Sc.) Miss. Bushra K.U.
Introduction to Neural Networks
References:
1- Lippman R. P.,"An Introduction to computing with Neural Nets",
IEEE ASSP Magazine, April, 1987.
2- Zurada J. M.,"Introduction to Artificial Neural Systems", Jaico
publishing House, 1996
3- Pham D. T. and Xing L., Neural Networks for Identification,
Prediction and Control, Spring-Verlag London Limited, 1997.
4- Ruan D.,"Intelligent Hybrid Systems: Fuzzy Logic, Neural
Networks and Gentics Algorithms" Kluwer Academic publishers,
1997.
5-
Stergiou C. and Siganos D., Neural Networks,

http:/www.WINDOWS.000\Desktop\NN\Neural Networks.html
2005
Lecture 1
Intelligent Systems
Artificial Nueral Network
Fuzzy system
Genetic Algorithms
1- Introduction:
Intelligent control is the discipline in which control algorithms are
developed by emulating certain characteristics of intelligent biological
systems for example:1) Neural Network (N.N): tried to emulate the low-level biological
functions of the brain to solve difficult control problems (after training)
2) Fuzzy System: can be designed to emulate the human deductive
process; that is, the process people use infers conclusions from what they
know. They use collection of rules called knowledge bases or rule bases
that hold a set of If- Then rules that quantify the expert's knowledge about
solving a particular problem.
3) Genetic Algorithms: here the goal is to embody the principles of
evaluation, natural selection and genetics from natural biological systems
in a computer algorithm.
Intelligent control is now becoming a common tool in many engineering and
industrial applications. Intelligent control must have the following features
1) learning ability and adaptability
2) robustness
3) Simple control algorithm.
Lecture 1
2- Neural Networks:
2.1 What is a Neural Networks?
Artificial neural networks are computational models (mathematic models) of
the human brain. It are composed of a large number of highly interconnected
processing elements (neurons) working in unison to solve specific problems.
Neuron is a computational element that defines the characteristics of
input/output relationships.
2.2 Historical background
The most basic element of the human brain is a specific type of cell,
which provides us with the abilities to remember, think, and apply previous
experiences to our every action. These cells are known as neurons, each of
these neurons can connect with up 200000 other neurons. The power of the
brain comes from the numbers of these basic components and the multiple
connections between them. In 1909, Cajef and Purkinje, shows a model to
nerves cell. Where neuron consists of four regions
1- Cell body (Soma): provides the support functions and structure of the cell
2- Axon: branching fiber which carries signals away from the neurons
3- Dendrites: consists of more branching fibers which receive signals.
4- Synapses: the electrochemical contact between neurons.
Fig.(1) Schematic diagram of biological neuron

3
Lecture 1
2.3 Artificial Neural Networks (ANN)
X1
Input
W1
X2
W2
Xn
Wn
Activation
F(.)
Output
X1
X2
X3
XR
Fig. (2) Artificial (simulated) neuron

n
a = f ( WiXi + bi)
i =1
In 1943, McCullochi and Pits proposed a mathematical model of

neurons which shown in Fig. (2), in the figure ( f ) are linear and a nonlinear
function which may be one of the common types
Lecture 1
Types of activation function
a-
Hard limiter (On-Off)

1 n0
(bipolar)
f (n) =
- 1 n < 0
1 n 0
f (n) =
0 n < 0
(unipolar)
b- Linear (Ramp activation function)

f ( n) = n (bipolar)
n0
(unipolar)
0 n < 0
n
f (n) =
c- Threshold logic unit (TLU) saturation
f ( n) =
a
kn
n>b
b < n < b (bipolar)
n < b
-b
-a
n>b
a
f ( n) =
kn 0 < n < b
(unipolar)
Lecture 1
d- Sigmoid activation function

Modern NN use the sigmoid nonlinearity which is also known as logistic,
semilinear or sequashing function
f ( n) =
1
(uniplor)
1 + e n
where
n = 0, f ( n) = 0.5
n = +, f (n) 1
n = , f (n) 0
f ( n) =
2
1 (bipolar)
1 + e n
if n = 0, f (n) = 0,
2
1
1+1
2
1=1
1+ 0
2
n = ,
1 = 1
1+
n = +,
e- Hyperbolic tangent Function: is similar to sigmoid in shap but symmetric

about the origin
e n e n
tanh(n) = n
e + e n
n = 0, f (n) = 0
where n = , f (n) = 1
n = , f ( n) = 1
Lecture 1
Ex1: Find y for the following neuron if X1=0.5, X2=1, X3=-0.7,

W1=0, W2=-0.3, W3=0.6
X1
X2
X3
w1
w2
w3
Net = X1*W1+X2*W2+X3*W3
Net = (0.5*0) + (1*-0.3) + (-0.7*0.6) = -0.72
1) if f is linear
y=-0.72
2) if f is hard limiter (on-off)
y=-1
3) if f is sigmoid
y=
1
1 + e ( 0.72 )
= 0.32
4) if f is tansh
y=
e 0.72 e 0.72
= 0.6169
e 0.72 + e 0.72
5) if f is TLU with b=0.6, a=3

y = -3
Ex2: Find y for the following neuron if X1=0.5, X2=1, X3=-0.7,

W1=0, W2=-0.3, W3=0.6, b=1
X1
Net = X1*W1+X2*W2+X3*W3+1*b
Net = (0.5*0) + (1*-0.3) + (-0.7*0.6)+(1*1)
X2
X3
= 0.28
w1
w2
w3
1) if f is linear
y=0.28
1
2) if f is hard limiter
7
Lecture 1
y=1
3) if f is sigmoid
y=
1
= 0.569
1 + e 0.28
4) if f is tansh
y=
e 0.28 e 0.28
= 0.272
e 0.28 + e 0.28
5) if f is TLU
y = 0.28
1.3 Why use neural networks?

NNs were developed to realize simplified mathematical models of
brain-like system.
Neural models are simple and the model computation is fast.
NNs take a different approachs to problem solving.
NNs have the ability to solve difficult and complicated problems.
NNs have the ability to model multi-dimensional nonlinear
relationships.
The error between training data and outputs is feedback to the NN
to guide the internal weight update of the network.
Lecture 2
Lecturer:(M.Sc.) Miss. Bushra K.U.
Types of Neural Networks
2.4 types of Neural Network (NNs):
NNs
Learning algorithms
Structures
Feedforword
networks
Examples
*The multi-layer
perceptron (MLP)
{Rumelhart and
McClelland, 1986}.
*The learning vector
quantization (LVQ)
network
{Kohonen, 1989}.
*The cerebellar model
articulation control
(CMAC)
{Albus, 1975}.
*The group-method of
data handling (GMDH)
network
{Hecht-Nielsen, 1990}.
Recurrent
networks
Supervised
learning
Examples
*The Hopfield
network
{Hopfield,
1982}.
*The Elman
network
{Elman, 1990}.
*The Jordan
network
{Jordan, 1986}.
Examples
*The delta rule
{Widrow and Hoff,

1960}.
*The
backpropagation
algorithm
{Rumelhart and
McClelland, 1986 }.
*The LVQ
algorithm
{Kohonen, 1989}.
An example of a simple feedforward network
Unsupervised
learning
Examples
*The Kohonen
learning
algorithms
{Widrow and Hoff,
1960}.
*The CarpenterGrossberg
Adaptive
Resonance
Theory (ART)
competitive
learning
algorithms
{CarpenterGrossberg, 1988 }.
An example of a recurrent network
Lecture 2
2.4.1 NN Topology
The topology of NN describes factors such as how many
interconnections there are for each neuron, that is each neuron connected to a
few other neurons, too many other neurons or to all other neurons in the
network
The term "fully interconnected" NNs refers to network models for
which the o/p of each neuron may be connected to the inputs of all the neurons
(in the next layers), on the other hand, the partially "not fully" interconnected
NN, the o/p of a given neuron is allowed to connect only to certain of its
neighbours.
2.4.2 N.N structure

In term of their structural neural networks can be divided into two
types:
1) Feedforward Network: in a feedforward network the neurons are
generally grouped into layers, signals flow from the input layer through to
the output layer via unidirectional connections, the neurons being
connected from one layer to then next, but not within the same layer,
example of feedforward is multi-layer perceptron
2) Recurrent Network: the outputs of some neurons are feedback to the same
neurons or to neurons in preceding layer, signals can flow in both forward
and backward direction (bidirectional). Examples of recurrent networks are
the Hopfiled network (1982), the Elman network (1990), and the Jordan
network (1986).
10
Lecture 2
2.4.3 Learning Algorithm categorization

Many researchers have classified NNs on the basis of their learning
strategies, i.e. on whether the learning is achieved in the presence of the
supervisor (or teacher) or not
1) Supervised learning: occurs when a teacher or trainer is involves
providing the NN by some information during learning to teach the
network correct and incorrect response (training pattern are i/p and o/p
pairs) like single layer and Multi layer perceptron.
2) Unsupervised learning: no information about the o/p is provided and the
network cannot know the correct response to specific pattern (training
pattern are only i/p pattern like Kohonen Neural Network)
3) Reinforcement learning: occurs when the network monitors itself and
corrects errors in the interpretation of data by feedback through the network
like Genetic algorithms.
2.5 Single layer Perceptron

1
b1
i
bn
wji
X1
y1
ym
Xn
Input layer
buffer
Output layer
Connection weights and the threshold in a perceptron can be fixed or

adapted using a number of different algorithms here we use the Basic Delta
Rule (BDR) which can be describe as follow:
11
Lecture 2
Step 1: Initialize Weights and Threshold

Initial weight must be 1- non zero, 2- small, 3- random value
Step 2: present new continuous valued input x0, x1, ...., xn along with the
desired output d(t)
Step3: calculate actual output
n
y j (t ) = f ( w ji x i + b j )
i =1
Step4: Adapt weights

w ji (new) = w ji (old ) + w ji
w ji = j x i
j = yd j y j
where yd = desired output

y = actual output
= learning rate (positive gain) between (0 and 1)
Step5: calculate error

e j = yd j y j
and the Root Mean Square error

NOD
RMS =
(y
k =1
(k ) y (k )) 2
NOD
Step 6: if error is less than the desired error then go to step 3 or finish.
12
Lecture 2
This flowchart shows BDR
START
Initialize weights
&bias
Present i/p &desired

o/p (training pairs)
Calculate actual o/p
Calculate error (yd-ya)
Yes
Is error < 0.0001?

No
Update weights to
minimize error
END
13
Lecture 3
Lecturer: (M.Sc.) Miss. Bushra K. U.
Multi layer Perceptron
2.6 Multi layer Perceptron (MLP)

Multi- layer perceptrons are feed-forward network with one or more
layers of nodes between the input and output nodes. Figure below shows an
MLP with three layers: an input layer, an output layer and an intermediate or
hidden layer. Neurons in the input layer only act as buffer for distributing the
input signals to neurons in the hidden layer. Each neurons in hidden layer
sums up its input signals after weighted them with the strengths of the
respective connections from the input layer and computes its output as a
function of sum. The output of neurons in the output layer is computed
similarly.
1
1
i
x1
x2
.
.
.
bj
bk
O1
y1
y2
.
.
.
xn
.
.
.
Ol
Input layer
buffer
Hidden layer
ym
output layer
The additional layers are hidden because they are not directly interacting with
the outside world.
1.6.1 Back-propagation (BP) algorithm

The backpropagation or (back-error-propagation), a gradient descent
algorithm, is the most commonly adopted MLP training algorithm. It was first
presented in 1974 by Werbos then was independently re-invented in 1986 by
Rumelhart et al. The learning procedure involves the presentation of a set of
14
Lecture 3
pairs of input and output patterns. The neural network first uses the input
pattern to produce its own output pattern and then compares this with the
desired output or the target pattern if there is no difference between an actual
output and the target pattern, then no learning takes place. Otherwise, the
connection weights are changed to reduce the difference. It uses the gradient
algorithm to minimize the Root Mean Square error or (mean error, square
error) between the actual output of a multilayer feed-forward perceptron and
the desired output. It requires continuous differentiable non-linearities in the
following assumes a sigmoid nonlinearity function.
Step1: Initialize Weights and Threshold to
Initial weight must be 1- non zero, 2- small, 3- random value.
Step 2: present new continuous valued input x0, x1, ...., xn along with the
desired output yd
Step3: calculate actual output
n
o j = f ( w ji xi + b j )
i =1
m
y k = f ( w o + b )
j =1
kj j
Step4: Adapt weights

Use a recursive algorithm starting at the output nodes and working back to the
hidden layer. Adjust weights by
Output layer
wkj (new) = wkj (old ) + wkj
wkj = k o j
k = f k (net k )( yd y k )
where f for sigmoid is f = y (1 y )
15
Lecture 3
hidden layer
w ji (new) = w ji (old ) + w ji
w ji = j x i
m
j = f j (net j ) k wkj
k =1
where yd = desired output

y = actual output
= learning rate (positive gain) between (0 and 1)
Step5: calculate error

e j = yd j y j
and the Root Mean Square error

NOD
RMS =
(y
k =1
(k ) y (k )) 2
NOD
Step 6: if error is less than the desired error then go to step 2 or finish.
16
Lecture 4(part1)
Notes
The bias
-Some networks employ a bias unit as part of every layer except the output
layer
=1
=1
x1
x2
xn
bk
bj
.
.
.
y1
.
.
.
.
.
.
y2
ym
These units have a constant activation value of 1 or -1 its weight might be

adjusted during learning. The bias unit provides a constant term in the
weighted sum which results in an improvement on the convergence properties
of the network this is equivalent to translating the sigmoid curve to the left or
1
the right. Where the bias in sigmoid function is
wx +
1+ e
- While o in sigmoid(sigmoid function =
( wx+ ) )has the effect of

o
1+ e
modifying the shape of the sigmoid where a low value of o tends to make the
sigmoid to take on characteristics of (TLU) and a high value results in a gently
varying function.
17
Lecture 4(part1)
The sigmoid function
Notes
-The sigmoid function is desirable because it has simple derivation

(differentiable) & it provides a form of AGC (Automatic gain control)
1
1 + ex
f ' ( x) = y (1 y )
f ( x) =
Example: prove that

e x ex
1)
e x + ex
f ( x) = (1 y 2 )
if f ( x) =
if f ( x) =
2)
f ( x) =
2
1
1 e x
1
(1 y 2 )
2
Note: before training the net a decision has to be made on the setting of the
learning rate theoretically the larger the learning rate, the faster the training
process goes. But practically the learning rate may have to be set to a small
value (0.5 to 0.6) in order to prevent the training process from being trapped at
a local minimum resulting in an oscillatory response.
Momentum term
The main drawback of BP is trapping in local minimum. One way to
overcome this problem is by increasing the learning rate without leading to
oscillation is to modifying the backpropagation by adding a momentum term.
w ji (t + 1) = w ji (t ) + j xi + ( w ji (t ) w ji (t 1))
where ( 0 < < 1 ) is the momentum term.

18
Lecture 4(part1)
Notes
Convergence
The network dose not leaves a local minimum by the standard BP algorithm
therefore special techniques should be used to get out of a local minimum for
example
a- Change the learning rate or the momentum term.
b- Start the learning process again with different initial weights.
c- Add small random values to the weights (e.g. 10% of the values of
oscillatory weights).
d- Avoid repeated or less noisy data.
e- Increase the number of hidden units (e.g. 10%).
General useful notes
a- Initialize the weights between (-0.1 to 0.1) or (-0.5 to 0.5).
b- Use fixed or adjustable bias and the weight is updated by the same
learning algorithm.
c- Use variable learning rate (like high at the beginning then smaller as
we approach the convergence state) or adaptive .
d- Usually, the i/p patterns are scaled before presenting them to the net. To
avoid saturating the neurons.
e- In modeling applications, the transformation function is chosen to be
linear, f(y)=y for o/p neurons to get unlimited ranges for output.
19
Lecture 5
NN classification and Hopfield network
Neural Network classification according to input
Binary input
supervised
Hopfield
NN
Hamming
NN
Continuous input
unsupervised
supervised
Carpenter
And
Grosspery
SLP
MLP
unsupervised
Kohonen
NN
Hopfield NN
In the beginning of the 1980 Hopfield published two scientific papers
about its network. This network normally accepts binary and bipolar input (+1
or -1). It has a single layer of neurons, each connected to all the others, giving
it a recurrent structure. Hopfield networks, or associative networks, are
typically used for classification. Given a distorted input vector, the Hopfield
network associates it with an undistorted pattern stored in the network. This
net are most appropriate when exact binary representations are possible as
with black and white images where input elements are pixel values, or with
ASCII text where input values could represent bits in 8-bit ASCII
representation of each character. Hopfield NN is shown in figure below.
X1
Xn
X2
..
Y1
Y2
20
Yn
Lecture 5
It has N nodes containing hard limiting nonlinearities and binary inputs and
outputs taking on the values +1 and -1 the output of each node is fed back to
all other nodes via weights as mentioned above. The operation of this net is
done by first weights are set using the given recipe from exemplar patterns for
all classes. Then an unknown pattern is imposed on the net at time zero by
forcing the output of the net to match the unknown pattern. Then the net
iterates in discrete time until the output no longer change.
21
Lecture 5
Hopfield Net Algorithm

Step1: assign connection weights. The weights wij of the network being
assigned directly as follows:
M s s
xi x j
wij =
s =1
0
i j
for (s) pattern
i= j
where wij is the connection weight from node i to node j and x is (which is
either +1 or -1) is the ith component of the training input pattern for class s, M
the number of classes and N the number of neurons (or number of components
in the input pattern)
Step 2: initialize with unknown input pattern
y i (0) = xi
1 i N
where yi(t) is the output of node i at time t and xi which can be +1 or -1 is

element of the input pattern
Step 3: iterate until convergence
n
y j (t + 1) = f h ( wij . yi (t )), 1 j N
i =1
the function fh is the hard limiting nonlinearity as defined below. The process
is repeated until node outputs remain unchanged with further iterations. The
node outputs then represent the exemplar pattern that best matches the
unknown input.
1 x 0
f ( x) =
-1 x < 0
Step4: repeat by going to step 3.
22
Lecture 6
Hamming Network
Hamming Network
It is used when the i/ps are binary value. The net selects a winner from
the stored patterns (x(m), m=1,2,, m) which has the least Hamming distance
from i/p vector.
The Hamming distance is the number of bits in the input which do not
match the corresponding exemplar bits.
Hamming network consist from two part, first the lower subnet which
calculates N minus the Hamming distance to M exemplar patterns. These
matching score will range from 0 to the number of elements in the (n) input
and are highest for those nodes corresponding to classes with exemplars that
best match the input.
Second part is the upper subnet where the binary inputs with (m)
patterns are presented to it. The input is then removed and the MAXNET
iterates until the output of only one node is positive (or big one). Classification
is then complete and the selected class is that corresponding to the node with
+ve output.
Hamming net algorithm
Step1: Assign Connection weights and bias

In lower subnet
w ji =
x ij
N
, j =
2
2
x11
1 2
or w ji = x1
2 m
x1
, 1 i N, 1 j M
x 12
x 22
x 2m
x 1n
x n2
x nm
where N is number of input node, M is number of exemplars of stored patterns

In Upper subnet
1
wkl =

k =l
,
k l
bias=0, 0 < <
1
M
1 l, k M
23
Lecture 6
or
1
wkl =

Hamming Network
where wji is the connection weight from input i to node j in the lower subnet
and is the bias in the node. wkl is the connection weight from node k to node l
in the upper subnet, and all the bias in this subnet are zero, xij is element i of
exemplar j.
Step2: Initialize with unknown input pattern

The output of lower subnet
N
net j = w ji xi + j
i =0
y j = f t (net j )
0
where yj is the output of node j in the lower subnet at time zero, xi is element i
of the input vector, and ft is the threshold logic nonlinearity.
Step3: iterate until convergence
The output of upper subnet
yk
t +1
= f t ( wkl y lt )
l =1
this process is repeated until convergence after which the output of only one
node remains positive
Step4: repeat by going to step3.
24
Lecture 6
Hamming Network
Y2k+1
Y1k+1
Ymk+1
Upper subnet
(MAXNET)
Y20
Y10
Ym0
Lower subnet
(Calculate matching score)
X1
n/2
X2
Y1
Xn
X3
n/2
n/2
Y2
Ym
F(net)
1
X1
X2
Xn
X3
net
Lower subnet
U
Y10
Y1k+1
F(net)
Y20
Y2k+1
1
Ymk+1
Ym
U
Upper subnet
25
net
Lecture 7
Kohonen's Self organizing Feature Map

The Self Organizing Feature Map introduced by Kohonen in 1982 uses
unsupervised learning, and continuous data. It sets its self up as the human brain
does, by putting nodes that have similar features close together, and makes
stronger weighted connections between them as opposed to farther nodes.
A kohonen network has two layers, an input layer to receive the input and an
output layer. Neurons in the output layer are usually arranged into two
dimensional grids as shown in the figure 1. Output nodes are extensively
interconnected with many local connections and each output neuron is
connected to all input neurons. The weights of the connections form the
components of the reference vector associated with the given output neurons.
Continuous- valued input vectors are presented sequentially in time without
specifying the desired output.
Figure (1): Kohonen SOM with two dimensional neighborhood & input vector
The basis feature of this net is the concept of excitatory learning within a
neighborhood around the winning neuron which slowly decreases in size with
each iteration. Weights between input and output nodes are initially set to small
random values and an input is presented with out desired output.
26
Lecture 7

Kohonen self organizing algorithm
Step1: initialize weights
Initialize weights from N inputs to M output nodes to small random values. Set
the initial radius of the neighborhood.
Step2: present new input
Step3: compute distance to all nodes
Compute distances dj between the input and each output node j using
N
d j = ( xi (t ) w ji )) 2 Euclidean distances
i =1
where xi(t) is the input to node i at time t and wji is the weight from input node i
to output node j at time t.
Step4: select output node with minimum distance. Select node j* as that output
node with minimum dj.
Step5: update weights to node j* and neighbors
Weights are updated for node j* and all nodes in the neighborhood defined by
NEj(t). New weights are
w ji (t + 1) = w ji (t ) + (t )( xi (t ) w ji (t ))
for j NET j (t )
1 i N
The term (t ) is a gain term (0< (t ) <1) that decreases in time.

(t ) = K a e( t / T )
a
Where Ka <1 and Ta is the decay constant of the learning rate, which ranges
from 1000 to 10000.
The term (t ) is a gain term (0< (t ) <1) that decreases in time.
Step6: repeat by going to Step2.
The weights eventually converge and are fixed after the gain term in step 5 is
reduced to zero.
In a well-trained kohonen network. Output neurons that are close to one another
have similar reference vector.
27
Lecture 8
Fuzzy Logic controller
Fuzzy System
1- Introduction:
Fuzzy systems (or Controller) can be designed to emulate the human
deductive processes that is, the process people use to infer conclusion from what
they know.
They use collection of rules called knowledge base or rule bases that hold
a set of (if- then) rules that quantify the expert's knowledge about solving a
problem.
There are applications of fuzzy system theory to control such as robot, mobile
robot, engine, motor, washing machine, vacuum cleaner, etc, signal processing
such as fuzzy filters or fuzzy signal detection, medical diagnosis, securities, data
compression and so on
Reference
i/p
e
e
Fuzzy Controller
FLC
Plant
Actual
output
Figure (1): A Block diagram of a fuzzy control system.

Why fuzzy control
1- Simple, easy to implement technology.
2- Controller can be developed linguistic lines, which have close association
with the fields of artificial intelligence.
3- Smooth controller behavior.
4- Results are easy to transfer from product to product.
5- They can be deal successfully with processes such as multi-variable,
inherently nonlinear, and time varying in nature. Or with ill defined
system of unknown dynamic.
28
Lecture 8
2- Basic structure of a fuzzy logic controller (FLC)

Lotfi A. Zadeh (1965) developed the theory of fuzzy sets and algorithms
by extended the classical concept of a set. Unlike classical logic, in which
element either do or do not belong to set, the degree of member for element of a
fuzzy set can take on any value of the interval [0,1]. Fuzzy logic offers a
framework for representing imprecise, uncertain knowledge. Similar to the way
in which human being make their decisions, fuzzy systems are using a mode of
approximate reasoning, which allows them to deal with vagueness and
incomplete information.
The first applied to the fuzzy logic design of a controller for a dynamic
plant by Mamdani in 1974. A block diagram of a fuzzy control system is shown
in figure below. The fuzzy logic controller is composed of the following four
elements. These are:
1- The fuzzification: convert the measured "crisp" inputs to "fuzzy" values such
as Positive Big (PB), Negative Small (NS) (which converts controller inputs into
information that the inference mechanism can easily use to activate and apply
rules).
2- The knowledge base (A rule based): contains a set of rules or a relation

matrix representing those rules (which contains a fuzzy logic quantification of
the experts linguistic description of how to achieve good control).
3- The decision-making (also called inference engine unit, inference mechanism

fuzzy inference module and computational unit): simulates the inference
mechanism in humans. It produces fuzzy control action using fuzzy implication
(which emulates the experts decision making in interpreting and applying
knowledge about how best to control the plant).
29
Lecture 8
4- The defuzzification: it is an interface unit between the process and the

decision making unit which convert the fuzzy set output to crisp output( which
converts the conclusions of the interface mechanism into actual inputs for the
process).
Knowledge base
Fuzzy
input
Inference Fuzzy
Engine output
Defuzzification
Crisp
e set
Fuzzification
Reference
i/p
Crisp
output
Plant
FLC
Figure (2): fuzzy controller.

2-1 Crisp set: It is a collection of objects of any kinds, numbers, points, chairs,
pencils, etc.
Crisp set can be represent by two methods
1- List Method: list the members of the set
Universe set
1,2,3,4,5,6... 1000
Crisp set
3,4,5,6
2- Rule Method: take only the members, which satisfy the rule.
Rule: take only the houses higher than 15m.
30
Lecture 8
In crisp set an element (u) can either belong or not belong to a set A (i.e.
the degree to which element u belongs to set A is either 1 or 0).
1 if and only if u A
0 if and only if u A
A (u ) =
A : u[0,1]
The function A is called membership function.
The operation that can be done with crisp set

Universe set U
a- Complement
A'
b- Union
AB
c- Intersection
A B
Example: if universal set= All students (male and female) in a class

Set A= All students younger than 20 year
Set B= Male students
Find A , B, A B, A B ?
1- A =All students older than 20 year
2- B = All female students
3- A B = All male students that are younger than 20
4- A B = All male students and all females' students younger than 20 year.
31
Lecture 8
2.2 Fuzzy set

A fuzzy set has dual representation, alinguistic term and a numeric value
through its membership function, which maps elements in a universe of
discourse (input space) to their membership degree in the set.
A set in classical set theory always has a sharp boundary because
membership in a set is a black and white concept. Every thing is either black or
white, an object belongs to set or does not belong to the set at all.
A fuzzy set is always associated with a linguistic term, and also with a
membership value. The linguistic terms offers us two important benefits. First,
the association make it easier for human experts to express their knowledge
using linguistic terms is easily understandable and comprehensible.
The membership value is necessary to compute data and get an output.
For instance in crisp set theory, if someone is taller than 1.8 meters, we can state
that such person belongs to the "set of tall people". However such sharp change
from 1.799 meters of a "short person" to 1.8001 meters of a "tall person" is
against the common sense.
Example: let we have set of five pencil located in the box. Determine a fuzzy set
of "short pencil" A as?
A= {p1/0.2, p2/0.5, p3/1.0, p4/1.0, p5/0.9}
P3 and p4 are exactly short, p5 are almost short, p2 is more or less short and p1 is
almost exactly not short.
1
32
Short
20
Lecture 8
The operator very is usually defined as a concentration operator as
very u=u2
very (very u)= (u2)2
Example: The composite term very old can be obtain from the term old as
Very old=old2
Example: consider the fuzzy set of short pencils
A= {pencil1/0.2, pencil2/0.5, pencil3/1.0, pencil4/1.0, pencil5/0.9}
Then a fuzzy set of very short pencils can be determined as
B= {pencil1/0.04, pencil2/0.25, pencil3/1.0, pencil4/1.0, pencil5/0.81}
Note: less (very u)=u or u 2 = u
The operation can be implementing on the membership function

1- Complement
A = 1 A (u )
2- Intersection
A
= min( A (u ), B (u ))
3- Union
A B (u ) = max( A (u ), B (u ))
33
Lecture 8
2.2.1 Membership function

a membership function is a curve that defines how each point in the universe of
discourse, or input space, is mapped to a membership value (or degree of
membership) between 0 and 1, which is the only condition that must be really
satisfied by a membership function. One of the must important think of
membership function is that they provide a gradual transition from regions
completely outside a set to regions completely in the set.
Membership function can take many shapes like
1- triangular and trapezoid
2- quadratic
3- Gaussian
4- Sharp peak
Triangular
Trapezoid
Gaussian
Quadratic
34
Lecture 8
2.2.2 Basic Fuzzy set operations

The three fundamental operations in fuzzy sets are union, intersection, and
complement.
1- Union (OR): the union D of Fuzzy sets A and B is given by
D (u ) = Max{ A (u ), B (u )} u U
2- intersection (AND): the intersection C of Fuzzy sets A and B is defined

by
C (u ) = Min{ A (u ), B (u )} u U
3- Complement (NOT)
A (u ) = 1 A (u )
35
Lecture 8
2.3 Fuzzy If-Then Rules

Fuzzy controllers make use of many IF-THEN rules to achieve their
desired tasks, these rules are given in the following form
IF (antecedent) THEN (consequent)

or
IF (a is A) AND (b is B) THEN (c is C)
Where A, B, and C are linguistic terms of a fuzzy set which represent the
linguistic labels associated with fuzzy sets specifying their meaning (for
example: SMALL, MEDIUM, LARGE, HIGH,TALL,SHORT, AND FAST).
a and b are the variable that represent the input of the fuzzy controller.
c is the output variable.
(a is A) AND (b is B) is called the antecedent and describes a condition
(c is C) is called the consequent and describes a conclusion
2.4 show the working of fuzzy system (fuzzy inference system)

Fuzzy inference is the actual process of mapping from a given input to an
output using fuzzy logic. The process involves all the elements that we have
discussed previously: membership functions, fuzzy logic operators, and if-then
rules.
The following points the different steps we need to implement the fuzzy
inference system:
1- fuzzify inputs
2- apply fuzzy operator or fuzzy matching
3- apply implication method or inference
4- aggregate all outputs or combining fuzzy conclusion
5- defuzzify
36
Lecture 8
2.4.1 fuzzify inputs

This step is to take the inputs and determine the degree to which they
belong to each of the appropriate fuzzy sets via membership functions.
The input is always a crisp value which will be within its universe of discourse.
The output is a fuzzy degree of membership. And it is always between 0 and 1.
So fuzzification is looking at the membership function and finding the fuzzy
degree of membership.
2.4.2 Apply fuzzy operator
In this step we calculate the degree to which the input data matches the
condition of the fuzzy rules. Where the fuzzy operator is applied to obtain one
number that represents the result of the antecedent for that rule if the antecedent
has more than one part.
2.4.3 Apply inference method
This step calculates the rule's conclusion based on its degree of matching.
There are two main methods: the clipping method and the scaling method.
2.4.3.1 The clipping method
This method cuts off the top of the membership function whose value is
higher than the degree of matching.
2.4.3.2 The scaling method
This method also called prod, here we change the membership degrees for
all the elements of the universe. A scale rate depends on the degree of matching.
A scaled membership function should have the peak value equal to degree of
matching.
short
1
Scaling
clipping
37
Lecture 8
2.4.4 Combining fuzzy conclusion

In this stage we need combine (aggregate) the inference results of these
rules. The output of the aggregation process is one fuzzy set for each output
variable.
2.4.5 defuzzify
This is the last step of the inference system. This step is needed because
we often need a crisp output value, not a fuzzy one. The input for the
defuzzification process is a fuzzy set (the aggregate output fuzzy set) and the
output is a single crisp value, as it's usually necessary.
There are seven major defuzzification techniques:

1- The mean of Maximum (MOM)
2- Center- of-area/gravity
3- Centre-of- largest-area
4- First-of-maxima
5- Middle-of-maxima
6- Last-of-maxima
7- Height
Since the first defuzzifcation method ( Centre of gravity/ area) is the best
well known method ( one of the most popular defuzzification method), then it
will introduced mathematically. The crisp output using the centre of gravity/area
is given by:
U =
u
i =1
(u i )
(u )
i =1
38

Neural Networks

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Neural Networks

Încărcat de

Drepturi de autor:

Formate disponibile

Lecture 1

Lecturer: (M.Sc.) Miss. Bushra K.U.

Introduction to Neural Networks

Stergiou C. and Siganos D., Neural Networks,

Introduction to Neural Networks

Artificial Nueral Network

Introduction to Neural Networks

Fig.(1) Schematic diagram of biological neuron

Introduction to Neural Networks

2.3 Artificial Neural Networks (ANN)

Fig. (2) Artificial (simulated) neuron

In 1943, McCullochi and Pits proposed a mathematical model of

Introduction to Neural Networks

Types of activation function

Hard limiter (On-Off)

b- Linear (Ramp activation function)

c- Threshold logic unit (TLU) saturation

Introduction to Neural Networks

d- Sigmoid activation function

e- Hyperbolic tangent Function: is similar to sigmoid in shap but symmetric

Introduction to Neural Networks

Ex1: Find y for the following neuron if X1=0.5, X2=1, X3=-0.7,

5) if f is TLU with b=0.6, a=3

Ex2: Find y for the following neuron if X1=0.5, X2=1, X3=-0.7,

Introduction to Neural Networks

1.3 Why use neural networks?

Types of Neural Networks

2.4 types of Neural Network (NNs):

*The delta rule

{Widrow and Hoff,

An example of a simple feedforward network

An example of a recurrent network

Types of Neural Networks

2.4.2 N.N structure

Types of Neural Networks

2.4.3 Learning Algorithm categorization

2.5 Single layer Perceptron

Connection weights and the threshold in a perceptron can be fixed or

Types of Neural Networks

Step 1: Initialize Weights and Threshold

Step4: Adapt weights

where yd = desired output

Step5: calculate error

and the Root Mean Square error

Types of Neural Networks

This flowchart shows BDR

Present i/p &desired

Is error < 0.0001?

Multi layer Perceptron

2.6 Multi layer Perceptron (MLP)

1.6.1 Back-propagation (BP) algorithm

Multi layer Perceptron

Step4: Adapt weights

where f for sigmoid is f = y (1 y )

Multi layer Perceptron

where yd = desired output

Step5: calculate error

and the Root Mean Square error

These units have a constant activation value of 1 or -1 its weight might be

- While o in sigmoid(sigmoid function =

( wx+ ) )has the effect of

-The sigmoid function is desirable because it has simple derivation

Example: prove that