The Mcculloch Neuron (1943) : A B G B P W G A

The McCulloch Neuron (1943)
p 1
w 1
p w
2 2
+
b
p n
w n
for n=2
w1 p1 + w 2 p 2 = b
n
a = g wi p i b = g ( w t p b ) a [ 0;1] p
i =1 2
g = step function
B
The euclidian space n is divided in two regions A and B
p
1
Laboratrio de Automao e Robtica - A. Bauchspiess Soft Computing - Neural Networks and Fuzzy Logic 51
The McCulloch Neuron
as patterns classifier
o x o
o o
o o o
o o
o x o x
x x o x
x x
x x x x x x
Linearly separable collections Linearly dependent (non-separable) collections
Some Boolean functions of two variables represented in a binary plan.
Linear and Non-Linear Classifiers
There exist 2 =2
m 2n
possible logical functions connecting n inputs to one binary output.
n # of binray # of logical # linearly % linearly

patterns functions separable separable
1 2 4 4 100
2 4 16 14 87,5
3 8 256 104 40,6
4 16 65536 1.772 2,9
5 32 4,3 x 109 94.572 2,2 x 10-3
6 64 1,8 x 1019 5.028.134 3,1 x 10-13
The logical functions of one variable:
A, A , 0, 1
The logical functions of two variables:
A, B, A , B , 0, 1
A B, A B, A B, A B,
A B , A B , A B , A B , A B, A B
Two Step Binary Perceptron
The neuron 6 implements a logical AND function by choosing

5
b6 = w
i =3
i6 .
For example:
1
w36 = w46 = w56 = ; b6 = 1 a 6 = 1 if and only if a3 = a 4 = a5 = 1
3
Three Step Binary Perceptron
w
1 3
3 w 3 9
p 2
w 1
4 4 9
9 A
w 5 9
A B
p
1
5
a 1 1
10
p
2
p 1
6
1
B
a =A B
11 ^
7 10
Neurons and Artificial Neural Networks
Micro-structure
characteristics of each neuron in the network
Meso-Structure
organization of the network
Macro-Structure
association of networks, eventually with some analytical processing
approach for complex problems
p
1
w 1
p w
2 2
+
b Bias input
p w
n n
Bias:
with p=0,
output 0 still possible !
Typical activation functions
Linear f ( s) = s Hopfield purelin f(s)
BSB
Signal + 1 se s 0 Perceptron hardlims f(s)

f (s) = 1
1 se s < 0
s
-1
Step + 1 se s 0 Perceptron hardlim f(s)

f ( s) = BAM 1
0 se s < 0
Hopfield/ +1 se s > 0 Hopfield f(s)

BAM BAM 1
f (s) = 1 se s < 0
unchanged if s = 0
s
-1
Typical activation functions
BSB or K se s K BSB satlin f(s)

Logical satlins K
f ( s) = s se K < s < + K
Threshold + K se s + K
s
-K
Logstics 1 Perceptron logsig f(s)

f ( s) = 1
1 + e s Hopfield
BAM, BSB
s
Hiperbolic 1 e 2 s Perceptron tansig f(s)

Tangent f ( s) = tanh(s) = Hopfield 1
1 + e 2s BAM, BSB
s
-1
Meso-Structure Network Organization...
# neurons per layer

# network layers
# connection type (forward, backward, lateral).
1- Multilayer Feedforward
Multilayer Perceptron (MLP)
Meso-Structure Network Organization...
2- Single Layer laterally connected (BSB (self-feedback), Hopfield)
3 Bilayers Feedforward/Feedbackward
Meso-Structure Network Organization
4 Multilayer Cooperative/Comparative Network
5 Hybrid Network
Sub- Sub-
Network
Rede 1 Network
Rede 2
1 2
Neural Macro-Structure
Rede
NetW. 1
- # networks
- connection type
- size of networks
Rede 2a
NetW.2 Rede
NetW. 2b Rede
NetW. 2c
- degree of connectivity
Rede
NetW. 3
Supervised Learning
d
-
x y
w w + x dy
Delta Rule Perceptron __
learning rate
Widrow-Hoff delta rule (LMS) ADALINE, MADALINE
Generalized Deta Rule
j x ij
wij wij +
x k2
Widrow-Hoff Delta Rule (LMS)
Delta rule Perceptron
d
Perceptron Rosenblatt, 1957 -
x y
Dynamics: __

sj = w
i
ij pij + bj
1 bj
+ 1 se s j 0
y j = f (s j ) = p w
0 se s j < 0
1 j 1 j
p w sj
yj
2 j 2 j
+
pn j
w n j
j = dj yj
wij wij + j xij Delta Rule Psychology Reasoning:

- positive reinforcement
- learning rate
j = 0 the weight is not changed. - negative reinforcement
ADALINE and MADALINE
Widrow & Hoff, 1960 (Mult.) Adaptive Linear Element
y j = wij pij + b j
i
Training:

j = d j sj = d j
wij pij + b j

j xij
wij wij +
x 2
k
Widrow-Hoff delta rule
LMS Least Mean Squared algorithm
0.1< <1 stability and convergency speed.
MatLab: NEWLIN, NEWLIND, ADAPT, LEARNWH
Obs : j j wij wij + j xij Delta Rule
LMS Algorithm
Objective: learn a function f : n from the samples (xk, dk)
{ xk}, {dk } and {ek } stationary stochastic processes

e = d y actual stochastic error Linear neuron
n
y= x w
i =1
i i
= xw t
Expected value
E[e2] = E[(d-y)2]
= E[(d-xwt)2]
= E[d2 ] 2E[dx]wt + wE [xtx] wt
Assuming w deterministic.
With
E [xtx] R autocorrelation input matrix
E [dx] P cross correlated vector Optimal analytic
w* = PR-1 solution of the
E[e2 ] = E[d2 ] 2Pwt + wRwt optimization
(solvelin.m)
0 = 2w*R 2P
(Partial derivatives equal 0 for optimal w*)
Iterative LMS Algorithm
Objective: adaptively learn a function f : n from the samples (xk, dk)
Knowing P and R, R-1 , then for some w:
w E[e2 ] = 2wR 2P
Post-multiplyting by R-1
w E[e2 ] R-1 = w 2P R-1 = w w*
w* = w w E[e2 ] R-1 i
*
wk+1 = wk ck w E[e2 ] R-1

(ck = Newtons method) How to, cautiously find
new (better ) values for
LMS Hypothesis:
E[e2k+1| e20 , e21, ... e2k] = e2k
wi , the free parameters ?
Iterative LMS Algorithm...
assuming R = I estimated steppest decent algorithm:
wk+1 = wk ck w e2k
Gradient of e2k with respect to w

ek2 ek2
w e2k = , L

1w w n
(d k yk ) 2 ( d k yk ) 2
= , L
i
w1 wn *
y y
= 2 (d k yk ) k , L 2(d k yk ) k
w1 wn j xij
wij wij +
y
= 2ek k , L
yk

x k2 Norma-
lization

1w wn
[ ]
Iterative (adaptive) solution
= 2ek x , L x = 2ek x k
1
k
n
k ( yk = x k w ) t
k
(The optimal solution is never reached!)
MADALINE i-input, j-neuron
LMS algorithm reduces to wk+1 = wk + 2ck ek xk
The Multilayer Perceptron
- The Generalized Delta Rule
Rumelhart, Hinton e Williams, PDP/MIT, 1986
p1 = x1(0) x1(1)
x1(2) = y1
x2(1)
p2 = x2(0)
x2(2) = y2
p3 = x3(0)
x3(1)
Neuron Dynamics:
s (jk ) = w0( kj ) + wij( k ) xi( k 1)

Processing Element (PE) j Turning Point Question:
in layer k
input i i
How to find the error
associated with an
with f (activation function) x (jk ) = f ( s (jk ) ) internal neuron??
continuous differentiable
The generalized delta rule
Training p1 = x1( 0) x1(1)
x1(2) = y1
m
=
2
j =1
(d j y j )2 - quadratic error p2 = x2(0)
x 2(1)
x2(2) = y2
w (jk ) = ( woj( k ) , w1( kj ) ,..., wmj

(k )
) - weigths of PE j p3 = x3( 0)
x3(1)
x (jk 1) = (1, x1( kj 1) ,..., xnj( k 1) ) - input vector of PE j

s (jk )
With s (jk ) = w (jk ) x (jk 1) = x (jk 1)
w (jk )
Instantaneous gradient:
so (jk ) = ( k ) = ( k ) x (jk 1)
2 2
2 2 2 2 w j s j
(jk ) = = (k ) , , L
w j
(k )
w0 j w1 j
(k )
wmj
(k )

1 2
Defining the quadratic derivative error as (k )
j =
2 s (jk )
s j
(k )
2 2
(jk ) = =
w (jk ) s (jk ) w (jk ) (jk ) = 2 (j k ) x (jk 1) Gradient of the error with respect
to the weights as function of the
former layer signals!!
The generalized delta rule...
p1 = x1( 0) x1(1)
For the output layer, the quadratic derivative error is: x1(2) = y1
x 2(1)
Nk Nk p2 = x2(0)
1

i =1
( d i yi ) 2
1

i =1
( d i f ( si( k ) )) 2 x2(2) = y2
(j k ) = = p3 = x3( 0)
2 s (jk ) 2 s (jk ) x3(1)
The partial derivatives are 0 for i j
1 ( d j f ( s j )) ( k ) ( d j f ( s j ))
(k ) 2 (k )
(j k ) = = ( d j f ( s j )) = ( d j x (k )
j ) f ( s (k )
j )
2 s (jk ) s (jk )
The output error associated with PEj, in the last layer:
(jk ) = d j x (jk ) = d j y j
Giving:
(j k ) = (jk ) . f ( s (jk ) )
Remember,
activation function, f, continuous differentiable
The generalized delta rule...
p1 = x1( 0) x1(1)
For a hidden layer k, the quadratic derivative error x1(2) = y1
can be calculated using the linear outputs of layer k+1: x 2(1)
p2 = x2(0)
x2(2) = y2
1 2 1 N k +1 2 si( k +1) p3 = x3( 0)

(k )
= = ( k +1) x3(1)
si( k )
(Chain Rule)
2 s (jk ) 2 i =1 si
j
considering

1 2 si( k +1) N k +1 ( k +1) si( k +1)
N k +1
= ( k ) = i s j
(k )
f( )
sl
(k )
= 0 if l j and that
s j
(k )
( )
f s (jk ) = f s (jk ) ( )
( k +1) (k )
i =1 2 si si i =1 si
Nk
We have: (k )
j ( )
N k +1 ( k +1) ( k +1)
= i w ji . f s (jk ) ( )
Taking into account that s (jk ) = w0( kj ) + wij( k ) xi( k 1) 1i =41
42443
i =1 (k )
j
N k +1
( )
( k +1) Nk
( k +1)
(j k ) = i si( k )
w
0i
+ wli( k +1) f sl( k )
Finally, the quadratic derivative errror for a hidden layer:
i =1 l =1
(j k ) = (jk ) . f ( s (jk ) )
( )
N k +1 Nk

(k )
j = ( k +1)
i wli( k +1)
si( k )
f sl( k )

i =1 l =1
The Error Backpropagation algorithm
1. wij( k ) random , initialize the network weigths

m
2. for (x,d), training pair, obtain y. Feedforward propagation: = 2
(d
j =1
j y j )2
3. k last layer
4. for each element j in the layer k do:
Compute (kj ) using (jk ) = d j x (jk ) = d j y j if k is the last layer,
N k +1
(k )
j = i( k +1) w(jik +1) if it is a hidden layer;
i =1
Compute (j k ) = (jk ) . f ( s (jk ) )

5. k k 1 if k > 0 go to step 4, else continue.
6. w (jk ) (n + 1) = w (jk ) (n) + 2 i( k ) x i( k )
7. For the next training pair go to step 2.
The Backpropagation Algorithm in practice
1 In the standard form BP is very slow.

Ee-2 Energia da rede
2 BP Pathologies: paralysis in regions of small gradient.
3 Initial conditions can lead to local minima. PadroStart

Bad esprio Good Start
Valor Inicial
4 Stop conditions number of epochs, wij <
5 BP variants
- trainbpm (with momentum) Padro recuperado
Optimum
PadresMinima
Local armazenados
- trainbpx (adaptive learning rate)
- ....
wi,j
Estados
- trainlm (Levenberg-Marquard J, Jacobian)
e2(wi,j) - Illustrative quadratic error
W (j k ) = ( J T J + J ) 1 J T e as function of the weights
Obs: the error surface is, normally, unknown.
Steepest descent go in the opposite

direction of the local gradient (downhill).
Computational Tools
SNNS
MatLab
- Neural Network Toolbox
NeuralWorks
Java
C++
Hardware Implementations of RNAs
SNNS - Stuttgarter Neural Network Simulator
MatLab
- complete environment purelin
Model Reference Controller
Reference
model liq 4 order
-System Simulation logsig qi
Neural
Network Control qi h4
Controller Signal h4
-Training tansig
Plant Output
h4
-Control Scope
radbas
3 f(u)
1 y(n)=Cx(n)+Du(n)
y
Fcn1 z x(n+1)=Ax(n)+Bu(n) K*u
Unit Delay 1 Discrete State -Space 2 Matrix
Gain 4
-C-
Constant 3
Switch 1 f(u)
Fcn2
f(u) 1 y(n)=Cx(n)+Du(n)
2 K*u
u z x(n+1)=Ax(n)+Bu(n) + K*u
Fcn +
Unit Delay 5 Discrete State -Space 1 Matrix
tansig Matrix
Gain 3
-C- Gain 1 netsum 1 purelin -C-
netsum 1
Constant 2 Constant 4 uhat
Switch Switch 2 Saturation 1 Zero -Order
Hold
B2_c
Constant 7
1 f(u) y(n)=Cx(n)+Du(n)
r K*u
x(n+1)=Ax(n)+Bu(n)
Fcn3
Discrete State -Space 3 Matrix
Gain 2
-C-
Constant 5 B1_c
Switch 3
Constant 6
Demonstration - Perceptron
% Perceptron
% Training an ANN to learn to classify a non-linear problem
% Input Pattern
P=[ 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 T= 1 0 1 1 1 0 1 0
0 1 0 1 0 1 0 1]
% Target Y= 1 0 1 1 1 0 1 0
%T=[1 0 1 1 1 0 1 0] % Linear separable
T=[1 0 0 1 1 0 1 0] % non separable
% Try with Rosenblat's Perceptron T= 1 0 0 1 1 0 1 0

net=newp(P,T,'hardlim')
Y= 1 0 1 0 0 0 1 0
% train the network
net=train(net,P,T)
Y=sim(net,P)
Demonstration - OCR
Training Vector
20 % Noise
ANN
Demonstration OCR...
% of missclassifications Neural OCR Classifier
Training with 10 x (0,10,20,30,40,50) % noise
Noisy patterns used in training (unitl % of bits flipped)
* - error without noisy training patterns

* - error using noisy training patterns
With Some Noisy Training Pattern

Learns how to treat any noise
Demonstration LMS, ADALINE, FIR
y (k ) = w0u (k ) + w1u (k 1) + w2u (k 2) + L wn u (k n)
Y ( z)
= w0 + w1 z 1 + w2 z 2 + L wn z n
U ( z)
FIR Model (always stable, only Zeros)

Obs : IIR Model is more compact, but can be unstable!
1 3
g1 (0 79.9 sec) = g 2 (80 150 sec) =
s 2 + 0.2 s + 1 s 2 + 2s + 1
System changes at 80 sec Sampling Time, Ts = 0.1sec

4
-2
(TDL Time Delay Line) -4

0 50 100 150
sec
Demo LMS, ADALINE, FIR...
% ADALINE - Adaptive dynamic system identification
% First sampled system - until 80 sec
g1=tf(1,[1 .2 1]), gd1=c2d(g1,.1)
% Sytem changes dramatically - after 80 sec
g2=tf(3,[1 2 1]),gd2=c2d(g2,.1)
% Pseudo Random Binary Signal - good for identification

u=idinput(120*10,'PRBS',[0 0.01],[-1 1]);
% time vector
...
[y1,t1,x1]=lsim(gd1,u1,t1);
[y2,t2,x2]=lsim(gd2,u2,t2,x1);
% Creates new adaline nework with delayed inputs (FIR)

% Learning Rate = 0.09
net=newlin(t,y,[1 2 3 4 5 6 7 8 9 10],0.09)
[net,Y,E]=adapt(net,t,y)
% design an average transfer function

netd=newlind(t,y)
Demo LMS, ADALINE, FIR...
RMSE Set 1=6.5742 u=idinput(1500,'PRBS',[0 0.01])
4
2 n=10, lr=0.1
0
-2
-4
0 500 1000 1500
ADALINE
Error
Learns System AND 4
also Changes in the Dynamics!! 2
-2
-4
0 500 1000 1500
10
RMSE Set 2=22.7817 u=idinput(1200,'PRBS',[0 0.05])
5 n=10, lr=0.1 Verification Signal

0
But, in other frequency range
not so good... -5
0 200 400 600 800 1000 1200
(needs to Adjust TDL, lr, Ts) Error

5
-5
0 200 400 600 800 1000 1200

The Mcculloch Neuron (1943) : A B G B P W G A

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

The Mcculloch Neuron (1943) : A B G B P W G A

Încărcat de

Drepturi de autor:

Formate disponibile

The McCulloch Neuron (1943)

Linearly separable collections Linearly dependent (non-separable) collections

Some Boolean functions of two variables represented in a binary plan.

n # of binray # of logical # linearly % linearly

The logical functions of one variable:

The logical functions of two variables:

The neuron 6 implements a logical AND function by choosing

Signal + 1 se s 0 Perceptron hardlims f(s)

Step + 1 se s 0 Perceptron hardlim f(s)

Hopfield/ +1 se s > 0 Hopfield f(s)

BSB or K se s K BSB satlin f(s)

Logstics 1 Perceptron logsig f(s)

Hiperbolic 1 e 2 s Perceptron tansig f(s)

# neurons per layer

Multilayer Perceptron (MLP)

2- Single Layer laterally connected (BSB (self-feedback), Hopfield)

4 Multilayer Cooperative/Comparative Network

Generalized Deta Rule

Widrow-Hoff Delta Rule (LMS)

wij wij + j xij Delta Rule Psychology Reasoning:

Obs : j j wij wij + j xij Delta Rule

{ xk}, {dk } and {ek } stationary stochastic processes

(Partial derivatives equal 0 for optimal w*)

Knowing P and R, R-1 , then for some w:

w E[e2 ] R-1 = w 2P R-1 = w w*

wk+1 = wk ck w E[e2 ] R-1

Gradient of e2k with respect to w

s (jk ) = w0( kj ) + wij( k ) xi( k 1)

w (jk ) = ( woj( k ) , w1( kj ) ,..., wmj

x (jk 1) = (1, x1( kj 1) ,..., xnj( k 1) ) - input vector of PE j

The partial derivatives are 0 for i j

The output error associated with PEj, in the last layer:

1 2 1 N k +1 2 si( k +1) p3 = x3( 0)

1. wij( k ) random , initialize the network weigths

Compute (j k ) = (jk ) . f ( s (jk ) )

1 In the standard form BP is very slow.

3 Initial conditions can lead to local minima. PadroStart

4 Stop conditions number of epochs, wij <

Obs: the error surface is, normally, unknown.

Steepest descent go in the opposite

Hardware Implementations of RNAs

% Try with Rosenblat's Perceptron T= 1 0 0 1 1 0 1 0

Training with 10 x (0,10,20,30,40,50) % noise

Noisy patterns used in training (unitl % of bits flipped)

* - error without noisy training patterns

With Some Noisy Training Pattern

FIR Model (always stable, only Zeros)

System changes at 80 sec Sampling Time, Ts = 0.1sec

(TDL Time Delay Line) -4

% Pseudo Random Binary Signal - good for identification

% Creates new adaline nework with delayed inputs (FIR)

% design an average transfer function

also Changes in the Dynamics!! 2

5 n=10, lr=0.1 Verification Signal

(needs to Adjust TDL, lr, Ts) Error

S-ar putea să vă placă și