Matlab S3 MLP

Programming in MATLAB
Chapter 3: Multi Layer Perceptron Gp.Capt.Thanapant Raicharoen, PhD
Outline
n Limitation of Single layer Perceptron n Multi Layer Perceptron (MLP) n Backpropagation Algorithm n MLP for non-linear separable classification problem n MLP for function approximation problem
Artificial Neural Network
3.2
Gp.Capt.Thanapant Raicharoen, PhD
Limitation of Perceptron (XOR Function)

No. 1. 2. 3. 4. P1 0 0 1 1 P2 0 1 0 1 Output/Target 0 1 1 0
3.3
Multilayer Feedforward Network Structure

Input node
x1 x2 x3
w w
(1) 2,1
(1) 1
Hidden node
h = layer no.
w
(1) y3
(1) 2,2
(1) y2 w(2)
1,2
(2) 1,1
Output node
(h) i
i = node i of layer h h = layer no.
o1
(2) 1,3
(1) 2,3
( wijh )
i = node i of layer h
j = node j of layer h-1
Output of each node

h h ( h ( ( yi( h ) = f ( wi(,1 ) y1( h 1) + wi(,2) y2h 1) + wi(,3) y3h 1) + L wi(,h ) ymh 1) + i( h ) ) m
where
= f ( wi(,hj) y (jh 1) + i( h ) )
j
y (0) = x j = input j and j

3.4
yi( N ) = oi = Output i Gp.Capt.Thanapant Raicharoen, PhD
Multilayer Perceptron : How it works

Function XOR
x1
0 0 1 1
(1) w1,1
x2
0 1 0 1
y
0 1 1 0
x1 x2
y1
w w
(1) 2,1 (1) 1,2 (1) w2,2 y2
(2) w1,1
o
(2) w1,2
Layer 1
(1) (1) y1 = f ( w1,1 x1 + w1,2 x2 + 1(1) ) (1) (1) y2 = f ( w2,1 x1 + w2,2 x2 + 2(1) )
Layer 2 (2) (2) o = f ( w1,1 y1 + w1,2 y2 + 1(2) ) f( ) = Activation (or Transfer) function
Artificial Neural Network 3.5 Gp.Capt.Thanapant Raicharoen, PhD
Multilayer Perceptron : How it works (cont.)

x2
(1,1) (0,1)
0 0 1 1 0 1 0 1 0 1 1 1 0 0 0 1
Outputs at layer 1
x1 x2 y1 y2
x1
(1) w1,1 (1) w2,1 (1) w1,2
y1
x2
(1) w2,2
y2
(0,0)
(1,0)
x1
(1) (1) Line L2 w2,1 x1 + w2,2 x2 + 2(1) = 0
(1) (1) Line L1 w1,1 x1 + w1,2 x2 + 1(1) = 0
3.6

Inside layer 1 x2
(0,1)
x1-x2 space
(1,1)
y2
y1-y2 space
(1,1)
Linearly separable !
(0,0) Class 0 Class 1 (1,0)
x1
(0,0)
(1,0)
y1
(1) (1) Line L2 w2,1 x1 + w2,2 x2 + 2(1) = 0 (1) (1) Line L1 w1,1 x1 + w1,2 x2 + 1(1) = 0
3.7

Inside output layer
Class 1 Class 0
y1 y2
(1,1)
(2) w1,1
o
(2) w1,2
y2
y1-y2 space
(2) (2) Line L3 w1,1 y1 + w1,2 y2 + 1(2) = 0
(0,0)
(1,0)
y1
Space y1-y2 is linearly separable. Therefore the line L3 can classify (separate) class 0 and class 1.
3.8

How hidden layers work - Try to map data in hidden layer to be a linearly separable, before transferring these data into output layer - Finally the data in hidden layer should be linearly separable. - There may be more than one hidden layer in order to map data to be linearly separable. - Generally Activation function of each layer is not necessary to be a Hard limit (Thresholding) function and not to be the same function.
How can we adjust weights?

Assume we have a function y = x1 + 2x2 And we want to use a single layer perceptron to approximate this function. w1 x1 ^ y x2 w2 output is:
y = w1 x1 + w2 x2
In this case: activation function is identity function (Linear function) f(x) = x -We need to adjust w1 and ^ 2 in order to obtain w y is close to y (or equal to)
Delta Learning Rule (Widrow-Hoff Rule)

Consider the Mean Square Error, MSE
2 = ( y y )2 = ( y w1 x1 w2 x2 )2
2
means average
is a function of w1 and w2 as see on this below graph

This graph called error surface (parabola)
MSE
w2
w1
3.11 Gp.Capt.Thanapant Raicharoen, PhD
Delta Learning Rule (Widrow-Hoff Rule)

Mean square error 2 as a function of w1 and w2
3 2.8 2.6 2.4 2.2
w2
2 1.8 1.6 1.4 1.2 1 0 0.5 1 1.5 2
The minimum point is (1,2). Because MSE = 0
w1
Therefore, w1 and w2 must be adjusted in order to reach the minimum point in this error surface
Delta Learning Rule (Widrow-Hoff Rule) w1 and w2 are adjusted to the minimum point like this:
3 2.8
Adjusted No. k Adjusted No. 3 Adjusted No. 2 Adjusted No. 1

0.5 1 1.5 2
Target
w2
2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0
w1
Initial values (w1,w2)
3.13
Gradient Descent Method

What is the direction of steepest descent? In what direction will the function decrease Most rapidly. 1. calculate gradient of error surface in the current position (w1,w2), gradient direction is steepest (Go to Hill Direction) 2. Walk to the opposite site of gradient (adjust w1,w2) 3. Go to Step 1 until reach the minimum point
3.14
Backpropagation Algorithm
2-Layer case
(2) ok = f ( wk , j y j + k(2) ) j
ok
Output layer
(2) wk , j
= f (hk(2) )
Hidden layer
y j = f ( w(1) xi + j(1) ) j ,i
i
w(1) j ,i
Input layer
= f ( h(1) ) j
( hmn ) = weighted sum of input
xi
of Node m in Layer n
Backpropagation Algorithm (cont.)

2-Layer case
2 = (ok ok ) 2 = (ok f ( wk(2)j y j + k(2) ))2 ,
k j k
(2.1) (2.2) (2.3)
= (ok f ( wk(2)j f ( w(1) xi + j(1) ) + k(2) ))2 , j ,i

k j i
The derivative of 2 with respect to w(2)k,j
2 = 2 (o k ok ) f (hk(2) ) y j (2) wk , j
The derivative of 2 with respect to (2)k
2 = 2 (o k ok ) f (hk(2) ) (2) k
3.16

2-Layer case
(2) 2 = (ok f ( wk , j y j + k(2) )) 2 k j (2) = (ok f ( wk , j f ( w(1) xi + j(1) ) + k(2) )) 2 j ,i k j i
(2.2) (2.3)
The derivative of 2 with respect to
w(1)j,i
2 (2) = 2 (o k ok ) (1) ( f ( wk , j y j + k(2) ) w(1) w j ,i k j j ,i

(2) (2) = 2 (ok ok ) f ( wk , j y j + k(2) ) wk , j f ( w(1) xi + (1) ) xi j ,i j k j j (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j k
3.17

Taking the derivative of e2 with respect to w(1)j,i in order to adjust the weight connecting the Node j of current layer (Layer 1) with Node i of Lower Layer (Layer 0)
Error from upper Node k Derivative of upper Node k Weight between upper Node k and Node j of current layer
2 (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j w(1) k j ,i

Input from lower Node i Derivative of Node j of current layer
This part is the back propagation of error to Node j at current layer

derivative of 2 with w(2)k,j
2 = 2 (o k ok ) f (hk(2) ) y j 2 wk , j
Derivative of current node Input from lower node
Error at current node
derivative of e2 with w(1)j,i
2 (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j w(1) k j ,i
3.19
Updating Weights : Gradient Descent Method

w
(n) j ,i
2 = ( n ) = (jn ) f (h (j n ) ) xi( n 1) w j ,i 2 = ( n ) = (jn ) f (h (j n ) ) j
(n) j
Updating weights and bias

w(jni) (new) = w (jni) (old ) + w(jni) , , , (j n ) (new) = j( n ) (old ) + j( n )
3.20
Adjusting Weights for a Nonlinear Function (Unit) calculation f , in case of nonlinear (function) unit
1. Sigmoid function
1 f ( x) = 1 + e 2 x
We get
f ( x ) = 2 f ( x ) (1 f ( x )) Special case of f Its easy to calculate f
2. Function tanh(x)
f ( x ) = tanh( x ) f ( x ) = (1 f ( x )2 )
We get
3.21
Backpropagation Calculation Demonstration
3.22
Example : Application of MLP for classification

Example: Run_XOR_MLP_Newff.m
% Run_XOR_MLP_Newff.m P = [0 0 1 1; 0 1 0 1]; % XOR Function T = [0 1 1 0] plotpv(P,T,[-1, 2, -1, 2]); % plot data PR = [min(P(1,:)) max(P(1,:)); min(P(2,:)) max(P(2,:))]; S1 = 2; S2 = 1; TF1 = 'logsig'; TF2 = 'logsig'; PF = 'mse'; % net = newff(PR,[S1 S2],{TF1 TF2}); % net.trainParam.epochs = 100; net.trainParam.goal = 0.001; net = train(net,P,T); %
Example : Application of MLP for classification

Example: Run_MLP_Random.m Matlab command : Create training data
x = randn([2 200]); o = (x(1,:).^2+x(2,:).^2)<1;
Input pattern x1 and x2 generated from random numbers

Class 1 Class 0
Desired output o: if (x1,x2) lies in a circle of radius 1 centered at the origin then o=1 x2 else o=0
-1
-2
-3 -3 -2 -1 0 1 2 3
3.24
x1
Example : Application of MLP for classification (cont.)
Matlab command : Create a 2-layer network

PR = [min(x(1,:)) max(x(1,:)); min(x(2,:)) max(x(2,:))]; S1 = 10; S2 = 1;
Range of inputs
No. of nodes in Layers 1 and 2 Activation functions of Layers 1 and 2
TF1 = 'logsig'; TF2 = 'logsig'; BTF = 'traingd';
Training function BLF = 'learngd'; Learning function PF = 'mse'; Cost function

net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);
Command for creating the network

3.25
Matlab command : Train the network

net.trainParam.epochs = 2000; net.trainParam.goal = net = train(net,x,o); y = sim(net,x); netout = y>0.5;
No. of training rounds 0.002; Maximum desired error Training command Compute network outputs (continuous) Convert to binary outputs
3.26

Network structure
x1 Input nodes x2 Hidden nodes (sigmoid)
Output node (Sigmoid)
Threshold unit (for binary output)
3.27
Class 1 Class 0
-1
-2
-3 -3 -2 -1 0 1 2 3
Initial weights of the hidden layer nodes (10 nodes) displayed as Lines w1x1+w2x2+ = 0

Training algorithm: Gradient descent method
10
0
Performance is 0.151511, Goal is 0.002
Training-Blue Goal-Black
10
-1
10
-2
10
-3
0.5
1 20000 Epochs
1.5 x 10
2
4
MSE vs training epochs 3.29

Results obtained using the Gradient descent method
Class 1 Class 0
-1
-2
-3 -3 -2 -1 0 1 2 3
Classification Error : 40/200


Training algorithm: Levenberg-Marquardt Backpropagation
10
0
Performance is 0.00172594, Goal is 0.002
Training-Blue Goal-Black
10
-1
10
-2
10
-3
4 10 Epochs
10
MSE vs training epochs (success with in only 10 epochs!)


Results obtained using the Levenberg-Marquardt Backpropagation
2 Class 1 Class 0
Unused node
-1
-2
-3 -3 -2 -1 0 1 2 3
Only 6 hidden nodes are adequate ! Classification Error : 0/200

Example : Application of MLP for classification (cont.) Summary: MLP for Classification Problem
- Each lower layer (hidden) Nodes of Neural Network create a local boundary decision.
- The
upper layer Nodes of Neural Network combine all local boundary decisions to a global boundary decision.
3.33
Example: Application of MLP for function approximation

Example: Run_MLP_SinFunction.m Matlab command : Create a 2-layer network
PR = [min(x) max(x)]
Range of inputs
S1 = 6; S2 = 1;
TF1 = 'logsig'; TF2 = 'purelin'; BTF = 'trainlm';


3.34

Network structure
Output node (Linear) x y
Input nodes
Hidden nodes (sigmoid)
3.35

Example: Run_MLP_SinFunction.m
% Run_MLP_SinFunction.m p=0:0.25:5; t = sin(p); figure; plot(p,t,'+b'); axis([-0.5 5.5 -1.5 1.5 ]); % net = newff([0 10],[6,1],{'logsig','purelin'},'trainlm'); % net.trainParam.epochs = 50; net.trainParam.goal = 0.01; net = train(net,p,t); % a = sim(net,p); hold on; plot(p,a,'.r'); %

Range of inputs
S1 = 3; S2 = 1;
TF1 = 'logsig'; TF2 = 'purelin'; BTF = 'trainlm';

Artificial Neural Network 3.37


1.8 1.6 1.4 1.2 Output y 1 0.8 0.6 0.4 0.2 0
0.5
1.5
2 Input x
2.5
3.5
Function to be approximated
x = 0:0.01:4; y = (sin(2*pi*x)+1).*exp(-x.^2);

Network structure
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Desired output Network output
No. of hidden nodes is too small !
0.5
1.5
2.5
3.5
Function approximated using the network


Range of inputs
S1 = 5; S2 = 1;
TF1 = 'radbas'; TF2 = 'purelin'; BTF = 'trainlm';

Artificial Neural Network 3.40

2 Desired output Network output 1.5
0.5
-0.5
0.5
1.5
2.5
3.5
Function approximated using the network

Example: Application of MLP for function approximation Summary: MLP for Function Approximation Problem
- Each lower layer (hidden) nodes of Neural Network create a local (short) approximated function.
- The
upper layer Nodes of Neural Network combine all local approximated function to global approximated function cover all input range.
3.42
Summary
Backpropagation can train multilayer feed-forward networks with differentiable transfer functions to perform function approximation, pattern association, and pattern classification. The term backpropagation refers to the process by which derivatives of network error, with respect to network weights and biases, can be computed. The number of inputs and outputs to the network are constrained by the problem. However, the number of layers between network inputs and the output layer and the sizes of the layers are up to the designer. The two-layer sigmoid/linear network can represent any functional relationship between inputs and outputs if the sigmoid layer has enough neurons.
3.43
Programming in MATLAB Exercise

n Exercise:
1. Write MATLAB to solve the question 1 in Exercise 4. 2. Write MATLAB to solve the question 2 in Exercise 4.
3.44

Matlab S3 MLP

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Matlab S3 MLP

Încărcat de

Drepturi de autor:

Formate disponibile

Programming in MATLAB

Chapter 3: Multi Layer Perceptron Gp.Capt.Thanapant Raicharoen, PhD

Artificial Neural Network

Gp.Capt.Thanapant Raicharoen, PhD

Limitation of Perceptron (XOR Function)

Artificial Neural Network

Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Feedforward Network Structure

i = node i of layer h h = layer no.

j = node j of layer h-1

Output of each node

Artificial Neural Network

y (0) = x j = input j and j

yi( N ) = oi = Output i Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Perceptron : How it works

(1) 2,1 (1) 1,2 (1) w2,2 y2

Multilayer Perceptron : How it works (cont.)

(1) w1,1 (1) w2,1 (1) w1,2

(1) (1) Line L1 w1,1 x1 + w1,2 x2 + 1(1) = 0

Artificial Neural Network

Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Perceptron : How it works (cont.)

Artificial Neural Network

Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Perceptron : How it works (cont.)

(2) (2) Line L3 w1,1 y1 + w1,2 y2 + 1(2) = 0

Artificial Neural Network

Gp.Capt.Thanapant Raicharoen, PhD

Multilayer Perceptron : How it works (cont.)

How can we adjust weights?

Delta Learning Rule (Widrow-Hoff Rule)

is a function of w1 and w2 as see on this below graph

Delta Learning Rule (Widrow-Hoff Rule)

2 1.8 1.6 1.4 1.2 1 0 0.5 1 1.5 2

The minimum point is (1,2). Because MSE = 0

Adjusted No. k Adjusted No. 3 Adjusted No. 2 Adjusted No. 1

2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0

Initial values (w1,w2)

Artificial Neural Network

Gp.Capt.Thanapant Raicharoen, PhD

Gradient Descent Method

Artificial Neural Network

Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Algorithm (cont.)

(2.1) (2.2) (2.3)

= (ok f ( wk(2)j f ( w(1) xi + j(1) ) + k(2) ))2 , j ,i

The derivative of 2 with respect to w(2)k,j

Artificial Neural Network

Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Algorithm (cont.)

The derivative of 2 with respect to

2 (2) = 2 (o k ok ) (1) ( f ( wk , j y j + k(2) ) w(1) w j ,i k j j ,i

Artificial Neural Network

Gp.Capt.Thanapant Raicharoen, PhD

Backpropagation Algorithm (cont.)

2 (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j w(1) k j ,i

This part is the back propagation of error to Node j at current layer

Backpropagation Algorithm (cont.)

derivative of 2 with w(2)k,j

Error at current node

derivative of e2 with w(1)j,i

2 (2) = 2 (ok ok ) f (hk(2) ) wk , j f (h (1) ) xi j w(1) k j ,i