Sunteți pe pagina 1din 9

Backpropagation Learning Rule for MLFN

Initialize all weights to small random numbers.


Present all samples patterns (one at a time) and perform the
following steps for each sample pattern.
(i) For each output unit
m k ,..., 2 , 1
compute its output k
o
.
(ii) For each output unit
k
) )( 1 (
k k k k k
o t o o
(iii) For each hidden unit
h


k
k hk h h h
w o o ) 1 (
(iv) pdate each weight
ij
w
ij ij ij
w w w +
where
ij j ij
y w
1
The Mathematics of Backpropagation Rule
Notations:

p
E
is the error function for pattern
p
,

pj
t
is the error function for pattern
p
on node
j
,

pj
o
is the actual output of node
j
,

ij
w
is the weight from node
i
to node
j
.
The Activation:
We require a continuous activation function like sigmoid
function

i
pj ij pj
o w net
pj
i
pi ij j pj j
o o w f net f
,
_


) (
2
The !rror Function:
!et us define the error function
p
E
as


k
pk pk p
o t E
2
) (
2
1
"his means that the error function,
p
E
, is proportional to the
s#uare of the difference between the actual and desired output.
Weight "hanging:
$ince weight changes should be proportional to the gradient of the
error therefore we have
ij
p
p ij ij p
w
E
E w


(%.1)
#eriving Formula of !rror "hange
&' using the chain rule, we can brea( down the gradient
components as
ij
pj
pj
pj
pj
p
ij
p
w
net
net
o
o
E
w
E

(%.2)
)onsider the third factor of (%.2)

,
_

k
pk ik
ij ij
pj
o w
w w
net

k
pk
ij
ik
o
w
w
( true for feedforward net)
*
pj
o
, since

'

j k
j k
w
w
ij
ik
when , +
when , 1
(%.*)
,e now consider the middle factor of (%.2)
pj
j
pj
pj
net
net f
net
o

) (
) (
-
pj j
net f
(%.%)
.ow we turn to the first factor of (%.2) which is a little more
complicated.

,
_

j
j j
pj pj
p
o t
o o
E
2
) (
2
1
) (
pj pj
o t
(%./)
Putting values from (%.*), (%.%), and (%./) into (%.2), we have
) ( ) (
-
pj j pj pj pj
ij
p
net f o t o
w
E

(%.0)
"his is useful for the output units, because the target and output are
both available, but not for the hidden units, because their targets
are not (nown.
$o, if unit
j
is not an output unit, we can write, b' the chain rule
again, that

k pj
pk
pk
p
pj
p
o
net
net
E
o
E

k i
pi ik
pj pk
p
o w
o net
E

k
jk
pk
p
w
net
E
(%.1)
%
since

'

j i
j
o
o
pj
pi
when , +
i when , 1
Putting the values from (%.*), (%.%), and (%.1) into (%.2), we have

k
jk
pk
p
pj j pj
ij
p
w
net
E
net f o
w
E
) (
-
(%.2)
It is useful to define
pj
pj
pj
p
pj
p
pj
net
o
o
E
net
E


(%.3)
In view of (%.3), e#uation (%.0) can be written as
pj pj
ij
p
o
w
E

(%.1+)
where
) ( ) (
-
pj j pj pj pj
net f o t
.
$imilarl', e#uation (%.2) can be written as

k
jk pk pj j pj
ij
p
w net f o
w
E
) (
-
pj pj
o
(%.11)
where

k
jk pk pj j pj
w net f ) (
-
.
Putting
pj pj ij p
o w E 4
into (%.1), we have
pj pj ij p
o w
(%.12)
where
/

'

k
jk pk pj j
pj j pj pj
pj
j w net f
j net f o t
unit output an not is if , ) (
unit output an is if ), ( ) (
-
-

(%.1*)
The $igmoid Function and its #erivative
The sigmoid function is used advantageousl% as the
activation function &threshold function'
(t is defined as
net c
e
net f

1
1
) (

+

and has the range


1 ) ( + < < net f
)
k
is a positive constant
that controls the spread of the function * large values of
k

squash the function until as
k

Advantages:
&i' (t is quit like the step function) and so demonstrates
+ehavior of similar nature
&ii' (t acts as an automatic control) since for small input
signals the slope is quite steep and so the function is
changing quite rapidl%) producing a large gain For large
inputs) the slope and thus the gain is much small This
0
means that the net,ork can accept large inputs and still
remain sensitive to small changes
&iii' A ma-or reason for its use is that it has a simple
derivative and this makes the implementation of the
+ack*propagation s%stem much easier
The #erivative
.iven that the output of
jth
unit)
pj
o
is given +%
) 1 4( 1 ) (
net c
pj
e net f o

+
The derivative ,ith respect to the unit) ) (
-
net f ) is given +%
2 -
) 1 4( ) (
net c net c
e e c net f

+
)) ( 1 )( ( net f net f c
) 1 (
pj pj
o o c &/0/'
The derivative is therefore a simple function of the outputs
1utting the value from &/0/' in &/02) ,e have
1

'

k
jk pk pj pj
pj pj pj pj
pj
j w o o c
j o t o o c
unit output an not is if , ) 1 (
unit output an is if ), )( 1 (


The Algorithm for MLFN
&i' (nitiali3e ,eights and thresholds $et all ,eights and
thresholds to small random values
&ii' 1resent input and desired output
1resent input
) ,..., , , (
1 2 1 +

n p
x x x x X
and target
output
) ,..., , , (
1 2 1 +

m p
t t t t T
,here
n
is the
num+er of input neurons and
m
is the num+er of
output neurons For pattern association) p
X
and
p
T
represent the patterns to +e associated For
classification)
p
T
is set to 3ero e4cept for one element
set to 0 that corresponds to the class that
p
X
is in
&iii' "alculate actual output !ach La%er calculates
1
]
1

1
+
n
i
i i pj
x w f y
and passes that as input to the ne4t la%er The final
la%er outputs values
pj
o

&iv' Adapt ,eights $tart from the output la%er) and ,ork
+ack,ards
pj pj ij ij
o t w t w + + ) ( ) 1 (
2
) (t w
ij
represents the ,eights from node
i
to node
j
at time t )

is a gain term) and


pj

is an error
term for
p
neuron
j

For output neurons
) )( 1 (
pj pj pj pj pj
o t o o c
For output neurons


k
jk pk pj pj pj
w o o c ) 1 (
3

S-ar putea să vă placă și