Documente Academic
Documente Profesional
Documente Cultură
In backpropagation with momentum, the weight change is in a direction that is a combination of the current gradient and the previous gradient. Convergence is sometimes faster if a momentum term is added to the weight update formulas. In order to use this strategy, weights from one or more previous training patterns must be saved. In the simplest form, the new weights for training step t +1 are based on the weights at training steps t and t 1 . The weight update formulas for backpropagation with momentum are,
w jk (t +1) = w jk (t ) + k z j + [ w jk (t ) w jk (t 1) ]
or and or
where the momentum parameter is constrained to be in the range from 0 to 1, exclusive of the end points.
In some cases it is advantageous to accumulate the weight corrections terms for several patterns (or even an entire epoch if there are not too many patterns) and then make a single weight adjustment (equal to the average)
for each weight, rather than updating the weights after each pattern is presented. This procedure has a smoothing effect on the correction term. In some cases, the smoothing may increase the changes of convergence.
with
f ( x ) = f ( x )[1 f ( x )] can be modified to cover any desired range, to be centered at any desired value of x , and to have any desired slope at its center. The binary sigmoid can have its range expanded and shifted so that it maps the real numbers into the interval [a, b] for any a and . To do this, we define the parameters:
= b a = a
g ( x ) = f ( x )
has the desired property, i.e. its range is [a, b]. Furthermore, its derivative also can be expressed in terms of the function value as
g ( x ) = 1
[ + g ( x )][ g ( x )]
For example, for a problem with bipolar target output, the appropriate activation function would be
g ( x) = 2 f ( x ) 1 1 g ( x) = [1 + g ( x )][ 1 g ( x )] 2
1 The steepness of the logistic sigmoid can be modified by a slope parameter s ig m a = 1 we have a more general function: 0.9 s ig m a = 2 0.8
. Thus
0.7
g ( x ) = f ( x ) =
1 + exp( x )
and
0.6
0.5
0.4
g ( x ) =
[ + g ( x )][ g ( x)]
0.3 0.2
0.1
0 -4
-3
-2
-1
= 1 and = 2
x = k y _ ink
And for the hidden unit Z j is considered as:
x = j x _ in j
Thus the activation function for an output unit depends on both weights on connections coming into the unit and the slope k for that unit. Similar is the case for the hidden units. Let us use the abbreviations
y k = f ( k y _ ink )
and
z j = f ( j z _ in j )
And for the weights to the hidden units are Similarly the updates for the slopes on the output units are
k = k y _ ink
And for the slopes on the hidden units are
j = j _ in j y
Note that the derivative is continuous at x = 0, i.e. 1 for x > 0 f ( x) = 1 + x 1 for x < 0 1 x This function can be used in place of sigmoid function in some applications.
Example
Fewer epochs of training are required for the XOR problem (with either standard bipolar or modified bipolar representation) wen we use the logarithmic activation function in place of bipolar sigmoid. The following table compares the two functions with respect to the number of epochs of they require: Problem Standard bipolar XOR Modified bipolar XOR (targets of +0.8 and 0.8) Logarithmic 144 epochs 77 epochs Bipolar Sigmoid 387 epochs 267 epochs
Radial basis functions, activation functions with a local field of response, are also used in backpropagation neural nets. The response of such a function is non negative for all values of x; the response function decreases to 0 as x c . A common example is the Gaussian function:
f ( x) = exp( x 2 ) f ( x) = x exp( x 2 ) = xf ( x) 2 2
f(x )
-2
-1
0 x