Documente Academic
Documente Profesional
Documente Cultură
Neural Networks
Dimitri P. Solomatine
www.ihe.nl/hi/sol sol@ihe.nl
Feed Forward
Feedback
Self-organising
Linear
Non-linear
Hopfield Model
Boltzman Machine
Feature Maps
ART
Supervised
Unsupervised
Y = a1 X + a0
x1 a1 a0
y(t)
y = a0+a1 x1
X
x(t)
new input value x(v)
In one-dimensional case (one input x), given T vectors (data) {x(t),y(t)} t =1,T the coefficients of the equation
y = f (x) = a1 x + a0
can be found. Then for the new V vectors {x(v)}, v =1,V this equation can approximately reproduce the corresponding functions values
D.P. Solomatine. Data-driven modelling (part 2).
{y(v)}, v =1,V
estimation of the parameters given errors for each measurement are independent and normally distributed. Optimization problem has to be solved: find such a0 and a1 that E is minimal:
E = ( y ( t ) ( a0 + a1 x ( t ) ) )
T t =1 2
in a similar fashion the problem can be posed for multiple a0 regression (with many inputs) x a
1 1
y = a0+a1 x1 + a2 x2 x2 a2
X Linear regression Y = a1 X + a2
F(u) 1 0 u
There are (Ninp+1)Nhid + (Nhid+1)Nout weights (aij and bjk) to be identified by minimizing mean squared error (Y(X) - f(X))2. Method used: gradient-based steepest descent method (called error backpropagation)
D.P. Solomatine. Data-driven modelling (part 2). 7
(OBS
ANN i )
Training of ANN is in solving a (multi-extremum) optimization problem: Find such values of weights that bring E to a minimum Problem of backpropagation algorithm - it assumes singleextremality
D.P. Solomatine. Data-driven modelling (part 2). 8
Biological motivation
signals are transmitted between neurons by electrical pulses the neuron sums up the effects of thousands impulses if the integrated potential exceeds a threshold, the cell fires generates an impulse that travels across the axon further
Dendrites
Cells bodies
D.P. Solomatine. Data-driven modelling (part 2). 10
Hidden node
a0 x1 a1 u = a0 + a1 x1 + a2 x2 x2 a2 u y = g (u) y
xi , i = 1 , ... , N inp
Output of the j-th node of the hidden layer is
y j = g ( a 0 j + ai j xi ) , j = 1, ..., N hid
i=1
D.P. Solomatine. Data-driven modelling (part 2). 11
N inp
Output node
b0
y1
b1
v z = g (v) z
v = b0 + b1 y1 + b2 y2 y2 b2
inputs are the outputs of hidden nodes y1 ... yNhid outputs are:
z k = g ( b0 k + b j k y j ) , k = 1, ..., N out
j=1 N hid
12
Transfer function g
the transfer function is usually non-linear, bounded and differentiable. Widely used is the logistic function:
g (u) =
1 1 + e- u
Logistic function
Output va lue
1.2 1 0.8 0.6 0.4 0.2 0 -10 -8 -6 -4 -2 -0.2 0 2 4 6 8 10
Slope = /4
Input va lue
13
Derivative of g
g (u ) (1 + e u ) 1 = u u = (1)(1 + e u ) 2 e u ( ) = e u (1 + e u ) 2 = e u g 2 (u )
Note that Then
e u =
1 g (u ) g (u )
g (u ) 1 g (u ) 2 = g (u ) u g (u ) = (1 g (u )) g (u )
14
15
a) 1 hidden node
b) 2 hidden nodes
c) 3 hidden nodes
D.P. Solomatine. Data-driven modelling (part 2).
d) 4 hidden nodes
17
If Nout functions, each with Ninp independent (input) variables are given, and T instances (vectors)
v = 1, ... ,V
v = 1, ... ,V
18
for output k the error for the input pattern t is: Ek(t) = (fk(t) zk(t))2 total for all outputs for input pattern t the error is: 1 ( E (t ) = ( f k(t ) z kt ) ) 2 2 k Total error is the summation of the errors for all output nodes for all T instances: 1 ( Etot = ( f k(t ) z kt ) ) 2 2 t k min 1 = [ f k(t ) g out (b0 k + b jk y j )]2 2 t k j
= 1 2 t
[ f
k
(t ) k
19
20
10
21
22
11
23
24
12
25
to ensure that the network is not saturated by large values of weights. 2 Select an instance t, that is the vector {xk(t)}, i = 1,...,Ninp (a pair of input and output patterns), from the training set. 3 Apply the network input vector to network input. 4 Calculate the network output vector {zk(t)}, k = 1,...,Nout. 5 Calculate the errors for each of the outputs k , k=1,...,Nout, the difference between the desired output and the network output:
( E = Ek( t ) = ( f k( t ) zkt ) ) 2
Randomize the weights {ws} (denoted above as matrices a and b) to small random values (both positive and negative) 1
...
D.P. Solomatine. Data-driven modelling (part 2). 26
13
... 6 Calculate the necessary updates for weights ws in a way that minimizes this error (discussed below). 7 Adjust the weights of the network by ws. 8 Repeat steps 2 6 for each instance (pair of inputoutput vectors) in the training set until the error for the entire system (error E defined above or the error on cross-validation set) is acceptably low, or the pre-defined number of iterations is reached.
27
16 as above 7 add up the calculated weights updates {ws} to the accumulated total updates {Ws}. 8 Repeat steps 2 7 for several instances comprising an epoch (could be the whole set). 9 Adjust the weights {ws} of the network by the updates {Ws}. 10 Repeat steps 2 9 until all instances in the training set are processed. This constitutes one iteration. 11 Repeat the iteration of steps 2 10 until the error for the entire system (error E defined above or the error on crossvalidation set) is acceptably low, or the pre-defined number of iterations is reached.
D.P. Solomatine. Data-driven modelling (part 2). 28
14
optimization is done by the steepest descent algorithm steps are made in the space of variables (weights w) in the direction opposite to the direction of the gradient of the function E w (N+1) = w (N ) E (w (N )) in individual weights changes will be:
ws ( N + 1) = ws ( N )
and the update step for weight s is:
E ws
ws = ws ( N )
ws =
D.P. Solomatine. Data-driven modelling (part 2).
(this is the delta rule of Widrow and Hoff (1960) for a single linear perceptron)
E ws
29
output node k vk zk
gkout(v)
30
15
b jk =
E b jk
b jk = k y j
where
k 2( f k z k )
g k (v) v
31
hidden nodes do not have explicit values of an error. Such errors are propagating from each of the nodes of the output layer to each of the nodes in the hidden layer
E (t ) =
1 ( f k(t ) zk(t ) ) 2 2 k
32
16
aij = xi
g j (u ) u
b
k k
jk
33
Momentum
ws ( N + 1) =
E + ws ( N ) ws
ws ( N + 1) = (1 ) S ( N )
E + ws ( N ) ws
where s(N) = the learning rate which is updated according to the following rule: E s ( N ) = s ( N 1) + , if RAED( N 1) > 0
ws = s ( N 1) , otherwise
Here RAED is the recent "average" of that derivative that is recursively calculated: E RAED ( N ) = (1 ) + RAED ( N 1) ws
D.P. Solomatine. Data-driven modelling (part 2).
34
17
35
N hid N inp N out number of hidden nodes choice of the activation functions remove part of connections (optimal brain damage)
36
18
37
linear regression splines: using cubic functions that would pass through the points and the boundaries 1st and 2nd their derivatives would be equal orthogonal functions (Chebyshev polynomials) combining simple kernel functions
38
19
use simple functions F(x) that approximate the given function in the proximity to some representative locations these F(x) depend only on the distance from these centers and drop to zero as the distance from the centers increase
Centers:
J
39
function z = f (x), where x is a vector {x1... xI} in Idimensional space centers wj j=1...J are selected f (x) is approximated by
z ( x) = F (| x w j |; b j )
j =1 J
where |x wj| is distance (eg., Euclidean) bj are coefficients associated with the j-th center wj.
Centers:
40
20
z (x) = b j F (| x w j | )
It is common to choose Gaussian function for F: f (r ) = exp (r 2 / 2 2) ( is analogous to the standard deviation in a Gaussian normal distribution) Distance |x wj| is usually understood in Euclidean sense and denoted as j : I
j =1
j =
(x w )
i =1 i ij
z ( x) = b j exp( 2 / 2 2 ) j j
D.P. Solomatine. Data-driven modelling (part 2). 41
z (x) = b j exp( | x w j |2 / 2 2 ) j
j =1
42
21
xi
wij
yj bjk
xi
Gaussian functions Linear functions
zk
43
Choose randomly J instances xj and use them as the positions of the centers {wj} All other instances are assigned to a class j of the closest center wj, and the locations of each center are calculated again using eg. k-nearest neighbor method. The above steps are repeated until the locations of the centers stop changing.
44
22
... 3. Weights {bjk} for the output layer are calculated by solving a multiple linear regression problem, which is formulated as the system of linear equations. The output from the output node J can be expressed as b jk y j j =1 zk = J yj
j =1
where bjk the weight on the connection from the hidden node j to the output node k, yj - the output from the hidden node j 4. If the total error is more than the desired limit, change the number of the hidden units repeat all the steps
D.P. Solomatine. Data-driven modelling (part 2). 45
18 centers found
D.P. Solomatine. Data-driven modelling (part 2). 46
23
Example: using RBF network to reproduce behaviour of a 1-D modelling system (SOBEK)
Input file: 7 inputs (prev. rainfalls, flows), 1 output (flow), 1303 examples
21 center found
47
48
24
49
developed to deal with the time varying or time-lagged pattern are usable for the problems where the dynamics of the considered process is complex and the measured data is noisy Examples: Hopfield networks, Regressive networks, JordanElman networks, and Brain-State-In-A-Box (BSB) networks
Context units
outputs inputs
50
25
Hopfield network
belongs to a class of devices with the autoassociative memory: they store a set of patterns in such a way that when a new pattern is presented, the network responds by producing whichever one of the stored patterns most closely resembles this new one Hopfield network has feedback from each node to each other node but not to itself Each node computes the weighted sum of inputs and outputs and if it exceeds a fixed threshold, it generates 1, otherwise -1 The network stores N-dimensional vectors comprising symbols 1, and these vectors are used as generalizations over possible patterns that are presented to the network.
51
Some references
Solomatine D.P., Torres L.A. (1996) Neural network approximation of a hydrodynamic model in optimizing reservoir operation - Proc. 2nd Intern. Conference on Hydroinformatics, Zurich, September 9-13, pp. 201-206. Kuo-lin Hsu, H.V. Gupta, S. Sorooshian (1995). Artificial neural network modelling of the rainfall-runoff process. Water Resources Res., vol.31, No. 10, pp. 2517-2530. N. Gong, T. Denoeux, J.-L. Bertrand-Krajewski (1996). Neural networks for solid transport modelling in sewer systems during storm events. Water Sci. Tech., vol. 33, No. 9, pp. 85-92. A.W. Minns, M.J. Hall (1996). Artificial neural networks as rainfall-runoff models. Hydrological Sci. J., vol 41, No. 3, pp. 399-417. Y. Shen, D.P. Solomatine, H. van den Boogaard (1998). Improving performance of chlorophyl concentration time series simulation with artificial neural networks. Annual Journal of Hydraulic Engineering, JSCE, vol. 42, February, pp. 751-756. C.W. Dawson, R. Wilby (1998). An artificial neural network approach to rainfall-runoff modelling. Hydrological Sci. J., vol 43, No. 1, pp. 47-66. Dibike Y., Solomatine D.P., Abbott M.B. On the encapsulation of numerical-hydraulic models in artificial neural network. Journal of Hydraulic Research, No. 2, 1999. Lobbrecht A.H., Solomatine D.P. Control of water levels in polder areas using neural networks and fuzzy adaptive systems. In: Water Industry Systems: Modelling and Optimization Applications, D. Savic, G. Walters (eds.). Research Studies Press Ltd., 1999, pp. 509-518.
D.P. Solomatine. Data-driven modelling (part 2). 52
26
End of Part 2
53
27