Sunteți pe pagina 1din 61

CHAPTER 1

ARTIFICIAL NEURAL NETWORKS


INTRODUCTION

In machine learning and cognitive science, artificial neural networks (ANNs) are a family
of models inspired by biological neural networks (the central nervous systems of animals, in
particular the brain) which are used to estimate or approximate functions that can depend on a
large number of inputs and are generally unknown. Artificial neural networks are generally
presented as systems of interconnected "neurons" which exchange messages between each other.
The connections have numeric weights that can be tuned based on experience, making neural
nets adaptive to inputs and capable of learning. For example, a neural network for handwriting
recognition is defined by a set of input neurons which may be activated by the pixels of an input
image. After being weighted and transformed by a function (determined by the network's
designer), the activations of these neurons are then passed on to other neurons. This process is
repeated until finally, the output neuron that determines which character was read is activated.

Examinations of humans' central nervous systems inspired the concept of artificial neural
networks. In an artificial neural network, simple artificial nodes, known as "neurons",
"neurodes", "processing elements" or "units", are connected together to form a network which
mimics a biological neural network.

There is no single formal definition of what an artificial neural network is. However, a
class of statistical models may commonly be called "neural" if it possesses the following
characteristics:

1. contains sets of adaptive weights, i.e. numerical parameters that are tuned by a learning
algorithm, and
2. is capable of approximating nonlinear functions of their inputs.

The adaptive weights can be thought of as connection strengths between neurons, which
are activated during training and prediction. Artificial neural networks are similar to biological
neural networks in the performing by its units of functions collectively and in parallel, rather
than by a clear delineation of subtasks to which individual units are assigned. The term "neural
network" usually refers to models employed in statistics, cognitive psychology and artificial
intelligence. Neural network models which command the central nervous system and the rest of
the brain are part of theoretical neuroscience and computational neuroscience.

1
BENEFITS OF NEURAL NETWORKS

A neural network derives its computing power through, first, its massively parallel
distributed structure and, second, its ability to learn and, therefore, generalize. Generalization
refers to the neural network producing reasonable outputs for inputs not encountered during
training (learning). These two information processing capabilities make it possible for neural
networks to solve complex (large-scale) problems that are currently intractable. In practice,
however, neural networks cannot provide the solution working by themselves alone. Rather, they
need to be integrated into a consistent system engineering approach. Specifically, a complex
problem of interest is decomposed into a number of relatively simple tasks, and neural networks
are assigned a subset of the tasks (e.g. pattern recognition, associative memory, control) that
match their inherent capabilities. It is important to recognize, however, that we have a long way
to go (if ever) before we can build a computer architecture that mimics a human brain. The use
of neural networks offers the following useful properties and capabilities:

1. Nonlinearity: A neuron is basically a nonlinear device. Consequently, a neural


network,made up of an interconnection of neurons, is itself nonlinear. Moreover, the
nonlinearityis of a special kind in the sense that it is distributed throughout the network.

2. Input-output mapping: A popular paradigm of learning called supervised


learninginvolves the modification of the synaptic weights of a neural network by
applying a set oftraining samples. Each sample consists of a unique input signal and the
correspondingdesired response. The network is presented a sample picked at random
from the set, andthe synaptic weights (free parameters) of the network are modified so as
to minimize thedifference between the desired response and the actual response of the
network producedby the input signal in accordance with an appropriate criterion. The
training of thenetwork is repeated for many samples in the set until the network reaches a
steady state,where there are no further significant changes in the synaptic weights. The
previouslyapplied training samples may be reapplied during the training session, usually
in adifferent order. Thus the network learns from the samples by constructing an input-
outputmapping for the problem at hand.

3. Adaptivity: Neural networks have a built-in capability to adapt their synaptic weights
tochanges in the surrounding environment. In particular, a neural network trained to
operatein a specific environment can be easily retrained to deal with minor changes in
theoperating environmental conditions. Moreover, when it is operating in a non-
stationaryenvironment a neural network can be designed to change its synaptic weights in
real time.The natural architecture of a neural network for pattern classification, signal
processing,and control applications, coupled with the adaptive capability of the network,

2
makes it anideal tool for use in adaptive pattern classification, adaptive signal processing,
andadaptive control.

4. Contextual information: Knowledge is represented by the very structure and


activationstate of a neural network. Every neuron in the network is potentially affected by
theglobal activity of all other neurons in the network. Consequently, contextual
informationis dealt with naturally by a neural network.

5. Fault tolerance:A neural network, implemented in hardware form, has the potential to
beinherently fault tolerant in the sense that its performance is degraded gracefully
underadverse operating. For example, if a neuron or its connecting links are damaged,
recall ofa stored pattern is impaired in quality. However, owing to the distributed nature
ofinformation in the network, the damage has to be extensive before the overall response
ofthe network is degraded seriously. Thus, in principle, a neural network exhibits a
gracefuldegradation in performance rather than catastrophic failure.

6. VLSI implementability: The massively parallel nature of a neural network makes


itpotentially fast for the computation of certain tasks. This same feature makes a
neuralnetwork ideally suited for implementation using very-large-scale-integrated
(VLS1)technology.

7. Uniformity of analysis and design: Basically, neural networks enjoy universality


asinformation processors. We say this in the sense that the same notation is used in all
thedomains involving the application of neural networks. This feature manifests itself
indifferent ways:
a) Neurons, in one form or another, represent an ingredient common to all
neuralnetworks.
b) This commonality makes it possible to share theories and learning algorithms
indifferent applications of neural networks.
c) Modular networks can be built through a seamless integration of modules.

8. Neurobiological analogy: The design of a neural network is motivated by analogy


withthe brain, which is a living proof that fault-tolerant parallel processing is not
onlyphysically possible but also fast and powerful. Neurobiologists look to (artificial)
neuralnetworks as a research tool for the interpretation of neurobiological phenomena.
On theother hand, engineers look to neurobiology for new ideas to solve problems
morecomplex than those based on conventional hard-wired design techniques.
Theneurobiological analogy is also useful in another important way: It provides a hope
andbelief that physical understanding of neurobiological structures could influence the art
ofelectronics and thus VLSI.

3
CRITICISM

1. TRAINING ISSUES

A common criticism of neural networks, particularly in robotics, is that they require a


large diversity of training for real world operation. This is not surprising, since any learning
machine needs sufficient representative examples in order to capture the underlying structure that
allows it to generalize to new cases. Dean A. Pomerleau, in his research presented in the paper
"KnowledgebasedTraining of Artificial NeuralNetworks for Autonomous Robot Driving," uses a
neural network to train a robotic vehicle to drive on multiple types of roads (single lane,
multilane, dirt, etc.). A large amount of his research is devoted to

(1) extrapolating multiple training scenarios from a single training experience, and
(2) preserving past training diversity so that the system does not become overtrained (if, for
example, it is presented with a series of right turns – it should not learn to always turn right).

These issues are common in neural networks that must decide from amongst a wide
variety of responses, but can be dealt with in several ways, for example by randomly shuffling
the training examples, by using a numerical optimization algorithm that does not take too large
steps when changing the network connections following an example, or by grouping examples in
socalledminibatches. A. K. Dewdney, a mathematician and computer scientist at University of
Western Ontario and former Scientific American columnist, wrote in 1997, "Although neural nets
do solve a few toy problems, their powers of computation are so limited that I am surprised
anyone takes them seriously as a general problem solving tool."

2. HARDWARE ISSUES

To implement large and effective software neural networks, considerable processing and
storage resources need to be committed. While the brain has hardware tailored to the task of
processing signals through a graph of neurons, simulating even a most simplified form on von
Neumann architecture may compel a neural network designer to fill many millions of database
rows for its connections – which can consume vast amounts of computer memory and hard disk
space. Furthermore, the designer of neural network systems will often need to simulate the
transmission of signals through many of these connections and their associated neurons – which
must often be matched with incredible amounts of CPU processing power and time. Jürgen
Schmidhuber notes that the resurgence of neural networks in the 21STcentury, and their renewed
success at image recognition tasks is largely attributable to advances in hardware: from 1991 to
2015, computing power, especially as delivered by GPGPUs (on GPUs), has increased around a
millionfold, making the standard backpropagation algorithm feasible for training networks that
are several layers deeper than before (but adds that this doesn't overcome algorithmic problems

4
such as vanishing gradients "in a fundamental way"). The use of GPUs instead of ordinary CPUs
can bring training times for some networks down from months to mere days.

3. MODELING ISSUES IN ARTIFICAL NEURAL NETWORKS

In order to improve performance, ANN models need to be developed in a


systematicmanner. Such an approach needs to address major factors such as the determination of
adequate model inputs, data division and pre-processing, the choice of suitable network
architecture, careful selection of some internal parameters that control the optimization method,
the stopping criteria and model validation. These factors are explained and discussed below.

Determination of Model Inputs

An important step in developing ANN models is to select the model input variables
thathave the most significant impact on model performance (Faraway and Chatfield 1998). A
good subset of input variables can substantially improve model performance. Presenting as large
a number of input variables as possible to ANN models usually increases network size (Maier
and Dandy 2000), resulting in a decrease in processing speed and a reduction in the efficiency of
the network (Lachtermacher and Fuller 1994). A number of techniques have been suggested in
the literature to assist with the selection of input variables. An approach that can be utilized is
that appropriate input variables can be selected in advance based on a priori knowledge (Maier
and Dandy 2000). Another approach used by some researchers (Goh 1994b; Najjar et al. 1996b;
Ural and Saka 1998) is to train many neural networks with different combinations of input
variables and to select the network that has the best performance. A step-wise technique
described by Maier and Dandy (2000) can also be used in which separate networks are trained,
each using only one of the available variables as model inputs. The network that performs the
best is then retained, combining the variable that results in the best performance with each of the
remaining variables. This process is repeated for an increasing number of input variables, until
the addition of additional variables results in no improvement in model performance. Another
useful approach is to employ a genetic algorithm to search for the best sets of input variables
(NeuralWare 1997). For each possible set of input variables chosen by the genetic algorithm, a
neural network is trained and used to rank different subsets of possible inputs. A set of input
variables derives its fitness from the model error obtained based on those variables. The adaptive
spline modeling of observation data (ASMOD) algorithm proposed by Kavli (1993) is also a
useful technique that can be used for developing parsimonious neurofuzzy networks by
automatically selecting a combination of model input variables that have the most significant
impact on the outputs.

5
A potential shortcoming of the above approaches is that they are model-based. In
otherwords, the determination as to whether a parameter input is significant or not is dependent
on the error of a trained model, which is not only a function of the inputs, but also model
structure and calibration. This can potentially obscure the impact of different model inputs. In
order to overcome this limitation, model-free approaches can be utilized, which use linear
dependence measures, such as correlation, or non-linear measures of dependence, such as mutual
information, to obtain the significant model inputs prior to developing the ANN models (e.g.
Bowden et al.2005; May et al. 2008).

Division of Data

Supervised ANNs are similar to conventional statistical models inthe sense that model
parameters (e.g. connection weights) are adjusted in the model calibration phase (training) so as
to minimize the error between model outputs and the corresponding measured values for a
particular data set (the training set). ANNs perform best when they do not extrapolate beyond the
range of the data used for calibration (Flood and Kartam 1994; Minns and Hall 1996; Tokar and
Johnson 1999). Therefore, the purpose of ANNs is to non-linearly interpolate (generalize) in
high-dimensional space between the data used for calibration. Unlike conventional statistical
models, ANN models generally have a large number of model parameters (connection weights)
and can therefore overfit the training data, especially if the training data are noisy. In other
words, if the number of degrees of freedom of the model is large compared with the number of
data points used for calibration, the model might no longer fit the general trend, as desired, but
might learn the idiosyncrasies of the particular data points used for calibration leading to
„memorization‟, rather than „generalization‟. Consequently, a separate validation set is needed
to ensure that the model can generalize within the range of the data used for calibration. It is
common practice to divide the available data into two subsets; a training set, to construct the
neural network model, and an independent validation set to estimate the model performance in a
deployed environment (Maier and Dandy 2000; Twomey and Smith 1997). Usually, two-thirds
of the data are suggested for model training and one-third for validation (Hammerstrom 1993). A
modification of the above data division method is cross-validation (Stone 1974) in which the
data are divided into three sets: training, testing and validation. The training set is used to adjust
the connection weights, whereas the testing set is used to check the performance of the model at
various stages of training and to determine when to stop training to avoid over-fitting. The
validation set is used to estimate the performance of the trained network in the deployed
environment.

In many situations, the available data are small enough to be solely devoted to
modeltraining and collecting any more data for validation is difficult. In this situation, the leave-
k-outmethod can be used (Masters 1993) which involves holding back a small fraction of the
data for validation and using the rest of the data for training. After training, the performance of

6
the trained network has to be estimated with the aid of the validation set. A different small subset
of data is held back and the network is trained and tested again. This process is repeated many
times with different subsets until an optimal model can be obtained from the use of all of the
available data. The recent studies have found that the way the data aredivided can have a
significant impact on the results obtained (Tokar and Johnson 1999). AsANNs have difficulty
extrapolating beyond the range of the data used for calibration, in order todevelop the best ANN
model, given the available data, all of the patterns that are contained in thedata need to be
included in the calibration set. For example, if the available data contain extremedata points that
were excluded from the calibration data set, the model cannot be expected toperform well, as the
validation data will test the model‟s extrapolation ability, and not itsinterpolation ability. If all of
the patterns that are contained in the available data are contained inthe calibration set, the
toughest evaluation of the generalization ability of the model is if all thepatterns (and not just a
subset) are contained in the validation data. In addition, if cross-validationis used as the stopping
criterion, the results obtained using the testing set have to berepresentative of those obtained
using the training set, as the testing set is used to decide when tostop training or, for example,
which model architecture or learning rate is optimal. Consequently,the statistical properties (e.g.
mean and standard deviation) of the various data subsets (e.g.training, testing and validation)
need to be similar to ensure that each subset represents the samestatistical population (Masters
1993). If this is not the case, it may be difficult to judge thevalidity of ANN models (Maier and
Dandy 2000).This fact has been recognized for some time (ASCE 2000; Maier and Dandy 2000;
Masters 1993), and several studies have used ad-hoc methods to ensure that the data used
forcalibration and validation have the same statistical properties (Braddock et al. 1998;
Campoloet al. 1999; Ray and Klindworth 2000; Tokar and Johnson 1999). Masters (1993)
strongly confirmsthe above strategy of data division as he says “if our training set is not
representative of the dataon which the network will be tested, we will be wasting our time”.
However, it was not until a fewyears ago that systematic approaches for data division have been
proposed in the literature.
Bowden et al. (2002) used a genetic algorithm to minimize the difference between the means
andstandard deviations of the data in the training, testing and validation sets. While this
approachensures that the statistical properties of the various data subsets are similar, there is still
a need tochoose which proportion of the data to use for training, testing and validation.
Kocjancic andZupan (2000) and Bowden et al. (2002) used a self-organizing map (SOM) to
cluster highdimensionalinput and output data in two-dimensional space and divided the available
data so thatvalues from each cluster are represented in the various data subsets. This ensures that
data in thedifferent subsets are representative of each other and has the additional advantage that
there is noneed to decide what percentage of the data to use for training, testing and validation.
The majorshortcoming of this approach is that there are no guidelines for determining the
optimum size andshape of the SOM (Cai et al. 1994; Giraudel and Lek 2001). This has the
potential to have asignificant impact on the results obtained, as the underlying assumption of the
approach is that thedata points in one cluster provide the same information in high-dimensional

7
space. However, ifthe SOM is too small, there may be significant intra-cluster variation.
Conversely, if the map istoo large, too many clusters may contain single data points, making it
difficult to chooserepresentative subsets. To overcome the problem of determining the optimum
size of clustersassociated with using SOMs, Shahin et al. (2004b) have introduced a data division
approach thatutilizes a fuzzy clustering technique so that data division can be carried out in a
systematicmanner.

Data Pre-processing

Once the available data have been divided into their subsets (i.e. training, testing
andvalidation), it is important to pre-process the data in a suitable form before they are applied to
theANN. Data pre-processing is necessary to ensure all variables receive equal attention during
thetraining process (Maier and Dandy 2000). Moreover, pre-processing usually speeds up
thelearning process. Pre-processing can be in the form of data scaling, normalization
andtransformation (Masters 1993). Scaling the output data is essential, as they have to
becommensurate with the limits of the transfer functions used in the output layer (e.g. between –
1.0to 1.0 for the tanh transfer function and 0.0 to 1.0 for the sigmoid transfer function). Scaling
theinput data is not necessary but it is almost always recommended (Masters 1993). In some
cases,the input data need to be normally distributed in order to obtain optimal results (Fortin et
al.
1997). However, Burke and Ignizio (1992) stated that the probability distribution of the input
datadoes not have to be known. Transforming the input data into some known forms (e.g. linear,
log,exponential, etc.) may be helpful to improve ANN performance. However, empirical trials
(Faraway and Chatfield 1998) showed that the model fits were the same, regardless of
whetherraw or transformed data were used.

Determination of Model Architecture

Determining the network architecture is one of the most important and difficult tasks
inANN model development. It requires the selection of the optimum number of layers and
thenumber of nodes in each of these. There is no unified approach for determination of an
optimalANN architecture. It is generally achieved by fixing the number of layers and choosing
thenumber of nodes in each layer. For MLPs, there are always two layers representing the input
andoutput variables in any neural network. It has been shown that one hidden layer is sufficient
toapproximate any continuous function provided that sufficient connection weights are
given(Cybenko 1989; Hornik et al. 1989). Hecht-Nielsen (1989) provided a proof that a single
hiddenlayer of neurons, operating a sigmoidal activation function, is sufficient to model any
solutionsurface of practical interest. To the contrary, Flood (1991) stated that there are many
solutionsurfaces that are extremely difficult to model using a sigmoidal network using one
hidden layer.In addition, some researchers (Flood and Kartam 1994; Ripley 1996; Sarle 1994)

8
stated that theuse of more than one hidden layer provides the flexibility needed to model
complex functions inmany situations. Lapedes and Farber (1988) provided more practical proof
that two hidden layersare sufficient, and according to Chester (1990), the first hidden layer is
used to extract the localfeatures of the input patterns while the second hidden layer is useful to
extract the global featuresof the training patterns. However, Masters (1993) stated that using
more than one hidden layeroften slows the training process dramatically and increases the chance
of getting trapped in localminima.

The number of nodes in the input and output layers is restricted by the number of
modelinputs and outputs, respectively. There is no direct and precise way of determining the
bestnumber of nodes in each hidden layer. It has been shown in the literature (Maren et al. 1990;
Masters 1993; Rojas 1996) thatneural networks with a large number of free parameters
(connection weights) are more subject tooverfitting and poor generalization. Consequently,
keeping the number of hidden nodes to aminimum, provided that satisfactory performance is
achieved, is always better, as it:

(a) reducesthe computational time needed for training;


(b) helps the network achieve better generalizationperformance;
(c) helps avoid the problem of overfitting and
(d) allows the trained network to beanalyzed more easily.

For single hidden layer networks, there are a number of rules-of-thumb toobtain the best
number of hidden layer nodes. One approach is to assume the number of hiddennodes to be 75%
of the number of input units (Salchenberger et al. 1992). Another approachsuggests that the
number of hidden nodes should be between the average and the sum of thenodes in the input and
output layers (Berke and Hajela 1991). A third approach is to fix an upperbound and work back
from this bound. Hecht-Nielsen (1987) and Caudill (1988) suggested thatthe upper limit of the
number of hidden nodes in a single layer network may be taken as (2I+1),where I is the number
of inputs. The best approach found by Nawari et al. (1999) was to startwith a small number of
nodes and to slightly increase the number until no significantimprovement in model performance
is achieved. Yu (1992) showed that the error surface of anetwork with one hidden layer and (I–1)
hidden nodes has no local minima. For networks withtwo hidden layers, the geometric pyramid
rule described by Nawari et al. (1999) can be used. Thenotion behind this method is that the
number of nodes in each layer follows a geometricprogression of a pyramid shape, in which the
number of nodes decreases from the input layertowards the output layer. Kudrycki (1988) found
empirically that the optimum ratio of the first tosecond hidden layer nodes is 3:1, even for high
dimensional inputs.

Another way of determining the optimal number of hidden nodes that can result in
goodmodel generalization and avoid overfitting is to relate the number of hidden nodes to the

9
numberof available training samples (Maier and Dandy 2000). Masters (1993) stated “the only
way toprevent the network from learning unique characteristics of the training set, to the
detriment oflearning universal characteristics, is to flood it with so many examples that it cannot
possiblylearn all of their idiosyncrasies”. There are a number of rules-of-thumb that have been
suggestedin the literature to relate the training samples to the number of connection weights. For
instance,Rogers and Dowla (1994) suggested that the number of weights should not exceed the
number oftraining samples. Masters (1993) stated that the required minimum ratio of the number
of trainingsamples to the number of connection weights should be 2 and, the minimum ratio of
the optimumtraining sample size to the number of connection weights should be 4. Hush and
Horne (1993)suggested that this ratio should be 10. Amari et al. (1997) demonstrated that if this
ratio is at least30, overfitting does not occur.

A number of systematic approaches have also been proposed to obtain automatically


theoptimal network architecture. The adaptive method of architecture determination, suggested
byGhaboussi and Sidarta (1998), is an example of the automatic methods for obtaining the
optimalnetwork architecture that suggests starting with an arbitrary, but small, number of nodes
in thehidden layers. During training, and as the network approaches its capacity, new nodes are
addedto the hidden layers, and new connection weights are generated. Training is
continuedimmediately after the new hidden nodes are added to allow the new connection weights
to acquirethe portion of the knowledge base which was not stored in the old connection weights.
For thisprocess to be achieved, some training is carried out with the new modified connection
weightsonly, while the old connection weights are frozen. Additional cycles of training are then
carriedout where all the connection weights are allowed to change. The above steps are repeated
andnew hidden nodes are added as needed to the end of the training process, in which the
appropriatenetwork architecture is automatically determined. Kingston et al. (2008) showed that
Bayesianapproaches can be used to determine the optimal number of hidden nodes by using
Bayes‟ factorsin conjunction with an examination of the correlation structure between
connection weights.Pruning is another automatic approach to determine the optimal number of
hidden nodes. Onesuch technique proposed by Karnin (1990) starts training a network that is
relatively large andlater reduces the size of the network by removing the unnecessary hidden
nodes. Geneticalgorithms provide evolutionary alternatives to obtain an optimal neural network
architecture thathave been used successfully in many situations (Miller et al. 1989). The adaptive
spline modelingof observation data (ASMOD) (Kavli 1993) algorithm is an automatic method
for obtaining theoptimal architecture of B-spline neurofuzzy networks, as shown
previously.Cascade-Correlation (Fahlman and Lebiere 1990) is another automatic method to
obtain the optimal architecture of ANNs. Cascade-Correlation is a constructive method that can
becharacterized by the following steps (Fahlman and Lebiere 1990). The neural network is
initiallytrained using Fahlman‟squickprop (Fahlman 1988) algorithm without hidden nodes and
withdirect connection between the input layer and the output layer. Hidden nodes are added
randomlyone or a few at a time. New hidden nodes receive connections from all previously

10
establishedhidden nodes as well as from the original inputs. At the time new hidden nodes are
added to thenetwork, their connections with the inputs are frozen and only their output
connections are trainedusing the quickprop algorithm. This process is stopped when the model
performance shows nofurther improvement. Consequently, the architecture of ANN networks
using Cascade-Correlation is that the input nodes are connected to the output nodes and the
hidden nodes areconnected to the input and output nodes as well as other previously established
hidden nodes. Theconstructive nature of the Cascade-Correlation method means that the way in
which the hiddennodes are connected results in the addition of a new single-node layer to the
network each time anew node is added. This is designed to result in the smallest network that can
adequately map thedesign input-output relationship, which has a number of advantages,
including improvedgeneralization ability (Castellano et al. 1997) and higher processing speed
(Bebis andGeorgiopoulos 1994). It should be noted that Masters (1993) has argued that the
automaticapproaches for obtaining optimal network architectures can be easily abused, as they
do notdirectly address the problem of overfitting.

Model Optimization (Training)

As mentioned previously, the process of optimizing the connection weights is known as


“training” or “learning”. This is equivalent to the parameter estimation phase in
conventionalstatistical models (Maier and Dandy 2000). The aim is to find a global solution to
what istypically a highly non-linear optimization problem (White 1989). The method most
commonlyused for finding the optimum weight combination of feed-forward MLP neural
networks is theback-propagation algorithm (Rumelhart et al. 1986) which is based on first-order
gradientdescent. The use of global optimization methods, such as simulated annealing and
geneticalgorithms, have also been proposed (Hassoun 1995). The advantage of these methods is
that theyhave the ability to escape local minima in the error surface and, thus, produce optimal or
nearoptimal solutions. However, they also have a slow convergence rate. Ultimately, the
modelperformance criteria, which are problem specific, will dictate which training algorithm is
mostappropriate. If training speed is not a major concern, there is no reason why the back-
propagationalgorithm cannot be used successfully (Breiman 1994). On the other hand, as
mentionedpreviously, the weights of B-spline neurofuzzy networks are generally updated using
the Least
Mean Squared or Normalized Least Mean Squared learning rules (Brown and Harris 1994).

Stopping Criteria

Stopping criteria are used to decide when to stop the training process. They
determinewhether the model has been optimally or sub-optimally trained (Maier and Dandy
2000). Manyapproaches can be used to determine when to stop training. Training can be stopped:
after thepresentation of a fixed number of training records; when the training error reaches a

11
sufficientlysmall value; or when no or slight changes in the training error occur. However, the
aboveexamples of stopping criteria may lead to the model stopping prematurely or over-training.
Asmentioned previously, the cross-validation technique (Stone 1974) is an approach that can
beused to overcome such problems. It is considered to be the most valuable tool to ensure
overfittingdoes not occur (Smith 1993). Amari et al. (1997) suggested that there are clear
benefits inusing cross-validation when limited data are available, as is the case for many real-life
casestudies. The benefits of cross-validation are discussed further in Hassoun (1995). As
mentionedpreviously, the cross-validation technique requires that the data be divided into three
sets;training, testing and validation. The training set is used to adjust the connection weights.
Thetesting set measures the ability of the model to generalize, and the performance of the
modelusing this set is checked at many stages of the training process. Training is stopped when
the errorof the testing set starts to increase. The testing set is also used to determine the optimum
numberof hidden layer nodes and the optimum values of the internal parameters (learning
rate,momentum term and initial weights). The validation set is used to assess model performance
oncetraining has been accomplished. A number of different stopping criteria (e.g. Bayesian
Information Criterion, Akaike‟s Information Criterion and Final Prediction Error) can also
beused, as mentioned previously. Unlike cross-validation, these stopping criteria require the data
bedivided into only two sets; a training set, to construct the model; and an independent
validationset, to test the validity of the model in the deployed environment. The basic notion of
thesestopping criteria is that model performance should balance model complexity with the
amount oftraining data and model error.

Model Validation

Once the training phase of the model has been successfully accomplished,
theperformance of the trained model should be validated. The purpose of the model validation
phaseis to ensure that the model has the ability to generalize within the limits set by the training
data ina robust fashion, rather than simply having memorized the input-output relationships that
arecontained in the training data. The approach that is generally adopted in the literature to
achievethis is to test the performance of trained ANNs on an independent validation set, which
has notbeen used as part of the model building process. If such performance is adequate, the
model isdeemed to be able to generalize and is considered to be robust.

The coefficient of correlation, r, the root mean squared error, RMSE, and the
meanabsolute error, MAE, are the main criteria that are often used to evaluate the
predictionperformance of ANN models. The coefficient of correlation is a measure that is used to
determinethe relative correlation and the goodness-of-fit between the predicted and observed
data. Smith(1986) suggested the following guide for values of |r| between 0.0 and 1.0:

|r| ≥ 0.8 strong correlation exists between two sets of variables;

12
0.2 < |r|< 0.8 correlation exists between the two sets of variables; and
|r| ≤ 0.2 weak correlation exists between the two sets of variables.

The RMSE is the most popular measure of error and has the advantage that large
errorsreceive much greater attention than small errors (Hecht-Nielsen 1990). In contrast with
RMSE,MAE eliminates the emphasis given to large errors. Both RMSE and MAE are desirable
when theevaluated output data are smooth or continuous (Twomey and Smith 1997).Kingston et
al. (2005b) stated that if “ANNs are to become more widely accepted andreach their full
potential…, they should not only provide a good fit to the calibration andvalidation data, but the
predictions should also be plausible in terms of the relationship modeledand robust under a wide
range of conditions.” and that “while ANNs validated against erroralone may produce accurate
predictions for situations similar to those contained in the trainingdata, they may not be robust
under different conditions unless the relationship by which the datawere generated has been
adequately estimated”.

References:

Hajek, M. Neural Networks

NESTOR L. SY.Modelling the infiltration process with a multilayer perceptron artificial neural
network. Hydrological Sciences Journal, 51:1, 3-20. February 2006

Reddy, K. B., Rao, B. V. & Sarala, C. Hydrology and watershed management: Ecosystem
Resilience-Rural and Urban.

Sydenham,P. H. and Thorn, R.Handbook of Measuring System Design, 2005 John Wiley &
Sons, Ltd. ISBN: 0-470-02143-8.

Maier, H. R. & Dandy, G. C. (2000) Neural networks for the prediction and forecasting of water
resources variables: a review of modelling issues and applications. Environmetal Modelling
Software 15, 101–124.

Shahin, M. A., Jaksa, M. B. Maier,H. R. State of the Art of Artificial Neural Networks in
Geotechnical Engineering. Electronic Journal of Geotechnical Engineering 8, 1-26

13
CHAPTER 2

Modelling the infiltration process with a multi-layerperceptron artificial


neural network
NESTOR L. SY
Faculty of Civil Engineering & Geosciences, Delft University of Technology, NL-2600 GA Delft,
The Netherlands
n.sy@ct.tudelft.nl

Abstract: Infiltration is a significant process which controls the fate of water in a catchment.
Over theyears, many infiltration models have been developed which are either physically based,
conceptual orempirical. The literature shows that a model’s applicability will always be limited
to its context (such aslocation, availability of data, etc.). Artificial neural networks (ANNs) have
been recently used with successfor a variety of nonlinear hydrological processes. In this study,
the ANN multilayer perceptron wasemployed to model infiltration using data derived from plot-
scale rainfall simulator experiments conductedin Cebu, the Philippines. Training parameters such
as the stopping criteria and the type of transfer function affected the efficiency and the
generalization capability of the ANN. The ANN resulted in a satisfactory network with an
average R2 of 0.9110. The network performance was also found to be related to the input as the
accuracy of the ANN model was higher for soil types with higher proportions in the training
data.The time distribution of infiltration showed that the model was unable to estimate the first
few minutes ofthe process, but improved significantly in the later time intervals. Sensitivity
analysis showed that soilmoisture and hydraulic conductivity are the influencing factors in
modelling infiltration using ANN. Whencompared with the traditional Philip and Green-Ampt
models, ANNs provided the highest accuracy interms of cumulative infiltration.

Key words:artificial neural network; infiltration; multilayer perceptron; rainfall simulator

INTRODUCTION

Horton’s (1933) classic infiltration theory of surface runoff has been the dominant
concept in hillslope hydrology for several decades now. It assumes that the (sole)source of storm
runoff was the excess water which was unable to infiltrate into the soil.Later, the partial area
concept showed that this occurrence may be localized in certaincontexts (Anderson & Burt,
1990); nevertheless, the evaluation of runoff is largelyaffected by the mechanisms of infiltration.
Understanding how infiltration contributesto runoff remains an important research area and has

14
implications in many otheraspects of the hydrological cycle including erosion, contaminant
transport and agriculturalwater use.

Throughout the past century, several infiltration estimation equations have


beendeveloped. Rawls et al. (1993) categorized these models as:

(a) physically-based;
(b) approximate theory-based; and
(c) empirical.

The physics-based approach of Richards (1931) combines Darcy’s Law and thelaw of
conservation of mass to derive the equation for unsaturated flow. This isexpressed as:

(1)

where Ө is the volumetric water content, K is the unsaturated hydraulic conductivityand


D is defined as the soil water diffusivity. Solving the Richards equation involvesdetailed data
input, specific boundary conditions and numerical solutions, which inmost cases are very
theoretical. Approximate models (e.g. Green &Ampt, 1911 andPhilip, 1957) and empirical
models (e.g. Kostiakov, 1932; Horton, 1940) usually havedrawbacks in the estimation of their
parameters and are limited in the context of how,or whence, the data were derived. Table 1
shows a summary of these equations as wellas a description of the model parameters. Many of
these models have been applied tovaried contexts with mixed results; it is difficult to assess
which model performs betterunder which conditions (Mishra et al., 2003).

Artificial neural networks (ANN) have traditionally been used to mimic tasksperformed
by the human brain. Applications have expanded over the years in suchdisciplines as computer
science and robotics, statistics, engineering, physics, medicine,biology and psychology, to name
a few. In these fields, ANNs are beginning to be thefavoured option over other methods because
they are highly nonlinear, universalapproximators. The main advantage of ANNs is their ability
to model nonlinearprocesses of a system without any a priori assumptions about the nature of
thegenerating processes.The ANN paradigm is inspired by the way the densely interconnected,
parallelstructure of the mammalian brain processes information.

15
Table 1. Approximate theory-based and empirical infiltration equations.

Equation Parameters

Green-Ampt(1911) f = K [1+ ηSc/A] fis the infiltration rate, K is the


saturated hydraulic conductivity, Scis
the suction at the wetting front, F is
the cumulative infiltration depth, and η
is the porosity.

Philips (1957) f = 1/2st-1/2 +A sis a parameter called sorptivity and A


is a parameter depending upon soil
properties.

Kostiakov (1932) f = (ab)tb-1 f is the infiltration capacity, t is the


time after infiltration starts, and a > 0
and 0 <b< 1 are parameters which
depend on the soil and initial
conditions. The parameters a andb are
determined using observed infiltration
data.

f is the instantaneous infiltration rate,


Horton (1940) f = fc + (fo – fc)e-kt fc is the final (constant) infiltration
rate, fo is the initial infiltration rate, k
is an empirical soil parameter
(constant for a given soil), and t is the
infiltration time.

16
Fig.1 Typical feedforward architecture.

It is a collection of mathematical models which draws on the analogies of adaptive


biological learning. Itis composed of a number of interconnected processing elements that are
similar toneurons and are joined together with weighted connections that are analogous
tosynapses. Learning occurs through training or exposure to a true set of input/outputdata where
the training algorithm iteratively adjusts the connection weights. Theseconnection weights store
the knowledge needed to solve specific problems.The feedforward type of networks are the most
popular ANN structures in usetoday. A feedforward network is composed of a hierarchy of
neurons, organized as aseries of layers. The first layer (input layer) acts as a space for the inputs
fed to thenetwork. The last layer (output layer) is where the overall mapping of the networkinput
is made available. Between the input and output layers may lie hidden layer(s),where more
remapping or computing takes place. Each unit in a layer is connected by links (weights). Figure
1 illustrates a typical feedforward network. Multilayer perceptrons(MLPs) and radial basis
function (RBF) networks are the two most commonlyused types of feedforward artificial neural
network.

The teaching process of the feedforward network begins by propagating all theinput
values through the network and determining all the unit outputs. The modelled(simulated) output
and the desired output/response are then compared with theirdifference considered as the error.
This error is back-propagated (starting from theoutput layer) to the previous layer(s) and is
usually modified by the derivative of thetransfer function with the weights adjusted using the

17
delta rule. This process proceedsfrom the previous layers until the input layer.The ASCE Task
Committee on Application of Neural Networks in Hydrology(2000a,b), Maier & Dandy (1998,
2000) and Dawson &Wilby (2001) give comprehensiveoverviews of issues, applications and
perspectives of ANNs in the field ofwater resources, hydrology and the environment. One of the
major criticisms of ANNsis their black-box label since it does not provide any explanation of the
underlyingprocesses. Maier & Dandy (2000) suggest that the determination of adequate model
inputs, suitable selection of the neural model and its internal parameters, parameterestimation
and model validation are vital in improving the performance of the ANN. Agood grasp of the
ANN limitations as well as knowledge of the different tools forhandling ANNs are helpful in
making practical and reliable ANN models. Thisunderstanding is important because a neural
network may perform with reasonableprecision for all the samples that contribute to the training
process, but this may notnecessarily mean it will always have satisfactory performance using
samples outside ofthe training set.

Most ANN researchers in the field of hydrology, water resources and environment
generally use the MLP, followed by the RBF. Neural networks, using both MLP andRBF, have
been applied in many areas such as rainfall–runoff, hydrological processes,flood forecasting,
drought analysis, remote sensing, atmospheric studies, soil moistureand environmental
monitoring (Fernando & Jayawardena, 1998; Islam & Kothari,2000; Shin & Salas, 2000;
Tokar&Momcilo; 2000; Bandibas&Kohyama, 2001;Abdul-Wahab& Al-Alawi, 2002; Cameron
et al., 2002; Sudheer et al., 2002; Sudheer& Jain, 2003; Suen&Eheart, 2003; Trajkovic et al.,
2003; Zhang &Govindaraju,2003; Anctil et al., 2004; Cigizoglu, 2004; Rajurkar et al., 2004; Sy,
2004).

In this study, the multilayer perceptron ANN was used to model the infiltration process.
The networks were trained using physically measurable data from rainfall simulator experiments.
The effects of some internal parameters were considered and the generalization capability of the
resulting model was tested using production sets. The sensitivity of the network was evaluated to
determine which input parameters affected the output. In order to evaluate the performance of
the ANN, traditional infiltration modelling techniques were also computed and compared with
the resulting neural networks.

MATERIALS AND METHODS

Study area and data

The study was conducted in Cebu, the Philippines. This island has a length from northto
south of approximately 240 km with an average width of 25 km. Cebu City, located in the centre
of the island, is the third largest city in the Philippines.

18
Field experiments were conducted at the Kotkot and Lusaran watersheds for the period
2001–2003 using a drop-former rainfall simulator (Fig. 2) with intensities ranging from 72.21 to
161.55 mm h-1. The Kotkot-Lusaran catchments have a combined area of approximately 15 163
ha. The parent material of soils is Carcar limestone. Degraded areas have very little organic
matter and soil depth is usually between 1.0 and 2.5 m. Upper portions of grassland areas have
shallow soils while other areas are rocky. The Kotkot-Lusaran area has rolling to rugged
mountainous terrain. The simulator produced 3.7 mm average-sized raindrops. At a height of 1.5
m above the ground, the simulator produced an average velocity of 7.8 m s-1, approx. 82% of the
kinetic energy of natural rainfall. The simulator was designed to suit tropical field conditions
such as sloping terrains and high intensity rainfall. Test plots of one square metre size were
chosen. A vertical trench was dug at the downstream end of the test plot and a trough was
positioned to catch the runoff water. The experiments usually lasted between one and three hours
and runoff was recorded every 5 min. The rainfall simulator experiments and the procedures
used are described in detail by Sy&Calo (2001). The experiments were carried out at slopes
ranging from 10 to 45°, which represents approximately 75% of the entire Kotkot-Lusaran
watershed.

Fig. 2 Schematic representation of rainfall-simulator experiment.

19
The input variables taken from the rainfall simulator experiments include: soil type(in
terms of %sand, %clay), hydraulic conductivity (K), slope (S), bulk density (BD) and soil
moisture (SM). The output variable is the infiltration. Infiltration f (mm h-1) is calculated using a
simple water balance equation:

f = i – q (2)

wherei is the intensity of the rainfall simulator and q is the runoff, both in mm h-1.

Because of the short duration of the experiment, evaporation was assumed negligible.
Particle-size properties of soil were ascertained from the size distribution of individual particles
in a soil sample using the Bouyoucos hydrometer method. The results were divided into three
soil texture groups: %sand, %silt and %clay using the US Department of Agriculture (USDA)
classification scheme.

Hydraulic conductivity was measured using the auger hole method because of
itssimplicity and ease of application in the field. The rate of filling of the borehole provides an
indication of the soil’s hydraulic conductivity. Bulk density was determined using a special
sampler designed to drive and remove a cylindrical core using a drop hammer. The weight of the
soil core is then determined after oven drying (Klute, 1986). Soil moisture was determined using
a Theta Probe, which measures the volumetric soil moisture content by responding to changes in
the apparent dielectric constant. These changes are converted into a DC voltage and related to
soil moisture content.

Data set

The 80 samples/experiments were split into two sets: 56 samples were used in theANN
training/testing and 24 for production. The training/testing set were further split into three
subsets: 60% training, 20% cross-validation and 20% testing. Each experiment usually took 90–
125 min. Each experiment result was then broken down into 5-min intervals. The 5-min interval
samples were then used as the training/testing values for the artificial neural network. The
production set (sometimes called the validation set) is a secondary set of data, which has not
been analysed by the network. The idea is for the neural network model to be adjusted and/or
selected based on its performance using the testing set, the data that it has not seen before.

Table 2 Data set distribution used for training, cross-validation, testing and production.

Soil type Samples used for Samples used for Total


training, production set
cross-validation and
testing

20
Loam 12 4 16
Clay Loam 14 4 18
Sandy Loam 12 4 16
Silty Loam 6 4 10
Clay 6 4 10
Silty Clay Loam 6 4 10

Table 3 Input parameter statistics.


Intensity Moisture %sand %clay Ks Slope BD

Mean 119.34 0.48 35.31 27.83 2.66 27.09 1.46


Standard 31.21 0.07 18.35 17.24 2.31 9.89 0.11
deviation
Kurtosis –1.11 –0.88 –0.63 3.73 1.53 –1.18 –0.79
Skewness –0.18 –0.71 0.20 1.47 1.28 0.05 0.29
Minimum 72.21 0.34 1.62 0.89 0.03 10.76 1.23
Maximum 161.55 0.55 68.79 86.22 9.02 44.08 1.70

Ks: saturated hydraulic conductivity; BD: bulk density.

The 24 experimental data points were selected to representthe six soil types under
different slope conditions. Tables 2 and 3 summarize the input variables used for the ANN
simulations.

Pre-processing of input data

Data variables have different ranges of values and units. To overcome these, data
wasstandardized in order to provide equal attention during the training process and remove the
effect of similarity between objects. The input data were standardized [0,1] using the following
function:

(3)

whereXij is the standardized value of the input xij, Vmin j and Vmax j the minimum
andmaximum values of the jth variable in all observations, respectively.

21
Design and training of the model

The multilayer perceptron (MLP) was the network type used in this study. Thehyperbolic
and sigmoidal transfer functions were evaluated. Determining the network size is usually done
by trial-and-error experimentation. The method applied herein followed that of Fahlman(1988),
termed the constructive algorithm approach by Maier & Dandy (2000). The procedure started
with one neuron in one hidden layer and progressing (increasing the size) until the performance
of the test is found suitable. The results of the model were evaluated using the mean squared
error (MSE) and the coefficient of determination (R2). The MSE is defined as:

(4)

Where, f is the mean of the observed infiltration, is the mean of the modelled infiltration.
The coefficient of determination (R2) between the observed and simulated infiltration is defined
as:

(5)

Where, is the mean of the observed infiltration, is the mean of the modelled infiltration.

The learning curve (MSE as a function of time) was also used as a convergence/stop
criterion using the cross-validation data. This was done by stopping the learning when the error
using the test set began to increase. The effects of the convergence criteria and the transfer
functions on network performance were evaluated by running simulations several times and
comparing the values of the coefficient of determination.After this, the best network was
determined and was evaluated on how it modelled the production set. Many neural network
researchers agree that, in order to have robust and fast convergence, one should ensure that all
the weights in the network change roughly at the same rate (LeCun et al., 1998; Principe et al.,
2000). The software (NeuroSolutions) used in this study implemented adaptive learning rates
used to control the speed of convergence by increasing or decreasing the learning rate based on
the error. This meant that the learning and momentum rates were set only at the beginning of the

22
run and the adaptive rules of the software took care that the weights in the network changed at
the same rate. LeCun et al. (1998) further suggest that using the Levenberg-Marquardt back-
propagation algorithm would result in more efficient networks for large and redundant samples.
However, the basic gradient descent method was used here due to software limitations. Only
recently a version NeuroSolutions is available, which employs the Levenberg-Marquardt
algorithm.

RESULTS, ANALYSIS AND DISCUSSION

Correlation matrix and multiple linear regressionThe common statistical parameters


(mean, standard deviation, skewness, kurtosis, minimum and maximum values) of the input data
are summarized in Table 3.

The correlation matrix, with infiltration as the dependent variable (Table 4),indicates that there is
a moderate relationship between bulk density, %sand and infiltration. There is also a fair degree
of relationship between clay, %clay and infiltration. These are soil characteristics which give
information on the degree of compactness (bulk density), size distribution (%sand, %clay) and
the ability of a soil sample to transmit water (hydraulic conductivity). They influence the soil
water movement andwater retention characteristics, which in turn affects infiltration. Intensity,
slope and soil moisture show weak or no correlation with infiltration.

Table 4 Correlation matrix for rainfall-simulator training data.

Intensity %san %clay K Slope BD Moisture Infiltration


d
Intensity 1.00

%sand 0.034 1.00

%clay -0.010 -0.444 1.00

K 0.096 0.595 -0.521 1.00

Slope -0.010 0.004 -0.036 - 1.00


0.008
BD -0.041 0.481 -0.233 0.467 0.022 1.00

Moisture 0.006 -0.046 -0.104 0.038 0.022 -0.012 1.00

Infiltration -0.134 0.546 -0.389 0.311 0.112 0.476 0.051 1.00

23
Note: figures in bold indicate fair to moderate degree of relationship.

Table 5 Model parameters using multiple linear regression, R2 = 0.431.

Intercept Intensity %sand %clay K Slope BD Moistu


re
Parameter 14.644 -0.085 0.472 -0.236 -1.483 0.196 41.366 18.044

The link between %sand, %clay, bulk density and hydraulic conductivity are alsoseen in
the correlation matrix. This is expected since these are closely related soil physical properties
and can affect soil water movement and water-retention characteristics of the media (Rawls et
al., 1993).

Modelling the data using multiple linear regression resulted in R2 of 0.431(Table 5). A
stepwise solution did not yield a better value for R2. The input parameters were tested for
normality with the Shapiro-Wilk test. The result shows that, for all input parameters, the decision
is to reject the null hypothesis that the sample follows a normal distribution. Essentially, non-
normality is significant for all the input variables.

Correlation and regression only reflects the effect of the parameters assuming that
thesystem is linear. In this instance, regression is inappropriate in modelling infiltration.

Parameter estimation of infiltration equations

The infiltration parameters of the Green &Ampt (1911), Kostiakov (1932), Horton(1940)
and Philip (1957) infiltration models using field rainfall simulator data were determined by
empirical fitting using linear regression. These four models were chosen as theyare commonly-
used equations in the literature (Rawls et al., 1993). Minimum, maximum and average parameter
values of the four models are presented in Table 6. Table 7 also shows the performance of the
models sorted on the basis of soil type. The results show that the Philip model gave the best
performance with an average R2 of 0.5741, followed closely by the Green-Amptmodel with an
R2 of 0.5639. The Horton and Kostiakov models gave R2 values of less than 0.4. However, it
can be noted from the table that the R2 performance varies from 0.0004 to 0.9510 for the Philip
model. This large discrepancy is also true with all the other models. For instance, the Philip
model parameter A varies from 1.41 to 113.70, the Green-Amptparameter K varies from 4.64 to
123.37, the Kostiakov parameter a varies from 35.95to 136.93 and the Horton parameters fc and
fo vary from 0.01 to 66.00 and 8.51 to 146.66, respectively. Mishra et al. (2003), in comparing
different infiltration models, also found that the values of the parameters can vary significantly
even with those reported in the literature.

24
Table 6 Parameters of Philips, Green-Ampt, Kostiakov and Horton derived from rainfall-
simulator experiments by empirical fitting.

Philip Green-Ampt Kostiakov Horton


s A R2
K ηSc R2
a b R2
fc fo R2
Min 0.0 1.41 0.004 4.64 0.04 0.004 35.95 0.57 0.000 0.01 8.51 0.004
2 2 6 0
Max 10. 113.7 0.9510 123. 100.4 0.920 136.9 1.00 0.925 66.00 146.6 0.889
47 0 37 6 7 3 5 6
Mean 4.5 62.72 0.5741 68.4 11.03 0.563 78.83 0.89 0.398 23.37 77.92 0.399
3 5 9 3 1

Note: figures in bold indicate fair to moderate degree of relationship.

Table 7 Mean parameters of Philips, Green-Ampt, Kostiakov and Horton arranged by soil
type.

Soil Philip Green-Ampt Kostiakov Horton


type s A R2 K ηSc R2 a b R2 fc fo R2
Loam 2.66 82.3 0.343 86.1 1.97 0.326 92.07 0.95 0.343 40.23 72.14 0.313
4 3 8 7 3 3

Clay 6.36 47.5 0.693 55.1 20.13 0.661 70.03 0.83 0.646 21.99 77.69 0.493
Loam 3 3 1 7 7 3

Sandy 3.94 62.9 0.653 68.4 2.83 0.688 76.88 0.92 0.620 15.31 79.83 0.330
Loam 6 0 1 0 0 0
Silty 1.46 82.0 0.420 115. 1.02 0.310 118.6 0.99 0.390 0.01 115.2 0.430
Loam 0 0 39 0 0 0 7 0
Clay 2.87 43.7 0.520 48.4 1.80 0.500 54.04 0.91 0.330 12.50 57.45 0.210
7 0 1 0 0 0
Silty 2.17 78.7 0.430 81.9 1.41 0.455 86.32 0.95 0.360 6.26 86.36 0.235
Clay 4 0 0 0 0 0
Loam

Table 8 Two sample experiments with resulting parameters by empirical fitting.

Expt Soil Intensity K Slope BD Philip Green-Ampt

25
no. type s A R2 K ηSc R2
2c Clay 151.2 0.241 27.33 1.35 26.095 112.60 0.8878 123.37 3.59 0.7359
Loam
3c Clay 151.2 0.216 24.21 1.38 41.869 53.21 .9509 69.28 8.58 0.8754
Loam

Another issue is that the R2 value may be satisfactory, but the derived parametersare
variable. For instance, two experiments were done under very similar conditions. The R2 values
were 0.8878 and 0.9509 with the Philip equation for both experiments. However, the parameters
derived have very clear differences (Table 8). This variability in the values of the parameters is
found in many of the results and makes it very difficult to determine the extent of model
suitability.

Effect of internal parameters on the ANN

An array of model selection procedures has surfaced throughout the development


ofneural networks, most of which penalize complexity (Zapranis&Refenes, 1999). The larger the
number of processing elements in the network, the more powerful the network is. However, as
the network size increases, overfitting the training set may occur so that when the network is
given new data which it has never seen before, the response becomes unpredictable. In order to
improve generalization, the ideal network has the least number of degrees of freedom granting
the optimum performance.

Obtaining an optimal neural network is usually one aim during training. Becausethe MLP
is prone to overfitting (Geman et al., 1992), some form of regularization is usually performed for
faster learning and generalization improvement. Because of advances in computational power, it
may seem unnecessary to seek for efficient algorithms. However, in the pursuit of faster
solutions, it is preferable that better and more stable solutions are found as well (Orr & Müller,
1998).

In this study, 30 simulations were performed per set by varying the number ofneurons
(ranging from one to six), stop criteria (i.e. with stop criteria from cross-validation data and
without stop criteria until the total epochs were reached) and the transfer functions (hyperbolic
tangent and sigmoid). Early stopping involves taking an independent cross-validation set and
monitoring the error of this set during training. While the error on the training set decreases
continuously, the cross-validation set will reach its minimum at a certain point and then start to
increase with training iteration. The early stopping point is where the error of the cross-
validation is at its lowest. This is where the trained network provides the best generalization
capability (Hanson et al., 1993). The stopping criterion employed in this study is based on this

26
method which is stopping with cross-validation. The choice of a stopping criterion should
maximize both training time and generalization.

Stopping when the generalization error has increased for a certain number of epochshas
the advantage of faster training speed as compared to letting the process continue until the epoch
limit has been reached. In using the stop criterion, it is found that the networks with three, four
and five hidden layer neurons using the hyperbolic tangent transfer function converge at a lower
number of epochs as compared to other combinations (Table 9). This implies a faster and more
efficient ANN training. The coefficient of determination for the testing set was also at its peak
for the same combination. Neither the average values of R2 nor the number of epochs showed
any considerable improvement beyond seven neurons. Another observation was that the MSE
and R2 had (on average) slightly higher standard deviations when a stop criterion was included
than when training was allowed to reach the maximum epoch. With no significant improvement
in generalization, training the network with a stop criterion is practical and can save a
considerable amount of processing time.

Table 9 Average MSE, R2 and epoch for hyperbolic tangent and sigmoidal transfer
functions.

Expt Soil Intensity K Slope BD Philip Green-Ampt


no. type s A R2 K ηSc R2
2c Clay 151.2 0.241 27.33 1.35 26.095 112.60 0.8878 123.37 3.59 0.7359
Loam
3c Clay 151.2 0.216 24.21 1.38 41.869 53.21 .9509 69.28 8.58 0.8754
Loam

Production data

Based on the simulations made, one can state that the ANN models with the bestresults
had the following characteristics:

(a) network type: multilayer perceptron;


(b) number of layers: three (with one hidden layer);
(c) number of neurons in the hidden layer: three, four or five;
(d) transfer function: hyperbolic tangent.

The ANN models trained with the above characteristics were then used in the evaluation
of the production set. Although the average R2 for testing of the ANN model was generally
satisfactory, a vital test of the model would be to evaluate its performance with the production
set. The production set in this study was chosen to represent six different soil types with varying

27
ranges of rainfall intensity and slope. Of the three ANN sizes used, the model with four neurons
gave the best result. The average R2 values for the three models used are 0.8560, 0.9110, 0.8821
for three, four and five neurons, respectively. This means that, as the number of neurons was
increased beyond four, the predictive capability of the network actually decreased.

Comparison between actual observations and ANN simulation

Table 10 shows the results of ANN model (MLP, 4-neuron hyperbolic) with
theproduction set. The mean R2 for this set is 0.9110. On the basis of soil type, the ANN
prediction was highest for Clay Loam (mean R2 = 0.9435) followed by Loam (mean R2 =
0.9417). The poorest performance was for Silty Clay Loam (R2 average =0.8558). A closer look
at Tables 2 and 10 shows a relationship between the percentageof data that were available for
input and the performance of the ANN. The results areconsistent with the view that ANNs
prediction capability is at its best when they are within the values of the training data (Minns&
Hall, 1996). The larger the trainingdata set, the better is the network performance. Since a large
percentage of the input data used for training/testing was Clay Loam and Loam, the ANN
prediction was also better for these soil types. With lesser data availability in the input training
set for Silty Clay Loam, for example, the predictive performance of the ANN network was also
low for this soil type.

Table 10 Mean R2 results of ANN (MLP 4-neuron, hyperbolic) with production data.

28
The ANN model prediction over the measured data was plotted on graphs for all ofthe
production set data. The trained ANN was able to satisfactorily predict the infiltration during the
later time intervals (15 min and after). However, for shorter time intervals (0–15 min) the
predictability decreased significantly, presumably due to increased noise. In the actual rainfall
experiments, a zero runoff, hence 100% infiltration can be expected during the initial minutes.
The ANN model prediction was unreliable during this phase and can even overestimate (i.e.
more infiltration than the actual amount of rainfall). After these initial few minutes, the model
prediction improves significantly (Fig. 3). The coefficient of determination of the ANN
model(MLP 4-neuron hyperbolic) was divided into the first 0–15 min and 15 min thereafterand
is also illustrated in Table 11. Figure 4 shows the scatter plots of the results from the production
set for Loam, Clay Loam and Sandy Loam. There is wider scatter for larger values of infiltration
for the experimental set, which means that there is a larger disparity between measured and
predicted values during the first several minutes of the experiment

29
Fig. 3 Comparison of time-distribution graph of simulated (ANN) and measured
(rainfall simulator) results.

Table 11 Partition of soil series results into 0–15 min and above 15 min.

Soil type 0–15 min Above 15 min Mean

Loam 0.76 0.97 0.9417

Clay Loam 0.78 0.97 0.9435


Sandy Loam 0.72 0.95 0.9154

Silty Loam 0.70 0.94 0.9112


Clay 0.68 0.94 0.8926
Silty Clay Loam 0.65 0.92 0.8558

30
(a)

(b)

(c)
31
Fig. 4. Scatter plots of measured vs ANN-predicted infiltration for: (a) Loam; (b)
ClayLoam; and (c) Sandy Loam.

Sensitivity analysis

A sensitivity analysis of the inputs to the infiltration was done for the three
separatelytrained ANNs to determine the contribution of the parameters to infiltration. Each
input parameter was varied between its mean and ±1 standard deviation (SD). The network
output was computed for 50 steps below and above the mean and the sensitivity was calculated.
A summary of the sensitivity analysis for the three separately trained ANNs is shown in Table
12. Figure 5 also illustrates the mean results of the sensitivity analysis. In these experiments, the
soil moisture content had a much greater effect on infiltration than the any other input. The
sensitivity analysis was done for the three trained ANNs with the assumption that ANNs with
low correlations between actual and predicted outputs could yield different results. All three
sensitivity analyses yielded similar and consistent results showing that soil moisture was the
major determining factor in the prediction capability of the ANN. The next influencing factor
was the hydraulic conductivity. All the other parameters fall within the same range of influence.
Although it may also seem that the dynamic variables such as rainfall intensity and soil moisture
are solely responsible for the training of the ANN, the sensitivity analysis shows that hydraulic
conductivity, which does not change in time, has an impact on the prediction. Rainfall intensity,
however, has lower sensitivity, as shown in Fig. 5.

Table 12 Contribution of input parameters based on sensitivity analysis of three


trained ANNs.

ANN MLP 3- ANN MLP 4- ANN MLP 5- Mean


neuron neuron neuron
hyperbolic hyperbolic hyperbolic

Intensity 0.1491 0.1578 0.1622 0.1564

%sand 0.1836 0.1813 0.1933 0.1861


%clay 0.1889 0.1872 0.1993 0.1918
K 0.2755 0.2679 0.2536 0.2657
BD 0.2391 0.2000 0.2203 0.2198
Slope 0.1790 0.1901 0.1916 0.1869
Moisture 0.4755 0.4509 0.4580 0.4615

32
Fig. 5 Mean sensitivity analysis of three trained ANNs.

However, this does not coincide with the correlation analysis of the variables(Table 4)
where soil moisture showed only a weak relationship to infiltration. Hydraulic conductivity,
together with other soil properties, did show a fair level of relationship to infiltration.
Researchers have performed simulated field measurements and have shown that soil
characteristics, slope, vegetation cover, bulk density, seasonchanges due to soil moisture are
among factors affecting infiltration (Cerdà, 1997; Martin & Moody, 2001; Harden & Scruggs,
2003). Correlation only reflects the effect of the parameters assuming that the system is linear.
Because of the nonlinear nature of the data, this then explains the discrepancy between the
correlation analysis and the sensitivity results from the ANN models.

Sensitivity analysis in neural networks is usually done for two purposes: toinvestigate the
effect the input has on the outputs and to determine whether any insignificant inputs can be
ignored. In this study, it is found that soil moisture and hydraulic conductivity are the major
influencing factors in modelling infiltration using an ANN. This could mean that including the
moisture values from the previous time step as additional input may improve the predictive
capability of the ANN. The other inputs show similar levels of sensitivity and do not give reason
to ignore them in order to improve the ANN performance.

Cumulative infiltration

To evaluate further the predictive capabilities of the ANN model, it is then


comparedwith: (a) the actual infiltration and (b) the traditional Philip and Green-Ampt models in
terms of cumulative infiltration. The comparison is accomplished using the entireproduction set.

33
The parameters of the Philip and Green-Ampt equations used were taken from the mean values
for each soil type (Table 7). Cumulative infiltration was then computed based on the given
rainfall intensity of the run using the method from Chow et al. (1988). Since Horton (1940) and
Kostiakov (1932) showed poorer results in the empirical fitting of the parameters, both have
been excluded in this comparison.

As illustrated in Fig. 6, the model accuracy for the Philip and Green-Ampt equationswere
inconsistent, mostly overestimating the cumulative infiltration. The poor performance of both the
Philip and Green-Ampt models can be attributed to the fact that the mean parameter values were
used. As discussed previously, there is a wide variability in the minimum and maximum values
derived from the empirical fitting making it very difficult to determine which parameter to use in
which context. Moreover, these infiltration equations have not been designed to account for
many other local features such as macropores, crusting, moisture within the soil profile, etc.
Because ANNs are universal approximators, the resulting model provided superior accuracy
because of its facility to portray nonlinearity and generalize the structure of the whole data set.

Fig. 6 Comparison of actual, ANN (4-neuron, hyperbolic), Philip and Green-Ampt


cumulative infiltration of production data.
CONCLUSIONS

34
This paper shows the applicability of ANN multilayer perceptron approach inmodelling
of the infiltration process using experimental data from rainfall simulator field experiments.
Correlation analysis of the variables showed that they were nonlinear, hence linear regression is
an inappropriate method for modelling infiltration. Parameter estimation of traditional infiltration
models provided a wide variability in the results; this makes it difficult to determine which
parameter to use in which situation.

The internal network parameters play a role in the efficiency and generalizationcapability
of the ANN. Simulations were made where internal parameters such as the convergence/stop
criteria and transfer functions were varied and the efficiency and generalization was evaluated
for the best combination. The results with the production data also reveal that the performance of
the ANN model is influenced by the availability of the input vectors. However, even with the
limited variables, the ANN model still achieved satisfactory outcomes. Comparison of time
distribution of infiltration in the production set shows that the model is not able to predict
sufficiently well the early time steps where runoff usually begins at a zero level (and thus, the
infiltration is maximum). However, in the subsequent time steps, the accuracy of the ANN model
significantly improves. Sensitivity analysis showed that soil moisture and hydraulic conductivity
are the influencing factors in modelling infiltration using the ANN. The ANN model provided
the highest accuracy when compared with the traditional Philip and Green-Ampt models in terms
of cumulative infiltration.

Further simulations are recommended to improve the ANN performance by including


data from other soil groups, varying slopes and incorporating other variables such as the soil
moisture values of the previous time step. The performance of the ANN model suggests that,
using this approach, one can estimate infiltration from easily available physical data. The trained
ANN could be coupled to larger hydrological models where limited data are available and
rainfall simulator experiments can be performed.

Acknowledgements

This project was made possible through the Joint FinancingProgramme for Cooperation
on Higher Education (MHO) administered by the Netherlands Organization for International
Cooperation in Higher Education (NUFFIC) and the Japan Society for the Promotion of Science
(JSPS) with visiting fellowships at Delft University of Technology (DUT), the Netherlands and
Tokyo Institute of Technology (TIT), Japan. Special thanks are due to Prof. DrIr Cees van den
Akker of the DUT Section of Ecology and Hydrology, Dr Josef Cser of the DUT Section of Civil
Engineering Informatics and Dr Joel C. Bandibas of the National Institute of Advanced Industrial
Science and Technology (AIST) in Japan.
REFERENCES

35
Abdul-Wahab, S. A. & Al-Alawi, S. M. (2002) Assessment and prediction of tropospheric ozone
concentration levelsusing artificial neural networks. Envir.Modell. Software 17, 219–228.

Anctil, F., Michel,.C., Perrin, C. &Andréassian, V. (2004) A soil moisture index as an auxiliary
ANN input for stream flow forecasting. J. Hydrol. 286, 155–167.

Anderson, M. G. & Burt, T. P. (1990) Process studies in hillslope hydrology: an overview. In:
Process Studies in Hillslope Hydrology (ed. by M. G. Anderson & T. P. Burt), 1–9. John Wiley
& Sons Ltd, Chichester, UK.

ASCE Task Committee on Application of Artificial Neural Networks in Hydrology (2000a)


Artificial neural networks in hydrology. I. Preliminary concepts. J. Hydrol. Engng ASCE 5(2),
115–123.

ASCE Task Committee on Application of Artificial Neural Networks in Hydrology (2000b)


Artificial neural networks in hydrology. II. Hydrologic applications. J. Hydrol. Engng ASCE
5(2), 124–137.

Bandibas, J. C. &Kohyama, K. (2001) An efficient artificial neural network training method


through induced learning retardation: inhibited brain learning. Asian J. Geoinformatics 1(4), 45–
55.

Cameron, D., Kneale, P. & See, L. (2002) An evaluation of a traditional and a neural net
modelling approach to flood forecasting for an upland catchment. Hydrol. Processes 16, 1033–
1046.

Cerdà, A. (1997) Seasonal changes of the infiltration rates in a Mediterranean scrubland on


limestone. J. Hydrol. 198, 209–225.

Cigizoglu, H. K. (2004) Estimation and forecasting of daily suspended sediment data by multi-
layer perceptrons. Adv. Water Resour. 27, 185–195.

Chow, V. T., Maidment, D.R. & Mays, L. W. (1988) Applied Hydrology. McGraw-Hill Book
Company, New York, USA.

Dawson, C. W. &Wilby, R. (2001) Hydrological modeling using artificial neural networks.Progr.


Phys. Geogr. 25, 80–18.
Fahlman, S. E. (1988) Fast learning variations of back-propagation: an empirical study. Proc.
1988 Connectionist Models

36
Summer School, 38–51, Morgan Kaufmann Publishers, Inc., San Francisco, California,
USA.Fernando, D., Achela, K. & Jayawardena, A. W. (1998) Runoff forecasting using RBF
networks with OLS algorithm.J. Hydrol. Engng ASCE 3(3), 203–209.

Geman, S., Bienenstock, E. &Doursat, R. (1992) Neural networks and the bias/variance
dilemma. Neural Comp. 4, 1–58.

Green, W. H. &Ampt, G. A. (1911) Studies in soil physics. I. Flow of air and water through
soils. J. Agric. Sci. 4, 1–24.

Hanson, S. J., Cowan, J. D. & Giles, C. L. (1993) Advances in Neural Information Systems, vol.
5. Morgan Kaufman Publishers, Inc., San Francisco, California, USA.

Harden, C. P. & Scruggs, P. D. (2003) Infiltration on mountain slopes: a comparison of three


environments. Geomorphol.55, 5–24.

Horton, R. E. (1933) The role of infiltration in the hydrological cycle. Trans. Am. Geophys.
Union 14, 446–460.

Horton, R. E. (1940) An approach toward a physical interpretation of infiltration capacity. Soil


Sci. Soc. Am. J. 5, 399–417.

Islam, S. & Kothari, R. (2000) Artificial neural networks in remote sensing of hydrologic
processes. J. Hydrol. EngngASCE 5(2), 138–144.

Klute, A. (1986) Methods of Soil Analysis, Part 1: Physical and Mineralogical Methods (second
edn). American Soc. Of Agron., Madison, Wisconsin, USA.

Kostiakov, A. N. (1932) On the dynamics of the coefficient of water-percolation in soils and on


the necessity of studying it from a dynamic point of view for purposes of amelioration. Trans.
Sixth Comm. of the Int. Soc. of Soil Sci., Part A, 17–31.

LeCun, Y., Bottou, L., Orr, G. B. & Müller, K. R. (1998) EfficientbackProp. In: Neural
Networks: Tricks of the Trade (ed. by G. B. Orr & K. R. Müller), 9–53. Springer-Verlag, Berlin,
Germany.

Maier, H. R. & Dandy, G. C. (1998) Understanding the behavior and optimising the performance
of back-propagation neural networks: an empirical study. Envir.Modell. Software 13, 179–191.

37
Maier, H. R. & Dandy, G. C. (2000) Neural networks for the prediction and forecasting of water
resources variables: a review of modelling issues and applications. Envir.Modell. Software 15,
101–124.

Martin, D. A. & Moody, J. A. (2001) Comparison of soil infiltration rates in burned and
unburned mountainous watersheds.Hydrol. Processes 15, 2893–2903.

Minns, A. W. & Hall, M. J. (1996) Artificial neural networks as rainfall–runoff models.


Hydrol.Sci. J. 41(3), 399–417.

Mishra, S. K, Tyagi, J. V. & Singh, V. P. (2003) Comparison of infiltration


models.Hydrol.Processes 17, 2629–2652.

Orr, G. B. & Müller, K. R. (1998) (eds) Neural Networks: Tricks of the Trade. Springer-Verlag,
Berlin, Germany.

Philip, J. R. (1957) Theory of Infiltration. Division of Plant Industry, CSIRO, Australia.

Principe, J. C., Euliano, N. R. & Lefebvre, W. C. (2000) Neural and Adaptive Systems:
Fundamentals through Simulations. John Wiley & Sons, Inc., New York, USA.

Rajurkar, M. P., Kothyari, U. C. &Chaube, U. C. (2004) Modeling of the daily rainfall–runoff


relationship with artificial neural network. J. Hydrol. 285, 96–113.

Rawls, W. J., Ahuja, L. R., Brakensiek, D. L. &Shirmohammadi, A. (1993) Infiltration and soil
water movement. Chapter 5 in: Handbook of Hydrology (ed. by D. R. Maidment), McGraw–Hill,
Inc., New York, USA.

Richards, L. A. (1931) Capillary conduction of liquids through porous mediums. Physics 1, 318–
333. Shin, H. S. & Salas, J. D. (2000) Regional drought analysis based on neural networks. J.
Hydrol. Engng ASCE 5(2), 145–155.

Sudheer, K. P. & Jain, S. K. (2003) Radial basis function neural network for modeling rating
curves. J. Hydrol. EngngASCE 8(3), 161–164.

Sudheer, K. P., Gosain, A. K. &Rangan, D. M. (2002) A data-driven algorithm for constructing


artificial neural networkrainfall–runoff models.Hydrol. Processes 16, 1325–1330.
Suen, J. P. &Eheart, J. W. (2003) Evaluation of neural networks for modeling nitrate
concentration in rivers. J. Water Resour. Plan. Manage. 129(6), 505–510.

38
Sy, N. L. (2004) Artificial neural network modeling of infiltration from plot-scale rainfall
simulator data. In: Hydroinformatics (ed. by S. Y. Liong, K. K. Phoon& V. Babovic) (Proc.
Sixth Int. Conf. on Hydroinformatics, June 2004), 1433–1440. World Scientific Publishing Co.,
Pte. Ltd., Singapore.

Sy, N. L. &Calo, E. T. (2001) Utilization of the USC-CED field rainfall simulator as a research
tool for understanding the infiltration process of the hydrologic cycle. In: Proc. Fourth National
Civil EngngEducation Congress (De La Salle Univ., Manila, Philippines, 2–4 May 2001). De La
Salle Univ. Press, Manila, Philippines.Tokar, A. S. &Momcilo, M. (2000) Precipitation–runoff
modeling using artificial neural networks and conceptual models, J. Hydrol.Engng ASCE 5(20),
156–161.

Trajkovic, S., Todorovic, B. &Stankovic, M. (2003) Forecasting of reference evapotranspiration


by artificial neural networks. J. Irrig. Drain. Div. ASCE 128(6), 454–457.

Zapranis, A. &Refenes, A. P. (1999) Principles of Neural Model Identification, Selection and


Adequacy with Applications to Financial Econometrics. Springer-Verlag, Berlin, Germany.

Zhang, B. &Govindaraju, R. S. (2003) Geomorphology-based artificial neural networks (GANN)


for estimation of direct runoff over watershed. J. Hydrol. 273, 18–34. Received 19 August 2004;
accepted 2 September 2005

39
CHAPTER 3

PRESENTATION

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

S-ar putea să vă placă și