Neural Network

Pergamon 0893-6080( 93 ) E0030-B
Neural Networks, Vol. 7, No. 5, pp. 833-846, 1994 Copyright 1994 Elsevier Science Ltd Printed in the USA. All rights reserved 0893-6080/94 $6.00 + .00
CONTRIBUTED ARTICLE
A Recurrent Neural Network Controller and Learning Algorithm for the On-Line Learning Control of Autonomous Underwater Vehicles
K. P. V E N U G O P A L , A. S. PANDYA, A N D R. S U D H A K A R
Florida Atlantic University
(Received 10 December 1992; revisedand accepted 14 December 1993)
Abstract--A new on-line direct control scheme for the Autonomous Underwater Vehicles (AUV), using recurrent
neural networks, is investigated. In the proposed scheme, the controller consists of a three-layer network architecture having feedforward input and output layers, and a totally recurrent hidden layer. All the interconnection strengths are synchronously updated using a computationally inexpensive learning algorithm called Alopex. The updating is based on the output error of the system directly, rather than using a transformed version of the error employed in the other neural network based direct control schemes. In the present implementation, the network starts from random initial conditions without needing any prior training, and learns the dynamics of the AUV to provide the correct control signal. Based on the simulation experiments using the nonlinear dynamics of an AUV, we demonstrate that the proposed learning algorithm and the network architecture provide stable and accurate tracking performance. We have also addressed the issue of robustness of the controller to system parameter variations as well as to measurement disturbances.
Keywords--Neural networks, Recurrent neural networks, Learning control, Learning algorithm, Autonomous underwater vehicles. 1. I N T R O D U C T I O N Underwater robotic systems are being increasingly employed in ocean exploration studies. Such systems can efficiently execute operations such as underwater structure inspection, recovery, geographical surveys, and a number of military operations. Even though m a n y underwater robotic vehicles are currently operational, most of them require constant human intervention and supervision in their operation. The present trend in the technology is to make such vehicles more autonomous, thereby expanding their operating regime and duration. Such vehicles, known as Autonomous Underwater Vehicles (AUV), have great commercial as well as military potential. Currently there are about 45 AUVs in various stages of development around the world (Bellingham, 1992). 1.1. Control Techniques for Underwater Vehicles The objective of the controller is to generate a correct control signal required to drive the vehicle from an initial state to a desired final state through an optimum state trajectory. Even though well-defined and general controller design techniques are available for linear dynamical systems, there is no general methodology available for the design of controllers for nonlinear dynamical systems. The most c o m m o n approach in the latter case is to linearize the dynamics and design a linear controller for each operating point. When the operational regime of the dynamical system is large, the effort involved in such a design is rather exhaustive. The design of an AUV control scheme is significantly more difficult due to the following factors. The ocean environment is highly uncertain and the disturbances affecting the vehicle dynamics are difficult to model. The nonlinear vehicle dynamics itself is only partially known and the vehicle has degrees of freedom in all the three spatial dimensions. Conventional linear controllers may be inadequate for such vehicles because
833
Acknowledgements:The authors wishto thank Prof. StanleyDunn for his interest and support throughout this research work. The simulation of the vehicledynamics was developedby Andrew Shein and John Kloske and their suggestions were useful on a number of occasions. The authors are also grateful to the reviewersfor their constructive criticisms and suggestions, which helped in improving the clarity of this paper. Requests for reprints should be sent to K. P. Venugopal at his present address: Medical Image Processing Group, University of Pennsylvania, 4th Floor Blockley Hall, 418 Service Drive, Philadelphia, PA 19104.
834
K. P. Venugopal. A. S. Pandya, and R. Sudhakar

1.2. Neural Network Based Controllers for Underwater Vehicles
they cannot provide stable, robust operation over large operating ranges (Yuh, 1990b; Fujii & Ura, 1990; Venugopal, Sudhakar, & Pandya, 1992). Further, the design of such controllers requires precise knowledge of the complex motion dynamics of the vehicle. Strictly speaking, the hydrodynamical characteristics of the underwater vehicle are known precisely only after the design of the vehicle is complete. Thus, the development of a conventional controller linking the multiple linearized models involves exhaustive and expensive testing of the vehicle. Recently, this has motivated the application of intelligent, adaptive control strategies, such as neural network control for underwater vehicles (Yuh, 1990b; Sanner & Akin, 1990; Yuh & Lakshmi, 1993; Venugopal, Sudhakar, & Pandya, 1992). Some of the traditional control schemes used in underwater robotics are cited below. A control scheme for Remotely Operated Vehicles (ROV), using a hierarchical structure of supervisory control, is discussed by Goheen, Jefferys, and Broome (1987). In this scheme the multi-input multi-output (MIMO) controller is split into a series of single-input single-output (SISO) controllers at the lower level. A higher-level controller supervises control action and often assists them. For a recent extension of this scheme for the velocity control of ROVs, see Goheen and Jefferys (1990). A sliding mode control scheme for the ROVs is described by Yoerger and Slotine (1985). In this scheme, the control of an nth order dynamical system is treated as a first-order stabilization problem (Slotine & Li, 1991). The design process consists of picking a wellbehaved function s of the tracking error and then selecting a feedback law such that s 2 is a stable function in the closed loop system. In practice, this corresponds to replacing a switching control law by its smooth approximation. The main advantage of the method is that it can deal with the nonlinear vehicle dynamics directly, without the need for its piecewise linearization. Although the method is robust to imprecise models and parameter variations, the design process needs raw estimation of the parameters and, in effect, leads to a trade-offbetween tracking performance and parameter uncertainty. Also, this method is nonadaptive. Some of the other traditional control schemes studied include the layered control scheme by Bellingham and Humphreys (1990) and Bricks (1989), and the parameter adaptation scheme by Yuh (1990a). Conventional adaptive control techniques such as self-tuning regulators (STR) and model reference adaptive control (MRAC) (Miller, Sutton, & Werbos, 1991, pp. 145-150) rely on the continuous identification of the system and consequent adjustment of controller gains. Though feasible, these techniques are complex and may be too slow for real-time applications where system learning and control actions are on-going processes (on-line controllers).
The emergence of neural networks as alternatives for traditional computing techniques has resulted in their wide use in control applications also (Miller et al., 1991; Narendra & Parthasarathy, 1990, 1991; Yabuta & Yamada, 1992; Yuh, 1990b). The learning and generalization abilities of these networks for adaptation and disturbance rejection make them better candidates for control applications. Also, due to their highly parallel nature of computing, neural networks are able to interpret and process very large amount of sensory information that traditional control techniques are unable to handle. In addition, very little a priori information about the system dynamics is necessary for the design of such controllers. Note that STR and MRAC require explicit modelling of the dynamics (Karakasoglu, Sudharsanan, & Sundaresan, 1991; Narendra & Annaswamy, 1989). A neural network-based direct control scheme for Underwater Robotic Vehicles (URV) is described by Yuh (1990b). In this scheme, a feedforward neural network architecture with three layers of interconnection strengths is used. The network is trained using the popular back propagation algorithm (Rumelhart & McCeUand, 1986). In a recent study, Yuh and Lakshmi (1993) investigate a learning with critic (Widrow, Gupta, & Maitra, 1973) approach for the control of remotely operated vehicles. They compare the performance of this controller when using the back propagation algorithm and the parallel recursive prediction error (PRPE) algorithm. A real-time single-axis attitude regulation scheme for an underwater telerobot using neural networks is discussed by Sanner and Akin (1990). Their study deals with some of the practical implementation issues and indicates that a single serial microprocessor implementation of the neural network controller may be impossible. The schemes discussed above are generally for the motion control of underwater vehicles when the velocities along all three axes are comparable. In such cases, the motion along each direction is independently controlled by separate thrusters. The problem is more complex when surfaces such as stern plane and rudder are used, instead of the thrusters, for controlling the vehicle direction. Fujii and Ura (1990) describe one of the first implementation of neurocontrollers for underwater vehicles with control surfaces. In their scheme, a fuzzy controller is used as a start-up controller till the neural network learns the vehicle dynamics. The control scheme involves a specific learning cycle. Venugopal et al. (1992) describe an on-line learning control scheme for tracking the pitch, depth, and heading of AUVs. In this scheme, the controller starts from random initial conditions and generates correct control
Learning Control of Autonomous Underwater Vehicles
835
signals in a few time instants, without any a priori training cycle. In all the above studies, feedforward neural network implementations with back propagation learning algorithm are used. Even though the feedforward networks are shown to provide certain level of performance when used as adaptive controllers for dynamical systems, the complexity required by the such networks may be large and the learning may be slow (Narendra & Parthasarathy, 1990; Yabuta & Yamada, 1992). Static feedforward networks are unable to map nonlinear dynamical systems efficiently and hence the implementation of neurocontrollers for dynamical systems is most appropriate when the networks themselves are dynamical in nature (Narendra & Parthasarathy, 1990).
1.3. Direct Control Schemes Based on Neural Networks
dynamics. Defining the squared error at the system output as the objective function to be minimized by the controller, we have:
Ey = (Yd-- y) 2
(1)
If (Yd -- Y) is defined as ey, the gradient of the error with respect to the network interconnection strength w, is:
OE _
Ow
Oy
ey Ow Oy Ou
ey Ou Ow
(2) (3)
"
The above equations can be generalized to the case when w is a vector. The term (Oy)/(Ou) corresponds to the forward gain of the dynamics and is an important factor determining the stability of the overall system (Yabuta & Yamada, 1992). If we define the system Jacobian J(u) = (Oy)/(Ou), eqn (3) can be rewritten
as,
The two generally used neural network-based control architectures are the direct control scheme and the indirect control scheme. In the indirect control scheme, the parameters of the dynamical system are identified at each instant, which are then used to estimate the controller parameters. Thus, there is an explicit identification process in this approach and such a scheme need not result in the minimum output error (Narendra & Annaswamy, 1989; Narendra & Parthasarathy, 1990). In the direct control scheme, the controller parameters are directly adjusted to reduce the output error while maintaining the overall stability (see Figure 1). Hence, it is simpler to implement such a scheme. It may be noted that even when the dynamics are linear and time-invariant, the above approaches result in overall nonlinear systems (Narendra & Parthasarathy, 1990). The basic direct control scheme for the case of a SISO dynamics is shown in Figure 1. The neural network learns the inverse characteristics of the dynamics implicitly to generate the correct control signal u, to drive the system state y to the desired state Yd. One of the problems in having a direct control scheme using neural networks is that the error at the output of the neural network (network error, eu), which is needed for updating the network weights, is not directly available (Figure 1). Only the system error ey (implicitly related to eu) can be measured at the output of the
oEy
ou
(4)
Ow = -eyJ( u) -~w "
Yd
/
FIGURE 1. Direct control scheme.
It can be seen that, in the case of an on-line control scheme, the parameter J(u) affects the network weights directly. An on-line numerical estimation of the Jacobian may cause fast variations in the instantaneous estimate and hence directly influence the weight updating of the controller neural network. These variations can be minimized by averaging but will result in slow or ineffective learning. The problem is more complex when the neural network has recurrent feedback, as there may be cumulative effects of J(u) on the weight updating (Venugopal, Sudhakar, & Pandya, in press). This can be tackled either by moderating the impact of the Jacobian estimate on the weight updating of the controller neural network or by employing a learning algorithm that does not explicitly require the error gradient and hence the use of network error. We pursue the latter strategy in this paper. In a typical control application, the learning has to be on-line (run-time) rather than off-line. In off-line learning, the training data (predetermined trajectories) is already available before learning starts. This approach is suitable only when the command signals are few and known. On the other hand, on-line learning is an incremental procedure attempting to minimize the output error as the training data becomes available. It has to keep up with the temporal flow of events. For accurate and complete learning of the system, the input set must be rich enough to reveal all modes of the system activity (persistent excitation) (Miller et al., 1991, pp. 12-15, 193). Thus, the sequence of command signals and system (initial) conditions need to provide a sufficiently varied set of training instances and a train-
836
KIP. Venugopal, A. S. Pandya, and R. Sudhakar
ing regime that repeats them often enough. However, this may not be readily achieved for realistic complex systems operating in unstructured environments. In practice, the initial conditions in the state space cannot be chosen at the discretion of the designer, but must be realized by sequencing the variety of inputs. In such situations, a useful strategy may be not to discontinue the adapting of the neural network controller, even though the weights converge to constant values, so as to provide the option of learning any unencountered initial condition in the state space. In short, the training is continued as long as the control action is needed. It may be seen that for on-line learning, issues such as speed of learning, system convergence, controller performance in poorly trained region, and learning interference have special significance. We have two major objectives in this paper. (i) We introduce the Alopex algorithm as an effective learning algorithm for on-line learning control applications. Because Alopex does not need an explicit error derivative for the network updating, the system error ey (Figure 1) can be directly used. This eliminates the problems associated with the algorithms that need an explicit error derivative. Further, the simple updating procedure of Alopex permits the algorithm to be used with any network architecture, including recurrent networks. (ii) We demonstrate the effectiveness of the Alopex based direct control scheme for the on-line control of the complex, highly nonlinear dynamics of an AUV. It is shown that the proposed neural network controller architecture and the learning algorithm give good tracking
performance and stability. We have also addressed the issues of disturbance rejection and adaptation capability of the proposed scheme in the presence of parameter disturbances. The paper is organized as follows. Section 2 deals with the overall control scheme. The recurrent network architecture and learning algorithm are described in Section 3. Section 4 presents the simulation results and Section 5 provides a discussion of the results. Our conclusions are presented in Section 6. 2. CONTROL SCHEME
2.1. Vehicle Dynamics

The simulation studies are performed on the dynamics of the Ocean Voyager, an AUV being built at the Center for Advanced Marine Systems, Florida Atlantic University. The vehicle is 22' long and 13" in diameter with a displacement of 2700 lb. The designed speed of the vehicle is 10.13 ft/s, with an over all cruise time of 2 h. The forward propulsion is of thruster type and the vehicle motion is controlled by two surfaces, the stern plane and rudder. The modelling of the vehicle dynamics is done using the conventional Euler equations of motion of a rigid body moving in a medium, based on the David Taylor Naval Ship Research and Development Center (DTNSRDC) revised equations of motion (Feldman, 1979; Shein & Kloske, 1991 ). The notations and coordinate system transformations used are of the DTNSRDC standard and are illustrated in Figure 2.
x~u
"
~ ~ ~
~ Horlzontal
FIGURE 2. Details of vehicle coordinate system. (adopted from Feldman, 1979).
837
The vehicle dynamics consists of two control inputs, viz., the stem plane and rudder deflections and six output parameters; viz., surge, sway, heave, roll, pitch, and yaw. The complexity of the vehicle dynamics is reflected in the following nonlinear relationship for the pitch angle. It consists of a fifth-order differential equation having nonlinearities up to the degree of four. The other input/output relationships also can be expressed in a similar fashion and the complete dynamic equations representing the vehicle motion is described in Feldman (1979). The pitch angle relationship is given by,
ly{t + (Ix - ly)rp - (1) + qr)Ixy + (pZ _ r2)ix~ + (qp _ i)lyz + m[zo(i~x - Vyr + vzq) - xo(i)z - vxq + v~p)] = 0.5plS[m~t + M ~ r p ] + 0.5pl4[m'bz + Mqvxq]
+
0.5pl3[M',i~2x + M'VxVz + M'I~,~t Vzf~V2y+ v~]
+ 0.5pl3[M'l~li~xli~l + M'oVlVz(VZy + v2)l] + 0.5pl3[ M'~,V] 's + M'6,,V2x'~( n - l )c]

+ 05c
fxb(x)Vz(X)VV (X) + v (x)dx

-- 0.5olCL
fx
2
XI)y(X)~FW(I -- r [ x l ) d x
(5)
where terms on the right-hand side of the equation correspond to the summation of all the moments with respect to the y axis, which is a nonlinear function of the form H ( u, v~, vy, Vz, iJz, p, q, r, il), involving stem plane deflection u. The parameters Vx, Vy, and Vz are the linear velocities along the three coordinate axis; p, q, and r are the angular velocities; and xo, Yo, and zo are the location of the center of gravity with respect to the global coordinate system. I stands for the moment of inertia, with the subscripts indicating whether planar or axial, and m, the mass of the vehicle (Feldman, 1979; Venugopal et al., 1992). Other parameters, such as M', are various hydrodynamic coefficients of the vehicle. The states of the vehicle ( vx, Vy, Vz, p, q, r) are obtained by integrating the above set of equations at each time instant. From the above discussion, it is apparent that the dynamics of the vehicle are highly nonlinear and the number of parameters affecting the nonlinearity is large. This, coupled with the turbulent ocean environment, renders nonadaptive conventional proportionalintegral-derivative (PID) controllers inadequate.
2.2. Control Scheme for the Vehicle
Preliminary simulation studies (Venugopal, Pandya, & Sudhakar, 1991 ) indicated that the rudder and stern plane relationships are fairly separable and, hence, the
overall control scheme can be simplified into two separate implementations. They are: (i) control scheme for the output parameters yaw ( ~ ) , roll (~b), and sway (y) with the rudder deflection 6r as the control input, and (ii) control scheme for the output parameters depth (z) and pitch (19) with the stem plane deflection 6s as the control input. Here the effect of stem plane deflection on the forward speed is neglected. Figure 3a shows the control scheme for the stem plane relationship. Each network controls one output parameter and the overall control input 6s is generated by combining them, as shown in the figure. Splitting the controller into SISO networks has the following advantages. (a) The behavior of SISO systems is well understood and hence controller design for them is easier than that for MIMO systems. This is true in the case of conventional controllers also. (b) From the simulation studies, it is observed that the updating rate of the networks is a critical factor, and it is easy to choose separate optimum values for each SISO network to get better convergence and tracking. The procedure for generating the desired signal is given below. The path planner of the AUV generates the desired velocities and positions at each instant, which in turn are used as inputs to the networks. 19d and Zd are the desired pitch and depth, respectively, and the control input 6s is generated by combining the neural network outputs. The change in forward velocity due to the control of stem plane is neglected and only the depth and pitch are controlled in this scheme. Each network is trained using the corresponding errors and the training is continued throughout the experiment. Figure 3b shows the rudder control scheme. 6d, 0d, and Yd denote the desired signals for yaw, roll, and sway, respectively. The corresponding outputs of the AUV dynamics are 6, 0, and y. As in the case of stem plane control, the networks are trained using the corresponding errors. It may be noted that the critical parameter in the lateral dynamics is the heading angle (yaw) and hence the next level of simplification will be to control the yaw only while monitoring the other parameters to be within the safe limits (Venugopal et al., 1992 ). The preliminary phase in the controller design process is to design the SISO controllers (Figure 4a) independently and to optimize their performance. In the subsequent phases, the designed networks can be used as start-up controllers for further training of the combined schemes described in Figure 3a,b to implement the complete control scheme. In this paper, we restrict our discussion only to the design of single-input singleoutput controllers for the three most important parameters viz., pitch, depth, and heading. The individual network architecture is shown in Figure 4b. Each controller consists of a three-layer recurrent neural network having one input layer of two neurons, one totally interconnected hidden layer of l0
838
K. P. Venugopal, A. S. Pandya, and R. Sudhakar
ii
~s
vehiclestates
W(t- l) I
f'
vehicle states
FIGURE 3. (a) The controller architecture for the lateral motion. The vehicle states fed back to the path planner are the linear and angular velocities, (b) The controller architecture for the longitudinal motion.
neurons, and an output layer having one neuron (a 2 X 10 X 1 network). The desired command and actual response of the vehicle are used as the two inputs to the controller. The size of the neural network was determined empirically based on a rough estimate of the complexity of the dynamics. In our earlier studies using feedforward networks (Venugopal et al., 1991, 1992), we found that a four-layer network of size 2 X 20 X 10 X I provided comparable tracking performance and disturbance rejection capability. 3. THE RECURRENT NETWORK ARCHITECTURE AND LEARNING ALGORITHM The mapping capabilities of the feedforward sigmoidal neural networks are well studied (Hornik, Stinchcombe, & White, 1989; Cybenco, 1989). Even though such networks are not dynamical in nature, they give satisfactory results, under certain conditions, for a number of control problems involving dynamical systems (Narendra & Parthasarathy, 1990; Miller et al., 1991 ). However, to control higher order systems, the network complexity (in terms of the number of neurons and interconnections) needed may be large. Also, for
such networks, the learning rate needs to be kept small, to ensure overall stability of the system (Narendra & Parthasarathy, 1990; Yabuta & Yamada, 1992), and this may result in slow dynamical response. Thus, in the control of dynamical systems, a dynamic (recurrent) neural network controller can outperform its static counterpart.
3.1. The Recurrent Neural Network Architecture
The recurrent neural network architecture has gained increased attention in recent years (Almeida, 1988; Pineda, 1987; Williams & Zipser, 1989; Elman, 1990). In a number of applications involving temporal signals, they are able to produce promising results (Karakasoglu et al., 1991; Narendra & Parthasarathy, 1991; Krishnapuram &Chen, 1993; Jordan, 1986). Though the learning ability of the recurrent network architectures is impressive, no simple algorithms are presently available for training such networks. One approach is to modify the feedforward back propagation learning algorithm for recurrent architecture. In this case, the algorithm becomes computationally intensive (Narendra & Parthasarathy, 1991 ), restricting their application in on-line learning control tasks. Because the algorithm
839
3.2. The Alopex Algorithm

~t) --
1-t W
Alopex is a biologically influenced stochastic parallel process designed to find the global minimum of an error surface (Harth & Tzanakou, 1974; Harth, 1976; Harth et al., 1987). The algorithm works by broadcasting a measure of the global performance, a cost function, to all the neurons in the network, synchronously. The explicit derivative of the error function need not be calculated for updating the weights in this procedure. A correlation measure between the change in weight and the global error change is estimated and the individual weights are changed based on a probability index of going in the right direction, so that the global error function is minimized (Tzanakou, 1992; Venugopal & Pandya, 1991 ). The algorithm is similar to the simulated annealing (Kirkpatrick, Gelatt, & Vecchi, 1983 ) in an implicit manner, allowing faster transitions out of local minima. The details of the algorithm are given below (Harth & Pandya, 1988; Venugopal & Pandya, 1991; Venugopal, 1993). Consider a neuron i with an interconnection strength w0, from neuron j in the lower layer. The output of the neuron i, during the nth iteration, is given by:
neti(n) = ~ w o ( n ) o u t j ( n ) + Oi(n) J (8)
FIGURE 4. (a) Individual neural network controller architecture. (b) Recurrent neural network architecture. The input and output layers are feedforward and the hidden layer is totally recurrent.
uses explicit gradient information, it requires nonlocal information for updating of the weights (Williams & Zipser, 1989). These problems can be alleviated by the proposed Alopex learning scheme (Harth, 1976; Harth & Pandya, 1988; Harth, Unnikrishnan, & Pandya, 1987; Venugopal & Pandya, 1991; Pandya & Venugopal, in press). In this study, we use a three-layer recurrent network architecture described by the following dynamical model (see Figure 4b):
.;c = - x + W f ( x ) u =f(Hx) + Bi
where 0~ is the threshold of neuron i. Applying a sigmoidal transformation to net/, we obtain, outi(n) = 1 - exp(-neti(n)/To) 1 + exp(-neti(n)/To) (9)
where To is the sigmoidal gain. During the nth iteration, the weight wij is updated
as,
w0(n) = wij(n - 1) + ~0(n)] Oi(n) Oi(n - 1) + ~j(n) ~ J
(I0)
(6) (7)
where 6~j(n) will have a small positive or negative step of size ~ with the following probabilities: ~o(n)= - 6 =+6 with probability P,j(n) ] withprobability [l-P0(n)] }j" (11)
where hidden layer output x E ~q, the lateral hidden layer weights W ~ ~qq, the input layer weights B ~qP, the output layer weights H E ~rq, p, q, and r are the number of neurons in the input, hidden, and output layers, and i represents the input vector. The function f ( . ) is a vector valued sigmoidal transformation function, with bounds between - 1.0 and + 1.0. The corresponding network architecture is shown in Figure 4b. The network has a feedforward input layer, a totally recurrent hidden layer, and a feedforward output layer. We employ a simple, stochastic, synchronous learning algorithm called Alopex for training this network architecture. Alopex does not impose any restriction on the network architecture because the algorithm is a separate entity, and no explicit error gradient needs to be computed.
The probability P~j(n) is given by the expression: Po(n) =
1 1 + exp[-Ao(n)/T ]
(12)
where A~j(n) is given by the correlation: A~j(n) = Awo(n)zS~E(n), (13)
Awo(n) and 2rE(n) are the changes in the weight wij and the error measure E over the previous two iterations [ for the first two iterations, Po.(n) is taken as 0.5]. In the expression for Pu(n), T is a positive temperature that determines the effective randomness in the system. With a nonzero value for T, the algorithm takes biased
840
random walks in the direction of decreasing E. The simulations are started with a large value for the temperature T. Subsequently, the temperature is set equal to the average value of the correlation A0, that is, T : IZxl. (14)

to the gradient due to the stochastic nature of the algorithm. Hence, the instantaneous updating of the network at each sampling instant may not be desirable in the present case (Venugopal, Pandya, & Sudhakar, 1992). We employed a procedure of batch updating over time in which the output error is averaged over T s. That is, the averaged error is
This method automatically reduces the temperature when parameters are close to the optima where the correlations are small. Towards the end, the step size can also be reduced for precise convergence. The stochastic approach of learning in Alopex differs from the instantaneous incremental learning in schemes such as back propagation (Rumelhart & McClelland, 1986). In Alopex, the weights affecting the system error are changed simultaneously by small amounts that are determined in part by the cross-correlation between the previous change in weight and the change in the system error, and in part by the noise (randomness) that is represented by an effective temperature. The stochastic algorithm increments or decrements the weights in such a way that the weight trajectories execute random walk with a superimposed bias that directly minimizes the system error (Harth et al., 1987 ). Also, Alopex learns by relating the pattern of the recent history of the system to the pattern of weight changes. Note that this pattern learning procedure also results in a decrease in network error, without having to determine it explicitly. The Alopex algorithm also differs from the other learning algorithms in that the system error is computed after all weights are changed synchronously. Some of the important properties of the Alopex algorithm that make it useful for control applications are: (i) the algorithm makes no assumption about the structure of the network, the error to be minimized, or the transfer function at the individual nodes. Hence, any network architecture can be employed, including recurrent ones (Venugopal, 1993 ); (ii) the weights are updated synchronously and hence allows efficient parallelization in hardware (Pandya, Shankar, & Freytag, 1990); (iii) studies have indicated that Alopex has very good global generalization and disturbance rejection characteristics (Venugopal, 1993 ). Here, generalization refers to the capability of the networks to learn the regularities and characteristics underlying in the input data, rather than performing a look-up table type of operation. Such a property will be very useful in applications where the disturbances affecting the dynamical system are ill-modelled or unknown.
3.3. Batch U p d a t i n g
E=~Z[)'e(t)
1 r
y(t)] 2
(15)
The weights are updated every T s using the equations in Section 3.2. Note that the time of accumulation T, influences the speed of neural network response. Thus, whenever there is a sudden change in the error or when the command signals vary rapidly, the batch size is reduced automatically for a faster updating. These choices of Tare made based on the maximum operational speed of the AUV and the bandwidth of its dynamics (Shein & Kloskie, 1991 ). 4. SIMULATION RESULTS This section provides the experimental results of the study conducted for the motion control of the AUVs using the recurrent network architecture and the Alopex learning algorithm. Three specific experiments are conducted; keeping the vehicle at (i) a desired pitch, (ii) a desired heading, and (iii) a desired depth. In each case, the performance of the controller is studied by monitoring the speed of convergence, tracking error, and stability of the overall system. The experimental aspects and assumptions are outlined here: (i) The desired responses of the vehicle (pitch, depth, and yaw) are made available at each instant from a path planning system. (ii) The actual response of the vehicle is measured by numerically integrating the differential equations of motion. The sampling time as well as the time interval for one iteration for the neural network is chosen as 0.05 s. The sampling time is determined based on the numerical accuracy required for the integration of the differential equations and the frequency response of the vehicle dynamics. (iii) The neural networks used in all the cases are of size 2 10 1, with the hidden layer totally recurrent. (iv) The network weights are initialized to small values between +0.1 and -0.1 in all the cases. (v) On-line learning is employed for the control with the network weights. Weights of the networks are updated every 10 s (200 iterations), when the command signals are constant. When the command signals are time varying, the updating interval is reduced to half of the previous value ( 5 s). (vi) The step size and learning rate determine the time constant of the network and are critical parameters
It can be seen from eqns (8)-(13) that in the Alopex algorithm, the weight updating is done without calculating the error derivatives explicitly. The weight updating at some instants may be in the direction opposite

for the stable operation of the overall system. Stability is also related to the initial distribution of the weights, which are set to small random values. (vii) The type of command signals and vehicle trajectories used in the studies are similar to the ones employed in practical situations. The rise time for the trajectories are chosen based on the system bandwidth. Note that the sampling time employed in the simulations is also related to the system bandwidth. Normally, the command signal to the neural network controller and the error signal need to be in the range ( - l, 1) because a threshold function [ eqn (9) ] of range ( - 1, 1 ) is used. In some cases, for example, the depth control problem, the command signal is in the range of hundreds of feet. There are two ways to address this problem: (a) normalize the command signal to the range ( - 1 . 0 , 1.0) before feeding as the input to the network; the corresponding vehicle response also has to be scaled down by the same factor; (b) the training algorithm can be modified to incorporate a threshold function having larger bounds; this in turn expands the range of the weight space. We employ the first method in the present studies. The controller output is in the range ( - 1 rad., 1 tad.), corresponding to an angular deflection ofT-57.3 The operational limit of the vehicle actuator for stern plane and rudder is 7-20.0 and hence any controllel output is limited between the limits T-20.0 . But, in all of our studies, the control signals generated by the neural network were well within these operational limits. The designed maximum speed of the vehicle is 10.16 It/s, which corresponds to the forward thruster rpm of 1200.
841
1'I
12
lo!
f/,,: 6> // i!:
4'~ 2 ~:l N: 2xl0xl RPM: 1000 stop saz~: 0.0~ 8 ~ ,I ,'
1 J
I
i
i
O, .2!
o 2
&
6oo i 1~
time: sec.
12~ 14oo l ~
1800
2000
0.2 I
ot
I
N: 2xl0xl RPM: 1000 step size: 0.006
~ .o.3t
,4L
.0.7o!
8oo
1000 ~ e : sec.
1200
1400
1600
18'00 2000
4.1. Results
Experiment 1. The effectiveness with which the controller is able to drive the vehicle at a specified constant pitch is investigated in this case. The stern plane deflection is the control input to the AUV dynamics. The response of the vehicle for a step pitch command of 10 is shown in Figure 5a. The network is updated at every 10 s using a step size of 0.006 and a learning rate of 1.0. Considering the inertial nature of the AUV dynamics, the command signal is varied linearly in the initial interval. It can be seen that the vehicle is able to track the commanded pitch closely. The corresponding neural network controller output is shown in Figure 5b. The validity of the assumption that the rudder and stern plane relationships are separable is confirmed by monitoring the variations in yaw, roll, and sway during the pitch control, which remain steady. The tracking control of a time-varying pitch command signal (a different trajectory) is shown in Figure 6a. The vehicle was commanded to be at a pitch of 10 till 200 s and to be at a pitch o f - 1 0 at 1300 s. The
FIGURE 5. (a) Vehicle response to a step pitch command of 10 plotted as a function of time. The command signal and the actual response are represented by dotted and solid lines, respectively. Each second in time axis corresponds to 20 iterations. The network is updated at every 10 s with a step size of 0.006. The vehicle speed is 8.4 ft/s (corresponding to the thruster rpm of 1000). (b) The control input generated by the neural network for the response shown in (a). The network size is 2 x 10 x 1.
network is updated faster, at double the rate, whenever there is a change in the command signal. This is a necessary requirement for a closer tracking. Figure 6b shows the corresponding control input generated by the neural network controller. It can be observed that the overshoot at the time = 1300 s (corresponding to the second transition in command signal) is higher than that during the first transition. This is because the vehicle is commanded to change its pitch from + 10 to - 1 0 during the second transition but the change was only 10 from the neutral position (0 ) during the first transition. It can be seen that the network is able to reduce this overshoot in the next transition from - 10 to +10 . The step size for the network updating is determined empirically. In some of the earlier static pattern rec-
842
a 2o,i
K. P. Venugopal, A. S. Pandya, and R. Sudhakar Experiment 2. Using the same control strategy, a neural network controller is designed for commanding the vehicle to track a specified heading (yaw). Figure 7a shows the response of the vehicle for a constant heading command of 30 . Because of the inertia of the vehicle, in this case the command signal is also increased linearly during the initial period. As evident from the figure, a close tracking of the commanded yaw angle is achieved, even though the controller is started from a random state at time t = 0. The corresponding control input to the rudder given by the neural network controller is shown in Figure 7b. The network step size is 0.008 and the forward thruster rpm is 1000. Experiment 3. Figure 8a shows the response of the vehicle to a depth command signal that is meant to direct the vehicle to a depth o f - 150 ft in 1000 s and remain steady at that depth. Again, because of the vehicle inertia, the command signal was varied in a linear fashion initially. As pointed out earlier, such sudden transitions in command signals are not usual in practical situaa 40
15 i
N: 2xl0xl step size: 0.006 RPM: 1000
/,# I ,,,' / 1!
-5-
~,,command signal \ iI ~ \ ~ _ . , J /j~ctual response
i
45 '
h
-2OI
500
1000
1500
time: sec.
2000
2500
3~
1.5,
1-
N: 2xl0xt step size: 0.006 RPM: 1000
05-
: _
!.,,/
i
c'
O-
35" {
, i
30L
,
command signal
-I
500
1000
1500
time: sec.
2000
2500
3000
k~ -~
:~
i 20 ,
/'
:;
///actual
response
z5! L
I0 n 5-,
FIGURE 6. (a) Vehicle response to a time-varying pitch command. The network is updated at every 10 s with a step size of 0.006. (b) Corresponding control input.
N: 2xt0x1 RPM: 1000 step size = 0.008
ognition simulations using the Alopex algorithm (Venugopal & Pandya, 1991; Venugopal, 1993), we had observed that the step size may be of the order of 0.001 to 0.01, for an effective convergence of the network. Because the step sizes generally used in the case of control applications are smaller than that of the static cases, we started with a guess of 0.00 1 and changed to get a closer tracking and stability. The effect of employing a constant step size is evident from Figure 6b. When the command signal is stationary, the neural network output hunts around a constant value. It may be noted that these fluctuations are not nonlinear oscillations and are there only in the neural network output and not in the vehicle response. Adapting the step size is an effective way to reduce such fluctuations. A smaller step size can be employed whenever the commanded trajectories are of stationary nature. Also, in the presented scheme, the overshoot and rise time can be independently controlled by adapting the step size and employing an averaging batch size, unlike in the case of conventional linear controllers where they depend on system damping.
0 i',:' 0
200
400
600
800
1000 riffle: SeC.
1200
1400
1600
1800
2000
o.1.
-0. t .i
,.../.
-0.2 1 ,v'v~ o
-04 i -0.5 i f -0.6 N: 2xt0xl RPM: 1000 step size: 0.008
-70
2-00 4oo
60o
8oo xooo t200 1~

time: sec
1~
18-00 2000
FIGURE 7. (a) Vehicle response for a heading (yaw) command of 30 . The vehicle speed is 8.4 ft/s. (b) The control input generated by the neural network controller. The controller is updated at every 10 s with a step size of 0.008.
843
a
0
tions. In our case, the forward thruster rpm is initially set to 1000 and when the depth is brought to -200 ft, the rpm is decreased to half (500). This reduction in rpm is a necessary operational requirement, whenever there is a change in the depth command signal, because of the constraints due to the vehicle inertia. A step size of 0.008 is used for the neural network controller. The corresponding control input is shown in Figure 8b. It is to be noted that in all three experiments described above, the step sizes were not varied significantly from each other to obtain close tracking performance. Hence, in an implementation where all three output parameters, pitch, depth, and yaw, are controlled simultaneously (as in Figure 3a,b), the fine-tuning required after linking all the individual controllers may not be significant.
4.2. Controller Performance Under Disturbance
-50
N: 2xlOxI SEep size: 0.008 RPM: 1000
-100
actual
-150
response
command signal
-2001 0
200
400
600
800
i000 1200
1400
[800
2000
time: sec.
0.2
As indicated before, the number of external parameters affecting the AUV dynamics are many. The forward speed of the vehicle (acceleration and deceleration ) affects the flow over the vehicle body and stern plane, thereby affecting the overall dynamics [see eqn (5)]. The changes in environment, such as water currents, temperature, and pressure differences, during the course at different parts of the ocean, may be significant. Under such circumstances, the necessary requirements of the AUV controller are: (a) ability to deal with consequences of parameter variations and random disturbances; (b) real-time performance; and (c) reasonable complexity (Miller et al., 1991, pp. 427-474). Conventional controllers are often inadequate for such tasks. We conducted some limited studies on the ability of the neural network controller to adapt to sudden changes in the dynamics by changing the vehicle speed abruptly and observing the tracking performance. Figure 9a shows the tracking of the pitch command when the vehicle speed was reduced by 50% suddenly (from 1000 to 500 rpm, as shown in Figure 9b) at 1250 s. It can be seen that the neural network is able to adapt accordingly to drive the vehicle to the specified pitch angle. Another parameter having significant importance in the controller performance is the measurement noise. A high noise-to-signal ratio may create poor tracking and even instability in the case of traditional control techniques. One of the promising properties of the neural networks is their ability to generalize even when the input data is noisy. To study the ability of the neural network controller to adapt to the measurement noise, we introduced a white noise component into the measured response (pitch) of the vehicle (Figure 10a). The noise amplitude was varied from 5% to 150% of the output signal, and the mean square error between the
-0.2 I -0.4 i
/
/ /
l"
F
-0.6 08! I
/
/
/
N: 2xl0xl s~p size: 0.008 RPM: 1000
I
-1!
"1"21
-I.4
o
~o
s6o
,ooo 12oo ~ o
um: see.
~oo
FIGURE 8. (a) Commanded depth signal and the corresponding vehicle response. (b) Neural network output for the case in (a). The networks are updated at every 5 s with a step size of 0,008.
desired and actual responses was calculated. The mean square error plotted as a function of the noise to signal ratio is shown in Figure 10b. It can be seen that the variation in the error is very small (only about 0.025 ). This shows that the controller is able to reject the noise very effectively over a wide range. 5. DISCUSSION In this paper, we demonstrate the effectiveness of using the Alopex algorithm as a learning algorithm for recurrent network architecture. It is shown that the proposed algorithm and the controller architecture provide an effective on-line control strategy for the AUVs. Simulation studies for the control of pitch, yaw, and depth of the AUV indicates that the proposed strategy gives good tracking performance and stability. It may be noted that the generalization capabilities of the learning
844
FI
1200r
I
1100~ 10~)
9ooF
f
[..-,
i
800F
700~
600 ~
500i
I
400300
L
500
i000
,5oo
20m
25oo
3ooo
3500
time: sec.
15
command signal
,0,F~..~-----~-.------~.~.__.~
l i:
:
i
:
/
..... i ~e~p. . . .
i /'
't
~I/
~'
N: 2xl0xl
step size: 0.005
surement noise disturbances, the step sizes employed were not significantly different from each other. Further, the changes in weight space were not significant as the operations changed from one trajectory to another. The step size, 6 [eqn ( 11 )], is found to be a critical factor for the overall stability of the control system. The step size can be adjusted to obtain the desired dynamic responses (such as overshoot and rise time). The updating of the networks is done by averaging the error over a period of T s; such a strategy is needed to ensure stability. The updating needs to be faster whenever there is a sudden change in the command signal; the updating interval is decreased to T~2 s in such situations. Closer tracking can be achieved by adaptively varying the step size according to the system error. A similar effect can also be achieved by adapting the temperature T [eqn ( 12)]. In all the studies described above, the step size is kept constant during the entire control duration. The choice of step size is a compromise between speed of response and closeness of tracking. When the command signal is constant or slow varying, a smaller step size ensures closer tracking, but the response towards sudden changes in the command signals may not be of the desired level. The effect of a constant step size on the controller performance can be observed in Figure 6b. The neural network output shows a hunting behavior. Note that this action of the Alopex when track-
I
-5 L 0
500 i000 1500 2000 2500 3000 3500
time: sec.
FIGURE 9. (a) Change in thruster rpm. The vehicle speed is reduced suddenly by 50% (from 8.4 ft/s to 4.2 ft/s) at 1250 s. (b) The response when the vehicle speed was changed as in (a).
u(t)
y(O
I
b 0.08,-0.07 i
"t
o
l
I
algorithm in the presence of noise and disturbances is an important factor affecting the tracking performance of an on-line controller. The presented algorithm is computationally simple and has already been shown (Pandya et al., 1990) to be better suited for implementation on highly parallel hardware. In previous studies using the same vehicle dynamics (Venugopal et al., 1991, 1992 ), we have used a feedforward network of size 2 X 20 X 10 X 1 for effective control of all the parameters individually. In the recurrent neural network implementation, we could achieve the same performance with networks of 2 X 10 X 1. Detailed studies are necessary to see whether such a small network has enough degrees of freedom to learn large number of trajectories for a highly nonlinear vehicle dynamics. It is to be noted that the totally recurrent hidden layer provides large effective memory in the system. In each of the experiments described above, including the ones with parameter and mea-
-i
o.o6~
f 0.05
0.04-
0.03
I J
0.02 0 0.2 04 0.6 0.8 1 12 1.4
Noise to Signal Ratio
FIGURE 10. (a) Addition of measurement noise to the vehicle response. Noise-to-signal ratio is changed from 5 to 150%. (b) Mean square error plotted as a function of the noise-to-signal ratio.
Learning Control of Autonomous Underwater Vehicles ing slow varying command signals should not be confused with nonlinear oscillations. These fluctuations can be reduced by proper step size schedule. Also, because of the stochastic nature of the algorithm, perfect tracking with zero error is impossible with Alopex, but with suitable parameter choices, close enough tracking can be obtained. Due to the inertial nature of the AUV dynamics, there is a noticeable amount of rise time for the actual as well as the desired system outputs, which is decided by the system bandwidth. As the derived system model is accurate only for signals with comparable bandwidth, an abruptly changing simulated signal can produce artifacts in the system response. To avoid this during simulation, the command signals are increased at a rate that does not violate the system bandwidth restriction. As in the case of the traditional adaptive control schemes and neural network-based control schemes, the presented control scheme may also get into chaotic regime. Chaos is more likely if the initial conditions or the external input cause the system to operate in strongly nonlinear region of the state trajectory (Slotine & Li, 1991, pp. 11 ). We do not have a foolproof way of detecting the onset or avoiding the chaotic regime. Our simple strategy was to prevent the system from entering to such a state by initializing the weights to small random values, keeping the step size sufficiently small and proper choice of batch size for averaging such that the neural network behavior does not change substantially over two consecutive iterations. Algorithms that use local generalizations and employ gradient descent-based adaptation procedure, such as CMAC (Albus, 1971 ), have been studied for control applications (Miller, Glanz, & Kraft, 1990; Miller, 1989; Miller et al., 1991, pp. 143-169). In this approach, first a characteristic system surface (a look-up table for control input vs. state space) is generated from the input/output measurements of the dynamics. The generated surface is used as feedforward information to calculate the control signal for the required system state. This necessitates the use of a traditional controller in the first few control cycles, until the look-up table is generated. The method described in this paper differs from the above approach in two ways. Alopex generalizes globally and no explicit gradient of the error surface is calculated. Secondly, the control scheme starts from random initial conditions and no start-up controller is employed in the implementation. Relative merits of the two different schemes will be a worthwhile topic for future research. 6. CONCLUSION In this paper, we introduced the Alopex algorithm as an effective learning algorithm for recurrent neural networks. Also, we presented a recurrent neural net-
845 work based on-line learning control scheme for AUV, splitting the dynamics into single-input single-output cases. Studies were conducted on the control of pitch, yaw, and depth relationships of the AUV, under different command conditions, and the results were discussed. The robustness of the controller to parameter as well as measurement disturbances was investigated. The resuits show that the networks are able to adapt effectively under disturbances.
REFERENCES
Albus, J. S. ( 1971 ). A theory of cerebellar functions. Mathematical Biosciences, 10, 25-61. Almeida, L. B. (1988). Backpropagation in perceptr0ns with feedback. In R. Eckmiller, & Ch. Malsburg (Eds.), Neural computers. NATO ASI Series (vol. F41 ). New York: Springer-Verlag. Belligham, J. (1992). Capabilities of unknown underwater vehicles.
Proceedings of the Workshop on Scientific and Environmental Data Collection with Autonomous Underwater Vehicles (pp. 7-14).
Cambridge, MA. Belligham, J., & Humphreys, D. (1990). Using layered control for supervisory control of underwater vehicles. In Proceedings of ROV '90. Bricks, D. C. (1989). A project to develop and test layered control system for autonomous underwater vehicles. Proceedings of the
Eighth International Conferenceon OffshoreMechanics and Arctic Engineering (pp. 123-127). The Hague.
Cybenco, G. (1989). Approximation by superposition of a sigmoidal function. Mathematics of Control and Systems, 2, 303-315. Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179-211. Feldman, J. ( 1979 ). DTNSRDC revisedstandard submarine equations of motion. (Tech. Rep. No. SPD-0393-09). David W. Taylor Naval Ship Research and Development Center. Fu, K. S. (1972). Learning control--a review and outlook. IEEE Transactions on Automatic Control, 4, 210-221. Fujii, T., & Ura, T. (1990). Development of motion control system for AUV using neural nets. Proceedings of AUV'90 (pp. 81-86 ). Gofieen, K. R., & Jefferys, E. R. (1990). On adaptive control of remotely operated underwater vehicles. International Journal of Adaptive Control and Signal Processing, 4, 287-297. Goheen, K. R., Jefferys, E. R., & Broome, D. R. (1987). Robust selfdesigning controllers for underwater vehicles. Transactions of American Society of Mechanical Engineers, 109, 170-177. Harth, E. (1976). Visual perception: A dynamic theory. Biological Cybernetics, 22, 169-180. Harth, E., & Pandya, A. S. (1988). Dynamics of Alopex process: Applications to optimization problems. In L. M. Ricciardi (Ed.), Biomathematics and related computational problems. Amsterdan: Reidel Publications. Harth, E., & Tzanakou, E. (1974). Alopex: A stochastic method for determining visual receptive fields. Vision Research, 14, 14751482. Harth, E., Unnikrishnan, K. P., & Pandya, A. S. (1987). The inversion of sensory processing by feedback pathways: A model of visual cognitive functions. Science, 237, 184-187. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359-366. Jordan, M. I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedingsof the Eighth Annual Conference of the Cognitive Society, Amherst. Karakasoglu, A., Sudharsanan, S. I., & Sundaresan, M. K. (1991). Neural network based identification and adaptive control of non-
846 linear systems: A novel dynamical network architecture and training policy. In Proceedings of the Conference on Decision and Control, IEEE, New York. Kirkpatrick, S., Gelatt, C. D., & Vecchi, V. P. (1983). Optimization by simulated annealing. Science, 220, 671-680. Krishnapuram, R., & Chen, L. E ( 1993 ). Implementation of parallel thinning algorithm using recurrent neural networks. IEEE Transactions on Neural Networks, 4, 142-147. Miller, W. 1". (1989). Real-time application of neural networks for sensor based control of robots with vision. IEEE Transactions on System, Man, and Cybernetics, 4, 825-831. Miller, W. T., Glanz, E H., & Kraft, L. G. (1990). CMAC: An associative neural network alternative to backpropagation. Proceedings of lEEE, 78, 1561-1567. Miller, W. T., Sutton, R. S., & Werbos, P. J., (Eds.). ( 1991 ). Neural networks for control. Cambridge: MIT Press. Narendra, K. S., & Annaswamy, A. M. ( 1989 ). Stable adaptive systems. Englewood Cliffs, N J: Prentice Hall. Narendra, K. S., & Parthasarathy, K. (1990). Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1, 4-27. Narendra, K. S., & Parthasarathy, K. ( 1991 ). Gradient methods for the optimization of dynamical systems containing neural networks. IEEE Transactions on Neural Networks, 2, 252-262. Pandya, A. S., Shankar, R., & Freytag, L. (1990). SIMD architecture for Alopex neural network. In J. Ghosh (Ed.), Parallel architecture Jbr image processing. Proceedings of SPIE, 1246, 275-287. Pandya, A. S., & Venugopal, K. E (in press). A stochastic parallel algorithm for learning in neural networks. IEICE Transactions on Information and S.vstems. Pineda, E J. ( 1987 ). Generalization of backpropagation to recurrent networks. Physical Review Letters, 59, 2229-2232. Rumelhart, D. E., & McClelland, J. L. (Eds.). (1986). Parallel distributed processing (Vol, 1). Cambridge: MIT Press. Sanner, R. M., & Akin, D. L. (1990). Neuromorphic pitch attitude regulation of an underwater telerobot. IEEE Control Systems Magazine, 4, 62-67. Shein, A., & Kloske, J. ( 1991 ). Development of simulation facilities for autonomous underwater vehicle research. Proceedings of the Summer Simulations Conference (pp. 746-748 ). Slotine, J. E., & Li, W. ( 1991 ). Applied nonlinear control. Engiewood Cliffs, NJ: Prentice Hall. Tzanakou, E. (1992). Unsupervised global optimization: Applications
K. P. Venugopal, A. S. Pandya, and R. Sudhakar on classification of handwritten digit and visual evoked potentials. Proceedings of IEEE Conference on SMC, 381-386. Venugopal, K. P. ( 1993 ). Learning in Connectionist Networks Using the Alopex Algorithm, Doctoral dissertation, Florida Atlantic University, Boca Raton. Venugopal, K. P., Pandya, A. S., & Sudhakar, R. (1992). Alopex neural networks for adaptive control of dynamical systems. Proceedings oflJCNN '92 (vol. 3, pp. 875-880), Baltimore. Venugopal, K. P., & Pandya, A. S. (1991). Alopex algorithm for training multilayered neural networks. In Proceedings oflJCNN '91, Singapore. Venugopal, K. P., Pandya, A. S., & Sudhakar, R. ( 1991 ). Adaptive neural network controllers for autonomous underwater vehicles. Proceedings o[ROV '91 ( pp. 361-366 ). Venugopal, K. E, Sudhakar, R., & Pandya, A. S. (1992). On-line learning control of autonomous underwater vehicles using feedforward neural networks. IEEE Journal of Oceanic Engineering, 17, 308-320. Venugopal, K. P., Sudhakar, R., & Pandya, A. S. (in press). An improved direct control scheme for adaptive control of dynamical systems using backpropagation neural networks. Circuits, Systems, and Signal Processing. Williams, R. J., & Zipser, D. (1989). A learning algorithm for continually running fully recurrent networks. Neural Computation, 1,270-280. Widrow, B., Gupta, N. K., & Maitra, S. (1973). Learning with a critic in adaptive threshold systems. 1EEE Transactions on Systems, Man, and Cybernetics, 3, 455-465. Yabuta, T., & Yamada, Y. (1992). Neural network controller characteristics with regard to adaptive control. IEEE Transactions on Systems, Man and Cybernetics, 22, 170-176. Yoerger, D. N., & Slotine, J. E. ( 1985 ). Robust trajectory control of underwater vehicles. IEEE Journal ~f Ocean Engineering, 4, 462470. Yuh, J. (1990a). Modelling and control of underwater robotic vehicles. IEEE Transactions on Systems, Man, and Cybernetics, 20, 14751483. Yuh, J. (1990b). A neural net controller for underwater robotic vehicles. IEEE Journal ~)[Oceanic Engineering, 10, 161-166. Yuh, J., & Lakshmi, R. (1993). An intelligent control system for remotely operated vehicles. IEEE Journal qf Oceanic Engineering. 18, 55-61.

Neural Network

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Neural Network

Încărcat de

Drepturi de autor:

Formate disponibile

Pergamon 0893-6080( 93 ) E0030-B

K. P. Venugopal. A. S. Pandya, and R. Sudhakar

Learning Control of Autonomous Underwater Vehicles

Ow = -eyJ( u) -~w "

KIP. Venugopal, A. S. Pandya, and R. Sudhakar

2.1. Vehicle Dynamics

FIGURE 2. Details of vehicle coordinate system. (adopted from Feldman, 1979).

Learning Control of Autonomous Underwater Vehicles

0.5pl3[M',i~2x + M'VxVz + M'I~,~t Vzf~V2y+ v~]

+ 0.5pl3[M'l~li~xli~l + M'oVlVz(VZy + v2)l] + 0.5pl3[ M'~,V] 's + M'6,,V2x'~( n - l )c]

fxb(x)Vz(X)VV (X) + v (x)dx

K. P. Venugopal, A. S. Pandya, and R. Sudhakar

Learning Control of Autonomous Underwater Vehicles

3.2. The Alopex Algorithm

w0(n) = wij(n - 1) + ~0(n)] Oi(n) Oi(n - 1) + ~j(n) ~ J

The probability P~j(n) is given by the expression: Po(n) =

where A~j(n) is given by the correlation: A~j(n) = Awo(n)zS~E(n), (13)

K. P. Venugopal, A. S. Pandya, and R. Sudhakar

Learning Control of Autonomous Underwater Vehicles

N: 2xl0xl RPM: 1000 step size: 0.006

N: 2xl0xl step size: 0.006 RPM: 1000

~,,command signal \ iI ~ \ ~ _ . , J /j~ctual response

N: 2xl0xt step size: 0.006 RPM: 1000

N: 2xt0x1 RPM: 1000 step size = 0.008

1000 riffle: SeC.

8oo xooo t200 1~

Learning Control of Autonomous Underwater Vehicles

N: 2xlOxI SEep size: 0.008 RPM: 1000

K. P. Venugopal, A. S. Pandya, and R. Sudhakar

step size: 0.005

S-ar putea să vă placă și