Sunteți pe pagina 1din 7

Path Planning for Autonomous Soaring Flight in Dynamic Wind Fields

Nicholas R.J. Lawrance and Salah Sukkarieh


AbstractAn autonomous aircraft capable of utilising soar-
ing ight in a dynamic wind eld could considerably extend
ight duration by limiting the use of on-board energy for
propulsion. While soaring ight is relatively well understood
for known wind, an autonomous soaring aircraft would have
to generate paths based only on local observations of the
wind made during the ight. This paper presents a method
to simultaneously map and utilise a wind eld using Gaussian
process regression to generate a spatio-temporal map of the
wind, and a path planning and dynamic target assignment
algorithm to generate energy-gain paths from the current wind
estimate. The planning architecture is tested in simulation for
dynamic wind elds and shows consistent energy gain through
exploration and exploitation of the wind environment.
I. INTRODUCTION
Unmanned Aerial Vehicles (UAVs) provide unique capa-
bilities in a range of industrial, scientic and defence appli-
cations. However, current UAV systems rely on propulsion
systems with on-board energy storage to maintain ight.
These systems limit the allowable mission duration of a UAV
to the amount of energy storage available.
Energy can be captured from the atmosphere during ight
through soaring. Natural wind phenomenon like thermals and
shear layers can be exploited to provide energy in terms of
altitude or airspeed gains. This paper presents a method to
simultaneously collect information about a wind eld and
utilise that information for soaring. However, the problem is
complicated by the fact that usually only local observations
of the wind are available, and atmospheric structures vary
over a range of spatial and temporal scales.
Early soaring research focused on observation and analysis
of the trajectories used by soaring birds [1], [2]. More
recently, computational optimisation has allowed further
analysis of the conditions required for soaring in known
wind conditions [3], [4], [5]. However, this research largely
focused on determining trajectories off-line in known wind
conditions. More reactive strategies have also been proposed,
where knowledge of soaring mechanisms is utilised to deter-
mine and track the current optimal ight condition based on
direct observations [6], [7], even in random gust elds [8],
[9]. A further alternative is to attempt tting a predened
wind model (such as a thermal model) to observations made
in-ight. Energy-gain strategies are determined off-line and
repeated during ight to gain energy [10], [11].
This work is supported in part by the ARC Centre of Excellence
programme, funded by the Australian Research Council (ARC) and the
Australian Centre for Field Robotics, The University of Sydney.
N. Lawrance and S. Sukkarieh are with the Australian Centre for Field
Robotics, Faculty of Aerospace, Mechanical and Mechatronic Engineer-
ing, The University of Sydney, NSW, 2006, Australia {n.lawrance,
salah}@acfr.usyd.edu.au
The difculty with current techniques is the inability to
provide a sufciently accurate wind estimate to generate
energy-gain paths on-line. This paper aims to address the
problem by generating a spatio-temporal map of the wind
eld from observations made during the ight to provide an
adequate map for energy-gain path planning. This work is an
extension of previous work [12] in simultaneously mapping
and utilising a static wind eld. The major contributions
presented here are the use of a spatio-temporal wind model
accounting for drift and temporal variation of the wind and a
power-based target assignment algorithm which allows static
and dynamic soaring in the same framework.
Section II presents the dynamic model used in simulation
and illustrates mechanisms of energy gain ight. Section III
discusses the spatio-temporal Gaussian process regression
used to generate a wind eld estimate. Section IV presents
the planning architecture and discusses the target assignment
algorithm and path planner. The simulation is described
in Section V and results are presented and discussed in
Section VI. Concluding remarks are made in Section VII.
II. SOARING MODEL
The glider dynamic model used in the subsequent analysis
and simulation is a point mass ight model for an unpowered
aircraft in wind. The applied forces are the aerodynamic
force (decomposed into lift, L, and drag, D) and weight
(mg). Weight is the force due to gravity and is directed
down in inertial space (at Earth model). Most quantities
are dened in an inertial frame, denoted subscript i, which
is xed with respect to the ground. Air-relative quantities,
denoted subscript a, are relative to the surrounding air.
Wind is described as a velocity vector in inertial space

W = [W
x
,W
y
,W
z
]
T
.
L
z
i
y
i
x
i
D
mg
i
z
a
V

W
z
i
y
i
x
-W
x
-W
y

a
!
R

Fig. 1. Air-relative velocity and applied forces for a gliding aircraft.


2011 IEEE International Conference on Robotics and Automation
Shanghai International Conference Center
May 9-13, 2011, Shanghai, China
978-1-61284-380-3/11/$26.00 2011 Crown 2499
The air-relative velocity

V
a
is a vector of the velocity of
the vehicle with respect to the air expressed in the inertial
frame. It can be written as a function of the airspeed V
a
and the air-relative heading and climb angles
a
and
a
as shown in Fig. 1. The bank angle is the rotation of
the lift vector around the velocity vector. Let C
i
a
be the
transformation from air-relative to inertial frames, such that
C
i
a
=L
z
(
a
)L
y
(
a
)L
x
().

V
a
=C
i
a

V
a
0
0

V
a
cos
a
cos
a
V
a
cos
a
sin
a
V
a
sin
a

(1)
Let J
w
be the local spatial wind gradients:
J
w
=

W
x
x
W
x
y
W
x
z
W
y
x
W
y
y
W
y
z
W
z
x
W
z
y
W
z
z

(2)
The air-relative velocity

V
a
is the difference between the
inertial velocity

R and the wind



W, in the inertial frame:

V
a
=

W (3)
The applied forces are lift, weight and drag. Note that the
vehicle is assumed to be aligned with the velocity vector;
sideslip and lateral forces are ignored.

F =

L+

D+mg (4)
Lift and drag are related through the common approximation
of drag coefcient (C
D
) as the sum of parasitic (C
D,0
)
and lift-induced (C
D,i
) components. The induced drag is a
function of the lift coefcient C
L
, wing aspect ratio AR and
Oswalds efciency factor e:
C
D
=C
D,0
+
C
2
L
ARe
(5)
Differentiating the inertial velocity in (3) yields the resulting
acceleration from applied forces in (4).
1
m

C
i
a

D
0
L

+mg

=
dC
i
a
dt

V
a
+C
i
a
d

V
a
dt
+J
w

R (6)
If it is assumed that the roll rate is directly controlled (as a
control input to the system) then the two remaining variables
are the pitch rate and lift. These equations can be solved to
give the lift required for a certain pitch rate (7) or the pitch
rate produced for a given lift constraint (8). The resulting
equations of motion are (711).
L =
m
cos

V
a
d
a
dt
+gcos
a

sin
a
cos
a
sin
a
sin
a
cos
a

T
J
w

(7)
d
a
dt
=
1
V
a

L
m
cos gcos
a
+

sin
a
cos
a
sin
a
sin
a
cos
a

T
J
w

(8)
D =
1
2
V
2
a
SC
D,0
+
L
2
1
2
V
2
a
SARe
(9)
dV
a
dt
=
D
m
gsin
a

cos
a
cos
a
cos
a
sin
a
sin
a

T
J
w

R (10)
d
a
dt
=
1
V
a
cos
a

L
m
sin +

sin
a
cos
a
0

T
J
w

(11)
For simulation of a glider these equations can be integrated
numerically with specied roll rate and either lift or pitch
rate functions. In the simulations presented in this paper,
the pitch rate is specied in all cases but rejected if the lift
required would violate either a pre determined maximum
lift coefcient or a load factor limit. In that case, the lift
is specied and the resulting pitch rate is determined. This
effectively models stall prevention and limit loading.
The aircraft ight energy is the sum of gravitational
potential and local air-relative kinetic energy:
E
a
= mgz
i
+
1
2
mV
2
a
(12)
Taking the time derivative of the air-relative energy and
substituting the airspeed acceleration (10) yields the overall
specic power:

E
a
m
= V
a
D
m
gW
z
V
a

cos
a
cos
a
sin
a
cos
a
sin
a

T
J
w

R (13)
Equation (13) illustrates how a gliding aircraft can gain or
lose air-relative energy from a wind eld. The rst term is the
power loss due to drag. This is always energy loss because
airspeed must be greater than zero. The second term is the
energy gained or lost from vertical air motion known as static
soaring. The third term is energy gained or lost due to wind
gradients, known as dynamic soaring. This term is affected
by airspeed, aerodynamic angles and wind gradients.
III. SPATIO-TEMPORAL GAUSSIAN PROCESS
REGRESSION
Planning soaring paths requires an adequate estimate of
the wind eld. Wind speed and direction are measured during
ight by a combination of air data and inertial measurement
systems. The air-data system is modelled on a probe capable
of measuring airspeed, angle of attack and angle of sideslip.
The inertial system is assumed to be capable of measuring
inertial position and speed. As shown in (3) the difference
between the inertial and air-relative velocities is the wind at
that point in time and inertial space.
A suitable wind model needs to predict the wind from
point observations made during the ight. This is compli-
cated by the fact that wind elds are driven by a number of
factors, leading to wind elds at small scales (< 1km) being
made up of a combination of static, dynamic and random
turbulent features. The mapping method should provide wind
predictions for a reasonable planning horizon (> 10s) and in
a relatively small region (< 100m).
While there are several regression methods potentially
capable of performing this task, there are also a number of
limitations. Model-based methods with a library of known
wind features were discounted due to the limited exibility of
modelling new structures encountered during ight. Explicit
solutions or approximations of the wind eld by solving
2500
aerodynamic equations were also discounted due to com-
putational complexity. Limited on-board storage effectively
excludes the use of grid-based representations, which would
be too coarse or too large for accurate wind modelling.
In the current framework, Gaussian Process (GP) regres-
sion was selected to perform wind mapping. GP regres-
sion returns a continuous estimate of both the mean and
variance of the eld at any point in space. The variance
estimate is used to quantify condence in the regression to
direct a sampling strategy for balancing map exploration and
platform energy. The continuous nature of GPs does not
limit regression to a xed grid resolution. GP regression
also assumes observations of the true function with added
Gaussian noise, which is a common model for turbulent
wind elds with an underlying structured eld overlaid with
turbulence. Finally, GP regression with a suitable covariance
function captures some of the features to be expected from a
uid ow such as differentiability in velocity due to viscosity.
An in-depth discussion of GP regression is beyond the
scope of this paper, and the reader is referred to Rasmussens
work [13] which provided much of the framework for the
current implementation. As in most regression techniques,
the goal is to characterise the underlying function from a
nite set of observations, y = {y
i
}
n
i=1
where y
i
R taken at
locations in the input space X = {x
i
}
n
i=1
where x
i
R
d
. The
observations are assumed to be drawn from the set of actual
function values f (x) with additive zero-mean Gaussian noise
with variance
2
n
.
y = f (x) + (14)
N (0,
2
n
) (15)
GP regression relies on a positive semi-denite covariance
function k(x, x

) which determines the covariance between


pairs of data points. Information such as the expected
smoothness of the resulting estimate is encapsulated in the
properties of the covariance function. The estimated mean
value

f

and covariance cov( f

) for a set of test points X

,
training points X, observations y, covariance function k and
covariance matrix K = K(X, X) are shown below.

=E[ f (X

)|X, y, X

]
=K(X

, X)[K+
2
n
I]
1
y (16)
cov( f

) =K(X

, X

) K(X

, X)[K+
2
n
I]
1
K(X, X

) (17)
Properties of the covariance function are dened by a
set of hyperparameters . To achieve a good t of the
data, the hyperparameters must be selected by training the
GP model. Marginal likelihood is a common metric which
represents the probability of obtaining the training observa-
tions given the training inputs, the set of hyperparameters,
and the current model. Although an in-depth discussion is
beyond the scope of this work, it is sufcient to say that
maximising the marginal likelihood represents a t based
on a balance of model complexity and prediction of the
training data. Complex models tend to t the data well near
the training points but are poor predictors away from the
training data (overtting) whereas simple models may not
be capable of capturing the information in the training data
and consequently assume a simple underlying function and
very noisy observations (undertting). For GP regression the
log marginal likelihood can be calculated as shown in (18)
where n is the number of training points.
log p(y | X, , M) =
1
2
y
T
(K+
2
n
I)
1
y
1
2
log|K+
2
n
I|

n
2
log2 (
0
)
T

1
(
0
) (18)
The nal term in (18) is an additional penalty on variation
of the hyperparameters. This represents a Gaussian prior
estimate of the hyperparameters N (
0
, ) and penalises
variation from
0
. The GP is trained using an optimisation
routine (Matlabs fminunc or a simple gradient descent al-
gorithm) which selects hyperparameters to minimise negative
log marginal likelihood. In the authors previous work [12]
a spatial GP regression with xed hyperparameters was used
to map a static wind eld. The current paper extends this by
introducing a spatio-temporal model to allow for variation
of the eld with time and drift of features in the eld due to
mean wind.
In this work we propose and demonstrate use of a station-
ary separable covariance function of spatial distance, time
and a drifted distance estimate. It is based on a separable
squared exponential function suggested by Cressie [14] with
an additional term to estimate drift. Drift is estimated using
an additional distance measure which quanties the estimated
distance between two non-simultaneous observations which
would have moved by an estimate of the wind drift W
d
.
d
2
w
(x, t, x

, t

) =

x+W
d

2
(19)
The distance metric is commutative and stationary with
respect to the inputs. The drift velocity in each dimension
is modelled by a hyperparameter of the covariance function.
Also included is a standard isotropic square distance. The
two distance measures are combined through a weighting
(0, 1). The resulting covariance function is illustrated
below as a function of the set of hyperparameters =

f
, l
x
, l
t
, l
d
, , W

. The weighting is calculated through


the logistic function =

1+e

1
.
k(x, t, x

, t

) =

2
f
exp

|xx

|
2
2l
2
x

|t t

|
2
2l
2
t
(1)
d
2
W
2l
2
d

(20)
The weight provides exibility in estimating drift. If a
good drift estimation is found then the model can decrease
towards zero and rely solely on the drifted estimates with
a higher temporal length scale, as seen in the results. An
additional advantage of this covariance function is the fact
that the wind gradient J
w
can be calculated analytically by
differentiating the covariance function with respect to the
target locations. Wind gradients are used by both the target
nding and path planning algorithms.
A stationary function is selected as air properties should
be consistent across inertial space. Note that due to the GP
formulation this still permits local modelling of (drifting)
2501
features, but the expected rate of spatial and temporal vari-
ation is the same across inertial space. A separable function
is preferred as it represents the simplest combination of
spatial, temporal and drift distance estimates. A non-linear
combination of these factors would require either additional
hyperparameters or a restrictive function which may not
offer additional benet over a separable combination. The
separable function also allows easier maintenance of the
observation set as reduction in covariance with time is
bounded, so that old observations will have a known limited
effect on future estimation and can be condently discarded
based on a simple time threshold.
IV. PLANNING ARCHITECTURE
The planning architecture shown in Fig. 2 was imple-
mented to manage the task of simultaneously exploring and
exploiting a dynamic wind eld. The higher level planner is a
global target selection algorithm, which selects target points
in the search space to maximise exploration or energy capture
taking into account the vehicles current energy state. The
lower level planner is a limited time-horizon planner which
plans over a nite set of control inputs to maximise energy
gain, exploration and progress towards the global goal.
A. Global target selection
The global planner determines the current target through a
heuristic utility measure based on knowledge of the aircraft
and wind eld. The utility maps goal locations into poten-
tial power estimates, P, which represent the expected aver-
age power available by visiting that goal location from the
current location based on map uncertainty, energy required
to travel to, and energy available for capture at the target.
Exploration is required to maintain a good map estimate.
Thus, the global planner should identify targets which will
yield more information about the current wind map. How-
ever, depending on the vehicles current energy state, it may
be better to explore or exploit at the current time. This is
accounted for in the utility function by estimating the amount
of excess energy the vehicle would have if it travelled to the
target. The excess energy e
excess
is the current aircraft energy,
e
current
, minus the target energy e
target
and an estimate of
the energy required to travel to the target e
travel
. The travel
energy is estimated by assuming direct travel to the target
for a distance d
target
at a nominal glide ratio

L
D

est
.
e
travel
= mgd
target

L
D

1
est
(21)
e
excess
= e
current
e
target
e
travel
(22)
The GP regression returns a normal distribution dened by a
mean,

W, and variance,
2
W
, estimate for the wind velocity.
This feature is exploited by the utility heuristic by assuming
an optimistic estimate of the wind in unexplored regions. The
wind is estimated using the mean estimate from the current
map plus a multiple of the standard deviation (to a maximum
of 2
W
), weighted by the excess energy e
excess
.
Thus, when the UAV has extra energy the wind is opti-
mistically estimated and higher power is predicted from high
variance regions. When the current energy is less than the
Local path planner
Spatio-temporal GP mapping
Global target identification
Path planner uses the current target and wind
estimate to plan a path for energy capture, map
improvement and progress towards the goal
Aircraft makes wind observations which
are sent to the GP wind mapping module
GP map is used to determine the power and
uncertainty at target positions in the wind field
Current global target is assigned based on
current energy state of the vehicle
Fig. 2. System overview of simultaneous exploration and exploitation path
planning architecture for a soaring UAV.
target, the variance bonus is negative, so the power estimates
are pessimistic and the global planner will tend towards the
nearest high-power region with low variance. The resulting
wind estimate is shown in (23) where e
max
and e
min
are
the energy limits at the highest and lowest altitude of the
exploration region respectively.
W
target
=

W +2
W
e
excess
e
max
e
min
(23)
The power available in a target region can be estimated
from the adjusted wind and gradient estimates using (13).
Dynamic soaring power is calculated by solving for the
optimum orientation which would maximise power from
the estimated wind gradient. However, the average power
over the length of a dynamic soaring cycle is less than the
maximum instantaneous power (due to energy loss in turns).
Previous research [7] has shown that the actual power over
a cycle is approximately 1/3 of the maximum instantaneous
power (above a minimum cut-off).
B. Local path planning
The local planner plans paths to maximise energy and
reduce uncertainty over a limited planning horizon. The
approach used in this application is to forward simulate over
a discrete set of control actions using the estimated wind
eld and motion model, then rank the resulting paths using
an energy based reward function. Paths are planned with a
limited planning horizon where the highest ranked paths are
selected and further propagated with the same control sets,
as illustrated in Fig. 3. The highest ranked path at a specied
tree depth is selected and the actions carried out open loop
by the aircraft. The advantage of this type of planning is that
it only searches feasible paths and is capable of generating
energy-gain trajectories based on the estimated wind model.
The energy-based reward function allows planning of paths
that utilise both static and dynamic soaring.
The reward function is a unied measure of the energy
utility of a path segment. It is a linear combination of three
components; energy capture, progress towards the global
2502
0
5
10
15
20
25
30
35
40
45
10
5
0
85
90
95
100
105
x
i
(m)
y
i
(m)
0
5
10
15
20
25
30
35
40
45
10
5
0
85
90
95
100
105
z
i
(
m
)
Fig. 3. Local path planning for gliding ight. Paths are planned over a
xed time horizon with a discrete set of control inputs. The resulting path
estimates are ranked with a reward function and the highest ranked paths
are further propagated by the same process.
goal and map improvement. These values are combined
under a ight energy framework for consistency.
The energy capture reward R
energy
is the energy collected
or lost during the planned path with an additional correction
on the power available at the end of the segment,

E
2
. This
ensures that the planner is not locally greedy and favours
paths that terminate in high power regions for improved
performance in the following planned segments.
R
energy
= mg(z
2
z
l
) +
1
2
m

V
2
2
V
2
1

E

E
2
t (24)
The navigation reward R
nav
quanties the energy advantage
of travelling towards the global goal. This is determined by
calculating the distance travelled towards the goal, d
goal,1

d
goal,2
, and converting this distance into an energy estimate
using the nominal glide ratio. This effectively acts as an
energy discount for travel towards the goal.
R
nav
= mg
d
goal,1
d
goal,2

L
D

est
(25)
The nal local planner reward component is the exploration
reward. In some cases local path options may have similar
energy performance but offer different utility in terms of
map exploration. As in the global planner, a potential
energy reward is used which estimates the amount of energy
that could be gained in favourable wind conditions. The
additional energy that would have been collected under these
conditions is the navigation reward. The additional energy is
estimated by taking the variance at each end of the segment
and determining the wind within one standard deviation of
the mean which would result in maximum energy gain. This
is achieved by solving the energy equation (13) for wind.
The optimal solution is for an increase in vertical wind of
1 to maximise static soaring energy, and opposing increases
in lateral shear to maximise dynamic soaring energy. The
resulting energy gain is the exploration reward. Note that
this term is always positive, such that areas of high energy
but low variance are not penalised.
R
explore
=
1
2
(
Wz,1
+
Wz,2
)mgt +
1
2
m

VV +V
2

(26)
Where

V is the original mean airspeed
1
2
(V
1
+V
2
), and V
is the estimated change in airspeed under the new conditions
V

2
V
2
.
The local planner reward function R uses a weighted
combination of these components to rank candidate paths.
R =
E
R
energy
+(1
E
)R
nav
+
explore
R
explore
(27)
Of particular importance is the relative weighting of the navi-
gation and energy rewards,
E
. The global planner provides a
single target for each planning phase of the local planner. The
targets are not necessarily high resolution estimates of the
best location to y, but indicate regions where there should
be energy or information gain. Thus, as the aircraft gets close
to a target, the need to travel towards the target becomes less
important, and local energy and/or information gain become
more important. To encode this exibility into the planner,
a weighting function for the navigation reward based on
distance to the goal is used. This provides the local planner
with enough exibility to provide efcient paths towards the
goal but maximise energy and information gain in the region
around the goal. In this case, a simple exponential decay
function based on distance to the goal is used, as shown in
(28) where
E,min
is a minimum energy weighting to ensure
efcient paths even at large distances from the goal and

E,100
is the weight at 100m from the goal. The exploration
weighting
explore
remains constant during each ight. The
resulting heuristic is not particularly sensitive to selection of

explore
as the energy terms tend to dominate and exploration
is used primarily to break ties between similar energy level
targets.

E
=
E,min
+(1
E,min
)exp

|d
goal
|
100
log

E,100

E,min
1
E,min

(28)
V. SIMULATION
The aircraft is simulated by numerical integration of
the 6DOF dynamic equations from Section II. The aircraft
parameters are for an RnR SBXC remote controlled cross-
country glider model as shown in Table I.
The GP mapping uses observation data collected during
the ight by a simulated air data system. This system is
simulated by taking the actual wind data from the simulation
and adding unbiased Gaussian noise with standard deviation
0.1m/s to represent measurement error. The GP hyperparam-
eters are trained in-ight using a limited number of gradient
descent steps for minimisation of the log marginal likelihood.
This would ideally be a parallel computation but in this
simulation the gradient descent is run as a serial process
with up to four gradient descent steps allowed after each
planning stage.
The hyperparameter values are initialised at
f
= 0.5m/s,
l
x
= 50m, l
t
= 80s, l
d
= 80m, = 0,

W
d
= [0, 0, 0]m/s and

n
= 0.1m/s, based on estimates of the sensor noise and the
scale of features expected to be useful for soaring in a wind
eld.
During the ights, the maximum number of stored training
points is xed at 150 to prevent excessive computational
2503
TABLE I
AERODYNAMIC AND GEOMETRIC PROPERTIES OF THE SB-XC GLIDER
Parameter Value Units Explanation
C
D,0
0.012 Parasitic drag coefcient
b 4.32 m Wing span
S 0.957 m
2
Wing reference area
AR 19.54 Wing aspect ratio
e 0.85 Oswalds efciency factor
m 5.44 kg Vehicle mass
n
max
2.0 Maximum load factor (positive)
n
min
0 Minimum load factor (negative)
C
L,max
1.2 Maximum lift coefcient
d
dt max
30

/s Maximum roll rate

a,max
50

Maximum air relative climb angle

L
D

est
20 Approximate glide ratio
load. The simulated sensor system collects data at frequency
2Hz. The relevance of the data is effectively calculated as
part of the covariance function for prediction, so observa-
tions that are too close together or too old are removed
and replaced with new observations. The set maintenance
assumes that only current or future predictions are required.
This results in a natural spatial sparsity of the data set to
provide the best coverage of the target space with the limited
number of observations made along the ight trajectory.
The local planner plans for a total horizon of 5 seconds,
but only the rst 3 seconds of the plan are used before
replanning. There are three roll rate and three pitch rate
commands for a total of nine control options for each branch.
The control sequence returned by the path planner is carried
out open-loop by the simulated aircraft with wind data drawn
from the simulated wind eld.
The wind is represented by a dynamic eld consisting
of features which can move and change strength during
simulation. The eld is overlaid with a Dryden continuous
turbulence model as specied in MIL-F-8785C[15] for a
moderate level of turbulence at low altitude (<1000 ft).
VI. RESULTS
Results are presented for a simulated ight of 500s in a
dynamic wind eld. The target exploration region is dened
by the box x [0, 400], y [100, 100] & z [250, 150].
The aircraft begins the simulation with a small set of data
already collected; this represents manual or autonomous
ight before autonomous soaring control is activated.
The mean wind eld is a sinusoidal wave heading in
the positive x-direction with a wind speed of 2m/s and
wavelength 200m. The maximum vertical contribution from
the wave is 1.41m/s. Overlaid are two thermal bubbles,
both with a core lift of 3m/s which reduces to zero lift at
50m radius. The thermal model is a toroidal recirculating
model with conservative vertical and horizontal ow so that
the inner lifting core is surrounded by sinking air with a
maximum sink of 0.65m/s. The two thermals start centred
at (150, 50, -200) and (-150, -50, -200) respectively. The
entire eld drifts through inertial space at a constant speed
of W
dri f t
= (0.6, 0.1, 0)m/s. This means that one thermal
starts near the centre of the eld and drifts outside the +x
boundary by the end of the simulation. The second thermal
starts outside the region and drifts into the region by the end
of the simulation. This scenario was selected to highlight the
ability of the planner to utilise a number of energy sources,
and show that continuous exploration is necessary to utilise
all available energy sources.
The resulting path is demonstrated in the attached video
and following gures. Figure 4 shows the progress of the
simulation at t = 250s and t = 450s. The actual wind is also
shown to compare against the estimate. Figure 5 illustrates
the energy change during the ight. Of particular note here
is the overall energy control and particularly the use of semi-
dynamic dolphin soaring in the sinusoidal eld early in the
ight demonstrating the ability of the planner to utilise both
static and dynamic soaring strategies in the same framework.
Figure 6 illustrates the global utility at t = 250s. In this
instance, there is a well mapped thermal near the middle of
the eld, and a new thermal entering the region of interest
through the x =0 plane. At 250s the aircraft is in a relatively
high energy state ying at mid altitude and high speed
(V
a
=25m/s). Thus, the global planner favours exploration
of regions with high estimated energy and high uncertainty.
The fewer number of recent sample points near the incoming
thermal increases the estimated power utility of targets in
that region. In the remaining ight, the two thermals are
alternately tracked as the planner attempts to maintain low
uncertainty estimates of both high-energy features.
The hyperparameters at the end of the simulation also pro-
vide information on how well the eld has been estimated. At
t = 500s the hyperparameter estimates were:
f
= 1.04m/s,
l
x
= 50.2m, l
t
= 121.8s, l
d
= 65.0m, = 6.49,

W
d
=
[0.599, 0.141, 0.164]m/s and
n
=0.0535m/s. The negative
(corresponding to =0.0015) indicates that the prediction
was relying almost solely on the drifted estimate, and that
the wind drift was relatively well estimated, especially in the
xdirection. Other simulations not presented here showed
similar results, with most simulations ending with a good
drift estimate and low values. For very low drift however,
the two length scales tend to converge and becomes
insignicant.
0 100 200 300 400 500
4000
3000
2000
1000
0
1000
2000
3000
4000
5000
0 100 200 300 400 500
4000
3000
2000
1000
0
1000
2000
3000
4000
5000
Time, t (s)
E
n
e
r
g
y

c
h
a
n
g
e
,

E
(
J
)
Kinetic energy
Potential energy
Total energy
Fig. 5. Platform energy during simulated ight.
2504
0
100
200
300
400
150
100
50
0
50
100
250
200
150
x
i
(m)
y
i
(m)

z
i

(
m
)
(a) t = 250s actual wind eld
0
100
200
300
400
150
100
50
0
50
100
250
200
150
x
i
(m)
y
i
(m)

z
i

(
m
)
(b) t = 450s actual wind eld
0
100
200
300
400
150
100
50
0
50
100
250
200
150
x
i
(m)
y
i
(m)

z
i

(
m
)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
(c) t = 250s path taken and current wind estimate
0
100
200
300
400
150
100
50
0
50
100
250
200
150
x
i
(m)
y
i
(m)

z
i

(
m
)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
(d) t = 450s path taken and current wind estimate
Fig. 4. Simulated ight through a drifting wind eld. Autonomous soaring begins at the green triangle heading in the positive x-direction. The path taken
is indicated by the grey line, the current position is the red circle and the heavy line is the previous 40s of ight. Cone colours represent the variance
estimate in (m/s)
2
.
Fig. 6. Global utility heuristic estimate at t = 250s in terms of specic
power (W/kg). Both thermals are of similar estimated strength, but the
incoming thermal near (0, 100, 200) has a higher utility because the
aircraft is in a high energy state and favours exploration over exploitation.
VII. CONCLUSION
This paper presents a system architecture for simultane-
ously mapping and utilising a wind eld for autonomously
soaring with a UAV. Simulated results demonstrate the valid-
ity of the planning and mapping architecture for generating
energy-gain paths in dynamic wind elds. The use of a novel
drifting spatio-temporal covariance function for GP regres-
sion provides an effective wind-relative mapping method
which can effectively estimate a dynamic wind eld. The
planner utilises the mean and variance estimates returned by
the GP regression to maintain a balance between exploration
and exploitation of the wind eld and maintain platform
energy.
REFERENCES
[1] C. Pennycuick, Soaring behaviour and performance of some east
african birds, observed from a motor-glider, Ibis, vol. 114, no. 2,
pp. 178218, 1972.
[2] G. Sachs, Minimum shear wind strength required for dynamic soaring
of albatrosses, Ibis, vol. 147, no. 1, pp. 110, 2005.
[3] M. B. Boslough, Autonomous dynamic soaring platform for dis-
tributed mobile sensor arrays, Sandia National Laboratories, Tech.
Rep. SAND2002-1896, June 2002.
[4] J. Wharington, Autonomous control of soaring aircraft by rein-
forcement learning, Ph.D. dissertation, Royal Melbourne Institute of
Technology, 1998.
[5] Y. J. Zhao, Optimal patterns of glider dynamic soaring, Optimal
Control Applications and Methods, vol. 25, no. 2, pp. 6789, 2004.
[6] J. W. Langelaan, Gust energy extraction for mini and micro uninhab-
ited aerial vehicles, Journal of Guidance, Control, and Dynamics,
vol. 32, no. 2, pp. 464473, 2009.
[7] N. R. Lawrance and S. Sukkarieh, A guidance and control strategy
for dynamic soaring with a gliding UAV, in IEEE International
Conference on Robotics and Automation, Kobe, Japan, 2009, pp.
36323637.
[8] P. Lissaman, Wind energy extraction by birds and ight vehicles, in
43rd AIAA Aerospace Sciences Meeting and Exhibit, Reno, Nevada,
2005, AIAA Paper 2005-241.
[9] N. T. Depenbusch and J. W. Langelaan, Receding horizon control for
atmospheric energy harvesting by small UAVs, in AIAA Guidance,
Navigation and Control Conference, Toronto, Ontario, Canada, 2010,
AIAA Paper 2010-8180.
[10] M. J. Allen, Autonomous soaring for improved endurance of a small
uninhabited air vehicle, in 43rd AIAA Aerospace Sciences Meeting
and Exhibit, Reno, Nevada, 2005, AIAA Paper 2005-1025.
[11] D. Edwards, Implementation details and ight test results of an
autonomous soaring controller, in AIAA Guidance, Navigation and
Control Conference, Honolulu, Hawaii, 2008, AIAA Paper 2008-7244.
[12] N. R. Lawrance and S. Sukkarieh, Simultaneous exploration and ex-
ploitation of a wind eld for a small gliding UAV, in AIAA Guidance,
Navigation and Control Conference, Toronto, Ontario, Canada, 2010,
AIAA Paper 2010-8032.
[13] C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine
Learning, ser. Adaptive computation and machine learning. Cam-
bridge, Massachusetts: The MIT Press, 2006, pp. 7 29.
[14] N. Cressie and H.-C. Huang, Classes of nonseparable, spatio-temporal
stationary covariance functions, Journal of the American Statistical
Association, vol. 94, no. 448, 1999.
[15] Military Specication: Flying Qualities of Piloted Airplanes, United
States Department of Defense, November 1980, MIL-F-8785c.
2505