Sunteți pe pagina 1din 5

Simultaneous Aerial Vehicle Localization and

Human Tracking
Kiran Kumar Lekkala1 , Vinay Kumar Mittal2
Indian Institute of Information Technology Chittoor, SriCity, Andhra Pradesh, India
kiran.l13@iiits.in1 , vkmittal@iiits.in2

AbstractIn this paper, we propose a robotic video surveil-


lance system. A prototype system is implemented on an aerial
vehicle, a quadcopter. It is capable of following a person or any
moving object, while simultaneously localizing it, i.e., measuring
the coordinates of the quadcopter on a scaled map. The system
can operate well, even if it is flying in an unknown and GPS-silent
environment. The prototype system consists of 3 major modules.
First is an Image-based Visual Servoing (IBVS) module for track-
ing the desired object or person, and sending the error parameters
to the actuators. The second module is Parallel Tracking and
Mapping (PTAM) module. The third module is an Extended
Kalman Filter (EKF) module. The PTAM and EKF modules help
in estimating the location of the quadcopter at every time-instant.
The proposed system enables the quadcopter to follow the desired
object smoothly, while keeping the tracked object in the center of
the frame and maintaining a constant distance from the object. Fig. 1. Picture of Parrot AR Drone
The effectiveness of the system is validated by computing the
precise measurements of the aerial movements, while ensuring
the minimal error. The proposed method can be helpful for However, it still has wide range of applications such as
quadcopters used in diverse applications such as surveillance, high-speed actuation using robotic arm, and ball catching
monitoring and tracking a desired object.
using quadcopter etc. [5].
Index Termshuman tracking, pose estimation, simultaneous
localization and mapping, quadcopter, surveillance, PTAM, EKF,
Image-based visual servoing The most common and a challenging feature for a quad-
copter during autonomous navigation, is to know its location
I. I NTRODUCTION at every instant of time. Few sensors that help the quadcopter
In recent years, research on Simultaneous Localization in this are: Motion Capturing (MoCap) system, GPS, camera,
and Mapping (SLAM) of autonomous aerial vehicles has depth sensor and Time of Flight camera etc. In the last two
made significant progress. That has facilitated the increased decades, researchers have been trying to solve the problem
utilization of these flying robots for a vast range of of Simultaneous Localization and Mapping as well as robust
applications, some of which are otherwise difficult for state estimation and control methods, in different ways. Few
human beings. Few examples are transportation, mapping, prominent studies involve use of Laser-range scanners [6],
surveillance, monitoring, relief operation and construction monocular cameras [7], stereo cameras and Red Green Blue
activities etc. [1]. In all these tasks, the key challenge lies and Depth (RGB-D) cameras [8] etc.
in performing aggressive flight maneuvers [2], collaborative
construction [3], coordination of quadcopters (Swarm Visual odometry is one of the key challenge in navigation
robotics) [4], and throwing, flying and catching back these and tracking of aerial objects such as Micro Aerial Vehicles
aerial vehicles/robots [5]. (MAV) in unknown, unstructured and cluttered environments
[9] [10]. Earlier attempts involved the use of to get the in-
Aerial robots can also be used for tracking and following depth Laser scanners measurements for the visual processes.
an object apart from surveillance. The simplest way to But these scanners were relatively heavy and costlier. Hence,
achieve this is by implementing an Image-based Visual these were required to be replaced by monocular or stereo,
Servoing (IBVS) system. It involves feedback information and depth cameras that can be used for dense continuous-time
from a visual sensor for controlling the actuators of a robot. In tracking [11]. In recent years, Intensity-based approaches
general, the coordinates of the set of feature points extracted are slowly replacing the Feature-based approaches for
during tracking are used for determining the error from the determining the odometry of a robot. An intensity based
set-point. This method is not suitable for robotic systems approach for finding the semi-dense visual odometry involves
that perform movements with large rotations but without motion estimation [12]. These intensity-based approaches are
sufficient translations, essentially known as camera retreat. not only useful for monocular cameras, but have also been
Fig. 2. Block Diagram for the proposed system
Fig. 3. Illustration of a person-following on a street. The orientation of the
drone is measured in roll-pitch-yaw axis (Red-Green-White) attached to the
drone. On the top-left, we provide the ground truth, which is the GPS data
explored for stereo and RGB-D cameras [13] [14]. of the drone while executing the trajectory. Note that big square represent 1
meter and every small square represents 200 cm.
In the proposed method, a lightweight monocular camera,
mounted as a frontal camera on a Parrot AR Drone, is used for
the pose estimation and object tracking. An IBVS Controller is achieved by running two independent parallel processes on
used for the tracking of the object. [15]. For better visual state- the computer (laptop). The IBVS controller is implemented
estimation (apart from the camera), the sensor data is extracted to steer the robot as per the features location of the desired
using the Inertial Measurement Unit (IMU) and various other object in the image plane. The controller first learns the target
onboard devices. Recently proposed approaches have been location before sending any control signals to the actuators.
used, for visual navigation of Parrot AR Drone, using PTAM After the controller is started, the drone operates either in the
for scale estimation, along with Extended Kalman Filter (EKF) flight mode (if the object is moving) or in the hover mode (if
for visual odometry and eliminating falsely tracked frames the object is stationary). Once the drone is airborne, the state
using PTAM [16] [17] [18]. A Robot Operating System (ROS) estimation module gets the sensor image data to estimate the
package named tum-ardrone is also developed recently, which pose of the quadcopter. Then the system can estimate the
is practically demonstrated using the Parrot AR. Drone 2.0 scale of the map, using the approach details of which are
[19]. This paper has been organized as follows: In Section discussed given in sub-section 3A.
2, the prototype of the proposed quadcopter-based system is
described in detail. In Section 3 the localization method is
discussed. The Human-tracking method is explained in Section III. L OCALIZATION OF A ERIAL V EHICLE
4. In Section 5, the experiments carried and the performance
evaluation results are discussed. Lastly, in Section 6 a summary A. Localization and Mapping
is given, along with a scope of further improvements and our
further work-in-progress, on this topic. Pose estimation is obtained using a visual SLAM technique,
i.e, PTAM (Parallel Tracking and Mapping) [20] used here.
II. D ETAILS OF P ROTOTYPE S YSTEM PTAM extract features from the input images and then
In the proposed system prototype, the Parrot AR Drone 2.0 tracks the feature points (of objects) in successive frames,
(Figure. 1) is used as the hardware platform, since it is widely by minimizing the reprojection error. The point distances for
available and is economical. Parrot AR Drone 2.0 is equipped different feature vectors are used, for estimating the camera
with wide variety of improved sensors in compared to its pose in the world-coordinate system. Tracking and mapping
predecessor Parrot AR Drone 1.0. To compute the orientation, are run as two individual threads, so as to simultaneously
a 9-axis IMU comprising of a gyroscope, accelerometer, and track and build the depth-map.
magnetometer is available. The front camera uses a 720 pixels
sensor with 93 lens and recording speeds up to 30 fps. The After initialization, the global map is rotated such that
bottom camera is a QVGA sensor with 64 lens and recording the xy plane corresponds to the accelerometer data. Initially,
up to 60 fps. It essentially involves computing the coarse the average key point depth is taken as unity. But during
motion vectors, using optical flow. The planar velocities are tracking the scale ( R) is estimated by a closed-form
then calculated by an ultrasound sensor, co-located with the Maximum Likelihood approach. The proposed system men-
bottom camera. Wi-Fi has been upgraded in this version 2.0, tioned for scaling is used, for essentially estimating the
to follow the newer 802.11n standard. (map coefficient), while mapping from x to y. This problem is
first statistically modeled by the data obtained from the sensor
Block diagram of the proposed system is shown in measurements, and then the scale (map coefficient) is estimated
Figure. 2. Simultaneous Localization and Human-tracking are using maximum-likelihood (ML) approach.
B. Extended Kalman Filter
This is a non-linear version of a linear Kalman Filter,
which is used for fusing all available data to estimate a full
motion model of the quadcopter. This model can then be used
to predict the state of the quadcopter and at the same time
rejecting falsely tracked frames by PTAM, with the help of
the EKF estimated pose.

We used a prediction model based on the Extended Kalman


Filter for predicting the state transition from one state to
t ) The
the next. The control command is ut = (t , t , z,
designed model is as follows:

t , ut ) = c3
(x t c 4 t (1)
t , ut ) = c3
(s t c4 t (2)

(st , ut ) = c5 t c6 t (3)
t , ut ) = c7 zt c8 zt
z(s (4)

In the above equations, c1 , c2 , c3 , c4 , c5 , c6 , c7 and c8 are


the proportionality coefficients obtained from the data collected
in a series of test flights. Fig. 4. Image frames to show the PTAM tracking process of the system, for
the continuous pose estimation shown in Figure. 3 (with ground-truth).
C. Scale Estimation
To estimate the scale of the quadcopter, a monocular SLAM
system is used, which is Parallel Tracking and Mapping ? m2
+ v2 mi
?i = 2 (8)
(PTAM) based. It also uses an altimeter as a metric sensor. ? m2 + 2
v
For each time-interval, a pair of values v and m is computed, Finally, using this global mean, the scale at every time-
where m is the altimeter reading (in meters) and v indicates the instant is determined as:
six DoF pose estimates from the visual PTAM system. Both the p
variables indicate the estimates of the one-dimensional motion ? svv smm + sign(svm ) (svv smm )2 + 4s2vm
= (9)
of the quadcopter. These are related as v = m, where 2v1 m svm
is the scale of the map and is estimated by the system in Pn Pn
2 T 2 T
real-time, at every time-instant. Please note that the dimension where, svv = P v i=1 vi vi , smm = v i=1 mi mi and
2 n T
of the identity matrix is one here, because the data from the svm = m v i=1 vi mi . Here, n is the number samples
altimeter is uni-dimensional, i.e. only the depth is measured. used. The measurement variances are: m the metric/altimeter
It is assumed that all the sensor readings are noisy, hence each variance and v the vision estimate variance. These can
can be modeled each of them probabilistically, using Gaussian be directly computed from the real-time sensor data, during
noise, as: mapping the trajectory.
vi N (i , v2 I11 ) (5)
IV. H UMAN T RACKING
2
mi N (i , m I11 ) (6)
The proposed system consists of three major components,
Here, is the variance, is the mean and N is the Normal as shown in Figure 2: PTAM, EKF and IBVS. In the previous
distribution function. Once the sensor data is collected, to find section, PTAM and EKF have already been discussed.
the scale, the log-likelihood is minimized. It can be expressed Image-based Visual Servoing (IBVS) method is used for
as: Human Tracking, while carrying out the surveillance. The sys-
tem essentially consists of a closed-loop feedback mechanism,
n such as PD controller. It takes the location of an object in the
1 X |vi i | |mi i |
L(1 ...n , ) ( + ) (7) camera frame at every time-instant, and calculates the deviation
2 i=1 v2 m2
of the object from the center of the frame, that is called the
where is the scale of the map and n represents the num- error-deviation. The total resultant error is derived using some
ber of sensor measurements. After solving this minimization heuristics, in order to get the control command sent to the AR
problem, the global mean, for eqn (3) is obtained as: Drone. Note that the error term e in eqn (1) is the Euclidean
of the roll, pitch angles. A PD controller with a decoupled
heuristic is then used as a Visual Servoing (IBVS) system, to
achieve the desired set-point. In every iteration, the control
vector is sent to the AR Drone Flight controller which steers
the drone to the desired pose. This process continuous till the
user/controller gives the land command signal to it.
V. E XPERIMENTS AND R ESULTS
A series of experiments are carried out, to observe the
differences from the predicted results. To show the robustness
of the AR Drone while following the desired object, a dataset
[15] is used in the experiments described below.
A. Positioning Accuracy during Object following
The quadcopter was tasked to follow a person, walking
at an average speed, in a curvy path. The stability of the
system was tested by this experiment and the observations were
recorded. The trajectory followed, was about 30 m long and
it took approximately 3 minutes, including the object pattern
learning and following. The position of the quadcopter plotted
in a 3D scaled map, is shown in Figure. 3. It is in contrast
with the ground-truth data from the GPS. Out of the 4705
Fig. 5. Sequence of image frames and pose-estimates to show how the drone
recovers and holds on to the locked pattern, while tracking a person.
frames recorded, 8 frames are illustrated in Figure. 4, in order
to explain the state-estimation process.
distance of the object, from the set-point which is the mid- B. Robustness from lost Visual Tracking and occlusions
point of the frame. The location of the tracked object and the The proposed system is robust, even to the loss of visual
size of the bounding box can be obtained as follows: tracking in the IBVS system. When visual object tracking is
lost, the system retraces its path back to the point where the last
xbb + (wbb /2)
fx = (10) visual measurements of the tracking system were present. It
wim exploits the fact that the system (i.e. the quadcopter) maintains
ybb + (hbb /2) the record of its position at every instant of time on the scaled
fy = (11) map. To validate this feature, the dataset [15] containing image
him
frames of human tracking with occlusion (another Human and
r
wim .him an object) was used, and the state at every time-instant was
f = xtm (12) estimated, as illustrated in Figure. 5.
wbb .hbb
Note that, (fx , fy ) R2 is the location of the tracked object VI. S UMMARY AND C ONCLUSION
and f is the size of the bounding box (enclosing the desired In this paper, a quadcopter based system is proposed, that
object) on the camera co-ordinate frame corresponding to the can be used for accurate human following and localization.
size of the object in world co-ordinate frame. The variables The stability and accuracy of the IBVS controller in tracking
wbb , hbb (in pixel values) are the width and height of the the desired object or person, and in sending the control signals
bounding box, respectively. xbb , ybb (in pixel values) represent to the actuator are validated by performance evaluation tests.
x and y location of the upper-left corner of the bounding box. Also, by integrating the EKF module with the SLAM system,
The size of the bounding box f is proportional to xtm , where and by removing the falsely tracked frames by the EKF pose,
xtm is the distance of the target object from the camera. The the system is made robust to any disturbance or any loss of
error of the object from the set point can be computed as visual tracking etc.
follows:
d Accuracy of the state-estimation module can be improved
E(t) = Kp e (t) + Kd e (t) (13)
dt further, by including better SLAM systems, such as LSD-
where, E is the total error, Kp is the proportionality SLAM or ORB-SLAM, which have better tracking and denser
coefficient and Kd is the derivative coefficient. Once the mapping systems. Another improvement, concerning the elim-
human pattern learning is achieved and the quadcopter is ination of positional drifts in long trajectories, can be to
commanded to take-off, the gradients of the tracked object augment the system, by including a loop-closure method. It
in the image frames are calculated. The constituent control may possibly be achieved by integrating an appearance based
vectors consist of (, , 0 , 0 ). Note that , are roll and SLAM system such as FAB-MAP or a Bag of words model
pitch angles respectively. Here, the 0 , 0 are the derivatives such as DBoW2, and then allow the system to recognize
the loops and further bundle-adjusting the pose-graph. These [19] C. Kerl, J. Sturm, and D. Cremers. Dense visual slam for RGB-D
cameras. In Proc. of the Int. Conf. on Intelligent Robot Systems (IROS),
works are in progress, by the authors and may appear in their 2013.
future publications. [20] Georg Klein and David Murray. Parallel tracking and mapping on a
camera phone. In Proc. Eigth IEEE and ACM International Symposium
on Mixed and Augmented Reality (ISMAR09), Orlando, October 2009.
R EFERENCES
[1] L Doitsidis, S Weiss, A Renzaglia, E Kosmatopoulos, R Siegwart,
D Scaramuzza, and M Achtelik. Optimal surveillance coverage for teams
of micro aerial vehicles in GPS-denied environments using onboard
vision. Autonomous Robots, 2012.
[2] Daniel Mellinger and Vijay Kumar. Minimum snap trajectory generation
and control for quadrotors. In IEEE International Conference on
Robotics and Automation, ICRA 2011, Shanghai, China, 9-13 May 2011,
pages 25202525, 2011.
[3] Quentin Lindsey, Daniel Mellinger, and Vijay Kumar. Construction of
cubic structures with quadrotor teams. In Proceedings of Robotics:
Science and Systems, Los Angeles, CA, USA, June 2011.
[4] Aleksandr Kushleyev, Vijay Kumar, and Daniel Mellinger. Towards A
swarm of agile micro quadrotors. In Robotics: Science and Systems VIII,
University of Sydney, Sydney, NSW, Australia, July 9-13, 2012, 2012.
[5] Robin Ritz, Mark W. Muller, Markus Hehn, and Raffaello DAndrea.
Cooperative quadrocopter ball throwing and catching. In 2012 IEEE/RSJ
International Conference on Intelligent Robots and Systems, IROS 2012,
Vilamoura, Algarve, Portugal, October 7-12, 2012, pages 49724978,
2012.
[6] Slawomir Grzonka, Giorgio Grisetti, and Wolfram Burgard. Towards
a navigation system for autonomous indoor flying. In 2009 IEEE
International Conference on Robotics and Automation, ICRA 2009,
Kobe, Japan, May 12-17, 2009, pages 28782883, 2009.
[7] Markus Achtelik, Simon Lynen, Stephan Weiss, Laurent Kneip, Mar-
garita Chli, and Roland Siegwart. Visual-inertial SLAM for a small he-
licopter in large outdoor environments. In 2012 IEEE/RSJ International
Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura,
Algarve, Portugal, October 7-12, 2012, pages 26512652, 2012.
[8] Erik Bylow, Jurgen Sturm, Christian Kerl, Fredrik Kahl, and Daniel
Cremers. Real-time camera tracking and 3D reconstruction using signed
distance functions. In Robotics: Science and Systems IX, Technische
Universitat Berlin, Berlin, Germany, June 24 - June 28, 2013, 2013.
[9] M Bloesch, S Weiss, D Scaramuzza, and R Siegwart. Vision based
mav navigation in unknown and unstructured environments. In Proc. of
The IEEE International Conference on Robotics and Automation (ICRA),
May 2010.
[10] C. Kerl. Odometry from rgb-d cameras for autonomous quadrocopters.
Masters thesis, Technical University Munich, Germany, Nov. 2012.
[11] C. Kerl, J. Stueckler, and D. Cremers. Dense continuous-time tracking
and mapping with rolling shutter RGB-D cameras. In IEEE International
Conference on Computer Vision (ICCV), Santiago, Chile, 2015.
[12] J. Engel, J. Sturm, and D. Cremers. Semi-dense visual odometry for
a monocular camera. In IEEE International Conference on Computer
Vision (ICCV), Sydney, Australia, December 2013.
[13] C. Kerl, J. Sturm, and D. Cremers. Robust odometry estimation for
RGB-D cameras. In Int. Conf. on Robotics and Automation, May 2013.
[14] F. Steinbruecker, J. Sturm, and D. Cremers. Real-time visual odometry
from dense RGB-D images. In Workshop on Live Dense Reconstruction
with Moving Cameras at the Intl. Conf. on Computer Vision (ICCV),
2011.
[15] Jesus Pestana, Jose Luis Sanchez-Lopez, Pascual Campoy Cervera, and
Srikanth Saripalli. Vision based GPS-denied object tracking and follow-
ing for unmanned aerial vehicles. In IEEE International Symposium on
Safety, Security, and Rescue Robotics 2013, Linkoping, Sweden, October
21-26, 2013, pages 16, 2013.
[16] J. Engel, J. Sturm, and D. Cremers. Scale-aware navigation of a low-
cost quadrocopter with a monocular camera. Robotics and Autonomous
Systems (RAS), 62(11):16461656, 2014.
[17] J. Engel, J. Sturm, and D. Cremers. Accurate figure flying with a
quadrocopter using onboard visual and inertial sensing. In Proc. of
the Workshop on Visual Control of Mobile Robots (ViCoMoR) at the
IEEE/RJS International Conference on Intelligent Robot Systems (IROS),
Oct. 2012.
[18] J. Engel, J. Sturm, and D. Cremers. Camera-based navigation of a low-
cost quadrocopter. In Proc. of the International Conference on Intelligent
Robot Systems (IROS), Oct. 2012.

S-ar putea să vă placă și