Documente Academic
Documente Profesional
Documente Cultură
1557
Such motion-guided sampling has several benefits with
respect to the original uniform sampling:
• significant reduction of search space;
• successful tracking of objects moving at high velocities;
• reduction in identity swaps for similarly looking objects;
1558
(a) Original frame (t0 ) (b) Uniform sampling (t0 + 1) (c) Motion model (t0 + 1)
Fig. 4. Sampling algorithms for finding the location of the object at t0 + 1 frame, based on its position at t0 , which is marked with a red dot. (a) depicts
the original frame at step t0 . (b) illustrates the conventional uniform sampling around the position of the object. (c) search space can be significantly
reduced using motion-model-based sampling. Best seen in color.
Gaussians allows reformulating the problem to use Kalman No Visual Tracking Visual Tracking
Horizontal
filtering to predict and update each Gaussian.
[pixel]
500 500
The prediction step for the GM-PHD filter is computed as
0 0
Jk 0 20 40 0 20 40
(i) (i) (i)
X
vk|k−1 (x) = pS,k wk N (x; mk|k−1 , Pk|k−1 ) 500 500
Vertical
[pixel]
i=1
+ γk (x), (4) 0 0
0 20 40 0 20 40
with 300 300
Width
200 200
[pixel]
(i) (i)
mk|k−1 = Fk−1 mk−1 (5) 100 100
(i) (i) T 0 0
Pk|k−1 = Qk−1 + Fk−1 Pk−1 Fk−1 , (6) 0 20 40 0 20 40
Time [s] Time [s]
where Fk−1 is the motion model from time k − 1 to k and
Qk−1 is the process noise during that interval. γk (x) is the Fig. 5. Position estimates obtained from the computer vision algorithm.
intensity of new targets appearing, described as a Mixture of The data on the left is obtained from the cascade of detectors with non-
maximum suppression of 1, and the data on the right is from the visual
Gaussians. In this work, γk (x) is implemented as a single tracking algorithm with non-maximum suppression set to 3. Different colors
Gaussian two meters in front of the camera. pS,k is the correspond to different measurement for the same timestep with this order:
probability that a target survives between time k − 1 and k. blue, red, black, green, cyan. Best seen in color.
The update step is
T
Jk [xc , yc , wc ] with the components being the detection center
(i) (i) (i) in horizontal and vertical coordinate and the width of the
X
vk (x) = 1 − pD,k wk|k−1 N (x; mk|k−1 , Pk|k−1 )
i=1 bounding box, respectively. A sample of the measurements
Jk is shown in Fig. 5. The implementation of the camera sensor
(i) (i) (i)
X X
+ wk (z)N (x; mk|k (z), Pk|k ), (7) model is similar to [19], which uses the standard camera
z∈Zk i=1 with distortion model, with the exception of the measurement
(i)
with model for the apparent width wc of target i at time k. The
(i) (i) (i)
equations for our sensor model are
(i)
pD,k (mk|k−1 )wk|k−1 lk (z) (i)
wk (z) = PJk (i) (i) (i)
(8) xc(i) = Fx xd + X0 (13)
κ(z) + i=1 pD,k (mk|k−1 )wk|k−1 lk (z) (i)
yc(i) = Fy yd + Y0 (14)
(i) (i) (i) (i) (i) (i)
lk (z) = N (z; Hk mk|k−1 , Rk + Hk Pk|k−1 [Hk ]T ) (9) Fx Wt (1 + K1 sr )
(i) (i) (i) (i) (i) wc(i) = (15)
mk|k (z) = mk|k−1 + Kk (z − Hk mk|k−1 ) (10) (i) (i)
kqb,k k + Kw kqb,k k2
h i
(i) (i) (i) (i) h i
Pk|k = I − Kk Hk Pk|k−1 (11) (i)
xd , yd
(i)
= ρ [rx , ry ] , ρ = 1 + K1 sr + K2 s2r
(16)
(i) (i) (i) (i) (i) (i) −1
Kk = Pk|k−1 [Hk ]T (Hk Pk|k−1 [Hk ]T + Rk ) , (12) (i)
xb,k
(i)
yb,k
rx = (i)
, ry = (i)
, sr = rx2 + ry2 , (17)
where κ(z) is the clutter level and Rk the sensor noise zb,k zb,k
covariance. pD,k is the probability that a target i is detected h iT
(i) (i) (i) (i) (i)
and Hk is the matrix that transforms the target’s state to a with qb,k = xb,k , yb,k , zb,k the position of the target
measurement. Note that Equations 10, 11 and 12 are similar relative to the camera in the camera frame. K1 , K2 , Fx ,
to a Kalman filter update. Fy , X0 and Y0 are the standard intrinsic camera parameters
2) Implementation: In our framework, the computer vision and were determined using OpenCV. Wt is the size of the
module returns the bounding boxes of detected objects. Due target, and is assumed to be known. Kw is a parameter
to the fact that width and height of these bounding boxes related to the change of the apparent width as function of the
are heavily correlated, the measurement of a target is z = target’s distance to the camera and has been determined
1559
No Visual Tracking Visual Tracking No Visual Tracking Visual Tracking
X vel [m/s]
X pos [m]
1 1 0.5 0.5
0 0 0 0
-1 -1 -0.5 -0.5
0 20 40 0 20 40 0 20 40 0 20 40
3 3
Y vel [m/s]
Y pos [m]
2 2 0.5 0.5
1 1 0 0
0 0 -0.5 -0.5
0 20 40 0 20 40 0 20 40 0 20 40
2 2
Z vel [m/s]
Z pos [m]
0.2 0.2
1 1 0 0
-0.2 -0.2
0 0 -0.4 -0.4
0 20 40 0 20 40 0 20 40 0 20 40
Time [s] Time [s] Time [s] Time [s]
(a) Position estimates (b) Velocity estimates
Fig. 6. Results for our experiment with two quadrotors (blue and red track, respectively) in a cluttered environment. The solid, thinner lines are the true
trajectories of the quadrotors acquired by the MCS. The dots are the estimated positions from the GM-PHD filter. The red and blue colored dots are the
ones closest to the red and blue curves, respectively. The data association is obtained during the computation of the OSPA metric and not as a result of the
GM-PHD filter. The black dots are the estimates that were not associated with a quadrotor. Best seen in color.
empirically. This parameter was added to compensate a to the high-end computation modules for quadrotors (e.g.,
discrepancy between the model and observed measurements AscTec Mastermind).
where the cascade of detectors would report targets larger Section V presents the results of our framework with and
than reality when the quadrotors are far away. As the camera without visual-based tracking. Although both methods operate
sensor model is not linear, the GM-PHD filter is implemented on the same video stream, the parameters were set differently
as an Extended Kalman PHD filter [18]. for fair comparison. An important difference resides in the
After updating, a pruning and merging step is performed neighborhood size of the non-maximum suppression step,
as described in [18] in the following manner: Gaussians with which is set to 3 for the case with visual-tracking and 1
a weight less than 10−5 are pruned and two Gaussians closer in case without. This is because visual tracking is more
than two times their covariances are merged. Finally, all sensitive to false positives, as opposed to the multi-target
Gaussians that have a weight higher than 0.5 are reported as state tracking which is more sensitive to missed detections.
being targets. Other differences were in the parameters used in the multi-
target state tracker. With visual tracking, the parameters were
IV. I MPLEMENTATION
pD = 0.95, κ(z) = 3 · 10−9 and R = diag(200, 200, 1600)
The experimental setup is composed of an AscTec Firefly and without they were pD = 0.5, κ(z) = 2.4 · 10−9 and
quadrotor carrying a camera and two AscTec Hummingbird R = diag(200, 200, 3200).
quadrotors flying in front as targets of the tracking framework.
The camera is equipped with a fish-eye lens with a field V. R ESULTS
of view of 185◦ (an example of a video frame from the In this section, we present the results of our tracking
accompanying video1 can be seen in Fig. 1.). All quadrotors framework. We compare two cases: the case for which the
flew autonomously using localization data acquired by an visual tracking is enabled, and the case for which it is disabled.
external Motion Capture System (MCS) and sent through The obtained trajectories for both cases are shown in Fig. 6(a).
WiFi to the vehicles. The MCS allows for tracking the pose Fig. 6(b) shows estimated velocities for both cases. Both
of the quadrotors with millimetric level accuracy and is used figures show the estimates as dots and the ground-truth as
as ground-truth to assess the performance of our framework. solid lines. The dots are colored according to the ground-
The trajectories of the Hummingbird quadrotors were set as truth they are closest to. On the one hand, the trajectories
follows: for one, an ellipsoid of 3×2 m and for the other one obtained without visual tracking are not only less accurate
an ∞-shape of 2×1 m. The quadrotors performed their loop than those with visual tracking, but they do also suffer of
in 20 s. The Firefly was mostly in static flight, moving to target disappearing. This is due to the cascade of detectors not
another position from time to time. All the data were recorded reliably detecting a target, and when one is not detected, the
on the Firefly to be post-processed by leveraging the rosbag multi-target state tracker immediately reacts by lowering the
functionality of the Robotic Operating System (ROS). weight of the estimate below the target existence threshold. On
In the post-processing phase, the data were played back
the other hand, the visual tracking will search the image for
in real time and fed to the computer vision and tracking
features that it learned previously. As a result, the algorithm
algorithms. The computation was performed on a desktop
returns a measurement for each quadrotor at each frame. This
computer with an Intel Core i7, an hardware comparable
consistency allows the multi-target state tracker to correctly
1 The video and more information about the project can be found at track both quadrotors and get a smooth estimate of their
http://disal.epfl.ch/research/UAVCollisionAvoidance velocity.
1560
2 step for a fully functional system able to accurately track any
aircraft as it relies on the size of a target to estimate distance.
1.5
R EFERENCES
[1] S. Weiss, M. W. Achtelik, S. Lynen, M. Chli, and R. Siegwart, “Real-
OSPA
1561