Documente Academic
Documente Profesional
Documente Cultură
Carlo Tomasi
Computer Science Department
Stanford University
Stanford, CA 94305
Abstract
1 Introduction
Is feature tracking a solved problem? The extensive studies of image correlation [4], [3], [15], [18], [7],
[17] and sum-of-squared-dierence (SSD) methods [2],
[1] show that all the basics are in place. With small
inter-frame displacements, a window can be tracked
by optimizing some matching criterion with respect to
translation [10], [1] and linear image deformation [6],
[8], [11], possibly with adaptive window size[14]. Feature windows can be selected based on some measure
of texturedness or cornerness, such as a high standard
deviation in the spatial intensity prole [13], the presence of zero crossings of the Laplacian of the image
intensity [12], and corners [9], [5]. Yet, even a region rich in texture can be poor. For instance, it can
straddle a depth discontinuity or the boundary of a
re
ection highlight on a glossy surface. In either case,
the window is not attached to a xed point in the
world, making that feature useless or even harmful to
most structure-from-motion algorithms. Furthermore,
0 This research was supported by the National Science Foundation under contract IRI-9201751
=
Z Z
(3)
2
2
= xgxgxg xgxgxg2 y ygygxg ygygx 2gy
x y
x y
y
y
2
g
g
g
Z = gxxgy xg2 y :
y
Even when ane motion is a good model, equation
5 is only approximately satised, because of the linearization of equation (4). However, the correct ane
change can be found by using equation 5 iteratively in
a Newton-Raphson style minimization [16].
During tracking, the ane deformation D of the
feature window is likely to be small, since motion between adjacent frames must be small in the rst place
for tracking to work at all. It is then safer to set D
to the zero matrix. In fact, attempting to determine
deformation parameters in this situation is not only
useless but can lead to poor displacement solutions:
in fact, the deformation D and the displacement d interact through the 4 2 matrix V of equation (6), and
any error in D would cause errors in d. Consequently,
when the goal is to determine d, the smaller system
Zd = e
(7)
should be solved, where e collects the last two entries
of the vector a of equation (5).
When monitoring features for dissimilarities in
their appearance between the rst and the current
frame, on the other hand, the full ane motion system
(5) should be solved. In fact, motion is now too large
to be described well by the pure translation model.
Furthermore, in determining dissimilarity, the whole
transformation between the two windows is of interest, and a precise displacement is less critical, so it
is acceptable for D and d to interact to some extent
through the matrix V .
In the next two sections we discuss these issues
in more detail: rst we determine when system (7)
yields a good displacement measurement (section 4)
and then we see when equation (5) can be used reliably to monitor a feature's quality (section 5).
VT
4 Texturedness
Regardless of the method used for tracking, not all
parts of an image contain complete motion information (the aperture problem): for instance, only the vertical component of motion can be determined for a
horizontal intensity edge. To overcome this diculty,
researchers have proposed to track corners, or windows with a high spatial frequency content, or regions
where some mix of second-order derivatives is suciently high. However, there are two problems with
0.02
dissimilarity
5 Dissimilarity
0.015
0.01
0.005
0
0
10
frame
12
14
16
18
20
6 Convergence
The simulations in this section show that when the
ane motion model is correct our iterative tracking
algorithm converges even when the starting point is
far removed from the true solution. The rst series of
simulations are run on the four circular blobs shown
in the leftmost column of gure 6. The three motions of table 1 are considered. To see their eects,
compare the rst and last column of gure 6. The images in the last column are the images warped, translated, and corrupted with random Gaussian noise with
a standard deviation equal to 16 percent of the maximum image intensity. The images in the intermediate
columns are the results of the deformations and translations to which the tracking algorithm subjects the
images in the leftmost column after 4, 8, and 19 iterations, respectively. The algorithm works correctly,
and makes the images in the fourth column of gure
6 as similar as possible to those in the fth column.
0.1
0.05
1.5
0.5
0
0
0
0.1
10
0
0
0.1
0
0
3
20
0.05
10
1.5
10
0
0
3
20
0
0
1
20
10
20
20
0.5
10
0
0
0.5
10
20
1.5
1
0.05
0
0
1.5
10
0
0
20
0
10
0
0
20
Dissimilarity (1st
10
20
Figure 7:
column), displacement error (2nd ), and deformation error (3rd ) versus iteration
number for gure 6. The last two columns are displacements and deformations computed during tracking, starting from zero. See text for units.
True
Deformation
1 409 ,0 342
:
Computed
Deformation
1 393 ,0 334
0:342
0:563
0:338
,0 342
0:658
0:342
0:670
0:319
0:658
0:809
0:342
0:253
1:232
0:569
,0 343
:
0:660
0:802
0:351
0:235
1:227
True
Computed
Translation Translation
3
0
2
0
3
0
3:0785
,0:0007
2:0920
0:0155
3:0591
0:0342
0.2
0.1
0.5
0
0
20
40
0
0
20
40
0
0
20
40
1
0
0.1
0
-0.6
10
20
7 Monitoring Features
This section presents some experiments with real
images and shows how features can be monitored during tracking to detect potentially bad features. Figure
10 shows the rst frame of a 26-frame sequence. A
Pulnix camera equipped with a 16mm lens moves forward 2mm per frame. Because of the forward motion,
features loom larger from frame to frame. The pure
translation model is sucient for inter-frame tracking but not to monitor features, as discussed below.
Figure 11 displays the 102 features selected according to the criterion introduced in section 4. To limit
the number of features and to use each portion of the
image at most once, the constraint was imposed that
no two feature windows can overlap in the rst frame.
Figure 12 shows the dissimilarity of each feature under
the pure translation motion model, that is, with the
deformation matrix D set to zero for all features. This
dissimilarity is nearly useless for feature monitoring:
except for features 58 and 89, all features have comparable dissimilarities, and no clean discrimination can
be drawn between good and bad features.
From gure 13 we see that features 58 is at the
boundary of the block with a letter U visible in the
lower right-hand side of the gure. The feature window straddles the vertical dark edge of the block in the
foreground as well as parts of the letters Cra in the
word \Crayola" in the background. Six frames of this
window are visible in the third row of gure 14. As the
camera moves forward, the pure translation tracking
stays on top of approximately the same part of the image. However, the gap between the vertical edge in the
foreground and the letters in the background widens,
and it becomes harder to warp the current window
into the window in the rst frame, thereby leading
Figure 11: The features selected according to the texturedness criterion of section 4.
0.25
58
0.2
89
dissimilarity
0.15
0.1
21
3
78 30
53
4 1
0.05
24 60
0
0
10
15
frame
20
25
30
Figure 12: Pure translation dissimilarity for the features in gure 11. This dissimilarity is nearly useless
for feature discrimination.
53
30
78
21
1
4 3
60 24
58
0.045
0.04
dissimilarity
0.035
21
89
58
0.03
78
0.025
0.02
58
0.015
60
24
3
4
0.01
60
0.005
0
0
78
89
11
16
21
26
21 53
30 1
5
10
15
frame
20
25
30
image's pixel size (the feature window is 25 25 pixels). The matching between one frame and the next is
haphazard, because the characters in the lettering are
badly aliased. This behavior is not a problem: erratic
dissimilarities indicate trouble, and the corresponding
features ought to be abandoned.
8 Conclusion
In this paper, we have proposed a method for feature selection, a tracking algorithm based on a model
of ane image changes, and a technique for monitoring features during tracking. Selection specically
maximizes the quality of tracking, and is therefore optimal by construction, as opposed to more ad hoc measures of texturedness. Monitoring is computationally
inexpensive and sound, and helps discriminating between good and bad features based on a measure of
dissimilarity that uses ane motion as the underlying
image change model.
Of course, monitoring feature dissimilarity does not
solve all the problems of tracking. In some situations,
a bright spot on a glossy surface is a bad (that is,
nonrigid) feature, but may change little over a long
sequence: dissimilarity may not detect the problem.
However, even in principle, not everything can be decided locally. Rigidity is not a local feature, so a local
method cannot be expected to always detect its violation. On the other hand, many problems can indeed
be discovered locally and these are the target of the
investigation in this paper. Our experiments and simulations show that monitoring is indeed eective in
realistic circumstances. A good discrimination at the
beginning of the processing chain can reduce the remaining bad features to a few outliers, rather than
leaving them an overwhelming majority. Outlier detection techniques at higher levels in the processing
chain are then more likely to succeed.
References
[1] P. Anandan. A computational framework and an
algorithm for the measurement of visual motion.
IJCV, 2(3):283{310, 1989.
[2] P. J. Burt, C. Yen, and X. Xu. Local correlation measures for motion analysis: a comparative
study. IEEE CPRIP, 269{274, 1982.
[3] C. Caorio and F. Rocca. Methods for measuring
small displacements in television images. IEEE
Trans. IT-22:573{579, 1976.
[4] D. J. Connor and J. O. Limb. Properties of framedierence signals generated by moving images.
IEEE Trans. COM-22(10):1564{1575, 1974.
[5] L. Dreschler and H.-H. Nagel. Volumetric model
and 3d trajectory of a moving car derived from
monocular tv frame sequences of a street scene.
IJCAI, 692{697, 1981.
[6] W. Forstner. Reliability analysis of parameter estimation in linear models with applications to mensuration problems in computer vision. CVGIP, 40:273{310, 1987.
[7] W. Forstner and A. Pertl. Photogrammetric Standard Methods and Digital Image Matching Techniques for High Precision Surface Measurements.
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]