Documente Academic
Documente Profesional
Documente Cultură
Abstract Mobile robot navigation is often based The robot gathers images of interesting ob-
on the use of landmarks distributed in the operation jects from dierent distances and perspectives.
environment of the robot. They can be simplistic These observations are labeled and given as a
and engineered to ease the detection but, ideally, a sample to a learning algorithm implemented by
robot should only use landmarks that are naturally Support Vector Machines (or SVMs) [11, 4] for
found in its environment. We present an approach classier formation. The output classiers are
in which a robot learns to recognize landmarks based used to classify subsequent observations into
on their silhouettes, which can be essentially ar- the landmark categories. We present initial ex-
bitrary and depend on the viewpoint of the robot. periments with a real robot platform.
While the problems of real-time image segmentation
force us to use less than completely natural land- SVMs have been successfully applied to im-
marks, the approach still oers improvements over age [6, 2] and hand-written digit [3, 11] recogni-
techniques that only take into account simplistic tion. Our approach diers from these studies
features of landmarks. Empirical evaluation with in many respects. In our work segmentation
a real robot demonstrates that the detection times and object recognition use only color values of
are viable for real-time applications. pixels. Also the approach of Chapelle et al.
[2] is based on using color and luminescence
information, but concerns recognition of whole
1 Introduction images from color histograms. We segment im-
ages as 32 32 bitmaps, which are given to a
Visual object recognition typically requires two SVM as a vector, without assigning any further
more or less distinct steps: in the image seg- information to the data. In [6] similar repre-
mentation the object boundaries are estimated sentation of images was used together with the
and in the recognition the segments are classi- grey level information of each pixel. However,
ed. For a mobile robot, which is supposed segmentation was not taken into account at all;
to negotiate its environment in real-time, e- the images were perfectly segmented objects.
ciency is a major concern. Therefore, it cannot
aord to do expensive preprocessing of the ob- Section 2 of this paper discusses the prob-
servations, but has to content with a straight- lem setting in visual object recognition and
forward segmentation approach. some suggested methods for it. In Section 3
We develop a landmark detection system for we brie
y describe the segmentation approach
a mobile robot. The segmentation method that used in our experiments. An overview of SVMs
we use is based on detection of colored objects. is given in Section 4. Empirical experiments
Work supported by Helsinki Graduate School in
with the landmark detection system are pre-
Computer Science and Engineering. sented in Section 5. Finally, we give the con-
y Work supported by the Univ. of Helsinki project cluding remarks of this study and outline some
20562: \Intelligent Control of an Autonomous Robot". future directions.
2 The Problem Setting cant loss of information. Finding such general
purpose features is dicult, since the value of
In real-world environments the segmentation information is task specic and depends heav-
problem cannot be reduced into any simple cri- ily on the types of the objects allowed. It
teria that could be evaluated separately for ev- is tempting to resort to environmental engi-
ery disjoint part of the image [9]. Consider, for neering by disallowing all objects that cannot
example, an image containing a small spheri- be classied well with the chosen set of fea-
cal area with several
owers in it. One can ar- tures. In combination with a simple segmen-
gue that there should be only two segments| tation system, the use of simple features may
the implicit \sphere" of
owers and the back- work well in some specialized environments.
ground. Still, a segmentation with each
ower We propose to avoid the feature selection
as its own segment is hardly wrong. problem by using SVMs, which can circumvent
For instance, Shi and Malik [9] manage to the curse of dimensionality with small samples
make the normalized cut problem tractable by of data. Recently there have been interesting
approximating it with a generalized eigensys- results in the application of SVMs in image
tem problem. Nevertheless, the computation recognition [6, 2, 8]. These results are not di-
still takes several minutes for a single low res- rectly applicable to robotics, because they do
olution image. Computation is slow even with not address the segmentation problem, even
additional heuristics and a specialized eigen- though it is in reality tightly coupled with the
solver. Other apparently realistic and fairly recognition problem. Moreover, we cannot as-
general approaches, such as the one in [5], also sume that whole objects or no objects at all
seem to require quite heavy computations. are always seen; the robot must be able to cope
As sophisticated segmentation seems too ex- with partially visible objects as well.
pensive for real-time robotics, one often resorts In a model-based approach a library of 3D
to environmental engineering that enables the object models is available and the aim is to nd
use of strong prior knowledge about the ob- the one which has a planar projection resem-
jects. For example, color coding of the ob- bling some segment of the image. Projection
jects eases the problem by several orders of tting is computationally expensive. Also, in
magnitude. If we are satised with the as- navigation it suces to recognize the objects
sumption that adjacent pixels having the same from several viewpoints, but it is not neces-
color belong to the same segment, we can nd sary to know how those viewpoints relate to
them by joining two pixel subsets which con- each other. We only need to know the class
tain adjacent pixels of the same color. There and the approximate position of the object, but
are, of course, more advanced variations of this its orientation with respect to the robot is of
color-based region-growing approach [1]. In no consequence as long as we can detect ori-
any case, the segmentation problem disappears entation from the relative positions of several
when we have a segmentation rule that can be visible objects.
evaluated for each pixel locally. We choose a view-based approach [8] and
After image segmentation, the robot has to record multiple images of all interesting objects
classify the segments in order to obtain an \un- from various viewpoints. We also use dierent
derstanding" of image content; i.e., to deter- instances of the object class. The resulting im-
mine if there are any signicant objects present ages are then normalized into 1024-dimensional
and where they are relative to the robot. The binary vectors, which are essentially low reso-
class, if any, must be determined by a function lution images. These raw vectors are used to
of the features that are calculated from the raw train one SVM for each object class, which then
pixel data of the segment. acts as a classier for its class.
The purpose of the features is to reduce the Our experiments show that the collection of
dimensionality of the data without any signi- classiers almost always gives reliable and un-
ambiguous results. Especially the very small 4 Support Vector Learning
number of false positives is important; in re-
alistic environments the vast majority of ob- SVMs are a fairly new class of machine learn-
jects do not belong into any class. In cases of ing algorithms [3, 11]. They fall under the cat-
ambiguity, we propose that the robot should egory of kernel-based methods [4]. Character-
use active exploration to resolve it; i.e., get istic properties of SVMs are that they lack lo-
closer to take another look. The 1-against-all cal minima, have a sparse solution, and are di-
selection used here is sometimes seen inferior mension independent. Taken together all this
to 1-against-1 tournament selection [6, 8], but makes SVMs an attractive approach to use in
this does not apply in our problem domain| applications such as machine vision.
tournament selection always picks a winner,
whereas the possibility of no winner is crucial 4.1 Background
in our domain. Consider the following binary classication
task. Given a sample of k training examples
3 Segmentation S = ((x1 ; y1 ); : : : ; (x ; y )), where each x 2 X
k k i
We use a primitive color-based segmentation is the input space. The task is to formulate a
in which all approximately red objects are ex- hypothesis on the basis of the sample for clas-
tracted from the image by making adjacent sifying further instances from the input space.
red pixels belong to the same set. No fur- The hypotheses that we consider are linear sep-
ther clustering is done|if close by red areas arators, i.e., hyperplanes.
do not touch, they are permanently separate The classical perceptron algorithm is guar-
segments. At the training stage the largest anteed to converge in a nite number of itera-
red segment larger than a predened thresh- tions provided that the input binary sample is
old (say 50 pixels) is normalized into a 32 32 linearly separable. Perceptron is an example of
binary matrix, hence discarding the intensities a linear machine ; it outputs a linear function
and precise color values. For the SVMs the of the input instance z:
manually classied matrices are simply 1024-
f (z) = hw zi + b =
X w z + b;
n
real applications all visible segments must be where w = (w1 ; : : : ; w ) is a weight vector,
n
f (z) =
X w (z) + b:
N
=1
i i
Figure 1: Nomad Super Scout II, a small mo-
bile robot equipped with a color video camera.
i
P
i;j
1; : : : ; k .
X y h(x ) (z)i + b: Then the decision rule given by the sign of
f (z) = P
k
a kernel K has K (x; z) = h(x) (z)i, where They are called support vectors. Let sv denote
i
: X ! F . For the detailed requirements of the set of support vectors in S . The maximal
kernel functions we refer the reader to the SVM margin hyperplane has geometric margin
=
literature [11, 4]. P
( 2SV ) 1 2 .
i
=
i
Figure 4: Observations of a football. ble 2. Again, most of the objects are recog-
nized with high accuracy; over 95% true posi-
tive success rate is achieved for all other objects
caused by one particular recording having some except the football. There is an increase in the
shape in common with the letter R. number of false positives when compared to the
SVMs can be used to recognize easily ex- character recognition task.
tractable landmarks. In robot navigation There is a dramatic rise in the number of
based on positioned landmarks, using sim- false negatives for the football. However, the
ple color-based segmentation and SVMs seems number of false positives stays within accept-
a viable alternative to ad-hoc feature-based able limits. The re
ecting properties of the
methods. ball surface are, of course, as uneven as pos-
sible. The black dots on the red ball further
5.2 Real-World Object Recognition interfere with the segmentation.
Although we used very primitive segmen-
Real-world objects dier dramatically from en- tation method, SVMs classied real-world ob-
gineered landmarks. They are usually three di- servations with high accuracy. Provided that
mensional and, thus, look very dierent from there are enough examples of dierent typical
varying view angles. They also have varying segmentations of object, SVMs can overcome
coloring and surfaces that re
ect light unevenly some of the problems involved in segmentation.
from dierent angles. In our test data there These results are very encouraging. A
were images of ve real-world objects: a dot- robot can be taught to recognize natural
ted coee cup, a dotted football, two beverage landmarks|e.g., doors|in its environment.
cans of dierent sizes, and one red Christmas In exploration these landmarks can be used to
gure (see Fig. 3). build a map of the environment. Furthermore,
For real-world objects only few observations it seems that with SVMs the robot could easily
are ideal segmentations of the object. Our be taught to recognize task specic objects.
simple segmentation method falls apart when
faced with more complicated objects. Vari- 5.3 Performance Issues
ation between recordings of the same object
is often large. It is particularly large in seg- Training and classication were run o-board
mented images of the football, as illustrated in for practical reasons. The speed of classica-
Fig. 4; it is hard to think any features common tion is adequate for our needs, and actually it
to all these observations. This further illus- was one of the main reasons, why we turned our
trates the strength of SVMs over feature-based attention to SVMs. Although our segmenta-
methods in real-world object recognition. tion method is quite fast, it still constitutes the
The results of this test set are given in Ta- time-consumption bottleneck of the system.
Our initial tests show that the number of age segmentation for interactive robots. In
positive training examples does not aect the IEEE/RSJ International Conf. on Intelligent
accuracy of classication very much. As long Robots and Systems. IEEE, 2000.
as there are a couple of representative exam- [2] Olivier Chapelle, Patrick Haner, and
ples, classication of positive examples per- Vladimir Vapnik. Support vector machines for
forms well. However, the number of nega- histogram-based image classication. IEEE
tive training examples aects the number of Transactions on Neural Networks, 10(5):
false positive classications. This suggest that 1055{1064, 1999.
when training SVMs for particular real-world [3] Corinna Cortes and Vladimir N. Vapnik. Sup-
object, one could use random segments from port vector networks. Machine Learning,
real-world video image as negative examples. 20(3): 273{297, 1995.
[4] Nello Cristianini and John Shawe-Taylor. An
6 Conclusion Introduction to Support Vector Machines and
Other Kernel-Based Methods. Cambridge Uni-
Our robot landmark detection system is based versity Press, 2000.
on learning interesting objects from labeled ex- [5] Thomas Homan, Jan Puzicha, and Joachim
amples. It does not require an expensive pre- M. Buhmann. Unsupervised texture segmen-
processing of the images to work with excel- tation in a deterministic annealing framework.
lent accuracy. Perfect classication, contrary IEEE Transactions on Pattern Analysis and
to systems based on perfect segmentation, can- Machine Intelligence, 20(8): 803{818, 1998.
not be expected in real-time robotics. [6] Massimiliano Pontil and Alessandro Verri.
We should test the landmark detection sys- Support vector machines for 3D object recog-
tem in topological mapping. Our sonar-based nition. IEEE Transactions on Pattern Analy-
systems have been very brittle due to the lim- sis and Machine Intelligence, 20(6): 637{646,
ited ability to perceive detail and the inability 1998.
to detect moving objects or even closed doors [7] Ryan Rifkin. SvmFu Software. Massachusetts
reliably. With vision, localization can be based Institute of Technology, 2000. http://ve-
more on the immediate perceptions and less on percent-nation.mit.edu/SvmFu/.
the experienced history of perceptions. Thus,
there is less need for any world or movement [8] Danny Roobaert and Marc M. van Hulle.
models to tie the past perceptions together. View-based 3D object recognition with sup-
port vector machines. In Proc. 1999 IEEE In-
In sonar-based perception places cannot be ternational Workshop on Neural Networks for
recognized without recording an intensive his- Signal Processing (pp. 77{84). IEEE, 1999.
tory of past movements and perceptions. This
leads to serious trouble as the role of odomet- [9] Jianbo Shi and Jitendra Malik. Normalized
ric error becomes larger. In mapping and nav- cuts and image segmentation. IEEE Trans.
igation, world and movement models become a on Pattern Analysis and Machine Intelligence,
22(8): 888{905, 2000.
necessity leading to complex and computation-
ally expensive solutions such as the one in [10]. [10] Sebastian Thrun, Wolfram Burgard, and Di-
I.e., it seems necessary to consider all possi- eter Fox. A probabilistic approach to con-
ble robot poses all the time instead of making current mapping and localization for mobile
periodic corrections based on map correlation. robots. Machine Learning, 31(1{3): 29{53,
1998.
[11] Vladimir N. Vapnik. Statistical Learning The-
References ory. John Wiley & Sons, 1998.
[1] James Bruce, Tucker Balch, and Manuela
Veloso. Fast and inexpensive color im-