Sunteți pe pagina 1din 7

Support Vector Classi cation of Landmarks

for a Mobile Robot


Ilkka Autio Tapio Elomaa Teemu Kurppay
Department of Computer Science, P. O. Box 26,
FIN-00014 University of Helsinki, Finland

Abstract Mobile robot navigation is often based The robot gathers images of interesting ob-
on the use of landmarks distributed in the operation jects from di erent distances and perspectives.
environment of the robot. They can be simplistic These observations are labeled and given as a
and engineered to ease the detection but, ideally, a sample to a learning algorithm implemented by
robot should only use landmarks that are naturally Support Vector Machines (or SVMs) [11, 4] for
found in its environment. We present an approach classi er formation. The output classi ers are
in which a robot learns to recognize landmarks based used to classify subsequent observations into
on their silhouettes, which can be essentially ar- the landmark categories. We present initial ex-
bitrary and depend on the viewpoint of the robot. periments with a real robot platform.
While the problems of real-time image segmentation
force us to use less than completely natural land- SVMs have been successfully applied to im-
marks, the approach still o ers improvements over age [6, 2] and hand-written digit [3, 11] recogni-
techniques that only take into account simplistic tion. Our approach di ers from these studies
features of landmarks. Empirical evaluation with in many respects. In our work segmentation
a real robot demonstrates that the detection times and object recognition use only color values of
are viable for real-time applications. pixels. Also the approach of Chapelle et al.
[2] is based on using color and luminescence
information, but concerns recognition of whole
1 Introduction images from color histograms. We segment im-
ages as 32  32 bitmaps, which are given to a
Visual object recognition typically requires two SVM as a vector, without assigning any further
more or less distinct steps: in the image seg- information to the data. In [6] similar repre-
mentation the object boundaries are estimated sentation of images was used together with the
and in the recognition the segments are classi- grey level information of each pixel. However,
ed. For a mobile robot, which is supposed segmentation was not taken into account at all;
to negotiate its environment in real-time, e- the images were perfectly segmented objects.
ciency is a major concern. Therefore, it cannot
a ord to do expensive preprocessing of the ob- Section 2 of this paper discusses the prob-
servations, but has to content with a straight- lem setting in visual object recognition and
forward segmentation approach. some suggested methods for it. In Section 3
We develop a landmark detection system for we brie y describe the segmentation approach
a mobile robot. The segmentation method that used in our experiments. An overview of SVMs
we use is based on detection of colored objects. is given in Section 4. Empirical experiments
 Work supported by Helsinki Graduate School in
with the landmark detection system are pre-
Computer Science and Engineering. sented in Section 5. Finally, we give the con-
y Work supported by the Univ. of Helsinki project cluding remarks of this study and outline some
20562: \Intelligent Control of an Autonomous Robot". future directions.
2 The Problem Setting cant loss of information. Finding such general
purpose features is dicult, since the value of
In real-world environments the segmentation information is task speci c and depends heav-
problem cannot be reduced into any simple cri- ily on the types of the objects allowed. It
teria that could be evaluated separately for ev- is tempting to resort to environmental engi-
ery disjoint part of the image [9]. Consider, for neering by disallowing all objects that cannot
example, an image containing a small spheri- be classi ed well with the chosen set of fea-
cal area with several owers in it. One can ar- tures. In combination with a simple segmen-
gue that there should be only two segments| tation system, the use of simple features may
the implicit \sphere" of owers and the back- work well in some specialized environments.
ground. Still, a segmentation with each ower We propose to avoid the feature selection
as its own segment is hardly wrong. problem by using SVMs, which can circumvent
For instance, Shi and Malik [9] manage to the curse of dimensionality with small samples
make the normalized cut problem tractable by of data. Recently there have been interesting
approximating it with a generalized eigensys- results in the application of SVMs in image
tem problem. Nevertheless, the computation recognition [6, 2, 8]. These results are not di-
still takes several minutes for a single low res- rectly applicable to robotics, because they do
olution image. Computation is slow even with not address the segmentation problem, even
additional heuristics and a specialized eigen- though it is in reality tightly coupled with the
solver. Other apparently realistic and fairly recognition problem. Moreover, we cannot as-
general approaches, such as the one in [5], also sume that whole objects or no objects at all
seem to require quite heavy computations. are always seen; the robot must be able to cope
As sophisticated segmentation seems too ex- with partially visible objects as well.
pensive for real-time robotics, one often resorts In a model-based approach a library of 3D
to environmental engineering that enables the object models is available and the aim is to nd
use of strong prior knowledge about the ob- the one which has a planar projection resem-
jects. For example, color coding of the ob- bling some segment of the image. Projection
jects eases the problem by several orders of tting is computationally expensive. Also, in
magnitude. If we are satis ed with the as- navigation it suces to recognize the objects
sumption that adjacent pixels having the same from several viewpoints, but it is not neces-
color belong to the same segment, we can nd sary to know how those viewpoints relate to
them by joining two pixel subsets which con- each other. We only need to know the class
tain adjacent pixels of the same color. There and the approximate position of the object, but
are, of course, more advanced variations of this its orientation with respect to the robot is of
color-based region-growing approach [1]. In no consequence as long as we can detect ori-
any case, the segmentation problem disappears entation from the relative positions of several
when we have a segmentation rule that can be visible objects.
evaluated for each pixel locally. We choose a view-based approach [8] and
After image segmentation, the robot has to record multiple images of all interesting objects
classify the segments in order to obtain an \un- from various viewpoints. We also use di erent
derstanding" of image content; i.e., to deter- instances of the object class. The resulting im-
mine if there are any signi cant objects present ages are then normalized into 1024-dimensional
and where they are relative to the robot. The binary vectors, which are essentially low reso-
class, if any, must be determined by a function lution images. These raw vectors are used to
of the features that are calculated from the raw train one SVM for each object class, which then
pixel data of the segment. acts as a classi er for its class.
The purpose of the features is to reduce the Our experiments show that the collection of
dimensionality of the data without any signi - classi ers almost always gives reliable and un-
ambiguous results. Especially the very small 4 Support Vector Learning
number of false positives is important; in re-
alistic environments the vast majority of ob- SVMs are a fairly new class of machine learn-
jects do not belong into any class. In cases of ing algorithms [3, 11]. They fall under the cat-
ambiguity, we propose that the robot should egory of kernel-based methods [4]. Character-
use active exploration to resolve it; i.e., get istic properties of SVMs are that they lack lo-
closer to take another look. The 1-against-all cal minima, have a sparse solution, and are di-
selection used here is sometimes seen inferior mension independent. Taken together all this
to 1-against-1 tournament selection [6, 8], but makes SVMs an attractive approach to use in
this does not apply in our problem domain| applications such as machine vision.
tournament selection always picks a winner,
whereas the possibility of no winner is crucial 4.1 Background
in our domain. Consider the following binary classi cation
task. Given a sample of k training examples
3 Segmentation S = ((x1 ; y1 ); : : : ; (x ; y )), where each x 2 X
k k i

and y 2 f 1; 1g. The n-dimensional space X


i

We use a primitive color-based segmentation is the input space. The task is to formulate a
in which all approximately red objects are ex- hypothesis on the basis of the sample for clas-
tracted from the image by making adjacent sifying further instances from the input space.
red pixels belong to the same set. No fur- The hypotheses that we consider are linear sep-
ther clustering is done|if close by red areas arators, i.e., hyperplanes.
do not touch, they are permanently separate The classical perceptron algorithm is guar-
segments. At the training stage the largest anteed to converge in a nite number of itera-
red segment larger than a prede ned thresh- tions provided that the input binary sample is
old (say 50 pixels) is normalized into a 32  32 linearly separable. Perceptron is an example of
binary matrix, hence discarding the intensities a linear machine ; it outputs a linear function
and precise color values. For the SVMs the of the input instance z:
manually classi ed matrices are simply 1024-
f (z) = hw  zi + b =
X w z + b;
n

dimensional binary vectors. The choice of the i i

largest segment is simply a convenience|in i =1

real applications all visible segments must be where w = (w1 ; : : : ; w ) is a weight vector,
n

classi ed separately. which together with parameter b de nes a hy-


In this color-based segmentation scheme, perplane in the input space. It is learned from
partially visible objects can be handled in a the training examples, and
P thus is a linear com-
trivial manner. If the bounding box of a seg- bination of them: w = =1 y x . The num-
k
i i i

ment touches the outer edges of the camera im-


i

ber of iterations required to learn the hypoth-


age, the robot can either discard the segment esis depends on the (geometric) margin of the
altogether or turn the camera into appropri- training set, which is the maximum Euclidean
ate direction after classifying all the certainly distance of the instances from any hyperplane.
visible objects. Rosenblatt's on-line, mistake-driven proce-
We do not try to build the best possible dure for training a perceptron works by adding
color-based segmentation system here. The misclassi ed positive training examples to or
roughness of the segmentations highlights the subtracting misclassi ed negative ones from an
robustness of the SVM classi cation results in initial zero weight vector. Hence, once a sam-
the later sections. Close-to-perfect segmenta- ple S has been xed, the vector can be
tion algorithms are not necessary for the land- thought of as an alternative representation of
mark detection based on SVMs to work. the hypothesis in dual representation.
An alternative learning scheme projects the
data through a non-linear mapping to a N -
dimensional feature space F , instead of oper-
ating on the input space. I.e., there is some
xed nonlinear map  : X ! F . Usually the
dimension of F is large compared to that of the
input space. The set of hypotheses considered
in F are again linear machines of type

f (z) =
X w  (z) + b:
N

=1
i i
Figure 1: Nomad Super Scout II, a small mo-
bile robot equipped with a color video camera.
i

Operating in a very high-dimensional feature


space is inecient. However, since linear ma- and b solve the following quadratic optimiza-
chines can be represented in a dual representa- tion problem: Maximize
tion, the hypothesis can be expressed implicitly
as a linear combination of the training points. X 1 X y y hx  x i
W ( ) =
k k

The decision rule can, then, be evaluated us- 2 =1


i i j i j i j

ing just inner products between the test point =1


i

P
i;j

z and the training points x1 ; : : : ; x : k with constraints =1 y = 0;  0; i =


k
i i i i

1; : : : ; k .
X y h(x )  (z)i + b: Then the decision rule given by the sign of
f (z) = P
k

the function f (z) = =1 y  hx  zi + b is


i i i
k
i =1 i i i i

equivalent to the maximal margin hyperplane


To compute the feature space inner product implicitly de ned by the kernel K (x; z).
h(x )  (z)i directly as a function of the input Only for inputs x , which lie closest to the
points we use kernel functions. For all x; z 2 X
i

hyperplane, are the corresponding  non-zero.


i

a kernel K has K (x; z) = h(x)  (z)i, where They are called support vectors. Let sv denote
i

 : X ! F . For the detailed requirements of the set of support vectors in S . The maximal
kernel functions we refer the reader to the SVM margin hyperplane has geometric margin =
literature [11, 4]. P
( 2SV  ) 1 2 .
i
=
i

Technical properties of kernels ensure that


4.2 Support Vector Machines the optimization problem is convex, which in
Vapnik and Chervonenkis' [11] theory of learn- turn means that the maximal margin optimiza-
ing bounds the generalization error of linear tion problem has a unique solution that can be
machines in terms of the margin of the hypoth- found eciently. There are no local minima.
esis with respect to the sample. This result Maximal margin classi cation requires the
does not depend on the dimensionality of the data to be linearly separable, which is not usu-
feature space. By enforcing conditions from ally the case in noisy real-world data. There-
optimization theory, the dual representation of fore, the strict requirement of linear separabil-
the hypothesis is sparse and, hence, produces ity has to be relaxed. The theory behind such
ecient algorithms. machines has also been worked out [11, 4].
Taking all together, the basis of the max-
imal margin classi er is in the following re-
sult from optimization theory. Given a sample
5 Empirical Evaluation
S = ((x1 ; y1 ); : : : ; (x ; y )) that is linearly sep-
k k Our experiments consist of collecting images of
arable in the feature space implicitly de ned by di erent objects, extracting normalized matri-
kernel K (x; z) and suppose that parameters  ces, and training and testing SVMs using the
corresponding vectors. Images of objects were
collected by a mobile robot with a vision sys-
tem. They were captured under slightly vary-
ing lighting conditions from several viewpoints. Figure 2: Observations of the number 8 and
The robotic equipment used in our empirical the letter A.
experiments is a small-sized robot with a color
video camera and a pan-and-tilt unit mounted Table 1: Results of 5-fold cross-validation let-
on top (see Fig. 1). ter recognition test. Column TP gives the per-
The segmentation method is quite fast, but centage of correctly classi ed observations for
produces rather rough results. From one ob- the SVM in question. Column false positive
ject 20 to 40 di erent images were collected lists the percentages of di erent observations
and a SVM was trained using the correspond- accepted by the machine.
ing vectors as positive examples and those of Class TP False Positive
some other objects as negative examples. We A 97 2 (None)
also used a special class of non-object observa- B 94 1 (8)
tions, which were formed of random color seg- C 100
D 98
ments from real vision data. Five-fold cross- O 99
validation was used to bring variation to the P 95
training and test data. In selecting vectors we R 99 8 (None)
ensured that from each class both training and S 97
test examples were selected. 8 92
The SVM used in these experiments is a
publicly available C++ implementation [7]. 5.1 Letter Recognition
The kernel function used is K (x; z) = hx  zi2 .
Hence, the 1024-bit examples are implicitly Letters cut from red paper were used as land-
mapped to a feature space of dimensionality marks that can easily be extracted from the
approximately one million. This kernel func- images. Although each extracted letter seg-
tion was chosen more or less arbitrarily from ment usually looks like the letter it represents,
earlier work [8]. The performance of SVMs is the actual form of the segment varies with the
often thought to be highly sensitive to the se- view angle the image was taken from. In some
lection of the kernel function. We obtained ex- cases the simple segmentation method fails to
cellent results without paying much attention maintain whole character as one segment, and
the the choice of the kernel. However, it had thus produces di erent results (see Fig. 2).
already been tested in image recognition task. A SVM was trained for each letter using ob-
Two sets of experiments are reported in servations of the particular letter as positive
the following. The rst one concerns recog- examples, and the vectors of other letters and
nition of paper letters. Letters represent en- non-objects as negative examples. Then each
gineered landmarks easily extracted from the SVM was tested against examples of all letters
images. The other set concerns the recogni- and non-objects. The characters used in this
tion of real-world objects, which are harder test are shown in Table 1.
to extract, but represent more closely natu- Results for this test set are given in Table
ral landmarks. Landmarks of the rst cate- 1. Despite variation in the training data, test
gory are at 2D objects, while those of the sec- examples are classi ed almost perfectly. There
ond category are 3D objects, which may look are some false negatives in most of the classes,
completely di erent when viewed from di er- but the number of false positives is notably low.
ent sides or angles. Moreover, their re ecting The only exception is the SVM for the letter
properties change from point to point. Hence, R, which classi ed 8% of non-object examples
e.g., the recorded intensities vary. as positive. Many of these false positives were
Table 2: Results of real-world object recog-
nition test. False positive recordings are this
time listed for all object classes and non-object
(a) (b) (c) (d) (e) observations.
SVM TP False Positive
Figure 3: Observations of real-world objects, Cup 95 { / 2 / 0 / 1 / 0 / 8
a) Co ee cup, b) Large can, c) Small can, d) Ball 74 4 / { / 0 / 2 / 0 / 7
Football, and e) Christmas gure. L. Can 95 0 / 0 / { / 0 / 0 / 8
S. Can 95 0 / 2 / 0 / { / 0 / 0
Pixie 97 0 / 0 / 0 / 0 / { / 0

Figure 4: Observations of a football. ble 2. Again, most of the objects are recog-
nized with high accuracy; over 95% true posi-
tive success rate is achieved for all other objects
caused by one particular recording having some except the football. There is an increase in the
shape in common with the letter R. number of false positives when compared to the
SVMs can be used to recognize easily ex- character recognition task.
tractable landmarks. In robot navigation There is a dramatic rise in the number of
based on positioned landmarks, using sim- false negatives for the football. However, the
ple color-based segmentation and SVMs seems number of false positives stays within accept-
a viable alternative to ad-hoc feature-based able limits. The re ecting properties of the
methods. ball surface are, of course, as uneven as pos-
sible. The black dots on the red ball further
5.2 Real-World Object Recognition interfere with the segmentation.
Although we used very primitive segmen-
Real-world objects di er dramatically from en- tation method, SVMs classi ed real-world ob-
gineered landmarks. They are usually three di- servations with high accuracy. Provided that
mensional and, thus, look very di erent from there are enough examples of di erent typical
varying view angles. They also have varying segmentations of object, SVMs can overcome
coloring and surfaces that re ect light unevenly some of the problems involved in segmentation.
from di erent angles. In our test data there These results are very encouraging. A
were images of ve real-world objects: a dot- robot can be taught to recognize natural
ted co ee cup, a dotted football, two beverage landmarks|e.g., doors|in its environment.
cans of di erent sizes, and one red Christmas In exploration these landmarks can be used to
gure (see Fig. 3). build a map of the environment. Furthermore,
For real-world objects only few observations it seems that with SVMs the robot could easily
are ideal segmentations of the object. Our be taught to recognize task speci c objects.
simple segmentation method falls apart when
faced with more complicated objects. Vari- 5.3 Performance Issues
ation between recordings of the same object
is often large. It is particularly large in seg- Training and classi cation were run o -board
mented images of the football, as illustrated in for practical reasons. The speed of classi ca-
Fig. 4; it is hard to think any features common tion is adequate for our needs, and actually it
to all these observations. This further illus- was one of the main reasons, why we turned our
trates the strength of SVMs over feature-based attention to SVMs. Although our segmenta-
methods in real-world object recognition. tion method is quite fast, it still constitutes the
The results of this test set are given in Ta- time-consumption bottleneck of the system.
Our initial tests show that the number of age segmentation for interactive robots. In
positive training examples does not a ect the IEEE/RSJ International Conf. on Intelligent
accuracy of classi cation very much. As long Robots and Systems. IEEE, 2000.
as there are a couple of representative exam- [2] Olivier Chapelle, Patrick Ha ner, and
ples, classi cation of positive examples per- Vladimir Vapnik. Support vector machines for
forms well. However, the number of nega- histogram-based image classi cation. IEEE
tive training examples a ects the number of Transactions on Neural Networks, 10(5):
false positive classi cations. This suggest that 1055{1064, 1999.
when training SVMs for particular real-world [3] Corinna Cortes and Vladimir N. Vapnik. Sup-
object, one could use random segments from port vector networks. Machine Learning,
real-world video image as negative examples. 20(3): 273{297, 1995.
[4] Nello Cristianini and John Shawe-Taylor. An
6 Conclusion Introduction to Support Vector Machines and
Other Kernel-Based Methods. Cambridge Uni-
Our robot landmark detection system is based versity Press, 2000.
on learning interesting objects from labeled ex- [5] Thomas Ho man, Jan Puzicha, and Joachim
amples. It does not require an expensive pre- M. Buhmann. Unsupervised texture segmen-
processing of the images to work with excel- tation in a deterministic annealing framework.
lent accuracy. Perfect classi cation, contrary IEEE Transactions on Pattern Analysis and
to systems based on perfect segmentation, can- Machine Intelligence, 20(8): 803{818, 1998.
not be expected in real-time robotics. [6] Massimiliano Pontil and Alessandro Verri.
We should test the landmark detection sys- Support vector machines for 3D object recog-
tem in topological mapping. Our sonar-based nition. IEEE Transactions on Pattern Analy-
systems have been very brittle due to the lim- sis and Machine Intelligence, 20(6): 637{646,
ited ability to perceive detail and the inability 1998.
to detect moving objects or even closed doors [7] Ryan Rifkin. SvmFu Software. Massachusetts
reliably. With vision, localization can be based Institute of Technology, 2000. http:// ve-
more on the immediate perceptions and less on percent-nation.mit.edu/SvmFu/.
the experienced history of perceptions. Thus,
there is less need for any world or movement [8] Danny Roobaert and Marc M. van Hulle.
models to tie the past perceptions together. View-based 3D object recognition with sup-
port vector machines. In Proc. 1999 IEEE In-
In sonar-based perception places cannot be ternational Workshop on Neural Networks for
recognized without recording an intensive his- Signal Processing (pp. 77{84). IEEE, 1999.
tory of past movements and perceptions. This
leads to serious trouble as the role of odomet- [9] Jianbo Shi and Jitendra Malik. Normalized
ric error becomes larger. In mapping and nav- cuts and image segmentation. IEEE Trans.
igation, world and movement models become a on Pattern Analysis and Machine Intelligence,
22(8): 888{905, 2000.
necessity leading to complex and computation-
ally expensive solutions such as the one in [10]. [10] Sebastian Thrun, Wolfram Burgard, and Di-
I.e., it seems necessary to consider all possi- eter Fox. A probabilistic approach to con-
ble robot poses all the time instead of making current mapping and localization for mobile
periodic corrections based on map correlation. robots. Machine Learning, 31(1{3): 29{53,
1998.
[11] Vladimir N. Vapnik. Statistical Learning The-
References ory. John Wiley & Sons, 1998.
[1] James Bruce, Tucker Balch, and Manuela
Veloso. Fast and inexpensive color im-

S-ar putea să vă placă și