Sunteți pe pagina 1din 6

International Conference on Control, Automation and Systems 2008 Oct.

14-17, 2008 in COEX, Seoul, Korea

Vision Based Obstacle Detection for Wheeled Robots


Tilman Wekel, Olaf Kroll-Peters, Sahin Albayrak
DAI-Labor, Technische Universitt Berlin, Faculty IV - Electrical Engineering and Computer Science (Tel : +49-30-314-74104; E-mail: {tilman.wekel|olaf.kroll-peters|sahin.albayrack}@dai-labor.de) Abstract: In this paper a real time obstacle detection system for robots is presented. The technique is based on a single camera and no further sensors or encoders are required. The algorithm is independent of geometry and even moving objects can be detected. The system provides a top view map of the robots field of view in real-time. First, the images are segmented reasonably. Ground motion estimation and stereo matching is used to classify each segment either belonging to the ground plane or belonging to an obstacle. The resulting map is used for further navigational processing like obstacle avoidance routines, path planning or static map creation. The algorithm has been tested successfully on several platforms. Keywords: navigation, robotics, computer vision, obstacle detection.

1. INTRODUCTION
The integration of mobile devices in indoor environments is one of the most promising and challenging parts of robotics. Compared to stationary AI technologies, robots have many advantages. As mobile agents they are not just able to move around and find their way in an apartment but also to interact with the environment through mounted effectors and kinematics. As a close and equivalent partner they recognize inhabitants, provide customized support in everyday life and have detailed information about each user. Well equipped with a huge knowledgebase and various skills, a robot represents an independent individual. To take care of his duties and responsibilities the robot must be able to navigate in the environment autonomously. Robots acting in indoor environments have to be aware of a big variety of dynamic obstacles which are often unpredictable. Unlike map-based approaches, map-less navigation uses no explicit representation of the environment but concentrates on the recognition and pursuit of particular objects like obstacles or items the robot wants to interact with [1]. The robot is able to react to its local environment. Obstacle detection is closely related to map-less navigation methods. A moving robot has to be aware of the local situation and avoid areas which are impossible to traverse. The proposed system provides an accurate 2D map of the robots field of view in order to control and navigate the robot without any interference with the environment. Compared to related works our approach doesnt rely on any further sensors except a single camera system [2] [3]. There are several advantages using our approach. A simple camera is easy to handle and a low cost product. Common approaches are often based on devices like laser range finders or stereo rigs that are very expensive and their setup and configuration is difficult to manage [4]. The paper is structured as follows: The next section describes the indoor scenario and necessary assumptions are mentioned. The third section presents the actual approach and derives a mathematical expression for

elevation estimation. Section three gives a closer look to implementation details while the evaluation results are discussed in section five. The presented approach is summarized with a brief conclusion in the last section.

2. PROBLEM FORMULATION
Consider a wheeled robot with a single camera in an indoor environment. In this approach the ground surface is assumed to be completely planar which is usually satisfied in indoor environment [5]. The robot is able to access an area as long as there are no objects like walls, tables or chairs the robot could collide with. Every point which extrudes from the ground surface is defined as an obstacle. Figuratively speaking, all objects which are either above or below this planar surface are needed to be detected by the robot [3]. The system is vision based and no further sensor data is required. A camera projection is a process where one dimension gets lost and several images are needed to reconstruct three dimensional structures. But even with the use of multiple images there is no unique relation between camera and real world coordinate system. Some prior knowledge is necessary to overcome the problem of ambiguity. The planar world assumption induces a homographic relation between projected ground points within two images [6]. In regard to typical indoor applications of robots it is assumed that they are not able to fly. Consequently, there is no changing in height because of the planar ground surface which defines a (z=0) - plane. Furthermore, the vehicle has only one rotation axis which is orthogonal to the ground plane. In total, only three degrees of freedom are left. The position of the camera is fixed relative to the ground plane and can be estimated by extrinsic camera calibration. The discussed preconditions and assumptions lead to a mathematical model of the problem and can be seen in Fig. 1. The situation is shown from side view for clearness. The camera is represented by its center c and an image plane a. The origin of the world coordinate system is defined by the

1587
Authorized licensed use limited to: University of Missouri Libraries. Downloaded on September 3, 2009 at 15:15 from IEEE Xplore. Restrictions apply.

intersection of the perpendicular of the camera center and the ground plane.
planar translation z rotation

det( P ) = ( f zt )

(4)

z y x ground plane

obstacles

Fig. 1 Technical setup

Obviously the matrix is invertible unless there is a z translation component which is set to zero. A typical image acquired from the robots camera system is shown in Fig.2. The adjacent image is the result of inverse projection. The chessboard is only necessary for initial calibration but is dispensable for later application. The square shaped structure of the tiles on the ground is recovered properly.

3. APPROACH TO DETECT OBSTACLES


In order to extract information about elevation, a representation or describing parameter for the height of an object need to be derived from available image data. Introductorily, the planar world model is presented as well as the inverse projection technique. Then the so-called zero disparity and its relation to height of objects relative to the ground plane are identified. Finally the obstacle map can be computed through inverse projection transformation [7]. 3.1 Inverse Projection A camera transforms a 3D point into its 2D correspondence on an image plane. Obviously one dimension gets lost and thus it is not possible to reverse this mathematical operation. However, if all points in the scene are on a plane and have a Z component which is set to zero the according equation is:

Fig. 2 Inverse projection of ground surface


3.2 World Plane Homography A multiple view perspective is motivated by obtaining data for scene reconstruction tasks. A stereo setup can either be a stereo rig or a single camera acquiring two images sequentially. The mathematical background is quite similar since a moving single camera taking two images at different position can be modeled as a simple stereo rig. Fig.3 describes a simple stereo camera setup. This setup is motivated by the problem of searching corresponding pixels between the two images.
world plane

x = P X = K R [ I | C ]

(1)

f x = 0 0

0 f 0

X Y p y 0 1 0 yt Z = 0 1 0 0 1 zt 1
px 1 0 0 xt yt f yt p y Y zt xt f zt p x X

(2)

f 0 x= 0

0 f 0

(3)

pi x e c

H pi' x' e' c'

The camera is located above the center of the world coordinate system and has identical orientation. The rotation component is then equal to the identity matrix. Since Z=0, the third column becomes irrelevant and can be removed. The projective transformation is degenerated to a (3,3)-matrix. The current matrix is capable of projecting a 3D point belonging to a planar surface (X,Y,Z=0) onto an image plane. In order to reverse this process the inverted matrix P-1 needs to be computed. There are several properties of an invertible matrix but the determinant criterion is most easy to test. The matrix P is invertible if det(P)0. The determinant of P is composed as follows:

Fig. 3 Epipolar geometry in context of a world plane The setup consists of two cameras which are represented by their images planes (,) and their centers c and c. The world point X is visible for both cameras and its projection on the image planes is described by rays. It is assumed that all points in the scene are lying on a planar surface called the world plane. Now the problem of re-projection is degenerated to the question of how projection rays will intersect with this plane. The situation is easier to handle and there is a linear relation between a plane point X and its projection x which is derived in the following tract. Consider a

1588
Authorized licensed use limited to: University of Missouri Libraries. Downloaded on September 3, 2009 at 15:15 from IEEE Xplore. Restrictions apply.

stereo setup and corresponding image planes as shown above. There is a linear transformation between x and x. This correspondence is defined by a homography H that is derived in the following. First the point x is re-projected on the plane. Then, it is projected on the second image plane.
x ' = P ' P x
1

(5)

The camera projection matrix P describes the internal and external parameters representing a unique relation between a real world point X and its projection x. It consists of several components and can be decomposed as follows: P = K [R | t] t = R C (6)

ground point correspondences. At least only two points are necessary. Fig. 4 shows the re-projected version of two images which have been taken at slightly different positions. The dotted and non-dotted crosshairs indicate correspondences found by the motion tracking module. These point sets are identical up to an isotropic transformation that can easily be computed. The approach does not rely on any special floor pattern. The square shaped tiles are for illustration only. The next image in Fig. 5 shows the mismatch between actual displacement (indicated by cross markers) and the displacement which has been estimated by linear transformation (indicated by circle markers).

Now the stereo rig model is replaced by a single moving camera taking images at different positions. For the sake of simplicity the world origin is located somewhere on the ground plane but it never changes its relative position and orientation to the camera. Pictorially speaking, the ground plane is moving instead of the camera. Only one camera is used and its position or orientation relative to the introduced coordinate system does not change. This fact leads to identical camera matrices. P = P ' = K [ R | c] (7)

Fig. 4 Inverse projected images and tracking points

The degrees of freedom of camera motion are reduced to three due to degenerated motion constraints mentioned in the introduction section. It can be described by a translation and rotation component. T = R (I | t) (8) sin cos 0 0

cos R = sin 0

1
0

x t= y 0

Fig. 5 Mismatch of ground motion estimation (9)


3.3 Zero Disparity

The linear transformation xx can now be decomposed into inverse projection, transformation of resulting ground points and projection on the image plane of the second camera. x ' = P T P x = H x
1

Consider the given draft. Fig. 6 shows the two dimensional version of the model that has been introduced previously.
x
camera 1 camera 2

(10)
xc1

mismatch (zero disparity) pX'

Before the points are translated and projected they need to be re-projected described by the inverted camera matrix P-1. The relation between x and x can be derived and interpreted in different ways. However, the estimation of the missing three parameters can conveniently be seen as an estimation of an isotropic transformation T. Based on the presented interpretation the transformation needs to be fitted to a group of 2D

xc2

pX

x zc1

X X' z

zc2

z z'

Fig. 6 Mathematical model

1589
Authorized licensed use limited to: University of Missouri Libraries. Downloaded on September 3, 2009 at 15:15 from IEEE Xplore. Restrictions apply.

For the sake of clarity the degrees of translation are reduced to z-motion and rotation is not possible. Point X has a non-zero height component and is projected to the first camera. If the point is assumed to be on the ground plane (X) its projection x on the second camera image could be estimated by the linear transformation which has been presented in the previous chapter. Unfortunately its estimated position derives from its actual position. The resulting mismatch can be seen as an expression for the height of X above ground level [3]. The analytic relation between mismatch and height is derived in the following. The scene is observed by the first camera and X is the corresponding ground point to X. px is the projection of X and X on the first camera plane.

several parameters. Obviously, the focal length f has a scaling effect on e. Fig. 7 shows the zero disparity as a function of obstacle height and position of the second camera. Both values are normalized. The magnitude of the zero disparity increases with longer distance between both camera positions. Fig. 8 shows the general characteristic of e with respect to the obstacle height. The zero disparity is positive for all points with 0<x<xc. All visible points are expected to be at least lower than the camera level if it is assumed that the camera looks slightly towards the ground surface. In regard to height estimation a long distance between the two cameras is optimal. However, the distance needs to be small enough to get satisfying results from the stereo correspondence analyzes.

X ' = (0

z xc 1 z c 1 x xc 1 x

1)

(11)

px = Pc1 X = Pc1 X ' Pc1,2 =

(12)

f 0

0 1

xc1,2 f z c1,2

(13)

Now a second photo is taken at a different position. The projection of X and X onto the second camera plane can be described by: p ' x = Pc 2 X = ( f ( x xc 2 ) p ' x ' = Pc 2 X ' p 'x ' = z zc 2 )
T

(14) (15)

Fig. 7 Zero disparity surface

z xc 1 z c 1 x z c 2 ( x c 1 x )

f xc 2 ( x c 1 x )

(16)

The image points px and px are results of two different ways of projection of X. px is the actual projection of X to the second camera while px is the projection assuming that X is belonging to the ground plane. Considering a unified camera height xc=xc1=xc2, the resulting mismatch can be written as:

e( x, z , xc , zc1 , z c 2 , f ) =

(p' ) (p' )
x x

( p' ) ( p' )
x' x'

1 2

(17)

(px)1 denotes the first entry of the vector px.

Fig. 8 Zero disparity curve (18) Fig. 9 shows the quantitative results of the presented approach. Based on two sequential images, the disparity according to ground motion is displayed in the upper right image. The pixel intensity represents the magnitude of the disparity vector. As expected, there is large displacement in the foreground while the pixels in the background are nearly motionless. The true

e(..) =

(x ( z

f x ( x xc )( zc1 zc 2 )
c1

zc 2 ) xc ( z zc 2 ) ) ( z zc 2 )

e is the Euclidean distance between the image points and is called zero disparity and gives information about the elevation of the considered point X. e depends on

1590
Authorized licensed use limited to: University of Missouri Libraries. Downloaded on September 3, 2009 at 15:15 from IEEE Xplore. Restrictions apply.

correspondences are found by stereo matching which is shown in the lower left corner. The zero disparity is the Euclidean distance between stereo matching correspondence and displacement found by ground motion estimation, shown in the lower right image. Obstacles can already be clearly identified.

Fig. 9 Quantitative results of zero-disparity estimation

4. IMPLEMENTATION
According to the introduced concept the system is designed as shown below in Fig. 10.

rectangle in the lower area of the camera image is used to detect these kinds of points. Areas directly in front of the vehicle have often already been classified in previous steps. All or at least the majority of points detected in this area can be expected to be on the ground plane. Outlier can easily be detected by algorithms like RANSAC [10]. Stereo Matching is motivated by finding pixel correspondences in two views. In this approach a customized block matching algorithm is used. The epipolar constraint is utilized to reduce search space in the corresponding image. Indoor environments are often poorly textured and the multi resolution approach is used to overcome this problem and to enhance performance [11]. The final result of the stereo correspondence algorithm is a displacement map for each image pair. The actual pixel correspondence is described by a simple 2D vector. The zero disparity is the Euclidean distance between actual correspondence and the displacement according to ground motion. The obtained map is a representation for the height of each considered point in the scene and is ready to use for region classifying. To enhance performance and to get more stable results the image is segmented beforehand. Zero Disparity of every nth-pixel is calculated to get averaged values for each segment. Furthermore segmentation is able to structure an image reasonably. The current camera perspective is not suitable for navigational operations. Since the robot is just able to move on a planar surface, the situation can be entirely observed from a top view. The two dimensional map can be calculated by inverse projection and the resulting map is true to scale. A few examples in context of different situations are presented in the following.

5. EVALUATION
The system has been tested successfully on a robot platform equipped with a webcam. The camera is calibrated beforehand in order to estimate intrinsic characteristics as well as position and orientation relative to the ground plane. The testing environment is a typical office room with common characteristics and there are no special preparations or simplifications. The ground surface is weakly textured and the system is confronted with different types of carpeting. Surrounding objects have an arbitrary shape and color. Each situation is described by four images which are ordered according to the data processing chain. The current view of the robot is shown in the upper left corner and its segmented version can be seen in the upper right image. The re-projection according to the ground surface is shown in the lower left while the final result is presented in the lower right corner. The three examples demonstrate typical situations. In Fig. 11 the robot is about to pass a door and is confronted with changing floor texture. The proposed system clearly identifies the ground surface behind the door while appearance based approaches would suffer from inhomogeneous ground color. In Fig. 12, the robot is confronted with a small obstacle which appears as a black area in the obstacle map. The last example shown

Fig. 10 Structure of obstacle detection system First appropriate image pairs need to be acquired. The two images should be shot at slightly different positions. Feature point tracking is necessary to estimate the fundamental matrix which is essential for fast stereo matching and to estimate structure of motion in order find a unique relation between corresponding ground points between two views. The feature detection is done by Harris Corner detection [7] and the tracking process is based on the approach developed by Lucas-Kanade [9]. The application of the system is not limited to static environments and moving objects are detected by analyzing the motion variance. Each point which is classified as belonging to a moving object is deleted. The homography describing the corresponding ground points is known except for three parameters which need to be estimated. Theoretically only two point pairs are necessary. In order to compute the transformation matrix some ground points need to be tracked. A small

1591
Authorized licensed use limited to: University of Missouri Libraries. Downloaded on September 3, 2009 at 15:15 from IEEE Xplore. Restrictions apply.

in Fig. 13 demonstrates one of the main advantages of the proposed system. The newspaper on the carpet is silhouetted against the surrounding texture but still be classified as belonging to the planar ground. The examples prove that the system works correctly and manage even difficult situations.

6. CONCLUSION
The proposed system has been shown to be correct and efficient for mentioned applications. Presented results prove that the system is able to construct obstacle maps with satisfying accuracy. In future work we will try to build static maps autonomously by merging local observations and care about estimation of camera trajectory within this environment. Vision based navigation is a major research field and topics like single camera based obstacle detection will be very important in near future.

REFERENCES
[1] G.N Desouza, A.C. Kak, Vision for mobile robot navigation: a survey, Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume 24, Issue 2, pp 237 - 267, 2002. [2] Y.H. Chow, R. Chung, Obstacle avoidance of legged robot without 3D reconstruction of the surroundings, Robotics and Automation ICRA 2000, pp 2316 - 2321 vol.3, 2000. [3] Hui Wang, Kui Yuan,Wei Zou, Yizhun Peng, Real-time obstacle detection with a single camera, Industrial Technology ICIT 2005, pp 92 96, 2005. [4] F. Nashashibi, M. Devy, P Fillatreau, Indoor scene terrain modeling using multiple range images for autonomous mobile robots, Robotics and Automation 1992, pp 40 - 46 vol.1, 1992. [5] L. Iocchi, K. Konolige, and M. Bajracharya. Visually realistic mapping of a planar environment with stereo. In Proc. of the 7th Int.Symp. on Experimental Robotics, 2000. [6] R.I. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, pp 325 329, 2004. [7] Z. Zhu, X. Lin, Real-Time algorithms for obstacle avoidance by using reprojection transformation, IAPR Machine Vision and Application, Tokyo, Japan, pp 393-396, 1990. [8] C. Harris, M. Stephens, A combined corner and edge detector, 4th Alvey Vision Conference, pp 17-151, 1988. [9] B. D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, Imaging understanding workshop 1981, pp 121130, 1981. [10] M. A. Fischler, R.C. Bolles, Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography, Comm. of the ACM 24 1981, pp 381395, 1981. [11] W. Zhang, Q. Zhang, L. Qu, S. Wei, A stereo matching algorithm based on multiresolution and epipolar constraint, Image and Graphics 2004, pp 180 183, 2004.

Fig. 11 First situation: robot is going to pass a door

Fig. 12 Second situation: a small obstacle on the ground

Fig. 13 Third situation: newspaper on the carpet

1592
Authorized licensed use limited to: University of Missouri Libraries. Downloaded on September 3, 2009 at 15:15 from IEEE Xplore. Restrictions apply.

S-ar putea să vă placă și