Sunteți pe pagina 1din 6

Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging

Wai Ho Li and Lindsay Kleeman


Abstract This paper details a robust and accurate segmentation method for near-symmetric objects placed on a table of known geometry. Here we dene visual segmentation as the problem of isolating all potions of an image that belongs to a physically coherent object. The term Near-Symmetric is used as our method can segment objects with some nonsymmetric parts, such as a coffee mug and its handle. Using bilateral symmetry this problem is solved autonomously and robustly through the aid of physical action provided by a robot manipulator. Our proposed approach does not require prior models of target objects and assumes no previously collected background statistics. Instead, our approach relies on a precise robotic nudge to generate the necessary object motion to perform segmentation. Experiments performed on ten objects show that our model-free approach can autonomously and accurately segment a variety of objects. These experiments also indicate that our segmentation approach is not adversely affected when operating in cluttered scenes and can segment multi-coloured and transparent objects in a robust manner.

This work is intended for use in domestic robotics applications as there are many objects with symmetry in most households. However, the sensing parts of the process, namely locating points of interest using symmetry triangulation and object segmentation by folded frame difference, are applicable to other robotic tasks. The overall aim is to provide robots with general methods of dealing with common household objects such as cups, bottles and cans, without the burden of mandatory ofine training for every new object. As our approach assumes nothing about the appearance of the robot manipulator, the actuation of target objects can be provided by any manipulator capable of performing a robotic nudge as described in Section III, including a human hand. B. Contributions Segmentation using robotic action has been explored in the past, most recently by Fitzpatrick et al [3], [4]. Their approach uses a poking action, which sweeps the end effector across the workspace. The presence of an object is detected when visual motion increases due to contact with the moving effector. Their segmentation method use frames just before and after this point of contact. No planning is performed prior to robotic action. Assuming the target object is not deformed by the poking action, objects of any shape can be segmented. The main contributions of our work are as follows. Firstly, by limiting our scope to near-symmetric objects, locations of interest are found prior to the application of robotic action. This is achieved by clustering the intersections between stereo triangulated symmetry axes and a table plane. By avoiding dense stereo approaches, we can also localized transparent objects with bilateral symmetry. Details of our stereo triangulation approach, including a comparison of results against dense stereo, can be found in [5]. Limited by the use of elastic acutators in their manipulator, the approach of Fitzpatrick et al uses applies an imprecise poking action to objects. In contrast, our method uses a short, accurate robotic nudge, applied only to locations of interest. In experiments, we show that our method does not tip over tall objects such as empty bottles and does not damage fragile objects such as ceramic mugs. This level of gentleness in object manipulation is not demonstrated in the work of Fitzpatrick et al. While neither method address the problem of end effector obstacle avoidance, the small workspace footprint of the robotic nudge should make path planning easier. Finally, while appearing similar at a glance, our approach to visual segmentation is very different to that of Fitz-

I. INTRODUCTION A. Motivation Object segmentation is an important sensory process for robots using vision. It allows a robot to build accurate internal models of its surroundings by isolating regions of images that correspond to objects in the real world. Multiscale computer vision object recognition methods, such as SIFT [1] and Haar boosted cascades [2] can imbue a robot with the ability to robustly detect and classify modeled objects. However, training such schemes to recognize objects require many hand-labeled and well segmented images of positive and negative examples. Precious human resources are required to obtain this kind of training data. For very large object sets, the amount of time and effort required can be prohibitive. The autonomous process described in this paper attempts to address this problem by obtaining accurate object segmentations robustly without the need for human aid or intervention. Another motivating factor is to provide a segmentation process that is highly autonomous. By limiting target objects to those with bilateral symmetry, a model-free approach can be applied, which allows us to abandon the a priori assumptions and ofine training demanded by other segmentation approaches. For example, our method can operate on transparent objects as we do not assume any temporal constancy or colour uniformity in an objects appearance.
Wai Ho Li and Lindsay Kleeman are with the Department of Electrical and Computer Systems Engineering, Monash University, Clayton Campus, Melbourne, Australia waiholi@gmail.com,

Lindsay.Kleeman@eng.monash.edu.au

patrick et al. Their approach uses video frames during robotic action, around the time of contact between the end effector and object. Due to their motion-based initiation, bad frame timing with respect to the time of contact can produce poor segmentations. This is highlighted in Figure 11 of [4], which shows that their end effector can be included in the segmentation results. This problem never occurs in our approach as we use video frames that are temporally further apart, captured before and after robot action. Also, nearempty segmentations can be returned by their approach. Our approach will only perform segmentation if object motion is detected during the nudge and the subsequent stereo tracking remains convergent. The satisfaction of these conditions prevents poor segmentations due to insufcient or unexpected object motion. C. System Overview

Fig. 2.

Autonomous Segmentation Flowchart

tracking converges, the object is segmented using the method described in Section IV. Bilateral symmetry is used as the primary visual feature throughout all stages of the process. Our Fast Bilateral Symmetry Detection [6] scheme, herein referred to simply as symmetry detection, is used to locate lines of symmetry within input images. The noise robustness of our detection method is crucial when performing segmentation in visually cluttered environments. II. FINDING INTERESTING LOCATIONS
Fig. 1. Robot System Components

A. System Calibration This section details the methods used to calibrate our robotic system. Firstly, the stereo cameras are calibrated using the MATLAB camera calibration toolbox [7]. The intrinsic parameters of each camera in the stereo pair are obtained individually. This is followed by a calibration to obtain the extrinsics of the stereo system. After this, the camera system can be used to triangulate locations in 3D space. The geometry of the table is found by tting a plane to the checkerboard corners, the locations of which are found using stereo triangulation. Prior to calibration, a grid of points is drawn on the table using the robot manipulator with a special pen attachment. Using this grid, the corners of the calibration is placed at a known location in the manipulators coordinate frame. The same corners are triangulated using the stereo cameras to nd their coordinates relative to the camera frame of reference. The arm-camera calibration is performed by solving the Absolute Orientation problem, nding the transformation to map the corner points from one frame of reference to another. We use the PCA-based solution proposed by [8].

The components of our robot system are shown in Figure 1. The stereo cameras consists of two Videre Design 1394 CMOS cameras verged together at around 15 degrees from parallel. These cameras capture 640x480 images at 25Hz during nearly all parts of the segmentation process, except for high resolution 1280x960 snapshots of the scene taken before and after the robotic nudge. The PUMA 260 robot arm has six degrees of freedom. The calibration grid is used to perform camera-arm calibration and to estimate the geometry of the table plane. Details of both are described in Section IIA. Our autonomous segmentation process is summarized in Figure 2. The robot begins by surveying the scene for interesting locations to explore. The details of this process are described in Section II. Once an interesting location has been found, the robot manipulator nudges the target location. If motion is detected during the nudge, stereo tracking is initiated to keep track of the moving object. Section III describes the robotic nudge and stereo tracking. If

B. Clustering Symmetry Intersects Symmetry lines are detected in the left and right video frames to provide data to a clustering algorithm. All possible pairings of symmetry lines between the left and right images are triangulated to form 3D axes of symmetry using the method described in our previous paper [5]. In our experiments, three symmetry lines are detected for each image, resulting in a maximum of nine triangulated axes of symmetry. Symmetry axes that lie outside the robot manipulators workspace are left out of the clustering data. Axes that are more than 10 degrees from being perpendicular to the table plane are also rejected. The intersections between valid symmetry axes and the table plane are collected over 25 pairs of video frames and recorded as 2D locations on the table plane. This collection of locations are grouped into clusters using a modied QT algorithm [9]. The QT clustering algorithm does not require any prior knowledge of the number of actual clusters. This is important as we are not making any assumptions concerning the number of objects on the table. The QT algorithm also provides a way to limit the diameter of clusters, reducing the likelihood of clusters that include symmetry lines from multiple objects. The orignal QT algorithm was modied with the addition of a cluster quality threshold. The quality threshold is used to ignore clusters formed by symmetry axes that occur in less than half of all collected frames. The geometric centroids of the clusters provide the robot with a list of interesting locations to explore. A nudge is performed on the valid location closest to the camera. A location is deemed invalid if the robot gripper will collide with other locations of interest during a nudge. III. OBJECT MANIPULATION: THE ROBOTIC NUDGE A. Motion Control

Fig. 4.

Top-Down view of Robotic Nudge

applying force to the bottom of objects, nudged objects are less likely to tip over. An example nudge captured by the right camera is shown in Figure 5.

Fig. 5. Consecutive video frames from Right camera during a nudge. The frames are taken from the P 1-P 2-P 1 portion of the nudge motion

Fig. 3.

Side view of Robotic Nudge

The motion of the robot gripper during a nudge is shown in Figures 3 and 4. The L-shaped protrusion is made of sponge to provide damping during contact, which is especially important when nudging brittle objects such as ceramic cups. The L-shaped sponge also allows the application of pushing force at a height which is very close to the table plane. By

The nudge begins by lowering the gripper from P 0 to P 1. The height of the gripper at location P 0 is well above the height of tallest expected object. Dmax is set to ensure that the L-shaped sponge will not hit the largest expected object during its descent. After arriving at P 1, the gripper travels towards P 2. Dmin is selected such that the gripper will make contact with the smallest expected object before arriving at P 2. The nudge motion ensures that the gripper never visually crosses the objects symmetry line when viewed from the right camera. The gripper then retreats back through P 1 to P 0. In early tests, the gripper was moved directly from P 2 back to P 0. This knocked over tapered objects such as the blue cup in Figure 8 due to friction between the soft sponge and the objects outer surface. In the overhead view of Figure 4, the nudge vector is perpendicular to the line formed between the focal point of the right camera and the target objects symmetry line, assuming one is present at the location to explore. This choice of motion will nudge the object horizontally across the cameras image. This reduces the scale change of the target object and also lowers the probability of glancing contact, improving the quality of segmentation. After a location of interest has been found, P 0, P 1 and P 2 are determined based on the cameras location. Using inverse kinematics, linearly-interpolated encoder values are generated at run time to move the gripper smoothly between these three points.

B. Obtaining Visual Feedback by Stereo Tracking When the gripper begins its descent at P 0, the right camera image is monitored for motion. Motion detection is performed at a coarse resolution using 8x8 pixel cells. Cells with two times the motion of the global average are labeled as moving. This block motion algorithm is the same as the one used in our symmetry tracking paper [10]. To prevent ego motion of the robot manipulator from being interpreted as object motion, the objects symmetry line is used as a visual barrier. As the robot gripper never crosses the symmetry line, motion detection is only performed on green region in Figure 3. Once motion has been detected, the robot begins stereo tracking on the target objects symmetry line. A Kalman lter is used to track the polar parameters of the target symmetry line. The tracking system is identical to the one described in our previous work on real time monocular symmetry tracking [10]. The monocular tracker is replicated twice to perform stereo tracking. Visual segmentation will only take place if tracking converges to a symmetry axes roughly perpendicular to the table plane. This prevents poor segmentation caused by insufcient object motion. Videos of the robotic nudge and stereo tracking can be downloaded from:
www.ecse.monash.edu.au/centres/irrc/li_iro08.php

IV. OBJECT SEGMENTATION A. Object Segmentation by Folded Frame Difference

objects symmetry lines before and after the nudge, found using our symmetry detector. Note that thresholding the raw frame difference will produce a mask that includes many background pixels. The mask will also have a large gap at the center of low-texture objects, such as the clear cup in the example. Using the objects symmetry lines, we can overcome these problems. Figure 6(d) shows the folded frame difference of the object. This image is produced by removing the frame difference pixels between the two symmetry lines. This process folds the frame difference image together as if it is printed on a piece of paper, pressing the creases at the symmetry lines together. Changes in the orientation of the objects symmetry lines before and after the nudge are removed prior to folding. This folding process removes the excess area of the motion mask autonomously and reduces the size of the motion gap at the center of the moved objects frame difference. After folding, a small gap still remains in the frame difference. This can be seen in Figure 6(d) as a dark vertical section inside the cup-like shape. To remedy this, we again exploit object symmetry to our advantage. Recall that the folding step merges the symmetry lines of the object in the before and after frames. Using this newly merged symmetry line as a mirror, we search for motion on either side of it. A pixel is considered moving if its frame difference value is above a threshold. The folded difference image is rotated so that the merged symmetry line is vertical. The widest pair of moving pixels bisected by the objects symmetry line are recorded for each row of the image. This produces a symmetric contour of the object. By lling the interior of this contour, we produce the image in Figure 6(e). Note that this lling approach retains the non-symmetric parts of objects. The nal segmentation result in Figure 6(f) is obtained by thresholding the symmetry lled difference image. V. SEGMENTATION EXPERIMENT RESULTS

(a) Before Nudge

(b) After Nudge

(c) Frame Difference

(d) Folded Difference

(e) Symmetry Filled

(f) Segmentation Result

Fig. 6. Segmentation by Folded Frame Difference. Note that the Folded Difference and Symmetry Filled images are rotated such that the objects symmetry line is vertical

Segmentation experiments were carried out on ten objects of different size, shape, texture and colour. Transparent, multi-coloured and partially symmetric objects are also included. Objects are set against different backgrounds, ranging from plain to cluttered. All segmentation results are obtained autonomously by our robot without any human aid. Objects in our scenes casts many shadows due to four bright uorescent ceiling light sources illuminating the table. For safety reasons, a ashing warning beacon is active during robot motion, periodically casting red light on the table when the robot manipulator is powered. Due to space constraints, some segmentation results have been left out. They can be found at:
www.ecse.monash.edu.au/centres/irrc/li_iro08.php

Segmentation is performed using the object motion generated by the robotic nudge. Figure 6 illustrates the major steps of segmentation. Figure 6(a) and Figure 6(b) are images taken by the right camera before and after the nudge. The absolute frame difference between the before and after images is shown in Figure 6(c). The green lines are the

A. Cups without Handles The white cup in Figure 7 poses a challenge to our segmentation process not because of its imperfect symmetry, but because of its shape. Due to its narrow stem-like bottom half, the nudge produces very small shifts in the objects

Fig. 10. Fig. 7. Partially Symmetric White Cup

White Mug

Fig. 11.

Multi-coloured Mug

Fig. 8.

Blue Cup

location, creating a narrow and weak contour of pixels in the frame difference. As seen from the resulting segmentation, our algorithm is able to handle this kind of object. Figure 8 shows detection results for a symmetric cup against background clutter. Lastly, Figure 9 illustrates the robustness and accuracy of our segmentation process. The robot was able to autonomously obtain a very clean segmentation of a transparent cup against background clutter. B. Mugs with Handles The mugs in Figures 10 and 11 tests the robustness of our segmentation approach for objects with non-symmetric parts. The handles of both mugs are successfully included in the segmentation results. The multi-coloured mug in Figure 11 was chosen for additional reasons. Firstly, it is a multi-coloured object with intensities similar to background

shadows. This is the reason why the segmentation results is quite noisy around the bottom of the mug. Secondly, it is a brittle object made of ceramic. Its successful manipulation provides evidence of the gentle nature of our robotic nudge. C. Drink Bottles

Fig. 12.

Small Water-Filled Bottle

Fig. 9.

Transparent Cup in Clutter

The water-lled bottle in Figure 12 is used to test the strength and accuracy of the robotic nudge. Due to its small size and weight, the nudge must be accurate and rm to produce enough object motion for segmentation. The segmentation result shows that the nudge can actuate small and dense objects. The remaining test objects are empty plastic drink bottles. They are lightweight and have high centers of gravity, very easy to tip over. During the nudge, their symmetry lines tend to wobble, which provides noisy measurements to the symmetry trackers. As such, these objects test the robustness of stereo tracking and the robotic nudge. Figure 13 shows a successful segmentation of a textured bottle against a plain background. Figure 14 is a similar experiment repeated

Fig. 13.

Textured Bottle

Fig. 15.

Transparent Bottle

Fig. 14.

Textured Bottle in Clutter

VII. ACKNOWLEDGMENTS Thanks go to Steve Armstrong for his help with repairing the PUMA 260 manipulator and the anonymous reviewers for their insightful comments. R EFERENCES
[1] D. G. Lowe, Distinctive image features from scale-invariant keypoints, IJCV, vol. 60, no. 2, pp. 91110, November 2004. [2] P. Viola and M. J. Jones, Rapid object detection using a boosted cascade of simple features, in IEEE CVPR, 2001. [3] P. Fitzpatrick, First contact: an active vision approach to segmentation, in Proceedings of Intelligent Robots and Systems, 2003. (IROS 2003), vol. 3. IEEE, October 2003, pp. 21612166. [4] P. Fitzpatrick and G. Metta, Grounding vision through experimental manipulation, in Philosophical Transactions of the Royal Society: Mathematical, Physical, and Engineering Sciences, 2003, pp. 2165 2185. [5] W. H. Li and L. Kleeman, Fast stereo triangulation using symmetry, in Australasian Conference on Robotics and Automation, 2006. [6] W. H. Li, A. M. Zhang, and L. Kleeman, Fast global reectional symmetry detection for robotic grasping and visual tracking, in Australasian Conference on Robotics and Automation, 2005. [7] J.-Y. Bouguet, Camera calibration toolbox for matlab, Online, July 2006, http://www.vision.caltech.edu/bouguetj/calib doc/. [8] T. S. H. K. S. Arun and S. D. Blostein, Least-squares tting of two 3-d point sets, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 9, pp. 698700, 1987. [9] S. K. Laurie J. Heyer and S. Yooseph, Exploring expression data: Identication and analysis of coexpressed genes, Genome Research, vol. 9, pp. 11061115, 1999. [10] W. H. Li and L. Kleeman, Real time object tracking using reectional symmetry and motion, in IEEE/RSJ Conference on Intelligent Robots and Systems, 2006, pp. 27982803.

against background clutter. Finally, Figure 15 contains two segmentation results for a transparent bottle. Note the accurate segmentation obtained for the transparent bottle, which produces a very weak motion signature when nudged. VI. CONCLUSION Our segmentation approach performs robustly and accurately on near-symmetric objects in cluttered environments. By using the robotic nudge, the entire segmentation process is carried out autonomously. Multi-coloured and transparent objects, as well as objects with non-symmetric parts, are handled in a robust manner. We have shown that our approach can segment objects of varying visual appearance autonomously, shifting the burden of training data collection from the user to the robot. End effector obstacle avoidance and path planning, especially in situations where non-symmetric objects are present in the nudge path, are left to future work. As our symmetry detection method uses edge pixels as input, our segmentation approach is visually orthogonal to those that use pixel information, such as colour and image gradient. In situations where the target object is non-symmetric, approaches relying on other features can be applied synergetically. Our objection to stereo optical ow and graph cuts is their reliance on object surface information, which is completely unreliable for transparent and reective objects. However, if the opaqueness of an object has been conrmed, these approaches can be used with our robotic nudge. As the geometry of our table plane is known, a stereo approach to segmentation can further improve segmentation by removing the object shadow which is present in some of the results.

S-ar putea să vă placă și