Sunteți pe pagina 1din 8

Approaches to Robotic Vision Control Using Image Pointing Recognition Techniques

Tian-Ding Chen
Institute of Information and Electronics Engineering, Zhejiang Gongshang University, Hangzhou 310018 chentianding@163.com

Abstract. Intelligent robot human-machine interactive technology will be incorporated into our daily lives and industrial production. This paper presents an autonomous mobile robot control system for human-machine interaction, The use of computer vision, 3D reconstruction of the two-dimensional image information of a specific coordinate point to point system. The use of a known point to the appearance characteristics of objects, by vision recognition algorithm, the original color image data for target screening and recognition. Allows users to easily through simple body movements, issuing commands to the robot. And by support vector machine(SVG) to classify non-linear non-separable type of data, accept user input and recognition actions to improve the robot vision system for target identification accuracy, and thus to achieve the goal of human-computer interaction. Keywords: Human-machine interaction, Computer vision, SVG, Robot control.

1 Introduction
In recent years, many studies to explore how to use a number of actions on the human body to communicate with the computer. These actions include facial expressions, body movements and pointing recognition (Gesture). One point recognition is a very natural and common way of communication, it has become an important research areas and hot spots. The study in the Human-Computer Interaction (Human-Computer Interaction) field occupies a very important position. Recently, many with the pointing recognition related research works, most of the medical achievements, such as sign language recognition, care systems, which aims to make it clear that the needs of patients. Some scholars combine pointing recognition in human and robot interaction, in order to sign the order to the robot, the robot can be performed to know what kind of gesture action [1], to achieve the purpose of interaction, but relatively simple interactive object . Literature [2] used Adaboost algorithm to find the hand characteristics as a basis for recognition, and then compared with the previous samples are taken right to complete the identification. Similarly, the study of use the sample compared also includes the literature [3], by the morphology of the ways to find the outline of the hand, and then the localized contour sequence compared samples of the methods to distinguish the different gestures, literature [2] [3]
Z. Zeng & J. Wang (Eds.): Adv. in Neural Network Research & Appli., LNEE 67, pp. 321328. Springer-Verlag Berlin Heidelberg 2010 springerlink.com

322

T.-D. Chen

disadvantage is that large learning training load. The literature [4] used moment invariants calculation the characteristics of gestures as neural network input vector, training a large number of samples to improve the recognition rate, but the large amount of data processing, real-time poor. In addition, a single camera is different from general research to identify the literature [5] use of two cameras able to complete the 3D gesture recognition, so that the recognition process is no longer limited to flat. Literature [6] [7] are detected by laser scanning the finger number of hand gestures and determine its gestures, their difference is [6] can only be right to judge nonarm gestures, while [7] can be identified with the arm gestures, but being unable to remove the arm information, if the arm image too much, it will affect the recognition rate. Literature [8] [9] in chronological determine the relative position of hands and face to determine what kind of dynamic hand gestures, the disadvantage is susceptible to false of justice resulting from the shadowing effects.

2 Object Tracking and Feature Extraction


This paper develops an efficient real-time pointing region tracking and extraction method to track the moving pointing object and extract the feature. In order to getting this goal, the system uses two processes. The first process extract the stationary pointing object shape, whereas, the second process proposed multi-threshold and threshold of self-ajust mechanisms in place used a single threshold value of color screening in the past. So, a smart tracking theorem on the single camera to reduce the processing steps by using epipolar geometry. The environment of work space is complex because there are many other objects with different colors. In the beginning, there are many colors in the image, and the computer is hard to understand the concepts of each color. Therefore, some filters are used to let the image be transferred from RGB to gray, and then binary images. In order to assist the computer analysis, system should transfer the color image to binary image. In order to get the binary value, this step is to mapping the RGB color space to YIQ color space to get gray image. In YIQ color model, the value Y is more important than I and Q in this gray step. The transformation from the RGB color space to the YIQ color space is:

Y = 0.299 R + 0.587G + 0.114 B

(1)

After this step, a gray image is obtained. In order to separate objects from the whole image, the information of the gray level should be obtained. Therefore, the histogram is quoted to calculate the number of each gray level. Then a threshold is chosen to filter to background and the shape of object is obtained. This system can get the binary image by this step. Through the histogram and binary step, the image preprocessing would be finish in order to use other image filter to analyze this system. Then, the color image would be transfer red to gray by (1). And finally, through the threshold value, the binary image can be obtained.

Approaches to Robotic Vision Control Using Image Pointing Recognition Techniques

323

2.1 Extraction of Hand Image Extracted image format are RGB color space, image size is 320 240 pixels. Each image pixel contains three bytes, each byte representing the red (R), green (G), blue (B), each color intensity ranges from 0 ~ 255, R , G, B values were 0 for black, R, G, B values are for the 255 is white. In gesture recognition, how to find out hand from the background is very important. The use of skin color to distinguish between hand and background, often because of light or skin color differences of individual users, resulting in the extracted image is not satisfactory, the results of the extracted hand image to judge good or bad a great impact on the system. The system as background subtraction to find the input image and background differences, with CamShift algorithm to remove the body or head, and other area to extract the hand image. 2.2 Image Processing and Wrist to Remove The input image and the background has been established compared to determine the output of a binary image. Background separation method by section 2.1, the threshold T set to 60. If the input image I(x, y) and the background image B(x, y) subtract the result is less than the threshold T, is that the background. The output image O(x, y) pixel set to 0 (dark point), otherwise the image will be the output O(x, y) pixel set to 255 (bright spot). Removal of the background, the images still have a lot of noise, and image of the main itself, the existence of broken or slit, through morphological closing operation to eliminate the image of the main break or slit, through the disconnect operation to eliminate the image noise in the smaller point and make the image more smooth. The input image after background subtraction and noise removal, the obtained binary image with CamShift hand tracking, may be that the image contains an oval-shaped hand tracking results, in accordance with the scope of follow-up results will be applied to an oval-shaped binary image , the labeling algorithm with the largest block out as a true hand shape. Camshift available from the tracking object tilt angle, use of tilt angle normalization object to identify the length of the image of the most wide w, and the most wide-length center coordinates (Cy, Cx), The length of the
most wide-coordinates of the center coordinates as a reference to formula (2) to remove wrist, The main purpose is to remove the wrist to keep only part of the palm and fingers, wrist or arm to reduce the impact of location on the fingers.

I ( x, y ) = 0

if

y > Cy + w/ 2

(2)

I(x, y): palm image; y: palm pixel y-coordinate; w: palm most wide-length; Cy: palm most wide-length center y coordinates. 2.3 Finger Search The hand image to the border search cycle of the sequence obtained coordinates of a set of hand contour {P0 , P , Pn } ,P0 is outline of the lower left hand pixel coordi1 nates, Clockwise direction along the contours of the split into k segment, each segment component by x pixels. The contours of the new set of coordinates {P0 , Px , P2 x Pn } . Calculate the angle of the horizontal direction of each segment.

324

T.-D. Chen

i = tan 1 (

P( i +1) x ( y ) Pix ( y ) P(i +1) x ( x) Pix ( x)

i = 0,

k 1

(3)

Availability of k segment of their point of view, the following steps to determine the fingers: (1). set i=0, wise

end

end start

start = i ; (2). determine whether i is equal to k-1, yes the end, other= i . Comparison of the angle difference between end and start , if
point of difference is greater than 180o, expressed through the fingers,

turning in a clockwise direction, judged as a finger; (3). reset start to (2). 2.4 Epipolar Constraint Model

= i , continue go

In stereo vision, in order to be able to solve the image correspond to the issues, one established a number of constraints to reduce errors in matching corresponding points, such as the epipolar constraint, consistency constraints and uniqueness constraints and so on. These obtain an accurate match for the final results provide a strong guarantee. Here, the epipolar constraint matching algorithm to achieve dimensionality reduction processing reduces the data search space, as shown in Fig. 1.

Fig. 1. Epipolar geometry model

3 Support Vector Machine Classifier


Support vector machine (SVM) was proposed by Vapnik in 1995 [10], not until 1998 did SVM became famous for its outstanding accuracy and performance on pattern

Approaches to Robotic Vision Control Using Image Pointing Recognition Techniques

325

recognition. Henceforward, SVM has been acknowledged a potential and powerful tool for classification problem and applied to different kinds of domain. This paper will use the SVM binary image of the palm for classification, the six kinds of gestures (0 to 5) through different angles and size of the record over and use SVM to find the Optimal Separation Hyperplane(OSH), re-use of this OSH on the test data set classification. Definition: Primal Optimization Problem Set objective function as f ( ) , inequation and equation constraint condition as

g i ( ) , i =1,,k and h j ( ) , j =1,,m, function is defined in n , Its primal


optimization problem can be written as (4)

Minimize subject
All satisfy (2)

f ( ) to


i = 1, k j = 1, m

g i ( ) 0 h j ( ) 0

(4)

formed by a set is called the feasible region, while the optimal

solution is the global minimum, it meets f ( ) f ( ) . In recognition field, if appropriate feature extraction of data to identify a good degree, you can through the characteristic value of the distance between the space as a basis for classification.Therefore, suppose that for each n characteristic values, the data description of the use of n-dimensional space coordinates, you can find an optimal separation plane of the data do distinguish, while the best definition lies in the separation plane and the training data point distance (D) the largest. Shown in Fig. 2, Fig. 2 (a) and (b) are for the data point and the separation plane (black solid line) of the diagram, but Figure (b), the separation distance from the plane and the data points as the greatest, so for unknown data, classification of two categories are to maintain the largest boundaries(margin), so have a higher success rate of classification.

(a)

(b)

Fig. 2. Data points and the separation of the plane diagram

4 Experiment and Valuation


Experimental platform shown in Fig. 3, this paper design of human-machine interaction robot based on computer controlled. In the visual part of the robot using monocular

326

T.-D. Chen

vision of the CCD camera. Robot system camera capture user's gestures images carry to computer, after the visual recognition system to handle, got the category of gesture. A total of six kinds of gesture form, shown in Fig. 4, these gestures correspond to the robot movement pattern are: (1) in situ circular motion, (2) straight, (3) back, (4) stop, (5) turn left, (6) turn right. Table 1 is an interactive recognition accuracy, Fig. 5 is based on the results of gesture recognition guiding.

Fig. 3. Experimental robot platform

(1) circular motion in same place

(2) go straight

(3) back up

(4) stop

(5) turn left


Fig. 4. Total kinds of gesture category Table 1. Interactive recognition result

(6) turn right

gesture category (1) (2) (3) (4) (5) (6)

recognition accuracy 0.901 0.912 0.905 0.930 0.906 0.906

Approaches to Robotic Vision Control Using Image Pointing Recognition Techniques

327

Fig. 5. Based gestures guiding result

5 Conclusions
In this paper, we develop and put into practice a set of real-time computer vision human-machine interactive system. Tracking point objects, we use to the object color, shape and motion from image analysis of such features to the object. Focus of the study is to use the concept of SVM allows users to easily communicate with the computer to reach a friendly human-machine interface program, and develop a pointbased object tracking and reconstruction of physical characteristics, it can be effective in detecting the user's behavior. Experimental design a set of actions based on replaceable module control components to complete the system scalability. In the robot's human-computer interaction mechanisms, the current system to identify scope of the directive is only limited to six kinds of gestures, hope that the future can increase the recognition type, the application of the visual recognition system more widely.

References
1. Tseng, K.T., Huang, W.F., Wu, C.H.: Vision-based Finger Guessing Game in Human Machine Interaction. In: IEEE International Conference on Robotics and Biomimetics, pp. 619624. IEEE Press, Kunming (2006) 2. Wagner, S., Alefs, B., Picus, C.: Framework for a Portable Gesture Interface. In: 7th International Conference on Automatic Face and Gesture Recognition, pp. 275280. IEEE Press, Southampton (2006) 3. Gupta, L., Suwei, M.: Gesture-based Interaction and Communication: Automated Classification of Hand Gesture Contours. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 31, 114120 (2001) 4. Premaratne, P., Nguyen, Q.: Consumer Electronics Control System Based on Hand Gesture Moment Invariants. IET on Computer Vision 1, 3541 (2007) 5. Abe, K., Saito, H., Ozawa, S.: Virtual 3-D Interface System via Hand Motion Recognition From Two Cameras. IEEE Transactions on Systems, Man and Cybernetics, Part A 32, 536540 (2002) 6. Holden, E.J., Owens, R.: Recognizing Moving Hand Shapes. In: 12th International Conference on Image Analysis and Processing, pp. 1419. IEEE Press, Halmstad (2003) 7. Yin, X., Zhu, X.: Hand Posture Recognition in Gesture-Based Human-Robot Interaction. In: 1st IEEE Conference on Industrial Electronics and Applications, pp. 16. IEEE Press, Singapore (2006)

328

T.-D. Chen

8. Kim, K.K., Kwak, K.C., Chi, S.Y.: Gesture Analysis for Human-Robot Interaction. In: 8th IEEE Conference on Advanced Communication Technology, pp. 18241827. IEEE Press, Korea (2006) 9. Wu, A., Shah, M., Da Vitoria Lobo, N.: A virtual 3D Blackboard: 3D Finger Tracking Using a Single Camera. In: 4th IEEE Conference on Automatic Face and Gesture Recognition, pp. 536543. IEEE Press, Grenoble (2000) 10. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

S-ar putea să vă placă și