Documente Academic
Documente Profesional
Documente Cultură
I. INTRODUCTION
Over the last few decades, robotics research has expanded
its topics from industrial robotics that release the human
operator from risky tasks, to the fields of service robotics that
focus more on assisting the human and on the interaction
between human and robot [1]. However, being used by
ordinary people without specialized knowledge, it is
imperative for the robots to be easily operated using a flexible
programming system. Robot programming by demonstration
(robot PbD) thus has become a central topic in the field of
robotics, which means that robot task programming is
simplified by using a vision system to learn from human
demonstration. Artificial neural networks are widely used in
this type of robot programming. M. Stoica et al. developed a
system to train an artificial neural network that can be
incorporated in industrial robot programming by
demonstration [2]. Their work explored how to deal with the
This work was supported in part by the National Chung-Shan Institute of
Science & Technology under Grants NCSIST-803-V101 and the National
Science Council under Grants NSC 102-2218-E-027-016-MY2 and NSC
102-2221-E-027-085.
Hsien-I Lin is with the Grad. Inst. of Automation Technology, Taipei
Tech. University, Taipei, Taiwan. (E-mail: sofin@mail.ntut.edu.tw).
Yi-Yu Chen is with the Grad. Inst. of Automation Technology, Taipei
Tech. University, Taipei, Taiwan. (E-mail: z7172930@yahoo.com.tw).
Yung-Yao Chen is with the Grad. Inst. of Automation Technology, Taipei
Tech. Univ., Taipei, Taiwan. (Correspondence. phone: +886-2-2771-2171;
fax: +886-8773-3217; e-mail: yungyaochen@mail.ntut.edu.tw).
(a)
(b)
(1)
(a)
(b)
(d)
(e)
(c)
(a)
(b)
Figure 2. Overall frameworks. (a) Framework of the robot PbD. In this paper,
we focus on the blocks within green rectangle; (b) framework of the proposed
robot vision system, whose goal is to provide prior knowledge for the robot
before human demonstration.
II. ALGORITHM
As to robot PbD in the topic of pick-and-place task, it is
useful if before human demonstration, the robot can obtain
initial information, e.g., to recognize the target object and to
(f)
Figure 3. Illustration of training data and the test environment. In this paper,
five target objects are selected since they are commonly used in our target
work environments. For each object, the training samples are acquired at every
five-degree of rotation from 5 degree to 360 degree, i.e., = j 5, j = 1,...,72.
(a) Object A with rotational angle = 15; (b) object B with = 15; (c) object
C with; (d) object D with = 240; (e) object E with = 320 ; (f) the test
environment. When testing, the object will be placed within a 20cm 20cm
square area, with an arbitrary rotational angle and an arbitrary displacement
d. Since the position and the angle of the camera are fixed, it is imperative to
deal the image scaling issue due to parameters (,d).
B. Foreground extraction
In addition to the training data set, the background of
is also recorded, which the disc
training data
_
image without any object on it is. In this paper, the target
object is considered as the foreground, and it can be extracted
pixel-by-pixel if the color difference of each channel between
the input image and the background image is too large. The
foreground extraction of training data can be expressed as
1,if I training
[m] I bg
_ training <
B fg _ training [m] =
,
0, othrtwise
(2)
(3)
I x [m] ,
=
I bg [ x ] + 0.1
(a)
(b)
Figure 4. Shadow detection results in the training data case. (a) Target objects
A E (from left to right), all with rotational angle= 10 ; (b) correspondding results of shadow detection, where shadow pixels are indicated as pink.
(5)
+
where now for ,,,,,
, the 1 means a background pixel. The
result of connected component analysis in step one is used
+
again. In ,,,,,
, only the connected region that has the most
pixels is kept, and thus the new binary image can be
expressed by
'
1,if B fg [m] =
B''fg [m] =
connected region .
0,
otherwise
(6)
(4)
++
Taking the complement of ,,,,,
the final foreground ++ is
obtained, as shown in Fig. 5(b). The function
denotes the
color version of the final foreground, where the RGB color
values are retrieved in pixel locations as ++ [-] = 1, as
shown in Fig. 5(c).
(a)
(b)
(c)
(a)
(b)
(c)
Figure 6. The proposed feature performance evaluation GUI that can be used
before constructing a predictive model.(a) the user interface, where the user
can select the feature types and visually evaluate the performance of the
selected features by looking into the feature sub-space. The bottom-left
figure of GUI represents the distribution of all training samples. Five target
objects are tested, whose training samples are indicated by different colors.
For each object, there are 72 samples due to different rotational angles;
(b) the distribution of the 1D (size) feature sub-space; and (c) the distribution
of the 2D (1 1 color space) feature sub-space. As shown in (b) and (c), the
distributions among different objects are far from each other, which imply
these selected features are appropriate for model construction.
E. Pose estimation
When testing, the target object is allowed to be arbitrarily
placed within a square area. Since the position and the angle
of the camera are fixed, the problem of image scaling needs to
be considered before pose estimation. In this paper, we
proposed a simple scale-invariant pose estimation method
that is based on image resizing by bilinear interpolation and
takes the coverage ratio of the binary foreground into account.
For each binary foreground of the training image set, the
smallest bounding rectangle is calculated, as shown in Fig. 7.
Given an input test image, we use an exhaustive search
algorithm to search for the most suitable pose estimation.
That is, the test image is compared with all training images
one after another by (1) extracting the binary foreground
(a)
(b)
(c)
(d)
Cx =
[ m ,n ]D
[m,n]D
,Cy =
[ m ,n ]D
(7)
[ m ,n ]D
=
,
area(B''fg _ test )
(8)
7< , 7= , 7> , object size} are used for training, where the first
six value represent the standard deviation value of
individual channels in the Y1 1 and HSV color channels,
respectively. For testing our method, in this paper we
have created a software application in Matlab version 7.8
where the built-in neural network toolbox is used. The
network type is feed-forward back-propagation, the method
of training used is the Levenberg-Marquardt method with
momentum option, and the performance function used is the
mean square error (MSE). The neural network proposed has
one hidden layers which has 15 neurons. For the hidden layer
we used the hyperbolic tangent sigmoid activation function
and for the other one we used the linear transfer activation
function. For object recognition, since these 5 target objects
have quite different shapes and colors, we would expect a
very high accuracy rate; and indeed, the average accuracy rate
is 99%. Only one image (object A) is recognized wrong.
EXPERIMENTAL RESULTS
(9)
CONCLUSION