Documente Academic
Documente Profesional
Documente Cultură
I.INTRODUCTION
A.
The sign language is the fundamental communication method between the people who suffer from hearing defects. In order for an
ordinary person to communicate with deaf people, a translator is
usually needed the sign language into natural language and vice
versa. International Journal of Language and Communication Disorders, 2005) Sign language can be considered as a collection of gestures, movements, posters, and facial expressions corresponding to
letters and words in natural languages.
American Sign Language (ASL) National Institute on Deafness
&
Other communication Disorders, 2005) is a complete language that
employs signs made with the hands and other facial expressions and
postures of the body. According to the research by Ted Camp found
on the Web site www.silentworldministries.org, ASL is the fourth
most used language in the United States only behind English, Spanish and Italian (Comp). ASL is a visual language meaning it is not
expressed through sound but rather through combining hand shapes
through movement of hands, arms and facial expressions. Facial
37
www.iaetsd.in
.
B. Related Work
Attempts to automatically recognize sign language began to appear
in the 90s. Research on hand gestures can be classified into two cate
Gories. First category relies on electromechanical devices that are
used to measure the different gesture parameters such as hands position, angle, and the location of the fingertips. Systems that use such
devices are called glove-based systems. A major problem with such
systems is that they force the signer to wear cumbersome and inconvenient devices. Hence the way by which the user interacts with the
system will be complicated and less natural. The second category
uses machine vision and image processing techniques to create visual based hand gesture recognition systems. Visual based gesture
recognition systems are further divided into two categories: The first
one relies on using specially designed gloves with visual markers
called visual-based gesture with glove-markers (VBGwGM) that
help in determining hand postures. But using gloves and markers do
not provide the naturalness required in humancomputer interaction
systems. Besides, if colored gloves are used, the processing complexity is increased. The second one that is an alternative to the second kind of visual based gesture recognition systems can be called
pure visual-based gesture (PVBG) means visual-based gesture
without glove-markers. And this type tries to achieve the ultimate
convenience naturalness by using images of bare hands to recognize
1. Learning algorithms.
a. Neural network (e.g. research work of Banarse, 1993).
b. Hidden Markov Models (e.g. research work of Charniak, 1993).
c. Instance-based learning(research work of Kadous,1995)
2. Miscellaneous techniques.
a. The linguistic approach (e.g. research work of Hand, Sexton, and
mullan, 1994)
b. Appearance-based motion analysis (e.g. research Work of Davis
and Shah, 1993).
c. Spatio-temporal vector analysis (e.g. research Work of Wuek,
1994
a. Template matching (e.g. research work Darrell and Pentland,
1993)
b. Feature extraction and analysis, (e.g. research work of Rbine,
1991)
c. Active shape models Smart snakes (e.g. research work of Heap
and Samaria, 1995)
d. Principal component analysis (e.g. research work of Birk, Moeslund and Madsen, 1997)
e. Linear fingertip models (Research work of Davis and shah, 1993)
f. Causal analysis (e.g. research work of Brand and Irfan, 1995).
Among many factors, five important factors must be considered for
the successful development of a visionbased solution to collecting
data for hand posture and gesture recognition
38
www.iaetsd.in
Prepared image
-----------------------------------------Image resizing
Feature
extraction
Edgedetetion
Featureextraction
-----------------------------------------Feature vector
Classification
Neuralnework
Classified sign
B.Classification Phase
The next important step is the application of proper feature extraction method and the next is the classification stage, a 3-layer, feedforward back propagation neural network is constructed.
The classification neural network is shown (see figure 3).
It has 256 instances as its input. Classification phase includes network architecture, creating network and training the network. Network of feed forward back propagation with supervised learning is
used.
39
www.iaetsd.in
The goal is to design a highly distinctive descriptor for each interest point to facilitate meaningful matches, while simultaneously
ensuring that a given interest point will have the same descriptor
regardless of the hand position, the lighting in the environment,
etc. Thus both detection and description steps rely on invariance
of various properties for effective image matching. It attempts to
process static images of the subject considered, and then matches
them to a statistical database of preprocessed images to ultimately
recognize the specific set of signed letter.
that this step can't be elim nated. In this algorithm, the orientation
is in the range [- PI, PI] radians.
D. KEYPOINT DESCRIPTORS
First the image gradient magnitudes and orientations are calculated around the key point, using the scale of the key point to select the level of Gaussian blur for the image. The coordinates of
the descriptor and the gradient orientations are rotated relative to
the key point orientation. Note here that after the grid around the
key point is rotated, we need to interpolate the Gaussian blurred
image around the key point at non-integer pixel values. We found
that the 2D interpolation in MATLAB takes much time. So, for
simplicity, we always approximate the grid around the key point
after rotation to the next integer value. By experiment, we realized that, this operation increased the speed much and still had
minor effect on the accuracy of the whole algorithm. The gradient
magnitude is weighted by a gaussian weighting function with ,
equal to one half of the descriptor window width to give less
credit to gradients far from center of descriptor. Then, these magnitude samples are accumulated into an orientation histogram
summarizing the content over 4x4 subregion. Fig. 4 describes the
whole operation. Trilinear interpolation is used to distribute the
value of each gradient sample into adjacent bins. The descriptor is
formed from a vector containing the values of all the orientation
histogram entries. The algorithm uses 4x4 array of histograms
with 8orientation bins in each resulting in a feature vector of 128
elements. The feature vector is then normalized to unit length to
reduce the effect of illumination change. The values in unit length
vector are thresholded to 0.2 and then renormalized to unit length.
This is done to take care of the effect of nonlinear illumination
changes.
A. FINDING KEYPOINTS
The SIFT feature algorithm is based upon finding locations
(called key points) within the scale space of an image which can
be reliably extracted. The first stage of computation searches over
all scales and image locations. It is implemented efficiently by
using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation. Key points
are identified as local maxima or minima of the DoG images
across scales. Each pixel in a DoG image is compared to its 8
neighbours at the same scale, and the 9 corresponding neighbours
at neighbouring scales. If the pixel is a local maximum or minimum, it is selected as a candidate key point.
We have a small image database, so we don't need a large number
of key points for each image. Also, the difference in scale between large and small bare hands is not so big.
B.
KEYPOINT LOCALIZATION
In this step the key points are filtered so that only stable and more
localized key points are retained. First a 3D quadratic function is
fitted to the local sample points to determine the location of the
maximum. If it is found that the extremum lies closer to a different sample point, the sample point is changed and the interpolation performed instead about that point. The function value at the
extremum is used for rejecting unstable extrema with low contrast.The DoG operator has a strong response along edges present
in an image, which give rise to unstable key points. A poorly defined peak in the DoG function will have a large principal curvature across the edge but a small principal curvature in the perpendicular direction.
C.
ORIENTATION ASSIGNMENT
Figure 7: 2x2 descriptor array computed from 8x8 samples (Source: Reference 1)
40
www.iaetsd.in
tions are P-by-4 matrix, in which each row has the 4 values for a
key-point location (row, column, scale, orientation). The orientation is in the range [-PI, PI] radians
The distance of one feature point in first image and all feature
points in second image must be calculated when SIFT algorithm
is used to match image, every feature point is 128- dimensional
data, the complexity of the calculation can well be imagined.A
changed Comparability measurement method is introduced to
improve SIFT algorithm efficiency. First, Euclidean distance is
replaced by dot product of unit vector as it is less computational.
Then, Part characteristics of 128-dimensional feature point take
part in the calculation gradually. SIFT algorithm time reduced.
Euclidean Distance is distance between the end points of the two
vectors. Euclidean distance is a bad idea because Euclidean distance is large for vectors of different lengths. This measure suffers from a drawback: two images with very similar content can
have a significant vector difference simply because one is much
longer than the other. Thus the relative distributions may be identical in the two images, but the absolute term frequencies of one
may be far larger. So the key idea is to rank images according to
angle with query images. To compensate for the effect of length,
the standard way of quantifying the similarity between two images d1 and d2 is to compute the cosine similarity of their vector
representations V (d1) and V (d2)
Algorithmblockdiagram
Now apply these steps in our previous image from which SIFT
features are extracted.
41
www.iaetsd.in
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
TOTAL
7
7
8
8
7
7
8
7
7
8
8
7
7
8
8
8
8
6
8
6
193
1
1
0
0
1
1
0
1
1
0
0
1
1
0
0
0
0
2
0
2
15
66.66
66.66
100
100
66.66
66.66
100
66.66
66.66
100
100
66.66
66.66
100
100
100
100
33.33
100
33.33
92.78
Table 1 Results of training 8 samples for each sign with (0.25) Canny
Threshold
The data set used for training and testing the recognition system
consists of grayscale images for all the ASL signs used in the
experiments are shown see fig. 4. Also 8 samples for each sign
will be taken from 8 different volunteers. For each sign 5 out of 8
samples will be used for training purpose while remaining five
signs were used for testing. The samples will be taken from different distances by WEB camera, and with different orientations.
In this way a data set will be obtained with cases that have different sizes and orientations and hence can examine the capabilities
of the feature extraction scheme.
B.Recognition Rate
Figure. 9 Training chart for a network trained on 8 samples for each sign,
(0.25) Canny threshold
Figure 10 Percentage error recognition chart of neural network
no.ofcorrectlyclassifiedsigns
totaL n oofsigns
C.Experimental Results
SIGN
A
B
C
D
E
F
Recognized
samples
7
7
7
8
8
8
Misclassified
samples
1
1
1
0
0
0
Ecognition
rate
66.66
66.66
66.66
100
100
100
E.
42
www.iaetsd.in
In Figure 8, we compare Database Images 1, 3, 7 with input image key points. So Database Image 3 is closest match with input
image.
A text to sign interpreter means if the user types any sign or sentence corresponding signs are shown so that normal person to
deaf people communication can be done. Examples of showing
the sign to text converters are shown (See figure 6). Fig. 6 shows
when the user type the name BOB in the text box its corresponding signs are appear on the screen one by one above the text
or spelling.
The problem now is how we can identify a 'No Match'. For this,
we saw that the 'No Match' query images are in many cases confused with the database images that have a large number of feature vectors in the feature vectors database. We decided to compare the highest vote (corresponding to the right image) and the
second highest vote (corresponding to the most conflicting image). If the difference between them is larger than a threshold,
then there is a match and this match corresponds to the highest
vote. If the difference is smaller than a threshold, then we declare
a 'No Match'. The values of THRESHOLD were chosen by experiment on training set images either with match or no match.
The approach described above has been implemented usingMATLAB. The implementation has two aspects: training and
inference. During the training phase locally invariant features
(key points, orientation, scales and descriptors) from all training
images are retrieved using the SIFT algorithm.During inference,
the objective is to recognize a test image. A set of local invariant
features are retrieved for the test image during the inference phase
and compared against the training feature-set using the metric
explained in section 4.The title of the closest match is returned as
the final output.
In order to prove the performance of our proposed system, we
Predefined the number of gestures from B, C, H, I, L, O, Y and
create a hand gesture database. Matching is performed between
images by unit vectors. The matching is accomplished for proposed method and the result shows that it produces 98% accuracy.
In Figure 7, we can easily identify Database Images 1, 3, 7 have
more number of key-points matched with input image key-points
.So Distance Ratio
Parameter and threshold are adjusted.
43
www.iaetsd.in
(1.5 GHz AMD processor, 128 MB of RAM running under windows 2008.) WEB-CAM-1.3 is used for image capturing.
The system is proved robust against changes in gesture.Using
Histogram technique we get the misclassified results. Hence Histogram technique is applicable to only small set of ASL alphabets
or gestures which are completely different from each other. It
does not work well for the large or all 26 number of set of ASL
signs. For more set of sign gestures segmentation method is suggested. The main problem with this technique is how good differentiation one can achieve. This is mainly dependent upon the
images but it comes down to the algorithm as well. It may be enhanced using other image
Processing technique like edge detection as done in the presenting
paper. We used the well known edge detector like Canny, Sobel
and Prewitt operators to detect the edges with different threshold.
We get good results with Canny with 0.25 threshold value. Using
edge detection along with segmentation method recognition rate
of 92.33% is achieved. Also the system is made background independent. As we have implemented sign to text interpreter reverse
also implemented that is text to sign interpreter.
Gesture
Name
B
C
H
I
L
O
Y
Testing Number
150
150
150
150
150
150
150
Success
Number
149
148
148
149
148
148
149
Correct Rate
99.3
98.7
98.7
99.3
98.7
98.7
99.3
Table2. The results of classifier for the training set and Testing set
ACKNOWLEDGMENT
7. CONCLUSIONS:
REFERENCES
a.
44
www.iaetsd.in
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.
m.
n.
o.
45
www.iaetsd.in