Sunteți pe pagina 1din 22

Indian Institute of Information Technology, Allahabad Deoghat, Jhalwa, Allahabad - 211012

A PROJECT REPORT ON :

Indian Sign Language Gesture Recognition


Submitted By: SAURAV KUMAR (IEC2009032) HIMANSHU VIVEK GUPTA (IEC2009036) (IEC2009091)

Under the supervision of

Dr. Pavan Chakraborty

Indian Institute of Information Technology, Allahabad Devghat, Jhalwa, Allahabad 211012

Indian Sign Language Gesture Recognition

Page 1

INDIAN INSTITUTE OF INFORMATION TECHNOLOGY

ALLAHABAD

CERTIFICATE

Certified that this project report

INDIAN SIGN LANGUAGE (ISL)

GESTURE RECOGNITION is the bonafide work of SAURAV KUMAR , HIMANSHU and VIVEK GUPTAwho carried out the project work under my supervision.

SIGNATURE

Dr. Pavan Chakraborty


(SUPERVISOR) Assistant Lecturer ROBOTICS, IIITA

Indian Sign Language Gesture Recognition

Page 2

Index
1. Abstract..4 2. Introduction .4 2.1 Problem definition.... 5
3. Literary survey

4. Methodology ..10 5. Work done till mid sem 11 6. Proposed Work till end sem12 7. References .. 17

Indian Sign Language Gesture Recognition

Page 3

ABSTRACT

This project considers a vision based gesture recognition for imitation based learning. It illustrates a new approach towards understanding of Indian Sign Language (ISL) gesture. The technique used is sufficiently robust to deal with dynamic hand gesture, as provided by standard ISL. Dynamic gestures profoundly improve the communication ability for deaf and dumb persons but, at the same time increases the computational complexities. A video database has to be considered instead of static image database. Complexities are further enhanced when both hands are being used extensively. Orientation histogram has been considered as an important feature since it is orientation and scene illumination variant. A simple and effective algorithm has been developed to calculate edge orientations in the sequence of ISL gestures images. It is being qualified using Euclidean distance metric and K-nearest neighbor methods. The behaviors of classified ISL gestures have been learned by Hidden Markov Model (HMM) technique.

Indian Sign Language Gesture Recognition

Page 4

INTRODUCTION

Sign Language imparts an absolute way of communication among the deaf communities all over the world. It emphasizes the movement of hands, arms, head and body in a conceptual predetermined way so that a gesture language would be constructed significantly. It carries its own syntactical and grammatical meaning since it is not considered as a universal language. Sign language is one of the major concerns for speech and hearing impaired in order to enhance the communication capabilities. It offers two types of gesture one is static gesture and another is dynamic gesture. ISL can be categorized with the both gestures but more challenging part is to deal with dynamic gestures. A static gesture is particular configuration and position of the hand which is being represented by single image. A dynamic gesture is a moving gesture which is being represented by sequences of images. It employs comprehensive study of complexity problem while classification of highly structural nature of sign language and entire analysis of hand gestures.

1.1 Problem Definition: To develop a good classifier that could be implemented to recognize hand gestures in real time. Hence to accomplish this task project has been divided in 6 modules depending upon their functional requirements. 1) Module 1: sensing or capturing Image input. 2) Module 2: Considering orientation contrast and edge of orientation as feature vectors 3) Module 3: Implementation of SIFT (Scale Invariant Feature Transform) to make operation robust. 4) Module 4: use of LIBSVM tool for classification 5) Module 5: Use classified result of dynamic ISL gesture for providing text or speech output.

Indian Sign Language Gesture Recognition

Page 5

LITERATURE SURVEY

Gesture recognition is one of the most challenging areas in research with appealing its huge applications. In this literature survey module two different aspects of hand gesture recognition are explored. First one is the hand gesture recognition based on data glove based approach and the other one implies computer vision approach.

2.1 Data Glove based technology


This technology emerges with its precious application on hand pose tracking. It is used to make an instrument which is entirely equipped with several sensors in order to provide information about position and orientation of the hand followed by flex of the fingers. The movements are then encapsulated to recognize sign language effectively. It had created huge response economically in market. It was explained by Zimmerman, Lanier, Blanchard, Bryson and Harvill (1987), who worked on optical flex sensor which has been mounted in a glove in order to measure finger bending. The working principle describes that the fiber optic cables containing little crack in it are being used at the back of each hand. The accurate pose of hand can be determined by measuring the light loss when the fingers are bent while doing gesture. As a result, the light emerges out through the crack of the cables as per the radiation of light on cable is concerned.

Fig 2.1.a represents the structure of Data Glove

The data glove is intended to measure the bending of each joint with a significant accuracy level. But its weak point is that the sideways motion of the fingers is tremendously difficult to measure. Figure 2.1. (a) represents the structure of data glove.

Indian Sign Language Gesture Recognition

Page 6

The weakness of data glove based technology has been solved by Cyber Glove based approach[01] where finger abduction has been done accurately. It was developed by Kramer (Kramer 89) which is being articulated by strain gauges. Those sensors are placed between the fingers in order to measure the side way motion more accurately along with keep sensing the bending of fingers. Following figure demonstrates the overview of Cyber Glove device.

Fig 2.1.b represents the structure of Cyber Glove The working principle of cyber glove illustrates that it is used to capture the motion as well as position of the fingers and the wrist. It contains upto 22 sensors which include three bend sensors mounted on each finger, four abduction sensorsalong with the rest of the sesors which indicate to measure wrist abduction and flexion plus thumb crossover and palm arch. The KHU-1 data glove[] based approach has been used for 3D Hand motion and tracking and recognition of hand gesture. The data glove is consisted with three tri axis accelerometer type of sensors, one controller along with one Bluetooth. It transmits motion signal to systems through wireless medium via Bluetooth. The kinematic theory is applied to construct 3D hand gesture model in digital space. The rule based algorithm has been utilized in order to recognize hand gestures using KHU-1 data glove and 3D hand gesture based model.

2.2 Computer Vision Based Technique


Different data glove based instruments provide very good accurate result for hand gesture recognition. But, several limitations exist which makes the recognition system expensive and encumbering. Therefore computer vision based technique has emerged to overcome those limitations. An extremely good review is offered by Palovic et. Al.(1995) on vision based

Indian Sign Language Gesture Recognition

Page 7

gesture recognition system. It shows its capability towards accurate hand and body tracking. This technique is not applicable for all the applications. Rehg and Kanade (1994) proposed an approach based on computer vision which offers to create a model on cylindrical components using stereoscopic camera. Etoh, Tomono and Kishino (1991) familiarized the similar work. The image based system attempts to segment hand image from background images by extracting features such as edges, shape, fingertips and hand orientation. Two researchers Starner and Pentland (1995) have proposed a model for description of
hand shape, based on HMM which is applied to recognize 42 ASL gestures with 99% accuracy.
[2]

Hand gesture recognition using vision based system

approach falls into two categories for

human computer interaction with the application to virtual reality. It involves 3D based model where 3D kinematics is used for formulation of hand model with considerable DOFs. This model is being used to estimate the hand parameters. It is being done with the comparisons of the input images and the 2D appearance which is the output of 3D hand model. Another approach implies appearance based method where features from the images are extracted to model the visual appearance of the hand. Then the comparison between the extracted features of the images from the video input is to be performed. This approach enhances the real time performance in order to deal with 2D image features. In the recent trends of research, gesture recognition this vision based approach has achieved a remarkable progress and will enhance its growth in the areas of feature extraction, classification method and gesture representation.

Indian Sign Language Gesture Recognition

Page 8

METHODOLOGY

Local Features For Object Recognition


Matching features[3] across different images is a major task in computer vision. Local features provide a representation that allows us to find a particular object we have encountered before. For any object in an image, interesting points on the object can be extracted to provide a "feature description" of the object. This description, extracted from a training image, can then be used to identify the object when attempting to locate the object in a test image containing many other objects. To perform reliable recognition, it is important that the features extracted from the training image be detectable even under changes in image scale, noise and illumination. When the images are similar in nature (same scale, orientation, etc.) simple corner detectors like Harris Corner Detector can work. But when the images have different scales, rotations and illumination, more descriptive local features needs to be obtained. A good local feature should be: reasonably tolerant to : image noise uniform scaling rotation changes in illumination minor changes in viewing direction. highly distinctive so that it allows for correct object identification with low probability of mismatch. easy to extract. easy to match against a large database of local features.

Scale-Invariant Feature Transform (Sift)


Scale-invariant feature transform [4] (or SIFT) is an algorithm in computer vision to detect and extract local feature descriptors in an image. The algorithm was published by David Lowe in 1999. These local feature descriptors are reasonably invariant to changes in illumination, image noise, rotation, scaling and minor changes in viewpoint. Following are the detection stages involved in the implementation of the SIFT algorithm :
Indian Sign Language Gesture Recognition Page 9

1. 2. 3. 4.

Scale -space extrema detection. Keypoint localization. Orientation assignment. Generation of keypoint descriptors.

1.Scale -space extrema detection This is the stage where the interest points, which are called keypoints in the SIFT framework, are detected. For this, the image is convolved with Gaussian filters at different scales, and then the difference of successive Gaussian-blurred images is taken. Keypoints are then taken as maxima/minima of the Difference of Gaussians[5] (DoG) that occur at multiple scales. Specifically, a DoG image is given by

where blur

is the convolution of the original image at scale k, i.e.,

with the Gaussian

Fig. Various octaves and their DoG obtained by subtraction


Indian Sign Language Gesture Recognition Page 10

Hence a DoG image between scales ki and kj is just the difference of the Gaussian-blurred images at scales ki and kj. For scale-space extrema detection in the SIFT algorithm, the image is first convolved with Gaussian-blurs at different scales. The convolved images are grouped by octave (an octave corresponds to doubling the value of ), and the value of ki is selected so that we obtain a fixed number of convolved images per octave. Then the Difference-of-Gaussian images are taken from adjacent Gaussian-blurred images per octave. Once DoG images have been obtained, keypoints are identified as local minima/maxima of the DoG images across scales. This is done by comparing each pixel in the DoG images to its eight neighbors at the same scale and nine corresponding neighboring pixels in each of the neighboring scales. If the pixel value is the maximum or minimum among all compared pixels, it is selected as a candidate keypoint.

Fig .Minima and maxima extraction from the DoG images

2.Keypoint localization Scale-space extrema detection produces too many keypoint candidates, some of which are unstable. The next step in the algorithm is to perform a detailed fit to the nearby data (interpolation) for accurate location and scale. This information allows us to reject points that have low contrast (therefore more sensitive to noise) or are poorly localized along an edge.

Indian Sign Language Gesture Recognition

Page 11

3.Orientation assignment In this step, each keypoint is assigned one or more orientations based on local image gradient directions. This is the key step in achieving invariance to rotation as the keypoint descriptor can be represented relative to this orientation and therefore achieve invariance to image rotation. First, the Gaussian-smoothed image , the gradient magnitude, differences: at the keypoint's scale is taken so that all at scale

computations are performed in a scale-invariant manner. For an image sample , and orientation,

, are precomputed using pixel

Fig. Magnitude and orientation of gradient in the neighbourhood of the keypoint

The magnitude and direction calculations for the gradient are done for every pixel in a neighboring region around the keypoint in the Gaussian-blurred image L. An orientation histogram with 36 bins is formed, with each bin covering 10 degrees. Each sample in the neighboring window added to a histogram bin is weighted by its gradient magnitude and by a Gaussian-weighted circular window with a that is 1.5 times that of the scale of the keypoint.
Indian Sign Language Gesture Recognition Page 12

The peaks in this histogram correspond to dominant orientations. Once the histogram is filled, the orientations corresponding to the highest peak and local peaks that are within 80% of the highest peaks are assigned to the keypoint. In the case of multiple orientations being assigned, an additional keypoint is created having the same location and scale as the original keypoint for each additional orientation.

Fig . Orientation histogram for a keypoint 4.Generation of keypoint descriptors Previous steps found keypoint locations at particular scales and assigned orientations to them. This ensured invariance to image location, scale and rotation. Now we want to compute a descriptor vector for each keypoint such that the descriptor is highly distinctive and partially invariant to the remaining variations such as illumination, 3D viewpoint, etc. This step is performed on the image closest in scale to the keypoint's scale. First a set of orientation histograms are created on 4x4 pixel neighborhoods with 8 bins each. These histograms are computed from magnitude and orientation values of samples in a 16 x 16 region around the keypoint such that each histogram contains samples from a 4 x 4 subregion of the original neighborhood region. The magnitudes are further weighted by a Gaussian function with equal to one half the width of the descriptor window. The descriptor then becomes a vector of all the values of these histograms. Since there are 4 x 4 = 16 histograms each with 8 bins the vector has 128 elements. This vector is then normalized to unit length in order to enhance invariance to affine changes in illumination .

Indian Sign Language Gesture Recognition

Page 13

Fig .A keypoint descriptor is obtained from the image gradient in a 8x8 neighbourhood

Recognition System:

Recognition Process Training 1. Capture image 2. Convert it into gray-scale image


Indian Sign Language Gesture Recognition Page 14

3. Subsample the image ( to make it real time 4. Find the Orientation histogram 5. Save it as a training pattern . Classification In machine learning, pattern recognition is the assignment of a label to a given input value. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes.Image classification is a complex process that may be affected by many factors.we are intending to use one of following algorithms for image classification.

1.K-nearest neighbour algorithm (K-NN) In pattern recognition, the k-nearest neighbour algorithm[6] (k-NN) is a method for classifying objects based on closest training examples in the feature space. The k-nearest neighbour algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbours, with the object being assigned to the class most common amongst If k=1 the object is simply assigned to the class of the nearest neighbour Algorithm: The training examples are vectors in a multidimensional feature space, each with a class label. After having a testing example first we need to calculate the euclidean distance between all the training samples and the testing sample.

Euclidean distance calculation Let xi be an input sample with p features (xi1,xi2,xi3,...xip) and n be the total number of input samples i= (1,2,3,..n ) and the total number of features The Euclidean distance between sample xi and xj is defined as dl(xi,xj)={(xi1-xj1)2+(xi2-xj2)2+(xi3-xj3)2+....(xip-xjp)2}1/2 l=1,2,3,..p.

Finding K-Nearest Neighbours


Indian Sign Language Gesture Recognition Page 15

After calculating l distances between training samples and new sample we select k minimum values amongst all dl and list the corresponding samples. Assignment Of Class:Now by observing these samples find the class(group) where most of these samples belong and assign the testing sample to that class(majority voting). Here we have used eucliean distance as metric.we can also use absolute distance .

2.Support Vector Machine The standard SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the input which of two possible classes forms the input, making the SVM a non-probabilistic binary linear classifier . [7] Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other. More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data point of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. A hyperplane is a concept in geometry. It is a generalization of the plane into a different number of dimensions. A hyperplane of an n-dimensional space is a flat subset with dimension n 1. By its nature, it separates the space into two half spaces.In a vector space, a vector hyperplane is a linear subspace of codimension 1. Such a hyperplane is the solution of a single homogeneous linear equation.[8] Support Vector Machines are based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects having different class memberships. A schematic example is shown in the illustration below. In this example, the objects belong either to class GREEN or RED. The separating line defines a boundary on the right side of which all objects are GREEN and to the left of which all objects are RED. Any new object (white circle) falling to the right is labeled, i.e., classified, as GREEN (or classified as RED should it fall to the left of the separating line) . [9]
Indian Sign Language Gesture Recognition Page 16

Fig : linear classifier

The above is a classic example of a linear classifier, i.e., a classifier that separates a set of objects into their respective groups (GREEN and RED in this case) with a line. Most classification tasks, however, are not that simple, and often more complex structures are needed in order to make an optimal separation, i.e., correctly classify new objects (test cases) on the basis of the examples that are available (train cases). This situation is depicted in the illustration below. Compared to the previous schematic, it is clear that a full separation of the GREEN and RED objects would require a curve (which is more complex than a line). Classification tasks based on drawing separating lines to distinguish between objects of different class memberships are known as hyperplane classifiers. Support Vector Machines are particularly suited to handle such tasks.

Fig. Maximum margin non linear classier:

SVM s start from the goal of separatingthe data with a hyperplane , and extend this to non linear decision boundaries using the kernel trick . The equation of a general hyper plane is w*x + b = 0 with x being the point ( a vector ) , w the weights ( also a vector ) . The hyperplane should separate the data , so that w*xk + b > 0 for all the xk of one class, and w*x + b < 0 for all the xj of the other class

Indian Sign Language Gesture Recognition

Page 17

WORK DONE TILL MID-SEM

ISL acquisition technique: The prime focus of this project is to create a repository with huge number of images (sequence of images) of several kind of ISL class/word. Preliminarily ISL dynamic gesture has been recorded with different frame rate per second. ISL video has been captured by selecting several dynamic gestures (i.e. sequence of frames) in real time using handy cam.

Indian Sign Language Gesture Recognition

Page 18

Obtaining SIFT feature descriptors We have splitted each gesture into a sequence of frames and converted each frame into grayscale image. In order to make the gesture recognition reasonable in an uncontrolled environment we have implemented SIFT algorithm to obtain the feature descriptors for each frame.

We obtained the feature descriptors for each frame that is a 128 (16*8) length vector that describes the location and the orientation of each keyword in that frame.

Indian Sign Language Gesture Recognition

Page 19

PROPOSED WORK TILL END-SEM

Our proposed classification approach entitles to indicate different modalities of task to reach its target effectively. It enriches to reach its goal by choosing appropriate features among the gathered ISL gestures. The process of feature selection and extraction can be carried out using algorithmic approach. Therefore, it leads to a classification module which is being done by statistical techniques namely known as Euclidean distance metric and K Nearest Neighbor metric. The entire process has been segmented into following sections: Feature Selection for ISL videos

Applied algorithm for feature extraction

Classical mechanism for calculation of Euclidean Metric

Overview of KNN method

Recognition system

Indian Sign Language Gesture Recognition

Page 20

REFERENCE

[01] http://www.billbuxton.com/input14.Gesture.pdf [02] Pragati Garg; Naveen Aggarwal; Sanjeev Sofat;, Vision Based Hand Gesture Recognition, World Academy of Science, Engineering and Technology, Vol.49, January 2009. [03] www.aishack.in/2010/05/sift-scale-invariant-feature-transform as on [Jan-15-2012] [04] http://en.wikipedia.org/wiki/Scale-invariant_feature_transform as on [Jan-18-2012] [05] http://en.wikipedia.org/wiki/Feature_(computer_vision) as on [Feb-17-2012] [06] http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm as on [Jan-24-2012] [07].http://www.theopavlidis.com/technology/CBIR/PaperE/AnSIFT1.htm as on [Jan-19-2012] [08].http://en.wikipedia.org/wiki/Support_vector_machine as on [Jan-26-2012] [09]http://www.statsoft.com/textbook/support-vector-machines as on [Feb-23-2012]

Indian Sign Language Gesture Recognition

Page 21

SUGGESTIONS BY THE BOARD

Indian Sign Language Gesture Recognition

Page 22

S-ar putea să vă placă și