Sunteți pe pagina 1din 20

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

AMITY SCHOOL OF ENGINEERING AND TECHNOLOGY

AMITY UNIVERSITY UTTAR PRADESH

Term Paper
on

Vision Based Gesture Recognition Using Neural Networks

Guided by: Submitted by:

Rohan Prabhakar

A2305218186

Submitted to:

Amity University Uttar Pradesh


DECLARATION
I, Rohan Prabhakar, student of B.Tech (2-C.S.E.-3(Y)) hereby declare that
the project titled “Vision Based Gesture Recognition Using Neural
Networks” which is submitted by me to Department of Computer Science
and Engineering, Amity School of Engineering Technology, Amity University
Uttar Pradesh, Noida, in partial fulfilment of requirement for the award of
the degree of Bachelor of Technology in Computer Science and
Engineering, has not been previously formed the basis for the award of any
degree, diploma or other similar title or recognition.

The Author attests that permission has been obtained for the use of any
copyrighted material appearing in the Project report other than brief
excerpts requiring only proper acknowledgement in scholarly writing and all
such use is acknowledged.

Date: __________________
Rohan Prabhakar
A2305219999
CSE- (2017-21)

1
CERTIFICATE

This is to certify that Rohan Prabhakar, student of B.Tech in Computer


Science and Engineering has carried out work presented in the project of
the Term paper entitle “Vision Based Gesture Recognition Using Neural
Networks” as a part of First year program of Bachelor of Technology in
Computer Science and Engineering from Amity University, Uttar Pradesh,
Noida under my supervision.

_________________________

Dr Akash Punhani
Department of Computer Science and Engineering
ASET, Noida

2
ACKNOWLEDGEMENT

The satisfaction that accompanies the successful completion of any task


would be incomplete without the mention of people whose ceaseless
cooperation made it possible, whose constant guidance and
encouragement crown all efforts with success. I would like to thank Prof
(Dr) , Head of Department-CSE, and Amity University for giving me the
opportunity to undertake this project. I would like to thank my faculty guide
Dr Akash Punhani who is the biggest driving force behind my successful
completion of the project. She has been always there to solve any query of
mine and also guided me in the right direction regarding the project.
Without her help and inspiration, I would not have been able to complete
the project. Also I would like to thank my batch mates who guided me,
helped me and gave ideas and motivation at each step.

Dr Akash Punhani

3
ABSTRACT

The following report discusses various aspects of the use of neural networks
in gesture recognition for better communication system between computers
and the user. The aim of the gesture recognition system lays emphasis on
the development of a system that can easily identify gestures, and use them
for controlling devices, or convey some formations. In this paper we are
discussing researches done in the field of gesture of recognition based on
Artificial Neural Networks. Several gesture recognition methods are
presented, advantages and drawbacks of the discussed methods are also
included. The specific environment, implementation tools and other
requirements for the methods are also discussed.

4
INTRODUCTION

The world is witnessing rapid development of information technology in our


lives and with that expectations of widely extensive use of computersihas
risen. These rapid developments,will inter in our environments. These
environments need simple natural and easy to user interfaces for,human
computer,interaction which are in short referred to as HCI. The user
interface of the computers,in today's time have evolved from primitiveitext
user interfaces,to graphical user interfaces (GUIs) which are still limited to
keyboard and mouse input. However, theseiGUIs are unnatural,
inconvenient, and not suitable for working in virtual environments. The use
of hand gestures ensures an efficient alternativeito provide the,onerous
interface,devices for human interaction.

Feelings and thoughts and can be expressed by gestures, but utility of


gestures can go beyond this point, hostility and enmity can also be
expressed during speech, emotion andiapproval are also expressed by
gestures.

Good understanding of the structure of the of human hands to specify


postures and gestures,is required for the development of such user
interfaces. Hand posture is considered to be a static form of handiposes.
On the other hand, a hand gesture is a comprised of a sequence static
postures that formlone single gesture which,is displayed within a specific
time period. Some gestures might have both static and dynamic
characteristics as observed inisign languages.

Gestures can be defined as a meaningful physical movement of the


hands,fingers,arms or other parts of the body with the aim to convey
information or the meaning for environment interaction. Identification of the
hand gestures require a good interpretation of the hand movement as
meaningful commands. For human computer interaction (HCI)
interpretation system there are two commonly approaches.

5
APPROACH AND METHODS

1) Data Gloves Approaches: These methods employs optical or mechanical


sensors attached to the glove that,convert and transfers finger flexions into
electrical signals to determine the hand gestures. The data collected is
analysed by one or moreidata glove freedom that contain data regarding
orientation and position of the hand. However the glove must be worn at all
times in this method and wearisome devicce with lots of,cables connected
to the computer hampers,the easiness and naturalness of the user-
computer interaction.

Fig 1: Data Gloves


Source:http//blog.leapmotion.com

2) Vision Based Approaches: These techniques are based on how


information about the environment is realized by the person.these methods
usually performed by using cameras for capturing input images> In order to
create the database for the gesture system, the gestures should be
selected with their relevant meaning. The database may contain multiple
images of the same symbol or gesture at different angles and orientation to
provide accuracy to the system. In this term paper we have used vision
based approaches and some researches that used glove based approches
are discussed to compare the advantages and disadvantages of the two
methods and approaches.

6
Fig 2: Vision based
Source:shutterstock.com

Vision based hand gesture recognition approaches can be categories into


model based approaches and appearance based approaches:

a) Appearance Based Approaches: These approaches use features are


extracted from visual appearance of the input image of the hand model.
The model is then compared with the numerous features extracted from the
input camera or video input.

b) 3-D Model Based Approaches: Model based approaches depends on


the kinematic hand DOF’s of the hand. Pose of the palm and joint angles
from the input image are the parameters that help in the formation of 2-D
projection from 3-D model.

7
IMAGE EXTRACTION AND IDENTIFICATION

Prior to recognition phase, tracking and segmentation are the most


essential in order to extract useful information from raw images. Thus its
necessary to,distinguish and segregate background and foreground in a
given image. The objects of interests which have to tracked form the
foreground of the image while non relevant pixels forming the background
which are to be discarded. Many different techniques can be used to
differentiatelforeground and background of the image: Binary thresholding,
Image Differencing, Connected Components Labelling and etc. Image
Differencing uses simple idea, in which the,foreground is modelled by
obtaining absolute differences in current frame with the background of the
image. It theniFilters out the pixels which have less difference by,using
threshold function, leaving foreground pixels extracted.

The equation given is:

Di,j = |Ii,j – Ri,j|

where
Di,j = The difference image at pixel (i,j)
Ii,j = The input image at pixel (i,j)
Ri,j = The background image at pixel (i,j)

There is also differences in brightness of the images.By applying small


threshold value, say Tforeground, the Di,j which exceeds the value of threshold
will be labelled as the foreground. This equation works only when the
background is defined priorito the calculation. For an adaptive algorithm,
background can be assessed in real time using:

B(t-1) = α It + (1- α) Bt

8
where

t = Current time

It = Input image at time t


Bt = Background image at time t
α = Weighing factor

The image now divided into foreground and background and accurately
bright. The other two functions of segmentation and skin colour filter is
finally applied to the image to extract meaningful symbol or gesture for the
computer to understand.

The further pages will explain how segmentation and skin filters work.

9
SKIN COLOUR FILTER

Human skin colour is composed by two extreme hues; red that is for blood
and yellow for melanin substances, with moderate saturation. Theseiskin
properties are essential information thatican be used in hand tracking
algorithm. Skin filterlmodel is modelled as follows: Three channeled pixel
(RGB values) are first transformed intoilog opponent values.

The greenlchannel is used to represent intensity because the red and blue
channels have poor spatial resolution. The constant 105 simpleiscales the
output of the log function intolthe range [0,254].n is a random noise value,
generated fromla distribution uniform over the range [0,1). The random
noise is added to prevent banding artifacts in dark areas of theiimage. The
constant 1 added,before the logitransformation prevents
excessive,inflation of color distinctions in veryidark regions. The log
transformation makes the Rg and By values, as well as differences
between I values (e.g. texture amplitude), independent,of illumination level.
The hue at a pixel is defined to be a tan(Rg,By), where Rg and By are the
smoothedivalues computed as in the previous section.

The saturation at the pixel is sqrti(Rg^2 + By^2). Because the equation


ignores the intensity, which causes the yellow and brown regions cannotibe
distinguished, both will be considered as yellow. Thelsaturation at the pixel
is sqrt(Rg^2 + By^2). Because
the equation ignores theiintensity, which causes the yellow and brown
regions cannot be distinguished, both will beiconsidered as yellow.

Saturation = √(Rg2 + By2)

10
Once the the hue and saturation are calculated , the skin regions canibe
marked using the given properties:

(a) Hue is in between 110 and !50 and whose saturation is between 20
and 60.

(b) Hue is between 130 and 170 and whose saturation is between 30
and 130.

Fig 3:Foregrounding using skin filter


Source: semanticscholar.com

11
SEGMENTATION

Shape extracted from background subtraction algorithm may not be exactly


desired. Many non desired components are captured as foreground due to
their similar characteristics in nature as depicted in following figures.

In order to resolve this problem, the foreground is partitioned into different


regions connected components algorithm. After severaliregions are
identified, the largest object in the image will be taken as the main piece to
be further processed.

This algorithm fails if one of the nonIrelevant objects has larger areas than
the actual main objects. Besides, region captured as foreground might not
be easily partitioned into disjoint regions. A way to solve this problem is
through collabration of several images segementation algorithms which
tackle the problem of simple characteristics altogether. Applyingledge
detection algorithm to image defines the outline of every object mentioned
in the sceneicaptured.

12
To obtain more information about the image, alskin color detection model
can be applied. This model can tell you about regions which contain human
skin colours. However, the output may come out as mingled region of
undefined borderline.

To have both information on objects border, an image operator can be


applied: Absolute Difference. Taking the image difference in fig and fig
ideally should produce and output as shown in fig ,where a clear cut exists
between the two objects detected which contain human skin colour
information.

Applying the region identification algorithm using connected component


labeling toiimage in figure , it is possible to identify and apply the region
identification algorithm using connected component labeling to image in
figure , it is possible to identify and label each oflthe regions in the image.

13
ARTIFICIAL NEURAL NETWORK (ANN)

During the development through the years the computational variationlhas


growth to new technologies. Artificial Neural Networks are one of the
technologies that solved a broad range of problems inian easy and
convenient manner.

The manner in which artificial neural networks works is similar to human


nervous system, hence it haslsimilar name with the word neural networks.
An artificial neural network can be defined as aihugely parallel distributed
processor units,iwhich has a natural tendency for storing experimental
knowledge and available it for use.

The artificial neuron (named perceptron) consists of numerical value


multiplied by a weight plus bias, the perceptron fires theioutput only when
the total signallof the input exceeds a specific threshold value. The
activation function controls the magnitude of the output, and then the output
is fed to other perceptron in theinetwork.

Fig:Representation of simple artificial neuron

14
NEURAL NETWORKS CLASSIFICATION

(1) FEED FORWARD NETWORKS

Feed forward Networks are the simplest devised type of neural network.
From its name ‘forward’ the information moves inione direction from the
input to output nodes goes through the hidden nodeslwith no cycles.

(2) RECURRENT NEURAL NETWORK

Recurrent neural network can be models with bi-directional data flow, which
allowsiconnection loops between perceptron.

15
GESTURE RECOGNITION USING NEURAL
NETWORKS

Because of Artificial Neural Network ANNs natureithat consist of many


interconnected processing elements , it can be constructed for problems as
mentionediin; searching foriidentification and control, game-playing and
decision making, pattern recognition medical diagnosis,ifinancial
applications, andidata mining. Also ANN has the ability to adaptive self
organizing. Various approaches have been,utilized to deal with gesture
recognition problem ranging fromIsoft computing approaches to statistical
modelsibased on Hidden Markov Model HMM[21,22], and Finiteistate
Machine FSM . Soft computingltools generally include ANN , fuzzy Logic
setsland Genetic Algorithms. In thislpaper we focus onlthe connectionist
approach.

The second system usediElman Recurrent Neural Network for gestures


recognition that could recognize 10 wordsithe data item have been taken
from data gloveiand normalized. The featuresiextracted were 16 data
items, 10 for bending, 3 for angles inithe coordinates, and 3 foriangles in
thelcoordinates. The networkiconsists of three layers, input layer with 16
nodes,ihidden layer with 150 nodes, and outputilayer with 10 nodes which
corresponds 10 recognizediwords. Some improvements have beeniadded

to the system, first, the positional data that have been extracted fromldata
glove was,augmented using pre-wiringinetwork and two kindiof positional
data have been used. And secondly, filtering data space, in which data in
three different time,points were givenito the input layer, and these dataiwill
be shiftedlfor next sample. With these two changes theiinput layer nodes
would be 93lnodes instead ofi16 nodes.

16
APPLICATIONS

(1) HANDS FREE CAR DRIVING

The diversion of a driver’s attention from driving can beicatastrophic. Given


that conventional button- and touch-based interfaces may distractlthe
driver,ideveloping novel distraction-free interfaces for the various devices
present in cars has becomes necessary. Hand gesture recognition may
provide an alternative interface insidelcars. Given that cars are the targeted
application area, the optimal location for the radar sensor canibe applied so
that theisignal reflected from the driver’s hand during gesturing is
unaffected by interference fromithe motion ofithe driver’s body or other
motions within the car.

(2) VIDEO GAMES

Gesture involved gaming is easy and fun. As no equipment is required to


carry out the system mentioned above, money for various console
controller, keyboards and mouse can be saved. Furthermore human direct
action in gesture making is always going to be faster than clicking on a
mouse or pressing key on mouse. This means quicker reflex actions and
therefore better gaming experience.

(3) CONTROLLING HOME APPLIANCES AND LIGHTING

With most of the home appliances now becoming more and more wifi
oriented, it will become more easy in the future to implement gesture
recognition to control such appliances.

Philip ‘hues’, a new lighting series by Philips is wifi oriented which can be
controlled by an app also promises the idea of gesture recognition lighting
is possible in near future.

17
DISCUSSION AND CONCLUSION

In this paper we have presented an idea of hand gesture recognition and


Neural Networks approaches. One ofithe most effective of software
computing techniques is Artificial Neural Networks that has many
applications on hand gesture recognition problem. Some researches that
handle handigesture recognition problem using different neural networks
systems are discussed with detailed showing theiriadvantages and
disadvantages The input for all the selected methods was either digitized
image camera or using data glove system. Then some preprocessingiwas
made onithe input image like normalization, edge detection filter, or
thresholding which are necessary forisegmenting the hand gesture from the
background.

Comparison was made between each of these methods, as seen different


Neural Networks systems areiused in different stages of recognition
systems according to the problem nature, its complexity, and the
environment available.

Then featureiextraction must be made, different methods presented in this


paper, geometric features or non geometric features, geometric features
that use angles and orientations, palm center.

Non geometric such as color, silhouette and textures, but they are
inadequate in recognition. Neural Networks system canibe applied for
extracted features from the input image gestures after applying
segmentation, as inito extract the shape of the hand.

18
19

S-ar putea să vă placă și