Sunteți pe pagina 1din 16

Implementation of Hand Gesture Recognition System to

Aid Deaf-dumb People

Miss. Supriya Ghule1 and Mrs. Mrunalini Chavaan2


1
M-tech Student, Department of Electrical Engineering, MITAOE India
2
Assistant Professor, Department of Electrical Engineering, MITAOE India
1
ssghule@mitaoe.ac.in and 2mhchavaan@etx.maepune.ac.in

Abstract. In recent years, there has been a fast increase in the range of deaf and dumb victims
because of birth defects or some other problems. Since a deaf and mute person cannot talk with
an ordinary person in order that they ought to rely on some kind of communication system. The
gesture shows some physical movements of the body that convey a message. Gesture
recognition is the mathematical interpretation of an individual’s motion by an information
processing system. Linguistic communication provides the most effective communication
platform for the deaf and dumb person to speak with an ordinary person. The target of this
paper is to develop a time period system for hand gesture recognition that acknowledges hand
gestures and then converts them into voice and text. To implement this method we tend to use a
Raspberry Pi with the camera module and programmed with Python Programming Language
supported by Open Source Computer Vision (OpenCV) library. It also contains a 5 inch
800*480 Resistive HDMI Touch Screen Display for I/O data. There is also 5 megapixel Pi
camera to capture the images of a person’s hand. In this paper, efforts have been done to detect
8 different gestures. Each gesture has assigned unique sound and text output. In experimental
results, 800 samples were taken into the consideration out of which 760 samples were detected
correctly and 40 samples were detected wrongly. Hence the proposed system gives the accuracy
of 95%.

Keywords: Raspberry Pi, Python, Feature Extraction, Contours, OpenCV

1 INTRODUCTION

In our daily routine, we will communicate with one another by using speech. Gestures
are a lot of preferred and natural to act with computers for human so it builds a bridge
between humans and machines. For several deaf and dumb person, linguistic
communication is their primary language making a powerful sense of social and
cultural identity. The planned system is predicated on vision-based hand recognition
approach that is a lot of natural and doesn't need any info to spot the actual gesture.
The hand gestures should be known under variable illumination conditions. There are
several feature extraction ways and classification techniques are offered and therefore
the call on which of them to use could be a difficult task. The planned methodology
performs background segmentation of the hand from the information then we tend to
2

assign gesture for a different sentence. It involves feature extraction ways to angle
calculation of hand gestures, then finally the gestures are detected and changing these
gestures into text and voice. The planned system relies on Raspberry Pi with the
camera module and programmed with Python programing language supported by
Open source computer Vision (OpenCV) library. It also contains a 5 inch 800*480
Resistive HDMI Touch Screen Display for I/O data. There is also 5 megapixel Pi
camera to capture the images of a person’s hand. In this paper, efforts have been done
to detect 8 different gestures. Each gesture has assigned unique sound and text output.
In experimental results, 800 (100 samples for each gesture) samples were taken into
the consideration out of which 760 samples were detected correctly and 40 samples
were detected wrongly.

2 LITERATURE SURVEY:

Rathi et al. [1] A framework for perceiving a dynamic hand words motion of Indian
signs and change of perceived signal into text and voice and vice versa. Eigenvectors
and Eigen esteem method has utilized for highlight extraction. Eigenvalue worth
weighted Euclidean Distance based for the most part classifier has utilized.Geethu
and Anu [2] ARM CORTEX A8 Processor board is used.
For image characterization, Haar classifier is used though 1-D HMM is utilized
for Speech modification. Marking acknowledgment has created its significance in
a very few zones like Human computer Interactions (HCIs), mechanical controls,
home computerization. Quiapo et al. [3] it was ready to fulfill the needs of a
sign Language Translator. The task was ready to boost the fluctuate of the
flex detection parts although as well as new types of detector states
for included sifting. The procedure GUI conveyed the bigger a part of the capacities
that were needed within the two-way interpretation technique. Sayan Tapadar et al. [5]
this includes training with the acquired alternatives that square measure near
particular for different hand motions. In this way, we will be prepared to set up
gesture-based communications and thus assemble crippled individuals socially
satisfactorily. Use the distinctive feature extraction. Hamid A. Jalab and Herman .K.
Omer [6] a hand motion interface for prevailing media player misuse neural system.
The anticipated standard recognizes a gathering of 4 explicit hand signals, to be
specific: Play, Stop, Forward, and Reverse. Our standard is predicated on four stages,
Image procurement, Hand division, alternatives extraction, and Classification. Geethu
G Nath and Arun C [7] Implemented framework for marking recognition for not too
sharp people in ARM CORTEX A8 processor board misuse convex sunken body
standard and model coordinating principle. The framework is utilized to oversee
gadgets like an instrument, car Audio Systems, home apparatuses. Shweta et al [8]
Build up a genuine time framework for hand motion acknowledgment that recognizes
hand signals, alternatives of hands like pinnacle figuring and edge computation thus
convert motion pictures into voice and contrariwise using image processing. Ali A.
Abed and Sarah A. Rahman [9] the versatile instrument is developed and tried to
demonstrate the viability of the anticipated guideline. The instrument movement and
3

route happy with very surprising headings: Forward, Backward, Right, Left and Stop.
The ubiquity rate of the automated framework came to with respect to ninety-eight
using Raspberry Pi with the camera module and modified with Python. Muhammad
Yaqoob Javed et al. [10] Digital Dactylology Converser (DOC) that could be a gadget
that changes over a sign language into voice sign and instant message. The anticipated
gadget will function admirably and translates the letters, letters in order to content and
sound. Anup Nandy et al. [11] Give efficient acknowledgment exactness to a limited
arrangement of dynamic ISL motions. It incorporates the amazing outcomes for
Euclidian separation and K-Nearest neighbor measurements.

3 PROPOSED ARCHITECTURE:

3.1 SYSTEM ARCHITECTURE:

i. Frame capture

The input file may be a frame or a sequence of video frames, taken by a


Raspberry Pi camera module pointed toward the user’s hand. A 5MP camera
module that capable of 1080p video and still image however additionally 720p60
and 640x480p60/90 captures the frame. The picture captured with background
and stable light. Region of Interest (ROI) is that the hand region, thus it captured
the pictures of the hand and converts them to binary scale so as to search out the
ROI [9].

ii. Blur Image:

In image processing, a Gaussian blur is that the results of blurring a picture by a


Gaussian function. It’s a wide used result in graphics software system, usually to
scale back picture noise and reduce details. Blur frame is important to method for
picture improvement and for obtaining smart results. Blurring is utilized for
smoothing pictures and scale down noise and details from the pictures. Image can
be filtered by LPF and HPF. LPF helps in removing noises, blurring the picture
and on the other hand HPF helps in finding edges in the pictures.
Mathematically, using a Gaussian blur to a picture is that the same as convolving
the picture with a Gaussian function (Equation 1).

Gaussian Blur formula


2 2 2
x x +y
1 2σ
2
1 2σ
2
G (x )= e G ( x , y )= 2
e
√2 π σ 2 2π σ

iii. Frame Segmentation:


4

Frame segmentation is that the beginning of any frame recognition


method. The main purpose of hand segmentation is to distinguish
hand area from the background within the picture. So as to realize
this, totally different image segmentation algorithms are used like
thresholding method, this result is shown in (fig. 1).

Fig. 1 Thresholding Process

iv. Find Contours and Convex Hull:

Contour is a curve joining all the continuous points. It is a useful tool for
shape analysis, detection, and recognition. In OpenCV Software, finding
contours is like finding a white color object from the black color
background. The green line around the hand (fig. 2) is
termed a convex hull that is used to get the fingertips is that the arched set
encasing the hand space. Convex hull can look the same as contour
approximation however it's not (both could offer an equivalent result in the
same case). It checks a curve for convexity defects and corrects it.

Fig. 2 Convex Hull and Defects


v. Find Convexity Defects and Area Ratio:
5

Any deviation of the object from this hull can be considered as convexity
defects denoted as blue dots (fig 2). There are 3 points start point, far point
(defect point), and end point between the two fingers. If the angle between
two fingers is greater than 30 degree and less than 90 degree then the cavity
formed will be termed as a defect. Any two fingers. The defects will be most
likely of triangular shape which will have 3 corner points i.e. start point, far
point, and end point

Start a End Point


Point

b c

Far Point
Formula of a,b,c, & area ratio

(end ( 1 )−start ( 1 ))
¿
a= ¿
2
(end ( 0 )−start ( 0 )) +¿
√¿

(far ( 1 )−start ( 1 ))
¿
b= ¿
2
(far ( 0 )−start ( 0 ) ) +¿
√¿

(end ( 1 )−far ( 1 ))
¿
c= ¿
2
(end ( 0 )−far ( 0 )) +¿
√¿

a+ b+c
s=
2
Area ratio is given by,
6

ar= √ s (s−a)(s−b)(s−c )
a cos ⁡( b2 +c 2−a2)
Angle= ∗57
2 bc

No . of correctly detected gestures


Accuracy = ∗100
total no . of tested gestures

3.2 METHODOLOGY
The camera coupled with the Raspberry Pi initially captures the image of the hand to
be processed and identified. The input image must be converted to a specific format
before processing and identifying the gesture. After the process, the gesture is
identified and the text is generated. This text is for ordinary people to read, and text-to-
speech messages are available if you cannot see them.

Input RGB to Identify Identify


Camera
Frame Binary Colour Gesture

Generate
Speaker
Text

Fig.3 Block Diagram of Working Module

A. Image Capture:
The input information is a picture or series of pictures taken by
one camera and directed at the user’s hand. The gesture pictures are
real images of various sizes taken with a camera.

B. RGB to Binary:
7

Two skin color areas are defined, one is the top area and one is
the bottom area with respect to HSV, where the input image is
captured by the camera and then converted to a grayscale image.

C. Identify Color:
In this step, the skin color is extracted from the object frame.
Next, the image was cropped to get rid of the unwanted parts of the
initial pictures. Finally, we get clear results images with uniform
size and consistent background.

D. Identify Gesture:
Identify the convex hull and contour in the green box around the
hand. In this green box, the gap between two fingers is called as
defects. If the angle between two fingers is less than 90 degrees and
greater than 30 degrees, it is considered defects. These defects are
marked with a blue dot. Based on the hand area and the defects we
can identify the gesture.

E. Generate text and text to speech:


The corresponding text will be displayed on the LCD screen
according to the gesture. This text is then converted to sound in the
output.

4 IMPLEMENTATION AND RESULT:

4.1 ALGORITHM:

 Import the mandatory packages: define the necessary packages


that required during this algorithm such as:
from tkinter import*
from PIL import Image
from PIL import ImageTk
import cv2
import time
import os
import picamera
import numpy as np
import math
8

 Define frames for tkinter image:


root = Tk()
frame1 = Frame(root)
frame2 = Frame(root)
frame3 = Frame(root

 Initialize the present frame as follows:


frame1.pack(side=TOP)
frame2.pack(side=LEFT)
frame3.pack(side=LEFT)

Fig. 4 TKinter Image

 Capturing image is as follows:


camera=picamera.PiCamera()
camera.resolution=(200,200)
camera.start_preview()
time.sleep(5)
camera.capture
camera.stop_preview()
camera.close()

 Convert image to HSV, obtain skin color object and blur image as follows:
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

lower_skin = np.array([0,20,70], dtype=np.uint8)


upper_skin = np.array([20,255,255], dtype=np.uint8)

mask1 = cv2.inRange(hsv, lower_skin, upper_skin)


9

mask = cv2.GaussianBlur(mask,(5,5),100)

 Then find the contours as follows:

contours,hierarchy=cv2.findContours(mask,cv2.RETR_TREE,cv2.CH
AIN_APPROX_SI MPLE)

 Find convex hull as follows:

cnt = max(contours, key = lambda x: cv2.contourArea(x))


hull = cv2.convexHull(cnt)

 Calculate area ratio for the gesture:


areahull = cv2.contourArea(hull)
areacnt = cv2.contourArea(cnt)
arearatio=((areahull-areacnt)/areacnt)*100
hull = cv2.convexHull(approx, returnPoints=False)
defects = cv2.convexityDefects(approx, hull)

 Code for finding no. of defects due to fingers:


for i in range(defects.shape[0]):
s,e,f,d = defects[i,0]
start = tuple(approx[s][0])
end = tuple(approx[e][0])
far = tuple(approx[f][0])
pt= (100,180)
a = math.sqrt((end[0] - start[0])**2 + (end[1] - start[1])**2)
b = math.sqrt((far[0] - start[0])**2 + (far[1] - start[1])**2)
c = math.sqrt((end[0] - far[0])**2 + (end[1] - far[1])**2)
s = (a+b+c)/2
ar = math.sqrt(s*(s-a)*(s-b)*(s-c)
angle = math.acos((b**2 + c**2 - a**2)/(2*b*c)) * 57

4.2 HAEDWARE IMPLEMENTATION:


10

Fig. 5 Hardware Module


11

4.3 FLOW CHART:

Fig. 6 Flowchart
12

4.3 RESULT:

Table 1: Image to Sound Conversion

Gesture Input Threshold Identified text Identified


s image image image output audio file
output

8
13

The above table represents how input image gets converted into sound and display
output. Each and every gesture has unique sound and display output. We will discuss
this in next table.
Table 2: Result Related Discussion
Gestures Number of Area Ratio Display and
Defects Audio Output
1 0 Less than 12 NO
2 0 Less than 17.5 HELLO
3 1 Less than 17.5 PLEASE HELP
4 2 Less than 27 I AM THIRSTY
5 3 Less than 27 YES
6 4 Less than 27 THANK YOU
7 0 Less than 17.5 ALL THE BEST
8 2 Less than 27 OK

4.5 ANALYSIS PART OF RESULT:

Table 3: Confusion Matrix

OUTPU Gesture Gesture Gesture Gesture Gesture Gesture Gesture Gesture


T 1 2 3 4 5 6 7 8
INPUT
Gesture
100 0 0 0 0 0 0 0
1
Gesture
0 100 0 0 0 0 0 0
2
Gesture
0 4 95 0 0 0 1 0
3
Gesture
0 0 3 90 2 0 0 5
4
Gesture
0 0 0 0 100 0 0 0
5
Gesture
0 0 0 0 0 100 0 0
6
Gesture
7 6 2 0 0 0 85 0
7
14

Gesture
0 0 3 5 2 0 0 90
8
For hand gesture detection we have a tendency to take 8 different gestures. Every
gesture repeated hundred times, so the total range of tested pictures was 800 among
that pictures correct recognition was 760 and wrong recognition was 40 with a mean
detection rate of hand gestures of 95.00%.

Accuracy formula

Table 4: Percentage Accuracy of Various Gestures

Different Total No. of No. of Right No. of Wrong Accuracy In


Gesture Gestures Gestures Gestures Percentage
Gesture 1 100 100 0 100%
Gesture 2 100 100 0 100%
Gesture 3 100 95 5 95%
Gesture 4 100 90 10 90%
Gesture 5 100 100 0 100%
Gesture 6 100 100 0 100%
Gesture 7 100 85 15 85%
Gesture 8 100 90 10 90%
Total Gesture 800 760 40 95%

The120graph for average recognition rate of various hand gestures that is shown
below 100
(fig 7).
80
60 Total no. of gestures
40 No. of right gestures
20 Wrong gestures
Accuracy
0
15

Fig. 7 Graph for Average Recognition Rate

5 CONCLUSION
The planned system is simple to implement as there's no complicated feature
calculation. This system was implemented using Raspberry Pi with the camera
module and programmed with Python Programming Language supported by
Open Source Computer Vision (OpenCV) library. The system was used to
acknowledge sign language utilized by deaf and dumb parsons. This system is
used to overcome the communication gap between mute person and ordinary
person. There's a necessity for research within the area feature extraction and
illumination therefore the system becomes a lot of reliable. The system was used
to acknowledge sign language utilized by deaf and dumb parsons. The deaf and
dumb person will use the hand gestures to do linguistic
communication and it'll be converted into voice and text with accuracy 95%.

REFERENCES
1. Rathi, S., & Gawande, U. (2017). Development of full duplex intelligent communication
system for deaf and dumb people. 2017 7th International Conference on Cloud
Computing, Data Science & Engineering -
Confluence.doi:10.1109/confluence.2017.7943247
2. Geethu G Nath, Anu V S, “Embedded Sign Language Interpreter System For Deaf and
Dumb People”, 2017 International Conference on Innovations in information Embedded
and Communication Systems (ICIIECS)
3. Quiapo, Carlos Emmanuel A. and Ramos, Katrina Nicole M., “Development of a Sign
Language Translator Using Simplified Tilt, Flex and Contact Sensor Modules”, 978-1-
5090-2597-8/16/$31.00 c 2016 IEEE
4. Subhankar Chattoraj Karan Vishwakarma, “Assistive System for Physically Disabled
People using Gesture Recognition”, 2017 IEEE 2nd International Conference on Signal
and Image Processing
5. Sayan Tapadar, Suhrid Krishna Chatterjee, Himadri Nath Saha, Shinjini Ray, Sudipta
Saha, “A Machine Learning Based Approach for Hand Gesture Recognition using
Distinctive Feature Extraction”, 978-1-5386-4649-6/18/$31.00 ©2018 IEEE
6. Hamid A. Jalab Herman .K. Omer, “Human Computer Interface Using Hand Gesture
Recognition Based On Neural Network”, 978-1-4799-7626-3/15/$31.00 ©2015 IEEE
7. Geethu G Nath, Arun C S, “Real Time Sign Language Interpreter”, 2017 International
Conference on Electrical, Instrumentation and Communication Engineering (ICEICE2017)
8. Shweta, Rajesh, Vitthal, “Real Time Two Way Communication Approach for Hearing
Impaired and Dumb Person Based on Image Processing”, 2016 IEEE International
Conference on Computational Intelligence and Computing Research
9. Ali A. Abed, Sarah A. Rahman, “Python-based Raspberry Pi for Hand Gesture
Recognition”, International Journal of Computer Applications (0975 – 8887) Volume 173
– No.4, September 2017
10. Muhammad Yaqoob Javed!, Muhammad Majid Gulzar3Syed Tahir Hussain Rizvi 2M
Junaid Asif4 Zaineb Iqbal, “Implementation of Image Processing Based Digital
Dactylology Converser for Deaf-Mute Persons”, 978-1-4673-8753-8/16/$31 .00 ©2016
IEEE
16

11. Anup Nandy, Jay Shankar Prasad, Soumik Mondal, Pavan Chakraborty, and G.C. Nandi,”
Recognition of Isolated Indian Sign Language Gesture in Real Time”, BAIP 2010, CCIS
70, pp. 102–107, 2010. © Springer-Verlag Berlin Heidelberg 2010
12. Kapil Yadav and Jhilik Bhattacharya, “Real-Time Hand Gesture Detection and
Recognition for Human Computer Interaction”, Intelligent Systems Technologies and
Applications, Advances in Intelligent Systems and Computing 384, DOI: 10.1007/978-3-
319-23036-8_49
13. Giuseppe Air`o Farulla, Ludovico Orlando Russo, Chiara Pintor, Daniele Pianu, Giorgio
Micotti, Alice Rita Salgarella, Domenico Camboni, Marco Controzzi, Christian Cipriani,
Calogero Maria Oddo, Stefano Rosa1, and Marco Indaco, “Real-Time Single Camera
Hand Gesture Recognition System for Remote Deaf-Blind Communication”, c Springer
International Publishing Switzerland 2014
14. [R, GONZALEZ. R, WOODS. Digital Image Processing, New Jersy: Pearson Education.
Inc.
15. S, UMBAUGH. Computer Vision and Image Processing: A Practical Approach Using
Cviptools with Cdrom, Prentice Hall PTR.
16. H, JALAB. R. W, IBRAHIM Texture enhancement based on the SavitzkyGolay fractional
differential operator. Mathematical Problems in Engineering, 2013

S-ar putea să vă placă și