Sunteți pe pagina 1din 33

Abstract

Computer vision includes methods for acquiring, processing, analyzing,


and understanding images. Applications of this field include detecting events,
controlling processes, navigation, modelling objects or environments, auto-
matic inspection and many more. Activity recognition is one of the applica-
tions of computer vision that aims to recognize the actions and goals of one
or more agents from a series of observations on the agents actions and the
environmental conditions. The goal of the project is to train an algorithm to
automate detection and recognition of human activities performed in the video
data. The project can be utilized in scenarios such as surveillance systems, in-
telligent environment, sports play analysis and web based video retrieval.
Propose of this paper is an approach to recognize human actions using Im-
age processing and machine learning techniques. We collect Weizmann dataset
and that dataset contain video. This video we split into frame by frame and
take one frame as input for image preprocessing. That given frame is noisy
so we need to remove the noise from frame. There are several method for re-
move the noise from image but we are using median blur technique for remove
blur from input image. We apply threshold for detect object from input image
and covert that image into grayscale. HOG algorithm used for counting occur-
rences of gradient orientation for particular region of interest (ROI). HOG is
used for feature extraction from image and that feature we are using for pro-
vide training to SVM algorithm. For testing we collect input from web camera
and apply preprocessing and feature extraction. We are use the given feature
and Support vector machine take input as a feature and predicts the class label
based on Train SVM model. We use publicly available dataset like Weizmann
dataset to train classifier.

1 List of Publications
Publication Name Year
Name

List of Abbreviations
GUI Graphics User Interface
RBG Red, Green, and Blue
SVM Support Vector Machine
HOF Hand Of Gesture
HOG Hidden Markov Model

1
Department of Computer Engineering 4
Department of Computer Engineering 2

SYNOPSIS

Name of the student: Vivek D. Lad


Roll No: 8727
Course and Academic year : Computer Engineering (2016-2017)
Dissertation Title: Real Time Hand Gesture Recognition And Translate Into
Text
Description
Sponsorship details (if any): No

Internal Guide: Mr. Ramesh Kagalkar.


Technical Keywords- ACM:

1. Hand Gesture Recognition


2. HumanMachine Interaction

3. Action Recognition
4. Motion Shape
5. Computer Vision

Relevant Objectives:

1. To make system that converts video of signs by different signers into


English text.

2. To get better result for video of gesture.


3. To give better recognition rates for the different gestures under different
conditions.
4. To get more accuracy from classification technique.

Motivation:

It is noticeable mainly in deaf and dumb people when they communicat-


ing via action normal people cant understand their language properly. So we
will decide to take challenges to give permanent solution in which they will
communicate as a normal human being.

2
Hypothesis:

Proposed scheme includes two models training model, testing Model. In


Training model, we have produced our own video database of Indian signs
in collaboration with signs of alphabets, numbers, writing and sentences by
several signers of Indian sings and store it into record. Video of signer is pre-
processed using dissimilar techniques to avoid any false segmentation in the
next stage. In testing model, the video related to database is tested and predict
text linked to sign in Hindi language. The system can tested with the signs
present in the database using unclear deduction system based on rules. The
rule based system is a very influential tool used by a lot of researchers for
model classification tasks.

Strategy planned associated with the dissertation:

3
Fig1: System Architecture

The Fig 1 shows the detail overview of the purposed system. In Fig 1 System
take input as video and video is collection of video frames. We collect frames
from video later we use for processing. We need to analyze the frame and based
on that frame action is identified. Preprocessing purpose we remove the blur
from image for improve accuracy. Feature extraction can be done using HOG
algorithm and we train the support vector machine by using feature collected
from HOG algorithm. In real time web camera to get image and that image to
extract HOG feature later that feature used for test the support vector machine.
Based on the training support vector machine predict the class label.
Name of at least two journals where papers (Sem-I and Sem-II) can be pub-
lished:

1. International Journal of Computer Vision and Image Processing.


2. International Journal of Emerging Technologies in Learning (IJET)
3. International Journal of Internet and Web Technology

4. International Journal of Embedded System.(IJES)

Review of conference/Journal papers supporting dissertation idea:

In paper [3], they used Silhouette Extraction for easily recognize back-
ground and remove its background by using depth image. Motion history
image to identify motion. History motion image will analyze the n number
of frames. System collect feature vector and based on that feature system will
train support vector machine and predict the action.
In paper [4] the human action recognition is done by using data gloves and
various other hardware parameter. In that paper the HNN algorithm is used
for the extraction of the features from the image but this system has more
limitatioin in case of changing environment.
According to Naidoo et al. [11] these systems are restricted to the employ-
ment of refined and typically costly devices. conjointly applications that use
glove primarily based analysis encounter an oversized vary of issues together
with reliable ness, accuracy and magnetic force noise .however as Fudickar
and Nurzynska [12] imply the less researched however conjointly being terri-
bly promising domain for additional development area unit systems that use
video camera for image capture only(without any further markers). The most
approaches of those kinds of real time systems were signing communication
supported recording, transferring and presenting video streams. However the
need of a high speed net facility thanks to the high quantity of information
being sent was the most barrier.
Plan of dissertation Execution:

4
MONTH WORK
0-2 Problem identification, Problem analysis
2-3 Literature survey
3-6 Formulation of objectives and system ar-
chitecture
6-8 Methodologies, Data processing and
analysis
8-10 Pre-test, results and discussion
10-12 Conclusion

Problem Statement:

The purposed system aims towards providing proper generation of text de-
scription from real time gesture from deaf people.

Solving Approach:

Unlike American Sign Language or British sign language, Indian sign lan-
guage does not contain a standard database that is available for use. Thus we
have created our own video database and proposed a system to recognize ges-
tures of sign language as of a video stream of the signer. The proposed system
consists of

1. Hand Detection:

The System set specific ROI after that check the given ROI contain hand or
not. If it recognize then we find the actual hand and crop that image because
finally we extract the feature from actual hand instead of extracting feature of
whole hand. That improves accuracy of the system.
2) Preprocess
This section consists preprocessing on video like frame extraction, noise
and blur elimination and edge detection. Video holds huge amount of data at
dissimilar levels in terms of sights, shots and surrounds. Thus to process on
video, first extract frames from video. These frames are nothing but images
that are used for further processing.

5
3) Feature Extraction

We extract the feature from hand. We use the technique like histogram
to find out feature. A histogram is a graphical representation of the distri-
bution of numerical data. It is an estimate of the probability distribution of
continuous variable (quantitative variable) and was first introduced by Karl
Pearson. To construct a histogram, the first step is to bin the range of val-
ues that is, divide the entire range of values into a series of intervals and then
count how many values fall into each interval. The bins are usually specified as
consecutive, non-overlapping intervals of a variable. The bins (intervals) must
be adjacent, and are usually equal size.

4) Classification

SVM classification is essentially a binary (two-class) classification tech-


nique, which has to be modified to handle the multiclass tasks in real world
situations. SVM classification uses features of image to classify. This classifica-
tion is uses trained video and classify testing video with description as output.

Outcomes:
The English text description is generated efficiently from hand gesture recog-
nition by SVM classification. The process uses gesture recognition, noise de-
tection and feature extraction methods to provide output of visual content.
Mathematical model:

V - Video (video of the signer).


Process:

1. Step1: Convert Video into image frames.


2. Step2: Convert RBG (Red, Green, and Blue) color video into Gray scale
video by eliminating the hue and saturation information and retaining
the luminance
3. Step3: Apply Gaussian Filter for noise and blur elimination.
4. Step4: Apply Image segmentation by performing edge detection algo-
rithm and perform training.

Algorithm for SVM classifier


Input: - Training Dataset
output :- class label

Algorithm
1) Compute Score of input vector
2) Kernel function (Radical basis function)

6
3) Class y = -1 when output of scoring function is negative.
4) Class y = 1 when output of scoring function is positive.

Parameter
Xi ith value of input vector
Yi ith value of class label
i is the coefficient associated with the i th training dataset
b- scaler value
Let S be the whole system consists of
S= {LV, DI, TE, OP, ?}
Where,
LV: Live Video
TD: Trained Dataset
TE: Text in English
OP: Output
?: null/empty set

1. LV={VF, ? }

1. Where

VF: Video frame


? : null/empty set

1. TD={DV,DE, ? }

2.

Where
DV: Database of video
DE: Database of English text

1. OP= {EA, EW, ES, ? }

2.

Where,
EA: ENGLISH text Alphabet
EW: ENGLISH text Word
ES: ENGLISH text Sentences
References:

[1].G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, Siming Li, Y. Choi, A. C. Berg,


and Tamara L. Berg, BabyTalk: Understanding and Generating Simple Image
Descriptions, IEEE Trans on pattern analysis and machine intelligence, vol.
35, no. 12, Dec 2013.

7
[2].N. Krishnamoorthy, G. Malkarnenkar, R. Mooney, K. Saenko, and S. Guadar-
rama, Generating Natural-Language Video Descriptions Using Text-Mined
Knowledge, 2013
[3] X. Sun, M. Chen, and A. Hauptmann, Action Recognition via Local De-
scriptors and Holistic Features, in
IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL,USA,
2009, pp. 5865.
[4] A.-P. Ta, C. Wolf, G. Lavoue, A. Baskurt, and J. M. Jolion, Pairwise Fea-
tures for Human Action Recognition, in International Conference on Pattern
Recognition
, Istanbul, Turkey , 2010, pp. 32243227.
[5]Ayushi Gahlot, J. Purvi Agarwaland,Akshya Agarwal, Skeleton based Hu-
man Action Recognition using Kinect, Recent Trends in Future Prospective in
Engineering & Management Technology 2016 .
[6].Chang, C., and Lin, LIBSVM: a library for support vector machines, ACM
Transactions on Intelligent Systems and Technology (TIST) 2(??):27, 2011.
[7].De Marneffe, M. MacCartney, B. and Manning, Generating typed depen-
dency parses from phrase structure parses, In Proceedings of the Interna-
tional Conference on Language Resources and Evaluation (LREC), volume 6,
449454, 2006.
[8].Ding, D. Metze, F. Rawat, S. Schulam, P. Burger, S. Younessian, E. Bao, L.
Christel, M. and Hauptmann, Beyond audio and video retrieval: towards mul-
timedia summarization, In Proceedings of the 2nd ACM International Confer-
ence on Multimedia Retrieval, 2012.
[9].Farhadi, A. Hejrati, M. Sadeghi, M. Young, P. Rashtchian, C. Hockenmaier,
J. and Forsyth, D., Every picture tells a story: Generating sentences from im-
ages, Computer VisionEuropean Conference on Computer Vision (ECCV)
1529, 2010.
[10]Felzenszwalb, P. McAllester, D. and Ramanan, D., A discriminatively trained,
multiscale, deformable part model, In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 18, 2008.
[11] Naidoo, S., Omlin, C.W. and Glaser, M. (2002). Vision-Based Static Hand
Gesture Recognition Using Support Vector Machines. Proceedings of Southern
Africa Telecommunication Networks and Applications Conference.
[12] Fudickar, S. and Nurzynska, K., (2007). A User-Friendly Sign Language
Chat. In: Proceedings of the Conference ICL2007. Villach, Australia. 26-28
September 2007.
[13].Laptev, I., and Perez, P., Retrieving actions in movies, In Proceedings
of the 11th IEEE International Conference on Computer Vision (ICCV), 18,
2007.
[14].Laptev, I. Marszalek, M. Schmid, C. and Rozenfeld, B., Learning realistic
human actions from movies, In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 18, 2008.
[15].Lee, M. Hakeem, A.; Haering, N. and Zhu, S., Save: A framework for
semantic annotation of visual events, In IEEE Computer Vision and Pattern
Recognition Workshops (CVPR-W), 18, 2008.

8
[16].Li, S. Kulkarni, G. Berg, T. Berg, A. and Choi, Y., Composing simple image
descriptions using web-scale n-grams, In Proceedings of the Fifteenth Confer-
ence on Computational Natural Language Learning (CoNLL), 220228, Asso-
ciation for Computational Linguistics (ACL), 2011.

Name of the student with sign Name of the guide with sign
.

9
Chapter 2

TECHNICAL KEYWORDS

Image Processing
To accomplish the recognize gestures of sign language from a video stream,
powerful image processing techniques such as frame differencing based track-
ing, edge detection, wavelet transform, image fusion techniques to segment
shapes in our videos are used.
Machine Learning
Machine learning is the subfield of computer science that gives computers
the ability to learn without being explicitly programmed. Machine learning is
closely related to (and often overlaps with) computational statistics, which also
focuses in prediction-making through the use of computers. It has strong ties
to mathematical optimization, which delivers methods, theory and application
domains to the field. Machine learning is sometimes conflated with data min-
ing, where the latter subfield focuses more on exploratory data analysis and is
known as unsupervised learning. Machine learning can also be unsupervised
and be used to learn and establish baseline behavioral profiles for various en-
tities.

Gesture Recognition
Gesture recognition is the most generally used modality among other com-
munication modalities in human computer interaction. Gesture recognition is
a matter in computer science and language technology with the aim of inter-
preting human gestures via mathematical algorithms. Gestures can originate
from any physically motion or state but normally originate as of the face or
hand.

Video Processing
Video processing is an exacting case of signal processing, which frequently
employs video filters and where the input and output are video files or video
streams. This consist process like scaling the video for reducing data and so
saving the processing time, section matter from a video sequence etc.

10
Chapter 3
INTRODUCTION

2 3.1 Dissertation Idea


Activity recognition aims to recognize the actions and goals of one or more
agents from a series of observations on the agents actions and the environmen-
tal conditions. Since the 1980s, this research field has captured the attention
of several computer science communities due to its strength in providing per-
sonalized support for many different applications and its connection to many
different fields of study such as medicine, human-computer interaction, or so-
ciology.
In Image processing, we take input as image later we perform processing
on that image based on the requirement. We take many types of input in image
processing like video, image or collect frames from video and after output is
produced in the form of an image or set of parameter related to image. Image
Processing purpose we need to convert image in digital form after we process
that image according our needs. We use image processing for improve the
image quality and gather useful information in the image this process is called
as feature extraction.
This thesis addresses the problem of action recognition, i.e., how to deter-
mine the type of action that is happening in a video. We consider the problem
of video representationhow to encode videos in a robust way, such that the
representation is suitable for a wide variety of action classes, tasks and video
types.

3 3.2 Literature Survey


The various approaches for recognizing human activities are classified into
Single Layered Approaches and Hierarchical Approaches [4]. Single-layered
approaches are further classified into two types depending on how they model
human activities: i.e., space time approaches and sequential approaches. Space-
time approaches view input as a 3-D(XYT) volume, while sequential approaches
view and interpret the input video as a sequence of observations. Space time
approaches are further divided into three categories based on the features they
use from the 3-D space-time volumes: space-time volumes, trajectories and lo-
cal interest point detectors. Sequential approaches are classified depending on
whether they use exemplar-based recognition methodologies or model-based
recognition methodologies. In single layered approaches, each activity corre-
sponds to a cluster containing image sequences for that activity. These clusters
are categorized into classes each having some denite property, so when an in-
put is given to the system various algorithms and methodologies like neighbor

11
based matching [5], template matching [6], statistical modelling [7] algorithm
are applied to categorize the input activity into its appropriate class.
Hierarchical Approaches works on the concept of divide and conquer; in
which any complex problem can be solved by dividing it into several sub prob-
lems. The sub-activities are used to identify the main complex activity. These
approaches are classified on the basis of the recognition methodologies they
use: statistical approach, syntactic approach and description-based approach.
Statistical approaches construct statistical state-based models like layered Hid-
den Markov Model (HMM) to represent and recognize high level human activ-
ity.
Ties [8]. Similarly syntactic approach use a grammar syntax such as a
stochastic context free grammar (SCFG) to model sequential activities [9]. Description-
based approach represent human activities by describing sub-events of the ac-
tivities and their temporal, spatial and logical structures [10]. Fig. 1.1 sum-
marizes the hierarchical approach based taxonomy of the approaches used in
human activity recognition.
Ke et al. [11] used segmented spatio temporal volumes to model human
activities, their system applied a hierarchical mean-shift to clusters of similar
colored voxels, and obtain several segmented volumes. The motivation is to
and the actor volume segments automatically and to measure their similarity
to the action model. Recognition is done by searching for a subset of over-
segmented spatio temporal volumes that best matches the space of the action
model. Their system recognized simple actions such as hand waving and box-
ing from the KTH action database.
Laptev [12] recognized human actions by extracting sparse spatio temporal
interest points from videos. They extended the local feature detectors (Har-
ris) commonly used for object recognition, in order to detect interest points
in space-time volume. Motion patterns such as change in direction of object;
splitting and merging of an image structure; and collision/bouncing of object
are detected as a result. In their work, these features were used to distinguish
a walking person from complex backgrounds.
Bobick and Davis [13] constructed a real time action recognition system
3 using template matching. Instead of maintaining the 3-dimensional space-
time volume of each action, they represented each action with a template com-
posed of two 2-dimensional images: binary Motion Energy Image and scalar
Motion History Image. The two images are constructed from a sequence of
foreground images, which essentially are weighted 2-D projections of the orig-
inal 3-D (XYT) volume. By applying a traditional template matching technique
to a pair of (MEI;MHI), their system was able to recognize simple actions like
sitting, arm waving and crouching. However the MHI method suers from a
serious drawback that when self-occluding or overwriting actions are encoun-
tered it leads to severe recognition failure [14]. This failure results because
when repetitive action is performed the same pixel location is accessed mul-
tiple times due to which the previously stored information in the pixel gets
overwritten or deleted by the current action. In order to address this issue
we implemented a novel technique for creating motion history images that are

12
capable of representing self-occluding and overwriting action. Our method-
ology overcomes the limitation of representing repetitive activities and thus
outshines the conventional MHI method.

4 3.3 Motivation
Design objectives and goals are:

1. Get better result from provided test video.


2. Improve recognition rate even environment affects.
3. Reduced execution time.

Chapter 4

PROBLEM DEFINITION AND SCOPE

5 4.1 Problem Definition


Many researchers have contributed innovative algorithms and approaches in
the area of human action recognition system and have conducted experiments
on individual data sets by considering accuracy and computation. In spite of
their efforts, this field requires high accuracy with less computational com-
plexity. The existing techniques are inadequate in accuracy due to assump-
tions regarding clothing style, view angle and environment. Hence, the main
objective of this thesis is to develop an efficient multi-view based human action
recognition system using hog features.

4.1.1 Statement of Scope


In this project the area of human action recognition is closely related to
other lines of
research that analyze human motion from images and video.

4.1.2 Software Context


The system context is implicit instead of being explicitly distinct as part
of project initiation or requirements gathering. In this section we represent
elements used for software development. Action Recognition system uses Web
camera video as input and interacts with database server through user.

4.1.3 Major Constraints


This application is developed into Java language thus it may require more
time for processing any video. Another main concept is that the testing will
give more accurate result only if the similar videos are stored into database.

13
4.1.4 Approaches for Solving the Problem and Efficiency Issues
While capturing image that image contain extra noise so accuracy will be
vary. We used median filter to remove noise in capture image. The system
work on real time so we used faster techniques to reduce computational time.

4.1.5 Outcome
Outcome from this project is the capability to use application for deaf peo-
ple so that they will communicate.

1. (a) i. Hardware Resources Required

Table 4.1: Hardware Resources Required


Processor Core i3
RAM 8 GB
Hard Disk 1 TB
Keyboard Standard 102
keys
Mouse 3 buttons

1. (a) i. Software Resources Required

Table 4.2: Software Resources Required


Front End Java (swing)
jdk 8
Back end WampServer
Tools Used Eclipse
Operating System Windows 7

1. (a) i. Area of Dissertation

The area of dissertation is image processing and video processing. Video is


split into frames by using video processing and we apply algorithm on image
by using image processing.

14
Chapter 5

DISSERTATION PLAN

Dissertation planning is a particular kind of academic task. It is a small


write-up for showing that what did for analyze the existing and proposed sys-
tem, literature study, planning for design and implementation including study
of project flow. Plan may categorized in different sections such as implementa-
tion plan, effort estimation plan, phase description plan as sown in Table 5.1,
Table 5.2, Table 5.3 respectively.

7 5.1 Implementation Plan


In implementation plan the overall planning of working days are explained.
Table 5.1: Implementation plan
Task Effort
weeks
Analysis of existing systems pro- 3 weeks
posed System
Literature survey 3 weeks
Designing and planning 4 weeks
System flow 4 weeks

9 5.3 Phase Description


Table 5.3: Phase Description
Phase Task Description
Phase 1 Analysis Analyze the data related to the re-
search issue statement.
Phase 2 Literature sur- Collect unprocessed information
vey and elaborate on literature reviews.
Phase 3 Design Allocate the module as well as de-
sign the process flow control.
Phase 4 Simulation Prepare simulation studies based on
proposed aims and objectives.
Chapter 6

15
SOFTWARE REQUIREMENT AND SPECIFICATION

10 Project Scope
The scope of the project is to provide a platform for dumb individuals to share
their views among every one. Analyze human motion from images and video
and developing application like Facebook where dumb people communicate
with each other.

1. (a) Architectural Design

Concept that focuses on the components or elements of a structure or sys-


tem and unifies them into a coherent and functional whole, according to a
particular approach in achieving the objective(s) under the given constraints
or limitations. A block diagram is a specialized, high-level flowchart used
in engineering. It is used to design new systems or to describe and improve
existing ones. Its structure provides a high-level overview of major system
components, key process participants, and important working relationships.
In fig 6.1, block diagram functions used for implementation are represented.
Video processing, image processing, feature extraction and action detection
using SVM classifier are functions used for application implementation.

16
Figure 6.1: Block DiagramA Methodology for Sign Language Video Translation
into Textual Version in Hindi
Department of Computer Engineering 30
A Methodology for Sign Language Video Translation into Textual Version in
Hindi
Department of Computer Engineering 2
6.3 Package Diagram

Package diagram s UML structure diagram which shows packages and de-
pendencies between the packages. Model diagrams allow to show different
views of a system, for example, as multi-layered (aka multi-tiered) application
- multi-layered application model. Following fig. 6.2 package diagram consist
overall representation of layers used in implementation. User communicate
with system using provided GUI and presentation logic. Then business layer
in system consists all knowledge require in business point of view and it con-
sist all business entities like identify action. Afterword, Data access layer used

17
for storing information of features which is collected in Business layer.

Figure 6.2: Package Diagram


6.4 Deployment Diagram
Deployment diagram is a structure diagram which shows architecture of
the system as deployment (distribution) of software artifacts to deployment
targets. Artifacts represent concrete elements in the physical world that are the
result of a development process. As shown in fig. 6.3, User, Video Recognition
system, local server nodes are configuring for this application. Node represents
the operation performs at nodes.

18
Figure 6.3: Deployment Diagram

11 6.5 Data Design


A data object is a part of the repository whose content can be addressed and
interpreted by the program. All data objects must be declared in the ABAP
program and are not persistent, meaning that they only exist while the pro-
gram is being executed.

6.5.1 Internal Software Data Structure


For project implementation we collect videos in which action are shown by
any user. Internally this video is processed out and converted into frames or
set of images. Then these images are used to detect action.

6.5.2 Global Data Structure


For implementation we use Eclipse IDE as software tool for code develop-
ment. We are not using any global data structures.

6.5.3 Temporary Data Structure


We will be using Http Session in java which can be used for storing saved
images in cookies. All temporary data store in cookie only.

6.5.4 Database Description


We are using Mysql database for storing information. First of all we create
table for feature and store all feature collected from video for training and
testing purpose. We Use SQL language to get data and store data in mysql.

12 6.5.5 Component Design


The objective of this design is to transform the design model into functional
software. To achieve this objective, the component-level design represents -the
internal data structures and processing details of all the software components

19
(defined during architectural design) at an abstraction level, closer to the ac-
tual code. In addition, it specifies an interface that may be used to access the
functionality of all the software components.

Interaction Diagram
A collaboration diagram, also called a communication diagram or inter-
action diagram, is an illustration of the relationships and interactions among
software objects in the Unified Modeling Language (UML).In this application
User and System are two objects, user interact with system by giving action
video as input to perform some operations like frame generation etc. as illus-
trated in fig. 6.4.

Figure 6.4: Interaction Diagram


A Methodology for Sign Language Video Translation into Textual Version in
Hindi
Department of Computer Engineering 44

20
A Methodology for Sign Language Video Translation into Textual Version in
Hindi
Department of Computer Engineering 27

1. Software Interface Description

A software interface designed to standardize and simplify the use of com-


puter programs, as by using a mouse to manipulate text and images on a dis-
play screen featuring icons, windows, and menus. Also called GUI. We use in
build java toolkit for GUI.

1. External Machine Interfaces

There is no any external machine interface is use in our project. But if


we capture videos using webcam then and then only we require configuring
webcam machine with our PC.

1. External System Interfaces

There is no requirement of external software for implementation as we use


Eclipse tool kit that contains most of inbuilt plugins for use if required.

1. Human Interface

This application is use by any user using which normal people can commu-
nicate with defeat people or vice versa.

1. Restrictions, Limitations, and Constraints

There are some limitations which are mentioned as follows:

1. System uses wamp server for deal with database.


2. The data for testing can give better result only if same or similar video is
already trained and stored in database.
3. The information store will be a MySql database.

21
Chapter 7

DETAILED DESIGN DOCUMENT


(HIGH LEVEL DESIGN)

13 7.1 Introduction
To develop any of the system the design phase of the system plays a vital role.
The system design gives central theme of the system which is going to be de-
velop. So the system design and documentation is very important for devel-
oper to start the work.
7.2 Overview of System
Video is collection of video frames. We collect frames from video later we
use for processing. We need to analyze the frame and based on that frame ac-
tion is identified. Preprocessing purpose we remove the blur from image for
improve result. Feature extraction can be done using HOG algorithm and we
train the support vector machine by using feature collected from HOG algo-
rithm. In real time web camera to get image and that image to extract HOG
feature later that feature used for test the support vector machine. Based on the
training support vector machine predict the class label as output.7.1: System
Architecture

22
Fig.7.1 Overview of system
The overall systems perform preprocessing, Feature extraction, Classifica-
tion, detecting the hand gesture and finally generating text description. The
proposed system consists of two major phases training and testing.

1. Training Phase:

In training module the images extracted from the captured video and are
trained by using SVM after that stored in the database by assigning class la-
bel. Figure 1 shows the overview of the system in the training section. All the
trained images are used to extract features and which are further used for test-
ing. Firstly, through live video the different frames are captured since a video
is nothing but a set of images. Then the training is performed on that captured
frames. After that, every Image is processed by filtering technique (noise re-
moval, edge detection or shape detection) and applying Histogram Oriented
Gradient (HOG) algorithm is used for feature Extraction. HOG algorithm de-
fines the objects (hand) and motion shapes in the images by describing the
intensity gradient and edge detection. After that an gray scale image is gener-
ated. This gray image used as input. The output is a list of points on the image

23
each associated to a vector of low-level descriptors. These points are said key
points and their descriptors are invariant by rescaling, in-plane rotating, and
noise addition and in some cases by changes of illuminant. The gesture cap-
tured from the images are used to generate exact meaning and are ranked in
English language. Thus whatever done in training is hand gesture and hand
motion are used to create exact meaning Thus in training section, meaning of
each gesture are insert into database.

1. Testing Phase:

This module test live video and gets the result in terms of segmentation
of frames. In this phase, a video is processed and divided into frames and
these frames are further processed by applying the purifying algorithm to re-
move noise from images. Median blur technique is used to filter image. The
lower part of figure 1 shows the testing phase. After elimination of noise, the
features of images are extracted and these features are linking with training
videos to recognize text. The prosed system under goes following step to yield
the desired result.

1. (a) i. Usage Scenario

A scenario is a tool used during requirements analysis to describe a specific


use of a proposed system. Scenarios capture the system, as viewed from the
outside, e.g., by a use.

1. (a) i. User Profiles

A user profile is a visual display of personal data associated with a specific


user, or a customized desktop environment. A profile refers therefore to the
explicit digital representation of a persons identity. A user profile can also be
considered as the computer representation of a user model.

1. (a) i. Use Cases

In software and systems engineering, a use case is a list of actions or event


steps, typically defining the interactions between a role (known in the Unified
Modeling Language as an actor) and a system, to achieve a goal. The actor can
be a human or other external system.

1. (a) i. Use Case View

In this Fig.7.2, use cases of user are shown. In our project user takes input
for training or testing purpose. Then he/she perform video frame extraction,
image processing, feature extraction and for testing perform feature matching
on testing video.

24
Figure 7.2: Use Case

14 7.3 ER Diagram
ER-modeling is a data modeling technique used in software engineering to pro-
duce a conceptual data model of a information system. Diagrams created using
this ER-modeling technique are called Entity-Relationship Diagrams, or ER di-
agrams or ERDs. Following ER-diagram in fig. 7.3 shows that user is an entity
who handles application and performs some operations as shown in diagram.

25
Figure 7.3: ER Diagram

1. (a) i. DFD Level (0 / 1 / 2) Diagram

A data flow diagram (DFD) is a graphical representation of the flow of


data through an information system, modelling its process aspects. A DFD is
often used as a preliminary step to create an overview of the system, which can
later be elaborated.

1. (a) i. DFD Level 0 Diagram

DFD0 is initial system representation in which input to system and process


on that input are define. According to our application input of system is sign
video and this video is processed to detect text as illustrated in fig 7.4.

26
Figure 7.4: DFD 0 Diagram

7..3.3 DFD Level 1 Diagram


As in fig. 7.5, DFD1 is a next flow of system in which video process and
divided into frames i.e. images and further images are process out to extract
features from those images.

Figure 7.5: DFD 1 Diagram


7.3.4 DFD Level 2 Diagram

After feature extraction these features are match with trained video fea-
tures from database to detect action. Following fig. 7.5 shows DFD2 diagram
for this application.

Figure 7.6: DFD 2 Diagram


7.4 Activity Diagram
Activity diagrams, which are related to program flow plans (flowcharts),
are used to illustrate activities. In the external view, we use activity diagrams
for the description of those business processes that describe the functionality
of the business system. Contrary to use case diagrams, in activity diagrams it
is obvious whether actors can perform business use cases together or indepen-
dently from one another. Figure 7.2 illustrates the overall flow of application
having two phases training and testing.

27
Figure 7.2 Activity Diagram

15

16 7.5 Class Diagram


In software engineering, a class diagram in the Unified Modeling Language
(UML) is a type of static structure diagram that describes the structure of a sys-
tem by showing the systems classes, their attributes, operations (or methods),
and the relationships among objects. This fig.7.8 class diagram represents the
classes and functioning of those classes. In this project there are mainly three
classes and database in which use to store user private data.

28
Figure 7.8: Class Diagram

17 7.6 Non Functional Requirements


In systems engineering and requirements engineering, a non-functional re-
quirement is a requirement that specifies criteria that can be used to judge
the operation of a system, rather than specific behaviors. They are contrasted
with functional requirements that define specific behavior or functions.

7.6.1 Performance Requirements


The performance of the system is calculate or judged on its quality of clas-
sification of actions. The recognition rate is defined as the ratio of the number
of properly classified action to the total number of action.
7.6.2 Safety Requirements
Video must be accurately classified. The results should be genuine and with
less false positive, true negative and false negative rates.

7.6.3 Security Requirements


The firewall must allow the incremental data to receive over the network if
required.
7.6.4 Software Quality Attributes

1. Availability

The system must always be available for action recognition every time.

29
1. Modifiability

Software is designed in an extensible way such that updates to any func-


tionality could be done by adding various novel procedures and modules. Loose
coupling of components is tried to achieve.

1. Performance

Performance of this system depends upon nature and size of input data. Per-
formance is in accuracy. Error is also a term that can be used as a performance
measure which tells us how many instances are misclassified.

1. Security

Security of the system is an important issue. We use login details for an-
alyzing authenticate user. This system itself is related to error in software.
Security is achieved by applying several constraints.

1. Testability

The action database can be upgraded to add new action into the database.
The system can now be tested with the action present in the database using
SVM Classifier.

1. Usability

The role of Action recognition systems in the society is to ensure that deaf
people have equality of opportunity and full participation in society.

18

19 7.7 Design Constraints


Design constraints are the

1. Software Interface Description

The data mining systems such as this product interact via stream data. The
major component for the interface consists of Java virtual machine and inte-
grated development environment (IDE). System design takes into considera-
tion how the system will look like to the end user.

30
1. External Machine Interfaces

External is used to designate that other system (Software or hardware) for


software development. There is no any external machine interface is use in our
project.

1. External System Interfaces

There is no requirement of external software for implementation as we use


Eclipse IDE that contains most of inbuilt plugins for use if required.

1. Human Interface

The human machine interface or user interface is the part of the machine
that handles the interaction between human and machine. Any user can use
this application as external entity by giving dataset as input.

1. Validity Criteria

In product development and process optimization, a requirement is a sin-


gular documented physical and functional need that a particular design, prod-
uct or process must be able to perform. It is most commonly used in a formal
sense in systems engineering, software engineering, or enterprise engineering.
It is a statement that identifies a necessary attribute, capability, characteristic,
or quality of a system for it to have value and utility to a customer, organi-
zation, internal user, or other stakeholder. A requirement specification (often
imprecisely referred to as the spec, because there are different sorts of spec-
ifications) refers to an explicit set of requirements to be satisfied by a mate-
rial, design, product, or service. After the integration of the software, several
high priority test cases are executed for validating the software which must be
verified thoroughly. To meet the designated behavioral, functional and per-
formance criteria validation testing is the necessary phase. The function or
performance characteristics of the established software are illustrated using
validation testing.

31
Chapter 8

CONCLUSION AND FUTURE ENHANCEMENT

This paper has introduced the English text generation from the sign lan-
guage. The process uses gesture recognition image preprocessing and feature
extraction. Each video splits into frames at one-second intervals and the fil-
tering, shape detection techniques are applied on every frame. The HOG algo-
rithm is used for mining the features and these features are used for compari-
son of testing with the training video.
REFERENCES
[1] M. S. Ryoo and J. K. Aggarwal, Spatio-Temporal Relation- ship Match:
Video Structure Comparison for Recognition of Complex Human Activities,
in
International Conference on Computer Vision , Kyoto, Japan, 2009, pp. 1593
1600.
[2] X. Sun, M. Chen, and A. Hauptmann, Action Recognition via Local De-
scriptors and Holistic Features, in
IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL,USA,
2009, pp. 5865.
[3]Ayushi Gahlot, J. Purvi Agarwaland,Akshya Agarwal, Skeleton based Hu-
man Action Recognition using Kinect, Recent Trends in Future Prospective in
Engineering & Management Technology 2016 .
[4] Di Wu, Ling Shao Silhouette Analysis-Based Action Recognition Via Ex-
ploiting Human Poses, IEEE Transactions on Circuits and Systems for Video
Technology ( Volume: 23, Issue: 2, Feb. 2013 )
[5] Ting Liu, Mojtaba Seyedhosseini, and Tolga Tasdizen, Image Segmenta-
tion Using Hierarchical Merge Tree, IEEE TRANSACTIONS ON IMAGE PRO-
CESSING, VOL. 25, NO. 10, OCTOBER 2016.
[6] Chengcheng Jia, Yun Fu, Low-Rank Tensor Subspace Learning for RGB-D
Action Recognition , IEEE Transactions on Image Processing
[7] ]Ayushi Gahlot, J. Purvi Agarwaland,Akshya Agarwal, Skeleton based Hu-
man Action Recognition using Kinect, Recent Trends in Future Prospective in
Engineering & Management Technology 2016 .
[8] M. S. Ryoo and J. K. Aggarwal, Spatio-Temporal Relation- ship Match:
Video Structure Comparison for Recognition of Complex Human Activities,
inInternational Conference on Computer Vision , Kyoto, Japan, 2009, pp. 1593
1600.
[9] Fudickar, S. and Nurzynska, K., (2007). A User-Friendly Sign Language
Chat. In: Proceedings of the Conference ICL2007. Villach, Australia. 26-28
September 2007.
[10] Amit kumar and Ramesh Kagalkar Advanced Marathi Sign Language
Recognition using Computer Vision, International Journal of Computer Ap-
plications (0975 8887) Volume 118 No. 13, May 2015.

32
[11] Ramesh M. Kagalkar and Nagaraja H.N, New Methodology for Transla-
tion of Static Sign Symbol to Words in Kannada Language, International Jour-
nal of Computer Applications (0975 8887) Volume 121 No.20, July 2015.
[12] Ramesh M. Kagalkar, Dr. Nagaraj H.N and Dr. S.V Gumaste, A Novel
Technical Approach for Implementing Static Hand Gesture Recognition, In-
ternational Journal of Advanced Research in Computer and Communication
Engineering(ISSN (Online) 2278-1021 ISSN (Print) 2319-5940), Vol. 4, Issue 7,
July 2015.

33

S-ar putea să vă placă și