Sunteți pe pagina 1din 6

2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)

A Comprehensive Leap Motion Database for Hand


Gesture Recognition

Safa AMEUR Mohamed Salim BOUHLEL


Anouar Ben Khalifa Sciences of Electronic,
Sciences of Electronic, Technologies of Information and
Research Unit of Advanced Systems
Technologies of Information and Telecommunication, SETIT,
in Electrical Engineering, ENISO
Telecommunication, SETIT, University of Sfax
University of Sousse
University of Sfax Sfax, Tunisia
Sousse, Tunisia
ENISo, University of Sousse medsalim.bouhlel@enis.rnu.tn
anouar.benkhalifa@eniso.rnu.tn
Sousse, Tunisia
Safa.Ameur@setit.rnu.tn

Abstract—The touchless interaction has received considerable physical contact between the man and the machine,
attention in recent years with benefit of removing the burden of specifically the computer.
physical contact. The recent introduction of novel acquisition
devices, like the leap motion controller, allows obtaining a very The technology can prove to be interesting for applications
informative description of the hand pose and motion that can be requiring interacting with the computer in particular
exploited for accurate gesture recognition. In this work, we environments, like an operating room where sterility causes a
present an interactive application with gestural hand control big issue. The system is designed to allow a touchless HMI, so
using leap motion for medical visualization, focusing on the that surgeons will be able to control medical images during
satisfaction of the user as an important component in the surgery.
composition of a new specific database.
This paper presents a study of hand gesture recognition
In this paper, we propose a 3D dynamic gesture recognition through the extraction, processing and interpretation of data
approach explicitly targeted to leap motion data. Spatial feature acquired by the LM. This leads us to design recognition and
descriptors based on the positions of fingertips and palm center classification approaches to develop a gesture library useful to
are extracted and fed into a support vector machine classifier in the required system control.
order to recognize the performed gestures. The experimental
results show the effectiveness of the suggested approach in the
We introduce several novel contributions. We collected a
recognition of the modeled gestures with a high accuracy rate of new dataset formed by specific dynamic gestures related to
about 81%. recommended commands, and then we created our own data
format.We propose a three-dimensional structure for the
Keywords—Hand gesture recognition; Touchless interaction; combination of spatial features like the arithmetic mean, the
Leap motion; Support vector machine standard deviation, the root mean square and the covariance,
in order to effectively classify dynamic gestures.
I. INTRODUCTION The paper is organized in the following way. Section II
describes the LM device. Section III represents the related
The user has dreamt for a long time to interact with a more works. Section IV describes our database. Section V
natural and intuitive machine rather than with conventional introduces the general pipeline of the proposed approach, in
means. For this reason, the gesture is the richest means of which, the first subsection presents the main feature
communication that can be used for human computer descriptors extracted from the dataset, and the second
interaction [2]. subsection describes the classifying algorithm. The
In the general context of Human-Machine Interaction, new experimental results are shown and discussed in section VI.
interfaces have been born. These provide a more natural The last part draws the conclusions.
interaction and a transparency of the computer tool. We cite,
for example, the gestural interfaces [12], which propose to II. LEAP MOTION DEVICE
improve the Human-Machine Interaction (HMI) using the
recognition of user gestures to enter commands. In this The LM is a compact sensor released in July 2013 by the
context, we hear a lot about the gestural controller "Leap Leap Motion Company [1]. The device has a small dimension
Motion" (LM). It offers a virtual interface to overcome any of 80 x 30 x 12.7 mm. It has a brushed aluminum body with a
black glass on its top surface, which hides three infrared
LEDs, used for scene illumination, and two CMOS cameras

978-1-5090-4712-3/16/$31.00 ©2016 IEEE 514


2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)

spaced 4 centimeters apart, which capture images with a frame recognition accuracy in gesture and handwriting on the fly.
rate of 50 up to 200 fps. This depends on whether USB 2.0 or The acquired input data were treated as a time series of 3D
3.0 is used. The LMC allows a user to get information about positions and processed utilizing the dynamic time warping
objects located in a device’s field of view (a format similar to algorithm. In [7], an Indian sign language recognition system
an inverted pyramid, whose lower area length measures 25mm was developed with both hands using LM sensor. The
and the top one 600mm, with 150° of field of view). positional information of five-finger tips along with the center
of the palm for both hands was used to recognize the sign
The LMC Software Development Kit (SDK) (available for posture based on the Euclidean distance and the cosine
C++, Java, Objective- C, C#, Python, Javascript, etc.) can be
similarity. Likewise, in [8] the authors put forward a novel
utilized to develop applications that exploit the capabilities of method of pattern recognition to recognize symbols of the
this device, compatible with Windows and OS X operating Arabic sign language. The scheme extracted meaningful
systems. It is designed to provide real-time tracking of hands characteristics from the data, such as angles between fingers,
and fingers in a three-dimensional space with 0.01-millimeter to achieve a high-accuracy, which utilized a classifier to
accuracy, according to the manufacturer. The positions of the decide which gesture was being performed. In addition, a
hands and fingertips are detected in coordinates relative to the study presented in [10], described a database containing the
center of the controller, taking as reference the right-handed three-dimensional motion trajectories of the numbers and the
coordinate system and millimeters as the unit. Several gestures alphabet (36 gestures in total) which were captured by an LM
can be natively identified by the LM such us swipe gesture, for a rapid recognition of dynamic hand gestures. They used
circle gesture, key tap gesture and screen tap gesture. the SVM and the hidden markov model algorithms for
Another positive feature is its affordability. It costs classification. Moreover, Marin et.al in [3] utilized the LMC
approximately 80 $ (~ 160 TND), which collaborates to its and the Kinect devices to recognize the American manual
popularity. Therefore, the LM Motion has good performance, alphabet. A static gesture database is available online at
a high precision and a minimal amount of space to be an (http://lttm.dei.unipd.it/downloads/gesture) based on
adequate solution to opt for. fingertips’ positions and orientations. These features were then
fed into a multi-class support vector machine (SVM) classifier
to recognize their gestures. Depth features from the Kinect
were also combined with features from LMC to improve the
recognition performances. They only focused on static
gestures rather than dynamic ones.
Authors in [11] built two dynamic hand gesture datasets
with frames acquired with a LMC: the LeapMotion-Gesture3D
dataset and the Handicraft-Gesture dataset. The feature vector
with depth information is computed and fed into the Hidden
Fig. 1. LMC with micro–USB plug
Conditional Neural Field classifier to recognize dynamic hand
gestures.
III. EXISTING DATABASES In the medical field, the LMC offers a touchless interaction
Researchers have started to analyze the LM Controller system that allows the medical staff to interact directly with
(LMC) performances after its first release in 2013. A first digital devices that visualize digital images in a sterile
study of the accuracy and robustness of the LMC was environment. In this context, a usability evaluation study of a
presented in [5]. An industrial robot with a reference pen natural interaction system into the operating room conducted
allowing suitable position accuracy was used for the on the RISO system, (the acronym in English stands for
experiment. The results showed a deviation between the “Image Recognition in Operating Room”) was described in
desired 3D position and the average measured positions below [10]. The interaction with the medical images would occur
0.2 mm for static setups and of 1.2 mm for dynamic ones. through the LM by the mean of nine gestures forming their
database.
To improve the human computer interaction in different
fields through the LMC, we opt for recognizing the utilizer We propose a more developed hand gesture recognition
gestures to enter commands. system in the medical field, which not only provides static-
gesture recognition but also dynamic gesture ones. Only the
A study for TV control described in [9], during which 18
LMC is required to collect a new database formed by specific
participants contributed by free-hand gesture commands in 21
hand gestures most are useful for a sterile control interface,
television control tasks. The authors released their dataset
which we will present in the next section.
online, consisting of 378 LM gestures described by fingertips
position, direction, and velocity coordinates. In [6] a database
formed by 12 gestures was collected from over 100 IV. DESCRIPTION OF OUR DATASET
participants and then used to train a 3D recognition model During the technical visits to hospitals, we discussed with
based on convolutional neural networks, which could some surgeons the usability of touchless interaction systems in
recognize 2D projections of a 3D space. the sterile field to set finally the indispensable commands in
The sign language is another field of study: An order to handle medical images. These commands were
investigation of the LMC in [4] showed its potential identified mainly on the basis of intuitiveness and

515
2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)

memorability. Each gesture is named as the action that it 6 Zoom in To zoom in on a


enables: “click” (for selection), “rotate” (clockwise or section of the
DICOM image
counterclockwise), “alter contrast” (increase or decrease), you should start
“zooming” (toggle between magnifying and normal view), the gesture by
”browse” (move the image to the left or the right), and “next opening your hand
and previous” (browse between images in a sequence). to choose a
particular area to
Since there are no publicly released datasets for 3D zoom in then you
dynamic gesture recognition, we have collected a gesture should close your
dataset using the LM sensor to evaluate the performance of the hand into pinch to
finally get a close
proposed approach. Our database contains 11 actions palm.
performed by 10 different subjects (3 men and 7 women). 7 Zoom out To zoom out on a
Only one participant is left-handed. All subjects performed section of the
five repetitions of each motion, giving about 550 separate DICOM image
samples. In addition, we recorded all repetitions in the same you should start
room while keeping the lighting conditions similar to that of a the gesture by a
closed hand then
surgical operating room .The following table shows all the you should open
gestures chosen for the control of medical images with a your hand, to
detailed description. finally get an open
palm
8 Move left Move the image
TABLE I. LIST OF DIFFERENT GESTURES USED DURING EXPERIMENTS to the left by a
closed hand and
n° GESTURE COMMAND DESCRIPTION
just show the
/
index finger to
ORDER
drag the selected
1 Click Select a particular
zone to the left
region or option
side
by a simple click
in the right item. 9 Move right Move the image
Just show two to the right by a
fingers (index and closed hand and
thumb) and hold just show the
the other fingers index finger to
drag the selected
2 Left rotation Rotate the image
zone to the right
to the left by
side
moving fingers
down and just 10 Previous Go to previous
show the index image by moving
finger to draw a an opened hand
circle in the from left to right
counter-clockwise
direction
3 Right rotation Rotate the image 11 Next Go to the next
to the right by image by moving
moving fingers an opened hand
down and just from right to left
show the index
finger to draw a
circle in the
clockwise
direction
4 Increase Increase the V. OVERVIEW OF THE PROPOSED APPROACH
contrast contrast of the
image’s color by
an open hand with The overall workflow of a 3D dynamic gesture recognition
a palm facing
system based on LMC is illustrated by Fig. 2. First, we should
down then move
the hand up 90 set up the programming environment to create a library of
degrees gestures based on the gathered hand attributes. Then a set of
5 Decrease Decrease the relevant features is extracted. Finally, a trained SVM classifier
contrast contrast of the is applied to the extracted features in order to recognize each
image’s color by performed gesture.
moving down (-90
degrees) a raised
hand initially
opened with a
palm facing down

516
2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)

Fig. 3 highlights the data acquired by the leap device used


in our experiments. We choose to extract the three coordinates
(X, Y, Z) of the six most dominant points in the hand, the
palm center and five fingertips processed respectively.
• Palm center: P = [(Xp1, Yp1, Zp1) ... (Xpn,
Ypn, Zpn)]. It is a vector that contains the positions
of the palm in a 3D space from the origin of the LM
coordinated system. Note that "n" is the number of
frames in each repetition.
• Fingertips Position: Fi, i = 1…5. Fi a
vector containing the 3D positions of each detected
finger. Fi = [(Xi1, Yi1, Zi1) ... (Xin, Yin, Zin)]. Note
that the device is able to associate each 3D position
to a particular finger by the detection of the finger
Fig. 2. Flowchart of suggested approach type. Thus, the vectors F1, F2, F3, F4 and F5 contain
respectivively the 3D position of the thumb, the
Once the technological solutions to be adopted are defined index, the middle, the ring and the pinky finger.
as described in section II. The LMC is designed to be placed
on a flat surface under the area where hand movements are This data is then grouped in matrices of six rows and n
performed. The user needs to connect the LMC by a USB to a columns:
computer to detect hand movements on top of it.
P 
The tracking data, which contain the palm and fingers’ F 1 
 
position, direction, velocity, can be accessed using its SDK. F 2
The LM company has kept updating their SDK after the first [A i , j , k ]=   (1 )
F 3 
release. In this study, we use the V2 desktop for building the F 4
application with standard tracking. Currently, we are utilizing  
the last version SDK V2.3.1 available online at [1], which  F 5 
does not provide gesture recognition. We are trying to with: i = 1,...,10; number of subject
implement the gesture recognition by ourselves.
j = 1,...,11; number of gesture
A. Feature Extraction
The hand attributes are gathered directly from the LMC k = 1, ..., 5; number of repetition
that provides a very important datum for the recognition of the For gesture recognition, each repetition [Ai, j, k] is
gesture and avoids the complex calculations required for their partitioned in Wt = 20 temporal windows and then the
extraction from the depth and color data captured from the arithmetic mean, the standard deviation, the covariance and
depths sensor like Kinect or others. the quadratic mean of coordinates in each window is
Accordingly, we have created our own data format. It calculated. To this purpose, we introduce the following
contains only necessary positional information and allows us features:
to easily save captured frames into files, and read them later • Arithmetic mean: For a random variable
for processing and testing purposes. vector A made up of n scalar observations, the mean
is defined as:
1 n
µ=  Ai
n i =1
(2)

• Standard deviation: For a random variable


vector A made up of N scalar observations, the
standard deviation is defined as:
2
1 n
S = (Ai − µ)
n − 1 i =1
( 3)

• Covariance: For two random variable


vectors A and B, the covariance is defined as:
Fig. 3. Tracking data of palm and fingertips

517
2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)

1 n VI. EXPERIMENTAL RESULTS AND DISCUSSION


C= cov (A,B) =  ( A i − µA ) * ( Bi − µB )
n-1 i =1
( 4)
The evaluation was performed on the new dataset. An
• RMS: The root-mean-square level of a interesting observation released from table II is that the four
vector A is: descriptors capture different properties of the performed
gesture. In general, the features extracted from one descriptor
2 complement the drawbacks of the features extracted from
1 n
RMS =  Ai
n i =1
(5) others. Thence, by combining them together, it is possible to
improve the recognition accuracy. We can reach about 81% of
accuracy, and this is a proof of the good performance of the
This set of characteristics is calculated based on spatial proposed machine learning strategy that is able to obtain good
attributes (3D positions) and normalized in the range [-1, 1]. results even when combining descriptors with different
Our previous approaches have produced six vectors whose performances, without being affected by the less performing
size varies according to the time window Wt (Fig. 4). features.
The complete feature set is obtained by concatenating the Much of the effectiveness of this basic solution relies on
six vectors [Palm, Thumb, Index, Middle, Ring, Pinky]. how precisely we choose the temporal window Wt. the
following illustration shows how the accuracy rate varies
depending on the temporal window. We note that at Wt= 20, a
rate of 81% represents the best accuracy that can be extracted
from the LM data with the suggested approach.

Fig. 4. The six descriptors vectors

B. SVM Classifier
A typical issue of many machine-learning techniques is
that a large dataset is required to properly train the classifier.
The acquisition of the training data is a critical task that can
require a huge amount of manual work. To compute the
results, we split the dataset into a training set and a test one. In
all the experiments, we have used the first six subjects for
learning and the last four ones for testing.
For classification, we test the SVM classifier, which is
one of the most common machine learning classifiers in use
today. It was derived from the learning theory and was widely Fig. 5. Accuracy rate based on variation of temporal window
utilized in detection and recognition.
TABLE II. PERFORMANCE OF LM FEATURES
In order to recognize the performed gestures, the
constructed feature vectors and their concatenation must be Features Accuracy (Wt = 20)
classified into G classes corresponding to the various gestures
Mean (µ) 40%
of the considered database. A multiclass SVM classifier based Standard deviation (S) 53.1818%
on the one-against-one approach has been used. A set of G (G- Covariance (C) 40.9091%
1) / 2 binary SVM classifiers are used to test each class against Root mean square (RMS) 30.9091%
each other and each output is chosen as a vote for a certain
gesture. For each sample in the test set, the gesture with the
maximum number of votes is the result of the recognition µ + S + C +RMS 80.9091 %
process. In particular, we opt for the SVM implementation in
the LIBSVM package (Chang and Lin, 2011). We set a non-
linear Gaussian radial basis function as the kernel, where the Finally, Table III provides the confusion matrix for the
classifier parameters are selected by means of a grid search SVM when all the features are combined together. The
approach and cross-validation on the training set. [3] diagonal of the matrix shows the correctly classified examples.
The dark gray cells represent the most correctly classified

518
2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)

examples for that class, while the light gray cells indicate the we will look to incorporate an alternative model, such as a the
false positives with failure rate greater than 10%. hidden markov model, as a segmentation method to determine
probable start and stop points for each gesture, and then input
Looking more in detail at the results in Table III, it is the identified frames of data into the convolutional neural
possible to notice how the accuracy is very close or above network model for gesture classification.
90% for most of them. G4, G5, G6, G7, G8 and G9 are the
critical gestures for the LM, and reveal a very high accuracy ACKNOWLEDGMENT
when recognized from the device. Whereas, gestures G1, G2
and G3, frequently fail the recognition. Both G2 and G3, two We would like to thank all people who contributed in the
reciprocal gestures, have a single raised finger (index) and collection and post processing of the database.
sometimes are confuse each other. Moreover, there are not
spatio-temporal characteristics used to differentiate between REFERENCES
them in our approach. G10 is sometimes confused with G4
since an open hand performs both of them. This is due to the [1] Leap Motion Controller. (Accessed on 10 November 2014).
limited accuracy of the hand direction estimation from the LM Available online at: https://www.leapmotion.com.
software. It can be also noticed that G1 is another challenging [2] M. Ben Abdallah, Med. Kallel, Med. S. Bouhlel, “An overview of
gesture owing to the touching fingers. gesture recognition,” IEEE, 6th International Conference on
Sciences of Electronics, Technologies of Information and
Telecommunications (SETIT), pp. 20-24, 2012.
TABLE III. CONFUSION MATRIX FOR PERFORMANCE EVALUATION [3] Marin, G., Dominio, F. & Zanuttigh, P, “Hand Gesture
Recognition with Leap Motion and Kinect,” IEEExplore, ICIP
G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 2014
G1 60 0 15 0 0 20 0 5 0 0 0 [4] Sharad Vikram, Lei Li, Stuart Russell.. “Handwriting and Gestures
in the Air, Recognizing on the Fly”. In Proceedings of the CHI
G2 5 50 25 5 0 0 0 0 15 0 0 2013 Extended Abstracts, Paris, France, 2013.
[5] Weichert, F.; Bachmann, D.; Rudak, B.; Fisseler, D. “Analysis of
G3 0 20 60 10 0 0 0 0 5 5 0 the accuracy and robustness of the leap motion controller”.
Sensors, vol.13, pp. 6380–6393, 2013.
G4 0 0 0 100 0 0 0 0 0 0 0 [6] R. McCartney1, J. Yuan1, and H.-P. Bischof. “Gesture
Recognition with the Leap Motion Controller”. Int'l Conf. IP,
G5 0 0 0 0 90 5 0 0 0 0 5 Comp. Vision, and Pattern Recognition IPCV'15, 2015
[7] Rajesh B. Mapari, Govind Kharat. “Real Time Human Pose
G6 0 0 0 0 0 100 0 0 0 0 0 Recognition Using Leap Motion Sensor”. ICRCICN, IEEE, 2015.
[8] Bassem Khelil, Hamid Amiri. “Hand Gesture Recognition Using
G7 0 0 0 0 0 0 100 0 0 0 0
Leap Motion Controller for Recognition of Arabic Sign
Language”. 3rd International conference ACECS’16,2016.
G8 0 0 0 0 0 0 0 100 0 0 0
[9] I-A.Zait¸ S-G.Pentiuc, R-D.Vatavu1, “On free-hand TV control:
G9 0 0 5 0 0 0 0 0 95 0 0 experimental results on user-elicited gestures with Leap Motion”.
Springer,Pers Ubiquit Comput vol19,pp821–838, 2015
G10 0 0 0 20 0 0 0 0 5 75 0 [10] Antonio Opromolla, Valentina Volpi, Andrea Ingrosso, Stefano
Fabri, Claudia Rapuano, Delia Passalacqua, and Carlo Maria
G11 0 0 0 5 0 5 0 5 0 0 85 Medaglia. “A Usability Study of a Gesture Recognition System
Applied During the Surgical Procedures”. Springer DUXU 2015,
Part III, LNCS 9188, pp. 682–692, 2015.
[11] Wei Lu; Zheng Tong; Jinghui Chu, “Dynamic Hand Gesture
VII. CONCLUSION Recognition With Leap Motion Controller,” IEEE Signal
Processing Letters, Vol.23, pp. 1188 – 1192, 2016.
In this paper, we have studied the influence of different [12] N.Triki; M. Kallel; Med. S. Bouhlel, “Imaging and HMI :
parameters on the overall recognition accuracy of a gesture Fondations and complementarities,” IEEE, 6th International
recognition system for the visualization and manipulation of Conference on Sciences of Electronics, Technologies of
medical images during surgical procedures. To evaluate the Information and Telecommunications (SETIT), pp. 25-29, 2012.
performance of our technique, we have collected a small but
challenging dataset of 11 dynamic gestures by the LM sensor.
Then the feature vectors of all gestures have been extracted.
Besides, the experimental database has been extended to
achieve early recognition. Subsequently, we have utilize the
train set to model the SVM, while the test set has been used to
detect the performance. Finally, the experimental results
demonstrate the effectiveness of the proposed method.
In this work, we utilized features based only in positional
information. Further research will be devoted to the
introduction of novel feature descriptors and to the extension
of the suggested approach to the recognition of dynamic
gestures exploiting also the temporal information. In addition,

519

S-ar putea să vă placă și