IRJET- Recognition of Theft by Gestures using Kinect Sensor in Machine Le...
researchPaper
1. Abstract
This paper proposes new features that can be extracted
from the Leap Motion and Microsoft Kinect to be used
for recognizing hand gestures. The Leap Motion
(https://www.leapmotion.com) provides 3D hand
information and the Microsoft Kinect
(https://developer.microsoft.com/en-
us/windows/kinect) providesdepth information,and the
combination of both allows us to extrapolate a large
amount of diverse information about the hand’s shape
enabling more accurate gesture recognition than is
possible with either camera individually. Using a
database of 10 distinct American Sign language
gestures, provided by Marin et al. [1,2], our new
features allow us to achieve a high level of accurate
recognitions.
1. Introduction
Interest in hand gesture recognition has continued to
grow in recent years as the demand for its utility
increases in areas such as video game development and
sign language translation for human computer
interaction (HCI). Advancements in 3D imaging and
Time-of-Flight cameras as well as the availability of
new technology such as the Leap Motion and the
Microsoft Kinect have made major advances to making
hand gesture recognition possible for a consumer
market. The Microsoft Kinect provides both an RGB
and depth image from which hand information can be
extracted, while the Leap Motion, created specifically
for hand gesture recognition, returns coordinate
information about the hand. Many different approaches
have been taken to distinguish gestures such as
topological features of holes in the image to distinguish
gestures [3] and rapid recognition to recognize gestures
before the gesture is completed [4]. Superpixel Earth
Mover’s Distance, which is widely used in image
retrieval and pattern recognition, is another approach
that has been applied to gesture recognition [5,6]. In this
paper, we utilize data from both a Microsoft Kinect and
a Leap Motion to obtain a more complete image of the
hand, making it possible to correctly identify the hand
gesture as shown in Figure 1.
The Leap Motion has improved significantly overthe
last few years.The Leap now provides more information
than previously available. In our approach,we made use
of these features to improve the accuracy of our
recognition. One of the features that is now available to
the Leap is extended fingers. The Leap is able to detect
which fingers are extended with very high accuracy.
This allows us to distinguish between gestures such as
the ones seen in Figure 1. The database we have selected
to use was created using the older version of the Leap
Motion. This means that they did not have the extended
finger feature available. To see how this feature
improves the accuracy of recognition, we manually
entered the extended fingers into the data. Section 2.1
will discuss our other new features for the Leap,
maximum X and Y value, average area, and X-Y ratio.
The Microsoft Kinect uses a Time-of-Flight camera
to create a grayscale depth image which makes the hand
easy to distinguish from the background. To find the
hand in the image, we assume the hand is always the
closest object to the Kinect and remove anything behind
the closest object. Other approaches use blob detection
to locate the hand with good results [7]. After finding
the hand,we extract features from the image in order to
gather data about the hand’s shape. The features we
extract are Silhouette, Convex Hull, Cell Occupancy
Average Depth, Cell Occupancy Non-Zero, Distance
Contour, Fingertip Distance and Fingertip Angles. We
then run the data through the Random Forest classifier,
which interprets the data and attempts to correctly
identify the gesture.There has been much work already
done using the Microsoft Kinect for hand gesture
recognition, such as researchers at the universities of
Padova and Moscow[1,8]. We have used features
inspired by these as well as innovative new ones to
maximize our recognition rate. The dataset of gestures
that we have used include 10 distinct hand gestures that
can be seen in Figure 2.
Fig. 1 Kinect and Leap setup
Leap Motion and Microsoft Kinect Hand Gesture Recognition
Josiah Bailey, Julie Kunnumpurath, Ryan Malon, Joe Nicosia, James Zwilling
State University of New York at Binghamton
2. 2. Hand Gesture Features
2.1 Leap Features
The Leap Motion provides us information about the
hand that can be accessed.Using the Leap SDK, we are
able to extract information about the fingertip positions,
palm center, extended fingers and various other points
of interest on the hand. Using this information, we
calculate our features.
2.1.1 Scale Factor and Number of Fingers
In order to account for different sized hands, we use a
scale factor. We average the x, y and z values for all
extended fingertips to create a new point we find the
distance between that averaged point and the center of
the palm. This distance is the scale factor. The number
of fingers is provided by the Leap and is the total
number of extended fingers detected by the Leap.
2.1.2 Extended Finger Logic Vector
We create a 5-bit vector where the most significant bit
(MSB) represents the thumb and the least significant bit
(LSB) represents the pinky all other fingers follow this
pattern. The Leap tells us what fingers are extended and
those corresponding bits are set to 1. This feature
provides a simple method to differentiate between two
gestures that both have the same number of fingers
extended.
2.1.3 Extended Finger Area
We calculate the area of the triangle using the points at
the two fingertips furthest apart and the center of the
palm as shown in Figure 3. This area is then divided the
number of extended fingers. This division helps for
differentiating between gestures 7and 8 as well as other
similar cases where the area would be the same but they
are two distinct gestures.The area is then divided by our
scale factor.
2.1.4 Max X and Y Value
We use the maximum x and y values with respect to the
palm center. These values are divided by the scale
factor.
2.1.5 Length Width Ratio
Based off the work of Ding et al.[9], we calculate the
width using the distance between the two furthest apart
fingers. The length is the greatest distance between a
fingertip and the center of the palm as seen in Figure 3.
The length is then divided by the width getting the ratio,
which
is then divided by our scale factor.
2.1.6 Fingertip Distances
Using the fingertip distances the Leap provides we
divide each by our scale factor to create distances
normalized based on the size of the hand.
2.1.7 Fingertip Directions
This is another feature that the Leap Motion calculates
directly. It is a vector of floats for each fingertip which
denotes the direction it is pointing. This is a feature of
the new Leap SDK, so we only use this feature for the
dataset that we created.
2.2 Kinect Features
Starting with the depth image returned by the Microsoft
Kinect, we then use the assumption that the hand will be
the closest object to the camera. By using a depth
threshold, we are able to cut out most of the image’s
background. Then the wrist is found by determining the
place where the width of the hand image decreases at the
highest rate.After everything below the wrist, the image
is then scaled, so the hand image is a uniformsize. Once
an image of just the hand is obtained, we are then able
to calculate some information that will be used in
several of our features.Among those are the contours of
the hand image, and the palm center, which we found by
performing a Gaussian Blur on the image, and then
using the brightest point on the image [10].
Fig. 2 Dataset Gestures
a) b)
Fig. 3 a) Extended Finger Area and
b) Length Width Ratio
3. 2.2.1 Silhouette
Based off of the work of Kurakin et al.[8], this feature
involves dividing the image into 32 equal radial sections
as seen in Figure 4. We calculate the distance from the
center of the image to the contour of the hand in each
radial section. The total distance of each section is then
averaged based off of how many points on the contour
are in the section. This helps to distinguish between
fingers that are together and fingers that are separated.
2.2.2 Convex Hull
Calculates the area of the black space around the hand.
By finding the space in between the fingers as shown in
Figure 4, this feature can showhow many fingers are up
and distinguish between fingers being togetheror apart.
2.2.3 Cell Occupancy Average Depth
Based off of the work of Kurakin et al.[8], this feature
involves splitting the image into a predetermined
number of squares of equal size. We used 64 squares in
ourresearch. We then find the average depth in each cell
using the grayscale value of each pixel in the hand.
2.2.4 Cell Occupancy Non-Zero
Based off of the work of Kurakin et al.[8], using the
same grid work as the above feature, the number of
occupied pixels in each cell is also saved as a unique
feature.
2.2.5 Distance Contour
The distance from the palm center to each point on the
contour of the hand can help to find local maxima and
minima, and distinguish between gestures with different
numbers of raised fingers.
2.2.6 Fingertip Distance
After using changes in the difference in the distance to
the palm centerfrom points along the contourto find the
fingertips, the distance from the palm center to each
fingertip is obtained. This feature can show some
differences between gestures such as 6 and 10, where
the difference is the finger being raised.
2.2.7 Fingertip Angles
Calculates the angles between the palm center and the
fingertips found in the Fingertip Distance feature.
2.3 Feature Vector
The feature extraction provides us with eight feature
vectors, one from the Leap and seven from the Kinect.
Only fingertip distances forthe Leap needed to be stored
in a vector while the other features were single values.
Each of the Kinect features were stored in separate
feature vectors as shown in Figure 5. All the Leap
features and Kinect features are then grouped into two
separate sets. These feature vectors can be tested alone
or combined with different features when using the
Random Forest test.
Fig. 5 Feature Vectors
3. Results
Testing of our features was done on the [1,2], which
contains ten gestures from fourteen individuals with
data samples for each gesture from each person. We
used the Random Forest classifier to measure the
accuracy of our gesture recognition. Table 1 shows our
results compared to the state of the art and Table 2
shows the results of the Random Forest test.
SVM
Leap Kinect Combined
Marin et
al.
81.5% 96.35% 96.5%
Our
Results
68.00% 95.36% 95.50%
Table 1. SVM results
The highest accuracy we got with the SVM test with
just the Leap features was 68%. Number of fingers, max
X, and area yielded this result. While running these tests
we were unable to use the extended fingers feature as
a) b)
Fig. 4 a) Silhouette and b) Convex
Hull
4. well as the fingertip directions, because the database
was made using an older version of the Leap Motion
SDK. The newer Leap Motion SDK provides that data,
and when included, this data is very useful for gesture
recognition. For the same reason, fingertip distances
were not included in this calculation either. When
manually entering the extended fingers, the accuracy of
fingertip distances is 90.36% by itself and extended
fingers is 100% by itself. The highest accuracy we got
with the Kinect was 95.36% on the SVM test. The
features that yielded this were convex hull, cell
occupancy average depth, and fingertip angles.
The highest recognition rate using a combination of
both Kinect and Leap features was 95.50% on the SVM.
The combination of number of fingers, ratio, max X and
Y, convex hull, cell occupancy average depth, and
fingertip angles produced that result.This set once again
excludes the Leap fingertip distance, extended fingers,
and fingertip directions.
Random Forest
Leap Kinect Combined
Our
Results
81.71% 95.21% 96.07%
Table 2. Random Forest results
Using the Random Forest test, we got significantly
improved results. Similar to the SVM tests, the same
Leap features as previously mentioned were excluded
from the calculations. The highest results using only
Leap or Kinect features gave us 81.71% and 95.21%,
respectively. Combining ratio, max X, max Y, and area
created the best combination of Leap features.
Combining convex hull, cell occupancy average depth,
and fingertip distances created the best performing
Kinect feature set. The highest combined accuracy of
96.07% was obtained with multiple feature sets. One
combination included ratio, max X and Y, area, convex
hull, cell occupancy average depth,and fingertip angles.
The other combination was similar but contained
number of fingers and fingertip distance instead of Max
Y and fingertip angles.
In addition to getting a higher recognition rate with
the Random Forest test, this test has a lower
computation time. When running all the Leap and
Kinect features, the SVM took 36 seconds to run. The
Random Forest test only took 8 seconds.The significant
decrease in computation time is important for real-time
gesture recognition.
Figure 6. ASL Dataset
We created our own dataset as well to test the
recognition rate ofour features. This dataset contains the
entire American Sign Language alphabet as shown in
Figure 6, excluding the letters J and Z. The dataset has
seven individuals performing each gesture ten times.
This dataset was made using the Leap Motion Desktop
V2, which means it provides more information
including extended fingers and fingertip directions.This
means that the Leap extended fingers and fingertip
distances can be used in our calculates for accuracy.
Utilizing all three of these,in addition to the rest of our
features, increased our recognition rate as well, as seen
in Table 3.
ASL Dataset Results
Leap Kinect Combined
Random
Forest
98.60% 87.74% 98.75%
SVM 80.83% 84.23% 91.73%
Table 3. Recognition rate with ASL dataset
Using the SVM, the combined results had a
recognition rate of 91.73% with a feature set consisting
of extended fingers, ratio, max X and Y, area, Leap
fingertip distances, and cell occupancy average depth.
For the Leap by itself, the recognition was 80.83% with
max X and Y along with fingertip direction. The Kinect
was 84.23% with just cell occupancy average depth.The
Random Forest results were even better. The combined
rate was 98.75% with Leap fingertip distances,fingertip
direction, convex hull, fingertip angles, and Kinect
fingertip distances. The Leap alone got a recognition
rate of 98.6% with all the Leap features, excluding max
Y. Kinect had a recognition rate of 87.74% with convex
hull, cell occupancy average depth,fingertip angles,and
fingertip distances.
4. Discussions
Popularity of virtual reality continues to grow and the
addition of hand gesture recognition would only further
this already burgeoning field. Advances in technology
have allowed for highly accurate collections of data
from everyday human interactions. Computers can now
process motion as input, instead of static symbols.
Implementing hand gesture recognition can help
5. broaden fields such as virtual reality, interactive
simulations, as well as help the hearing impaired.
The results we calculated with the new dataset
demonstrates how improvement in the Leap Motion
technology available for gesture recognition improves
the accuracy of gesture recognition as it now can
produce more precise information. Data such as
extended fingers and finger direction helped distinguish
between similar gestures that previously would have
been undifferentiated using older technology. As
interest in this field continues to increase, the
capabilities of technologies such as the Leap will
increase.
One reason that the Kinect features did not perform
as well on the ASL dataset is because of the similar
shapes ofmany of the gestures.As seen in Figure 6, it is
clear to see that many of the gestures are very similar.
With such similar gestures,it becomes more difficult to
distinguish between gestures like M and N, as seen in
Figure 7.
Figure 7. ASL Random Forest confusion matrix
By using Random Forest instead of an SVM, we were
able to get a higher recognition rate as well. Using the
Random Forest test works much better with a larger set
of features than an SVM. Because the training model
deals with well over 300 features considering some of
our features are vectors with multiple values, the SVM
requires much more time to train the data. In addition to
the higher accuracy provided by Random Forest, it was
also generally much more time efficient than the SVM.
5. Conclusions
In this paper, we propose new features one can extract
from the Leap Motion and Microsoft Kinect for use in
gesture recognition. Using the [1,2] dataset, we
produced promising results in gesture recognition,
especially from the Kinect. When creating a more
complex dataset consisting ofthe ASL alphabet,we had
access to the additional data the Leap Motion provides,
and observed an increase in accuracy by a significant
amount. The combination of our Kinect and Leap
features, while not perfect, work to correctly recognize
gestures with high accuracy.
Gestures like J and Z were not possible to test with
static images because in ASL as these gestures require
motion. Future work needs to be done with real time
hand tracking to correctly identify gestures in motion.
References
[1] Giulio Marin, Fabio Dominio, and Pietro Zanuttigh,
“Hand GestureRecognition With Jointly Calibrated Leap
Motion and Depth Sensor,” in Multimedia Tools and
Applications, 2014.
[2] G. Marin et al. “Hand Gesture Recognition with Jointly
Calibrated Leap Motion and Depth Sensor,” Multimedia
Tools and Applications, pp. 1-25, 2014.
[3] Kaoning Hu and Lijun Yin, “Multi-Scale Topological
Features for Hand PostureRepresentation and Analysis,”
in International Conference on Computer Vision, 2013.
[4] Yanmei Chen, Zeyu Ding, Yen-Lun Chen, and Xinyu
Wu, “Rapid Recognition of Dynamic Hand Gestures
using Leap Motion,” in IEEE International Conference
on Information and Automation, 2015.
[5] Chong Wang, Zhong Liu, and Shing-Chow Chan,
“Superpixel-Based Hand Gesture Recognition With
Kinect Depth Camera,” in IEEE Transactions on
Multimedia, Vol. 17, No. 1, 2015.
[6] Z. Ren, J. Yuan, J. Meng, and Z. Zhang, “Robust part-
based hand gesture recognition using Kinect sensor,”
IEEE Trans. Multimedia, vol.15, no. 5, 2013.
[7] Xia Liu and Kikuo Fujimura, “Hand GestureRecognition
using Depth Data,”in 6th IEEE International Conference
on Automatic Face and Gesture Recognition, 2004.
[8] Alexey Kurakin, Zhengyou Zhang, and Zicheng Liu, “A
Real-Time System for Dynamic Hand Gesture
Recognition with a Depth Sensor,”in Proc. of EUSIPCO,
2012.
[9] Zeyu Ding, Zexiong Zhang, Yanmei Chen, Yen-Lun
Chen, and Xinyu Wu, “A Real-time Dynamic Gesture
Recognition Based on 3D Trajectories in Distinguishing
Similar Gestures,” in IEEE International Conference on
Information and Automation, 2015.
[10] Fabio Dominio, Mauro Donadeo, and Pietro Zanuttigh,
“Combining MultipleDepth-based Descriptors for Hand
Gesture Recognition” Pattern Recognition Letters, 2013.