Using Neural Networks and 3D sensors data to model LIBRAS gestures recognition - KDMile 2014

Using Neural Networks and 3D sensors data to model
LIBRAS gestures recognition
Gabriel S. P Moreira - Gustavo R. Matuck - Osamu Saotome - Adilson M. da Cunha
ITA – Brazilian Aeronautics Institute of Technology

Introduction
 1,799,885 citizens in the country having great difficulty to hear
(CENSUS 2010)
 The Brazilian Signs Language (LIBRAS) is the sign language used
by the majority of the deaf people
 Second official language by the Brazilian courts, on laws 10.436
(2002) and 5.626 (2005)
 This research is an initial investigation of LIBRAS recognition
using 3D sensors and neural networks

Related Research
 Main approaches of hand tracking systems:
 Data-glove-based - electromechanical gloves with sensors
 Vision-based - continuous image frames from regular RGB or 3D sensors
 Main machine learning techniques:
 Artificial Neural Networks (ANN)
 Hidden Markov Models (HMM)
 Usage of HMM to recognize 47 Libras gestures types, captured with regular RGB
cameras [Souza et al. 2007].
 3D hand and gesture recognition system using Microsoft Kinect sensor, for
dynamic gestures, and HD color sensor, for static gestures recognition [Caputo et
al. 2012]
 A real-time system to recognize a set of Libras alphabet (only static gestures),
using MS Kinect and neural networks [Anjo et al. 2012].

LIBRAS alphabet
 All alphabet signs are made with just one hand.
 20 (twenty) static gestures and 6 (six) dynamic gestures (H, J, K, X, Y, Z)
executed with movement.

Case Study
 Six (6) people volunteered to record the signs of the LIBRAS complete alphabet,
divided in three groups: deaf people (2), LIBRAS teachers (2), and students (2)
 LIBRAS recognition process

Case Study – Data Acquisition
 Gestures recorded using:
 COTS 3D sensor Creative Senz3D GestureCam™ [Creative 2013]
 Intel® Skeletal Hand Tracking Library (Experimental Release) [Intel 2013]
 In this investigation, for each frame, it was recorded the absolute 3D coordinates (X,
Y, and Z) of 5 fingertips and the center of the hand palm, based on the sensor
distance. That resulted in 18 attributes with continuous position of hand and fingers.
Creative Senz3D GestureCam
Hand Tracking

Case Study – Pre-Processing (1/2)
 Normalization to Hand Relative Coordinates – Absolute
fingertips coordinates converted to relative coordinates, in
relation to hand’s center;
 Training Samples Transformations - There were only 11 samples
for each alphabet letter. Therefore, a strategy was created and
implemented to generate new samples based on training
samples, by rotation and scaling geometric transformations.

Case Study – Pre-Processing (2/2)
 Frames Sampling – Because gesture number frames varied with
gesture duration and camera FPS rate, this processes selected
18 equidistant frames for each gesture each frame containing
all finger tips 3D coordinates, at some point in time.

Case Study – Neural Network Model
 Multi-Layer Perceptron (MLP), with back-propagation learning,
sigmoidal function activation and Sum Squared Error (SSE) to measure
the learning performance.
 MLP Architecture:
 Input Layer (18 relative fingertips and hand 3D coords x 18 sampled frames for
each gesture = 324 neurons)
 Hidden Layer (tested from 50 to 400, stepped by 50)
 Output Layer (26 neurons – the alphabet letters)

Case Study – Results Analysis
 Dataset split: Training (64%), Validation (18%), and test (18%) sets
 The best MLP Network configuration correctly classified 61.54% (32/52) of
unseen data during the network training (test set). Its architecture was
composed of 200 neurons on the hidden layer.
 Alphabet test sets classification results

Limitations
 Difficulty was to find people capable of accurately
recording LIBRAS gestures, especially deaf people and
teachers
 The chosen 3D sensor presented the following limitations
 An instability of finger tracking on experimental library (Intel®
Skeletal Hand Tracking Library)
 The variability of frames per seconds rate may have introduced a
bias when comparing samples, because of the temporality of
gestures

Conclusion
 Initial investigation of LIBRAS recognition, using 3D sensor data
and Multi-Layer Perceptron.
 Six LIBRAS teachers, students, and deaf people have volunteered
for recording the alphabet gestures.
 Some strategies were developed for pre-processing gestures
data , to deal with temporality of gestures, normalizing
coordinates, and geometric transformations (rotation and
scaling).
 The best model correctly classified 61.53% patterns of the test
set.
 MLP network was not as effective as expected when trained with
few samples and operating over noisy gestures data

Recommendations and Sugestions
 Improvements in classification models can be evaluated
using more representative samples of gestures data.
 As a natural continuation of this research, still on its
beginning phase, the authors intend to explore other 3D
sensors, pre-processing approaches, and learning models,
like Support Vector Machines and Hidden Markov Models.

Thank you!
Gabriel S. P Moreira - Gustavo R. Matuck - Osamu Saotome - Adilson M. da Cunha
ITA – Brazilian Aeronautics Institute of Technology
gspmoreira@gmail.com

Using Neural Networks and 3D sensors data to model LIBRAS gestures recognition - KDMile 2014

More Related Content

What's hot

Similar to Using Neural Networks and 3D sensors data to model LIBRAS gestures recognition - KDMile 2014

More from Gabriel Moreira

Recently uploaded

Using Neural Networks and 3D sensors data to model LIBRAS gestures recognition - KDMile 2014

Editor's Notes