1. E-Learning Application Using Gesture Recognition
Amrit Murali
SRM University,
Chennai
amritmurali@
gmail.com
Niteesh AC
SRM University,
Chennai
niteesh.ac@
gmail.com
Mr. R. Jebakumar
SRM University,
Chennai
jebakumar.r@
ktr.srmuniv.ac.in
ABSTRACT
Recognizing gestures is acomplex task which involves
many aspects such as motion modeling, motion analysis,
pattern recognitionand machine learning.Gesture
recognition pertains to recognizing meaningful expressions
of motion by a human, involving the hands, arms, face,
head, and/or body. It is of utmost importance in designing
an intelligent and efficient human-computer
interface.Students need to possess practical ability,
decision-making ability and the tools of mastering
computer technology. E-Learning,is a kind of teaching
reform that adapts the need of the society, starts to be used
widely in schools and colleges. E-learning technologybased
on the development of multimedia and network technology
can provide support for experimental teaching.Keeping all
the essential factors in mind an E-Learning application will
be created which recognizes the movement offingers and
various patterns formed by them. An E-Learning
Application using Gesture Recognition (ELGR) is proposed
in this paper. In ELGR, we present the over-riding of
mouse pointer and performs various mouse operations such
as leftclick, right click, double click, drag etc using gesture
recognition technique.
1. INTRODUCTION
Gesture recognition is the mathematical interpretation of a
human motion by a computing device. Gesture recognition,
along with facial recognition, voice recognition, eye
tracking and lip movement recognition are components of
what developers refer to as a perceptual user interface
(PUI). The goal of PUI is to enhance the efficiency and
ease of use for the underlying logical design of a stored
program, a design discipline known as usability.[7]
In
personal computing, gestures are most often used for input
commands.
Recognizing gestures as input allows computers to be more
accessible for the physically-impaired and makes
interaction more natural in a gaming or 3-D virtual world
environment. Hand and body gestures can be amplified by
a controller that contains accelerometers and gyroscopes to
sense tilting, rotation and acceleration of movement -- or
the computing device can be outfitted with a camera so that
software in the device can recognize and interpret specific
gestures.[7]
A wave of the hand, for instance, might
terminate the program.
We have use an appearance based model for gesture
recognition. These models don’t use a spatial representation
of the body, because they derive the parameters directly
from the images or videos using a template database. Some
2. are based on the deformable 2D templates of the human
parts of the body. In this project we have used the tracking
of particular colors.This second approach in gesture
detecting using appearance-based models uses image
sequences as gesture templates. Parameters for this method
are either the images themselves, or certain features derived
from these. Most of the time, only one ( monoscopic) or
two ( stereoscopic ) views are used.[5]
Since one of the major input parameters of this project is a
video/image feed, a vital part of this project is Image
Processing. Image processing refers to the task where we
receive an image as an input and then perform some
operation on it and get a set of properties or a different
resultant image as the output. The process where a
computer is attempting to decipher information from an
image is known as Computer Vision.
Computer vision is a field that includes methods for
acquiring, processing, analyzing, and understanding images
and, in general, high-dimensional data from the real world
in order to produce numerical or symbolic information, e.g.,
in the forms of decisions.A theme in the development of
this field has been to duplicate the abilities of human vision
by electronically perceiving and understanding an image.
This image understanding can be seen as the disentangling
of symbolic information from image data using models
constructed with the aid of geometry, physics, statistics,
and learning theory. Computer vision has also been
described as the enterprise of automating and integrating a
wide range of processes and representations for vision
perception.
As a scientific discipline, computer vision is concerned
with the theory behind artificial systems that extract
information from images. The image data can take many
forms, such as video sequences, views from multiple
cameras, or multi-dimensional data from a medical scanner.
As a technological discipline, computer vision seeks to
apply its theories and models to the construction of
computer vision systems.[8]
E-learning (or eLearning) is the use of electronic
media, educationaltechnology and information and
communication technologies (ICT) in education.Computer-
based E-learning or training (CBT) refers to self-paced
learning activities delivered on a computer or handheld
device such as a tablet or smartphone. It is a flexible, self-
paced method of education. [6]
E-Learning is the future of the education industry as it
provides resources at any student’s fingertips. Students are
no longer dependent on physical attendance at lecture halls
or classrooms. It makes it easier for students to access
information on the go, at home or any other place.
The origin or etymology of e-learning is contested, with the
e- part not necessarily meaning electronic as per e-mail or
e-commerce. Coined between 1997 and 1999, e-learning
became first attached to either a distance learning service or
it was used for the first time at the CBT systems seminar.
Since then the term has been used extensively to describe
the use of online, personalised, interactive or virtual
education.
Bernard Luskin, an educational technology pioneer,
advocated that the "e" of e-learning should be interpreted to
mean "exciting, energetic, enthusiastic, emotional,
extended, excellent, and educational" in addition to
"electronic." Eric Parks suggested that the "e" should refer
to "everything, everyone, engaging, easy". These broad
interpretations focus on new applications and developments,
as well as learning theory and media psychology.
Moore et al found "significant variation in the
understanding and usage of terms used in this field" and
pointed to "implications for the referencing, sharing and
collaboration of results." In usage, e-learning is an
extremely significant (but incomplete) subset of
educational technology. As such, various aspects of e-
learning are discussed in that article.[6]
2. LITERATURE SURVEY
In modern day, Gesture recognition has become
animportant term. There are many gesture recognition
techniques developed for tracking and recognizing various
hand gestures.
The first one is wired technology, in which users need to tie
up themselves with the help of wire in order to connect or
interface with the computer system. In wired technology
user can not freely move in the room as they connected
with the computer system via a wire and are limited by the
length of wire.
In “Smart particle filtering for 3d hand tracking”[9]
,
structured light was used to acquire 3D depth data.Skin
color was used for segmenting the hand as well, which
requires interest points on the surface of the hand using a
camera. Motion information obtained from the 3D
trajectories of the points was used to augment the range
data. .[1]
Nowadays some advanced techniques have been introduced
like Image based techniques which require processing of
image features like texture etc. If we work with these
features of the image for hand gesture recognition the result
may vary and could be different as skin tones and texture
changes very rapidly from person to person from one
continent to other. To overcome these challenges and
promote real time application, gesture recognition
technique based on color detection and their relative
position with each other has been implemented. The color
can also be varied and hence obviating the need of any
particular color. The movement as well as mouse events of
mouse are very smooth and user is able to select the small
menu buttons and icons without any difficulty. [1]
In ”A Simple Shape-Based Approach to Hand Gesture
3. Recognition”[10]
, the algorithm is based on calculation of
three combined features of hand shape which are
compactness, area and radial distance. If compactness of
two hand shapes are equal then they would be classified as
same, in this way this approach limits the number of
gesture pattern that can be classified using these three
shape based descriptors.[1]
The algorithm implemented in “Gesture Recognition Based
Mouse Events”[1]
is divided into seven main steps. First one
is selection of RGB value. The second step includes
conversion of RGB value to YCbCr. The further steps are
region of interests, scale conversion, mirror value and
finally the mouse event. This algorithm focuses on
reducing redundancies in RGB data and hence uses a
complex algorithm that converts RGB data into YCbCr.
In ELGR we attempt to omit the complex RGB to YCbCr
conversion and it’s corresponding computations to see if it
improves computer performance.
As computers become more common in society, facilitating
human–computer interaction (HCI) will have a positive
impact on their use. Gestures have long been considered as
an interaction technique that can potentially deliver more
natural, creative and intuitive methods for communicating
with our computers.
The work done by Siddharth S. Rautaray andAnupam
Agrawal[12]
provides an analysis of comparative surveys
done in this area. It focuses on the three main phases of
hand gesture recognition i.e. detection, tracking and
recognition. Different application which employs hand
gestures for efficient interaction has been discussed under
core and advanced application domains. This paper also
provides an analysis of existing literature related to gesture
recognition systems for human computer interaction by
categorizing it under different key parameters. It further
discusses the advances that are needed to further improvise
the present hand gesture recognition systems for future
perspective that can be widely used for efficient human
computer interaction. [12]
Garrison and Borgia[11]
focuses on the development of an
Internet‐ based distance learning model for teaching the
introductory finance course in the Finance Department at
Florida Gulf Coast University (FGCU). They develop a
separate Internet‐ based course as an alternative to the
traditional in‐ class introductory finance course. In this
Internet‐ based course students are required to participate
in a “boot camp” for the first few weeks, which covers only
the most complex aspects of the course. After this initial
period, the course is completely Web‐ based in design.
3. SYSTEM ARCHITECTURE
3.1 Video Capture:
In this module, we create an video input object that reads
the human hand gesture through a camera. The camera feed
is then sent to the Gesture Detector. The input is received
from the video camera. The feed is in the form of a video of
resolution 640x480. The windows camera is used to
capture the video.
3.2 Gesture Detector:
The Gesture Detector module will check to see what kind
of gesture is being made by the user. This is done by taking
an image snapshot every 0.05 seconds. The input to the
gesture detector is the video output of the video capture
unit. The gesture detector unit then creates a snapshot of
the image so that it can be sent to the Mouse Event
Detector module for processing.
3.3 Mouse Event Detector:
This module receives the image snapshot that is the output
of the Gesture Detector unit and then performs various
image processing techniques that are described in further
detail below.
a) Image Snapshot: This unit takes a screenshot of the
video feed to convert it into an image feed for
efficient processing.
b) Image Processing: This unit extracts the blue and
red component from the image feed. This is done so
that the system can recognize the presence of the two
colors.
c) Gesture Classifier: This unit classifies the image as
belonging to one of three categories, left-click, right-
click or cursor move.
3.4 Mouse Event Generator:
This module receives the gesture from the Mouse Event
Detector unit. It then performs one of the three defined
mouse event actions. The three defined events are Mouse
Move, Left Click and Right click. The functions performed
by this module are,
a) Mouse Listener:This unit takes the output of the
mouse event detector and performs mouse press and
mouse release functions.
b) Event Handler: This unit ensures that the required
mouse events take place on the gui.
4. Figure 1. ELGR System Framework
4. SYSTEM MODEL
The System Model is described in the figure below. The
System Model describes the logical orientation of the
program. It describes how control flows through the
various models described in the System Architecture.
The System Model describes the flow of the application
from a user’s point of view. We first start with a window
asking if the user would like to start the gesture recognition
activity. If the user clicks on yes, the command to start
capturing the video feed and take a snapshot is started.
Once this is completed, the red and blue components of the
image are extracted. Median Filters are then used to extract
all unnecessary noise from the image. Once this is
completed, the image is classified as belonging to a
particular class of gestures. Depending on the class of
gesture the corresponding mouse function if performed
which enables the user to access the content of the
application.
Figure 2. Flow of data through the gesture recognition
module
5. ALGORITHM
ELGR is implemented with the help of two key algorithms.
Each algorithm corresponds to on key aspect of gesture
recognition. The first algorithm describes the image
processing module. It describes how the video stream s
captured and then made ready such that color identification
becomes easy. The input is a video stream and the output is
two processed images.
A. Algorithm Im_Proc(Video ύ)
1. Create video input object
ύ=videoinput( <adaptorname>, <deviceid>,
<format>). This object will capture the video and
store it. It also decides what resolution the video
format is to be in.
2. The next step is to ensure that the video colour format
is in RGB. This ensures that no other unnecessary
color computations are needed. Set( ύ,
5. ‘ReturnedColorSpace’, ‘rgb’);
3. Do
i. Get_snapshot(ύ1)
ii. Extract red component from image.
im2=
imsubtract( ύ1( :, :, 1),
rgb2gray( ύ1));
iii. Similarly extract blue component.
in2=
imsubtract( ύ1( :, :, 3),
rgb2gray( ύ1));
iv. Filter out noise by,
medfilt2(im2, [3
3]);
4. While( ! quit)
End Algorithm Im_Proc
The second algorithm describes the mouse and event
listeners and handler modules. This describes the different
steps that must be completed for successful over-riding of
the mouse pointer. This algorithm accepts a processed
image as an input and performs corresponding mouse
operations as the output.
B. Algorithm Mouse_Event(Image Ï)
1. Do
i. Trace Region of Interest by executing
the command
bwboundaries(Ï)
ii. Calculate centroids for blue and red
components in the image using
regionprops( Ï, ‘centroid’)
iii. If( blue is on left of red) then
perform left click
iv. Elseif(blue on the right of red)
perform right click
v. Elseif( only red) perform mouse move
operation using the followig
formula
mouse.mouseMove(1600-
( Ï.Centroid(1,1)*(5/2)),( Ï.Centroid
(1,2)*(5/2)-180));
2. While( !quit)
End Algorithm Mouse_Event(Image Ï)
6. RESULTS
0
10
20
30
40
50
60
70
80
90
100
Right_Click Left_Click Mouse_Move
In the above figure, we can see the percentage of successful
mouse operations. The Mouse_Move operation is the most
reliable of the three functions. The Mouse_Move operation
works 90% of the time and only fails a meager 10% of the
tries. The Left_Click button only works 70% of the time.
The Right_Click button has the lowest success percentage,
it works only 60% of the time.
By omitting the RGB to YCbCr conversion we notice a
massive difference in the performance of the system. The
YCbCr conversion does not add anything unique to the
application and hence can be left out without losing any
functionality.
7. CONCLUSION
We have clearly implemented a more efficient form of the
gesture recognition using hand gestures algorithm. This
improved performances comes with the cost of reduced
accuracy in the recognition of gestures. However, since
there is no commercial E-learning application that uses
Gesture Recognition there is a possibility for this
application to be widely adopted in schools and colleges
across the world. The advantages of this ELGR are that it
creates an interest in computers for children using an
innovative interface. It also provides a unique learning
experience for students that is both educational as well as
fun. It helps with the acquisition of technological skills
through practice with tools and computers. In the modern
world it is not too difficult to extend this project to work
with mobile devices as the dependence on hardware is
6. minimized.
8. REFERENCES
1. “Gesture Recognition Based Mouse Events” by
Rachit Puri IMWeb, Multimedia & ServicesWeb
SolutionsWeb Engine Samsung Research India-Bangalore
Bangalore – 560037, India
2. http://in.mathworks.com/index.html?s_tid=gn_logo
3. https://www.youtube.com/watch?v=LiFpSs2cQ00
4. http://docs.oracle.com/javase/1.5.0/docs/api/java/awt/
Robot.html
5. http://en.wikipedia.org/wiki/Gesture_recognition
6. http://en.wikipedia.org/wiki/E-learning
7. http://whatis.techtarget.com/definition/gesture-
recognition
8. http://en.wikipedia.org/wiki/Computer_vision
9. M. Bray, E. Koller-Meier, and L. V. Gool. “Smart
particle filtering for 3d hand tracking”. In Proc. Of Sixth
IEEE International Conf. on Face and Gesture Recognition,
2004.
10. Amornched Jinda-apiraksa, Warong Pongstiensak,
and Toshiaki Kondo, ”A Simple Shape-Based Approach to
Hand Gesture Recognition”, in Proceedings of IEEE
International Conference on Electrical
Engineering/Electronics Computer Telecommunications
and Information Technology (ECTI-CON), Pathumthani,
Thailand , pages 851-855, May 2010
11. Sharon H. Garrison and , Daniel J. Borgia, (1999)
"Using an Internet‐ based distance learning model to teach
introductory finance", Campus-Wide Information Systems,
Vol. 16 Iss: 4, pp.136 - 139
12. Vision based hand gesture recognition for human
computer interaction: a survey
Siddharth S. Rautaray, Anupam Agrawal