1. 1
Hand Gesture Classification Using
Electromyography Signals
Miquel Junyent-Barbany, Maher Nadar, Carlos Rodoreda
Abstract—In this paper, a hand gesture recognition scheme
based on ElectroMyograghy signals is discussed. Single Channel
Electrodes employed on the Pronator Quadratus muscle of the
forearm are able to extract amplified voltage signal directly from
the surface of the tissue and transmit the data directly to MatLAB
program.
The data is initially pre-processed through filtering and de-
meaning, and then converted to its Fourier Transform so as to
make it time-origin invariant. A combination of features related
to length, variance, magnitude auto-regressive model coefficients
and others were extracted and different classification algorithms
were applied.
The paper finishes by comparing the overall performance of
the different classifiers, where the five-class categorisation takes
place with the highest accuracy of up to 95% using the Naive
Bayes algorithm.
I. INTRODUCTION AND MOTIVATION
Surface Electromyography is a very convenient non-invasive
approach to measure the behaviour of the different muscles in
ones body [1]. Its use until the recent decades has been to
merely examine the performance or condition of the muscle
in question. With the rise of the field of artificial intelligence,
however, the obtained readings can now be used to predict the
intended movement.
With this at hand, a revolution of the ability to interact
with the environment and remotely control the world around
us arises. Chief applications are to remotely control the TV,
computer, or any device that is set to recognize our algorithm
[2], [3]. For instance, forming a fist would mute a video, and
an open palm would pause it. And similar applications go
on as far as humans creativity allows. Another vital drive to
this pattern recognition approach is to potentially improve the
control of prosthetic limbs. In fact, although the original human
limb might be amputated, the signals in the nerves might most
probably still be existent.
This paper begins with a quick review through the state of
the art. Our approach to retrieve, process and classify Elec-
tromyography signals is then presented in the Methodology
section. Aspects like signal filtering and other pre-processing
methods are exposed after describing the muscle and the
sensor selected. The different extracted features are compared
and their abilities to enhance the classification of five hand
movements are discussed. Lastly, the accuracy of different
classification methods are presented in the Results section.
II. RELATED WORK
Hand gesture recognition has been rigorously approached
within the recent decades. In its earliest stages, methodolo-
gies incorporating computer vision were obtaining respectable
results, but were affected by background texture, color and
lighting. Other attempts were using specialised movement-
based sensing gloves that would render the Human-Computer
interface inconvenient [3].
On another page, several researches have been conducted
based, similarly to our practice, on EMG signal processing.
The first ever study on EMG signal classification was con-
ducted by Graupe and Cline in 1975, where 85% accuracy
was obtained using auto-regressive coefficients as classifying
features [2]. In 1999, Engelhart et al. reach a 93% accuracy
for a four class categorization using time-frequency wavelet
representations [4]. In the same year, Nishikawa et al. managed
to classify ten different hand gestures by means of an online
learning scheme with an average accuracy of 91% [3].
Other studies directed to sign language recognition via hand
gestures which, as one can imagine, involves much more
classes than the other studies- suggest that a combination of
EMG along with accelerometer can perform better classifi-
cation than EMG signals alone [1]. Classifiers used in such
studies include Linear Bayesian and hierarchical decision tree
[5].
Moreover, some papers introduced the idea that it is far
better to deal with the signals frequency domain rather than the
time domain, especially if one intends to classify the gestures
in real time in a sliding window-based approach [6]. Indeed,
Kim, Mastnik and Andr´e obtained 94% real time classification
accuracy by fusing K-NN and Bayes decision levels [7].
III. METHODOLOGY
In this section, the different steps for our approach are
described, which are outlined in Figure 1.
Fig. 1: General outline of the process
A. Signal acquisition and pre-processing
The sensor of choice was the SHIELD-EKG-EMG bio-
feedback shield [8], designed by Olimex, which was connected
2. 2
to an Arduino Uno. The shield pre-amplifies the voltage
detected from the muscles to a range of 0 to 5 volts so as to
make the readings distinguishable. Through research and trial
and error, it has been found that the best electrode positioning
(with least noise generation) for a single channel operation
is to mount two electrodes consecutively on the ’Pronator
Quadratus’ muscle of the left forearm that is performing the
different gestures, and one on the other hand that acts as a
ground for the measurement [7], [6].
Fig. 2: Sensor positioning and chosen gestures: a) Ground elec-
trode is wrapped around right forearm. b)Anti-clockwise roll
turn. c) Flexed fingers. d) Clockwise roll turn. e) Contracted
fingers. f) Rest.
Figure 2 shows the different gestures including the position
of the sensors: Gesture 1 is a roll turning of the wrist in the
anti-clockwise sense, Gesture 2 is the flexing of the fingers
(hand is fully open), Gesture 3 is the roll turning of the wrist in
the clockwise sense, Gesture 4 is the contraction of the fingers
(hand is forming a fist), and Gesture 5 is the rest position
(minimal muscle activity).
One of the most challenging tasks was to generate a ground
truth, or to pair a signal with its class for later classification.
The main problem was to determine the beginning and the
end of a gesture when recording the signal. This has been
solved by selecting appropriate time-origin invariant features,
which is later described. The classes were given by pressing a
keyboard key while performing the gesture, which generated
a target signal. Figure 3 shows a piece of the raw signal for
Gesture 1 together with the target signal generated by pressing
a key.
After obtaining the raw data, the next step is to pre-process
it to facilitate its further manipulation. We begin by demeaning
the readings. This is particularly useful since repeating the ex-
act same muscle command is pretty much an impossible task.
Finally, the whole signal is truncated in chunks that correspond
to the different gestures in order to apply supervised learning.
Then, a Hamming filter with a window of size 10 acts as a
lowpass filter that admits only the first quarter of the frequency
bandwidth [1].
Figure 4 shows a sample of the different signals obtained for
every gesture. In blue, raw data is presented which incorporates
noise throughout the whole signal. After applying the filter,
Fig. 3: Example of raw signal and target signal for Gesture 1.
the high frequencies are removed and the resulting signal is
smoother but also a delay can be noticed. This noise reduction
can be better appreciated in Figure 7. By experimentation, this
has been identified to be a delay of half the filtering window,
which does not suppose an issue for classifying the signal.
Fig. 4: Raw and filtered signals of the five different gestures.
Some references mention the use of the absolute enclosing
curve (using Hilbert transform), as it is an analog signal [9],
[10]. Nevertheless, in our application this step seemed to hinder
the classification, where the accuracy diminished significantly,
and thus we chose to omit it. We relate this to our extensive
use of frequency domain features, which would be elaborated
in the next paragraphs.
B. Feature extraction
In order to use the classifier in real time in a window-based
approach, the features need to be, as mentioned before, time-
origin invariant (i.e. the features should represent the chunk of
signal corresponding to a gesture independently of the time)
[11]. This led us to work with Frequency Domain features
rather than Time Domain [7], [12]. By visual inspection, The
Fourier Transform and the Power Spectrum showed to be good
for distinguishing between the classes [13], [14].
3. 3
To reduce the amount of features, a histogram is used,
being then the area along a bandwidth of frequencies the
feature vector for both graphs. Figure 5 shows the Fourier
Transform of a sample signal of Gesture 2, together with its 5
bins histogram from frequency 0 to 10Hz. It should be noted
here that the choice of the number of bin in the representing
histogram should be relatively low. Indeed, if the number of
bins were high, the representation would be too specific for
each signal, and hence two signals of the same gesture would
not be recognised as belonging to the same class.
In Figure 6, the Power Spectral Density, or the square
of the Fourier Transform, of an example of Gesture 5 (not
moving) is given. As can be observed, there exists a remarkable
difference between both shapes and magnitudes of the two
representations.
Fig. 5: Fourier Transform together with its histogram
Fig. 6: Power Spectrum together with its histogram
Auto-Regressive model coefficients showed to be good
features with favorable results in some reference work [5].
Several AR models of different order were tried to see how
well they could represent the signal. The percentage of fit,
together with the Akaike Information Criterion (AIC) for a
one-step predictor, were chosen to determine the proper order.
Regressors of order 1 or 2 showed poor results, and were hence
disregarded. Order 3 was enough to get around 85% of fit.
Orders 4 and 5 increased the fit to 90% and 95% respectively,
while also reducing the AIC index. As for higher orders (6
and beyond), the AIC index was decreasing in much more
significant rate than that of the increase of the fit (almost
negligible fit enhancement), and hence discounted.
Figure 7 shows the response of the 4th-order AR model
together with the raw and filtered signals, corresponding to
Gesture 4. The fit of this one-step predictor is 94.85% and the
AIC obtained is -7.1138. It should be noted here that the use
of the filter was a crucial step, for the fit could not get higher
than 50% with the raw signal alone.
Fig. 7: One-step AR model response. Top: raw, bottom: filtered
The integral of the signal, the integral of its derivative
(referred as the length of the signal) and the integral of
the square of the signal are calculated in order to provide
information of the overall shape. Additionally, two values,
the variance and the so-called Modified Mean Frequency
(MMNF) [6] (the sum of the product of the amplitude and
the frequency spectral densities divided by the total sum of
the amplitude spectrum), have been of important use. Taking
M as the maximum frequency of the spectrum, the formula of
the MMMF would be:
MMMF =
M
i=1 fiAi
M
i=1 Ai
(1)
C. Classification
There are many classifiers that can be applied to approach
our problem [11]. A list of the classifiers used with a small
description is presented below:
• Linear Discriminant Analysis (LDA): LDA constructs
a decision boundary by minimizing the intra-class and
maximizing the inter-class variability.
• Naive Bayes (NB): According to the Bayes rule, patterns
are assigned to the class with the highest posterior
probability.
4. 4
• Nearest Neighbor (NN): Patterns are assigned to the
class of the nearest (Euclidean distance) training pattern.
• Support Vector Machine (SVM): SVM constructs a
decision boundary with a maximal margin to separate
different classes. Radial Basis is used as decision func-
tion.
• Tree: A tree structure which maximizes the separation
of the data with feature values is constructed. Branches
are separation rules and leaves are classes.
Several two-class classifiers of the types listed above have
been tried with two techniques which are known to give best
results:
• One vs all (OVA): One two-class classifier per class is
generated, with the observations of that class as positive
samples and all other samples as negatives.
• One vs one (OVO) In this case K(K−1)
2 classifiers are
trained to separate between two classes. When predict-
ing, each classifier votes for an option and the class with
a major number of votes becomes the prediction.
IV. RESULTS
Several experiments have been performed in order to deter-
mine which feature vectors show better results. A vector of 19
features has been chosen:
• MMNF
• Variance of the signal.
• Integral of the signal.
• Integral of the signal squared.
• Length of the signal.
• 10 bin histogram of the Magnitude from the Fourier
transform, in decibels.
• Coefficients of a 4 order AR model.
The feature vectors data-set is shuffled and divided in two:
train (75%) and validation (25%). This is repeated one hundred
times for each classifier, in order to train and validate it with
different data. The mean and the maximum accuracy values are
taken in order to compare between classifiers. The proportion
of feature vectors available for each gesture is as follows:
• Gesture 1: 48 observations.
• Gesture 2: 60 observations.
• Gesture 3: 41 observations.
• Gesture 4: 44 observations.
• Gesture 5: 60 observations.
Figure 8 show the average and maximum accuracies for each
classifier, using one vs all and one vs one. It can be observed
that Naive Bayes accurately distinguishes between the different
classes, showing an average accuracy of 80% and a peak of
95%. SVM and Linear Discriminant Analysis perform good
as well, the second specially good when using one vs one.
KNN performed remarkably good for some reference works,
however it shows the worst results in our case.
The confusion matrices for Naive Bayes and SVM using
OVA are shown in Figure 9, which belong to the maximum
peak of both: 95.31% and 92.19% respectively. It can be seen
that classes three and four are confused in both cases, while
Gesture 5 is never confused. This seems reasonable since the
Fig. 8: Max and mean classification accuracies of the different
classifiers.
last gesture is not actually a movement. The experiment was
repeated by taking the last gesture out of the validation set.
Results are similar for all classifiers, with an average of 6%
less accuracy. The maximum peak, for instance, is 89.06%
with the Naive Bayes classifier using OVA.
Fig. 9: Confusion matrices using OVA. Left: Naive Bayes;
Right: SVM.
In order to see how distinguishable two classes are, an
experiment was done by training and validating with pairs
of gestures. Results are given in Figure 10. It can be seen
that Gesture 1 is very separable from Gestures 3 and 4, while
Gestures 3 and 4 cannot be distinguished that well. By looking
at the shape of the signals in Figure 4 it can be observed that
Gesture 1 is comparable to Gesture 2 but not to Gestures 3
and 4, which are similar.
V. CONCLUSION
In this work, we presented a way to classify five different
hand gestures with respectable accuracy. The signal extraction
was done with an EMG-shield interconnected with an Arduino
Uno. The interface between the sensors and the computer was
made using USB cable, and the data would be directly read
and analysed in MATLAB. After filtering the raw signal with
a Hamming filter of window size 10, the Fourier Transform
of the signal was extracted, so as to deal with time-origin
invariant data. Following that, several features were obtained,
including but not limited to the AR model coefficients, mag-
nitude of the signal, its variance and some relevant integrals.
With the features in hand, different classification algorithms
5. 5
Fig. 10: Results for gesture pairwise NB-classification.
have been used. While the KNN and hierarchical decision
tree performed relatively poorly (maximum accuracy of 83%
and 84 % respectively), the best results were obtained with
Support Vector Machines and Nave Bayes algorithms, where
the accuracy went up to 92% and 95% respectively.
VI. FUTURE WORK
All applications need the system to work in real time, which
could be implemented using a window-based approach with
relative ease, since the features used are time-origin invariant
[15]. The main inconvenience is that the readings of the sensors
are difficult to reproduce once the sensors locations are slightly
changed. A way to improve this would be to add conductive
gel between the sensor and the skin, or to apply a high-pass
filter to get rid of the noise of the voltage amplifier [1].
Furthermore, other time-origin invariant features could be
used alongside the ones presented to enhance classification
results. Some references make use of the Wavelet Transform
since they present remarkable results in other (bio)signal
classification tasks [16]. Also, more classifiers could be tried.
Valuable results have been obtained using an Artificial Neural
Network [17]. Another approach would be the use of clustering
methods [18]. Kim, Mastnik and Andr´e obtain excellent results
by merging different classifiers [7].
Allot of space is still available for future work and enhance-
ment to the presented study. In fact, all the measurements
have been extracted from a single subject, which makes the
algorithm in question accurate for that particular subject alone.
Knowing the fact that muscles interact differently from one
individual to the other (some persons may have more synapses,
more muscle mass, etc...), each subject would have to have his
own training stage. An interesting future task would be to find
a classification that would be individual independent.
Last but not least, accuracy is a well-known measure of qual-
ity. Nevertheless, other measures such as sensitivity, specificity,
precision or F1 score give more information in order to select
the best classifier [19]. Also, standard validation techniques
like cross-validation or k-fold should be applied. In order
to improve the accuracy of the resulting classifier models, it
would definitely be useful to increase the amount of inputted
signals, and thus intensifying the training phase. Moreover,
the choice of the gestures to be classified may also affect the
classification process. With further investigation of the signals
obtained from diverse hand gestures, the ones that appear to
have higher separability would be chosen to classify.
REFERENCES
[1] N. Haroon and A. N. Malik, “Multiple hand gesture recognition using
surface emg signals,” Journal of Biomedical Engineering and Medical
Imaging, vol. 3, 2016.
[2] B. Crawford, K. Miller, P. Shenoy, and R. Rao, “Real-time classification
of electromyographic signals for robotic control,” Proceedings of AAAI,
2005.
[3] Z. Xu, C. Xiang, W. Wen-hui, Y. Ji-hai, V. Lantz, and W. Kong-
qiao, “Hand gesture recognition and virtual game control based on
3d accelerometer and emg sensors,” Proceedings of the international
Conference on Intelligent User Interfaces, 2009.
[4] K. Englehart, B. Hudgins, M. Stevenson, and P. Parker, “A dynamic
feedforward neural network for subset classification of myoelectric
signal patterns,” Engineering in Medicine and Biology Society, IEEE
17th Annual Conference, vol. 1, 1995.
[5] Z. Xu, C. Xiang, L. Yun, V. Lantz, W. Kongqiao, and Y. Jihai,
“A framework for hand gesture recognition based on accelerometer
and emg sensors,” IEEE TRANSACTIONS ON SYSTEMS, MAN, AND
CYBERNETICS, 2011.
[6] A. Phinyomark, C. Limsakul, and P. Phukpattaranont, “A novel feature
extraction for robust emg pattern recognition,” Journal of Computing,
vol. 1, 2009.
[7] J. Kim, S. Mastnik, and E. Andr´e, “Emg-based hand gesture recognition
for realtime biosignal interfacing,” Proceedings of the 13th international
conference on Intelligent user interfaces, 2008.
[8] SHIELD-EKG-EMG bio-feedback shield USERS MANUAL, Olimex
LTD, 2014.
[9] J. Singh, K. Kalkal, K. Kapila, and M. A. Alam, “Emg-based hand
gesture recognition for realtime biosignal interfacing,” Indian Institute
of Technology, Kanpur, Tech. Rep., 2015.
[10] V. B. Chavan and N. N. Mhala, “Development of hand gesture
recognition framework using surface emg and accelerometer sensor for
mobile devices,” International Research Journal of Engineering and
Technology, vol. 2, 2015.
[11] U. Jensen, M. Ring, and B. Eskofier, “Generic features for biosignal
classification,” Proceedings Sportinformatik, 2012.
[12] L. D. Valencia, C. D. Acosta-Medina, and G. Castellanos-Dom´ınguez,
“Time-frequency based feature extraction for non-stationary signal,” in
Applied Biomedical Engineering. InTech, 2011.
[13] G. Heinzel, A. Rudiger, and R. Schilling, “Emg-based hand gesture
recognition for realtime biosignal interfacing,” Max-Planck-Institut fur
Gravitationsphysik, Teilinstitut Hannover, Tech. Rep., 2002.
[14] B. D. Storey, “Computing fourier series and power spectrum with
matlab.”
[15] A. Amalaraj, “Real-time hand gesture recognition using sEMG and
accelerometer for gesture to speech conversion,” Master’s thesis, San
Francisco State University, 2015.
[16] M. Ermes, “Methods for the classification of biosignals applied to
the detection of epileptiform waveforms and to the recognition of
physical activity,” Ph.D. dissertation, Tampere University of Technology
(Finland), 2009.
[17] E. H. Shroffe and P. Manimegalai, “Hand gesture recognition based on
emg signals using ann,” International Journal of Computer Application,
vol. 2, 2013.
[18] R. T. M. de Abreu, “Algorithms for information extraction and signal
annotation on long-term biosignals using clustering techniques,” Mas-
ter’s thesis, Universidade Nova de Lisboa, 2012.
[19] V. Labatut and H. Cherifi, “Accuracy measures for the comparison of
classifiers,” The 5th International Conference on Information Technol-
ogy, Jordanie, 2011.