This document proposes a new method for hand detection and wrist localization to achieve automatic recognition of Arabic sign language gestures without clothing or background conditions. The method involves:
1) Using marker-controlled watershed segmentation to localize the hand region.
2) Rotating the hand region vertically, dividing it into sections, and detecting the wrist position as the first line with minimum white pixels in the hand region and maximum black pixels in the background region, focusing the search in the lower sections to avoid detecting fingers.
3) Extracting shape-based features like geometric moments and Zernike moments from the localized hand region to recognize Arabic digit sign gestures for sign language interaction.
2. into disjointed regions [17] and separates the objects in an
image. In order to solve the over-segmentation problem of the
traditional watershed, an improved algorithm named marker-
controlled watershed [18] [19] is proposed. Its goal is to
detect the presence of homogenous regions in the image by a
set of morphological operations. It is presented in Algorithm 1.
Algorithm 1 Hand localization: Marker-controlled watershed
algorithm used in works [18] and [19]
1. Convert color image into grayscale one.
2. Use gradient magnitude (as segmentation function).
3. Mark the foreground objects.
4. Compute background markers.
5. Compute the Watershed Transform (of the segmentation
function).
6. Isolate out the region of interest from the segmented image
and visualize the result.
Figure 1 illustrates the proposed segmentation process re-
sults.
Fig. 1: Marker-controlled watershed segmentation:
(a) Original image, (b) Grayscale image, (c) Gradient image,
(d) Final image.
III. WRIST LINE LOCALIZATION
Wrist line extraction is very important in order to facilitate
hand feature extraction. It is very necessary to detect only
a hand region of a captured image from a simple camera.
In recognizing gestures process, with the presence of few
forearm region several errors can be presented due to the little
difference in hand area information.
Some works [20] [21] proposed to apply a skin mask con-
taining hand and forearm information. The forearm width is
analyzed by respecting the mask orientation. These methods
are sensitive to variation of gesture and work only when the
hand region is presented.
Other works [22] [23] proposed a wrist localization method
without any clothing constraints by finding the local minimum
of the contour of the skin mask which contains the hand
region. This method has many detection errors such as finding
the finger region instead of the wrist position as illustrated in
Figure 2 (The regions are circled in red.)
Fig. 2: Wrist detection erroneous results: (a) Image rotated,
(b) Obtained result compared with the ground-truth location[23].
To overcome these limitations, we propose a new method
whose steps are presented in Algorithm 2 and whose details
are described in the next subsections.
Algorithm 2 Wrist detection
1. Rotate the hand region in the vertical direction.
2. Bounding box hand region.
3. Divide hand box in 4 equal regions.
4. Find the wrist position only in the 3 lower parts by detecting
the first line characterized by the minimum number of white
pixels (presented in the hand region) and the maximum
number of black pixels(presented in the background region).
5. Remove all pixels below the wrist line detected.
6. Rotate the new hand region in the original direction.
Hand must be presented in the vertical position. In fact, the
main idea in the wrist detection process is to present the hand
in the vertical position and to eliminate the finger region in
the first step. Second, we search for the first minimum width
related to the wrist line position; we start the search from top
to bottom. This proposition reduces the wrist line search fields
and eliminates the possibility of detecting the minimum width
in the finger region (see Figure 3).
Fig. 3: Wrist detection process.
A. Hand adjustment into the vertical direction
The main goal of this step is to put the hand into the vertical
direction. In this context, we opted to extract the straight line
corresponding to a maximum number of aligned points. It can
be related to the elongated finger (see Figure 4(a)) or to the
forearm existing in the hand object (see Figure 4(b)).
775
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on December 27,2021 at 15:04:43 UTC from IEEE Xplore. Restrict
3. Line Hough transformation is a regular method used for
line detection in image processing [24]. It can extract easily
the line corresponding to a maximum number of aligned points
[25].
Each line can be represented by two parameters τ and θ
where τ is the perpendicular distance from the origin to the
line, and θ is the angle between this perpendicular line and
the X coordinate axis [26] as shown in Figure 5. We use θ
to define the new orientation of the hand in our adjustment
process.
Fig. 4: Line detection and image rotation in the vertical direction:
(a) Example of line detected in the finger region,
(b) Example of line detected in the forearm region.
Fig. 5: Polar line representation.
B. Proposed wrist line localization process
To keep only the hand, we proceed firstly to bound all
the detected pixels in a rectangular box having the smallest
perimeter and then we estimate the position of the wrist which
represents the extremity of the hand and connects it to the
forearm.
The wrist exists in the lower part of the image and it is
characterized by the minimum width of the forearm. So
it corresponds to the lower horizontal line of the detected
object characterized by the minimum number of white pixels,
presented in the hand region, and the maximum number of
black pixels, presented in the background region.
So to ensure a good wrist line detection in wrist region and
not in finger or forearm region, we propose to find the wrist
after dividing the bounding region in 4 equal parts and search
the wrist only in the lower three parts presented with red color
circle in Figure 6. In fact, the first part is related to the finger
and palm information.
In the next step, we attempt to remove from the image all the
detected pixels below the wrist in order to keep only the hand
as shown in Figure 7.
Fig. 6: Division process: (a) original image,
(b) Rotated image and fixed wrist block search presented with red
circles.
Fig. 7: Wrist localization results: (a) Original image,
(b) Final hand detection after the wrist localization process.
IV. FEATURE EXTRACTION FOR HAND DESCRIPTION
Extracting good features is crucial to gesture recognition.
The features of the image provide a description of its content
such as color, texture and shape. In our context, shape is the
important feature since color and texture remain unchangeable
for the hand for all gestures. As a result, shape has recently
become one of the most promising descriptors that several
approaches have suggested. These descriptors can be classified
in two categories: contour-based shape descriptors and region-
based shape descriptors.
Contour-based shape descriptors include many transformations
such as Fourier Transform (FT) [27], Wavelet Transform (WT)
[28], Curvature Scale Space (CSS) [29], etc. These descriptors
use only boundary information and not inside information
about the shape. Also, these methods cannot be used with
disjointed shapes where boundary information is not clear.
With region-based approaches, shape descriptors use all the
pixel information within a region. These descriptors include
Geometric Moments (GM) [30], Angular Radial Transform
(ART) [31], Zernike Moments (ZM) [32], Generic Fourier
Descriptors (GFD) [33], etc. Although these descriptors are
sensitive to noise and shape variations [33] [32], they provide
satisfactory results.
Consequently, in this paper we have chosen to use region-
based descriptors. This choice is based on their invariants to
776
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on December 27,2021 at 15:04:43 UTC from IEEE Xplore. Restrict
4. geometric transformation and their performance to character-
ize hand shape. Also to highlight the superiority of region-
based shape descriptors, we compare them to the commun
contour-based shape descriptors such as the Fourier Descriptor
(FD).
V. EXPERIMENTAL RESULTS
In this section, we present the test protocol and the ex-
perimental results for our proposed method. First, we present
an evaluation step to our proposed wrist detection algorithm
with a public dataset. Second, we apply our hand detection
and wrist localization process to recognize Arabic digit sign
language.
A. Evaluation on a public dataset
1) Database:
Our proposed wrist detection method is evaluated by using a
public hand gesture recognition database (HGR). It includes
Polish Sign Language gestures and American Sign Language.
Other special signs were used as well. The database was pro-
posed to evaluate hand detection and to propose an estimation
system supported by the Polish Ministry of Science and Higher
Education under research grant no. IP2011 023071 from the
Science Budget 2012-2013 1
.
The HGR database is composed of three series: HGR1,
HGR2A and HGR2B. Each series includes three subsequent
data: original RGB images (jpg files), ground truth binary
skin presence masks (bmp files) and hand feature points
location (xml files). In our evaluation process we use the
HGR1 database proposed in the work of [23] to have a faithful
comparison. It contains 899 images related to 25 gestures
presented by 12 individuals with uncontrolled background
and lighting conditions. Figure 8 shows an example from the
HGR1 database
2) Test protocol:
The performances are evaluated with the same conditions as
those [23]. We detect the reference points U’, V’ and W’ for
each image and compare them with the groud-truth points U,V
and W presented in the xml file.
To verify the performance of our wrist detection process, we
calculate the detection error defined in [23] as:
e =
|WW
|
UV
(1)
The wrist is considered detected if eE where E=1.0 is the
maximal detection error. An example is illustrated in Figure
9.
3) Wrist detection process evaluation: comparison with
existent work:
All results are presented in Table I.
We can conclude that our proposed wrist detection process
gives better results, in terms of error detection, compared to
[23]
s approach. This reduction of the error can be explained
by the addition of the hand orientation and the division
step. This offers the possibility to surpass the different error
1The data set is available at http://sun.aei.polsl.pl/?mkawulok/gestures
Fig. 8: Example from HGR1 Database.
Fig. 9: Example illustrated in [23] from a silhouette with the
ground-truth (U, V, W) and detected (U’, V’, W’) points and
possible wrist point areas. The detection errors are
(a) e = 0.13 and (b) e = 0.66.
conditions related to the finger or forearm information as found
in the work of [23].
Figure 10 illustrates the wrist location assured by our ap-
proach.
Fig. 10: Obtained wrist detection results compared with
ground-truth data: work [23] is presented with a red circle and our
approach is presented with a green circle
TABLE I: Error rates results; our approach and [23]’s
approach.
Number of e E Approach [23] Our approach
E = 1.0 131.7 (14.7%) 125 (13.9%)
E = 0.5 323.3(36.0%) 175(19.46%)
The obtained results highlight the performance of our
approach, particularly, the importance of the vertical hand
777
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on December 27,2021 at 15:04:43 UTC from IEEE Xplore. Restrict
5. direction and the elimination of the part including finger
information.
B. Application of the proposed approach to the Arabic digit
sign language recognition
The Arabic Sign language is the principal manner of com-
munication between the deaf and the hearing impaired people.
It is not universal and it is very complex with a special rules
and grammars presented with different signs. It can also be
divided into static (digit, alphabet) and dynamic signs (isolated
and continuous words). In this work we are interest only in
the static signs.
1) Proposed database and test protocol:
Until now, Arabic sign language has received little attention
due to its complexity [9]. The most problem is the absence
of standard database [34]. So to evaluate the performance
of our approach, we acquired a new database (our database)
that contains 216 hand images captured by applying different
orientations and different lighting conditions. The database
contains all the Arabic digit sign language (10 Arabic sign
language digit). Figure 11 illustrates some examples from this
database.
Fig. 11: Examples from our database.
This database has been split randomly into two subsets:
training and test. Experiments have been performed on three
random combinations. The recognition phase was executed by
a k-nearest neighbor classifier.
The performances were evaluated in terms of recognition
rate, recall rate and precision rate defined in the following
equations.
Recognition rate =
T otalNumber of gestures correctly identified
T otal number of gestures
(2)
P recision =
Count ofretrieved images relevant to the query image
T otal count of images retrieved
(3)
Recall =
Count of retrieved images relevant to the query image
T otal count of relevant images in the DataBase
(4)
2) Results:
All the results of all the descriptors proposed in Section 4 are
presented in Table II, Figure 12 and Figure 13.
According to these results we can conclude what follows:
• According to Table II Region-based approaches such as
ZM and GFD are more efficient than contour-based ap-
proaches (FT) because they use all the pixel information
within a hand region.
Fig. 12: Recall rate for all descriptors.
Fig. 13: Precision rate for all descriptors.
• ZM descriptors are the most suitable ones in this domain.
They achieved very satisfactory results for complex num-
bers (7, 8, 9). When we examine the seventh, eighth and
ninth digits (see Figure 14), we can notice their similari-
ties due to the absence of a specific finger for each digit
(the index finger for the ninth number, the ring for the
eighth number and the middle for the seventh number).
This gap has been well presented by the Zernike Moments
descriptor. This indicates that the relevant information
(position of the hidden finger) has been defined precisely.
The performance of ZMs is essentially due to the fact that
their principal functions are orthogonal. In consequence,
ZM can characterize an image with no redundancy or
overlap of information between the moments. Hence it
takes into account all the inner details of the shape, that
offer the possibility to present more information over
the unit circle. So the Zernike Moments descriptor has a
strong detection of slight variations in the complex form.
Fig. 14: Similarity between seven, eight and nine digits.
• The GFD descriptors have achieved very satisfactory
performances for digits (1, 2, 3 and 4). When we
examine the characteristics of these digits as well
(Figure 15), we can see that they are represented by
successive fingers. Each of the fingers is elongated,
rounded and convexed outward. These specifications
have been well detected by the polar presentation.
If whenever there are two points inside the convex
shape, it has also a segment connecting these two
points which have already been presented with another
778
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on December 27,2021 at 15:04:43 UTC from IEEE Xplore. Restrict
6. angle θ (θ = θ + 2KΠ). The ZM descriptors have an
ambiguity with this situation because they are only able
to describe shape features in a circular direction, but it
was easily done by GFD which presented shape more
precisely in the radial directions. So the Generic Fourier
Descriptor has a strong ability to detect shapes in general.
Fig. 15: Example of one, two, three and four digits presentation.
TABLE II: Recognition rate for all descriptors
Descriptors k = 1 k = 3 k = 5
Zernike Moment 86.77% 92.06% 96.44%
Generic Fourier Transform 77.24% 85.18% 87.29%
Fourier T ransform 69.83% 86.87% 86.27%
In addition, our obtained result presented in Table III proves
the importance of the wrist localization stage to have a faithful
hand feature extraction. Table III illustrates the decrease in
the recognition rate when using ZMD (85.7%) and GFD
(69.73%) without the wrist detection step. These results prove
also the importance of the wrist localization step in the hand
recognition system.
TABLE III: The ZMD and GFD recognition rates without
and with wrist detection step (wds).
Descriptors Without wds With wds
Zernike Moment 85.7% 96.44%
Generic Fourier Transform 69.73% 87.29%
VI. CONCLUSION AND FUTURE WORKS
This paper proposes a new hand detection and wrist local-
ization process for the Arabic digit sign language recognition.
The experimental results underscore our proposed wrist de-
tection method compared with existing works. A comparative
study between different shape descriptors in terms of gesture
recognition rate and precision/recall rates is presented. ZM are
the most suitable ones and they achieved a very satisfactory
results.
As perspectives to this work, we plan to address the wrist
localization step where the hand region is not presented only
in the scene. Also, we intend to propose other classifiers to
improve the performance of our proposed hand and wrist
detection process with an investigation of new hand features
and why not a real-time Arabic sign language application.
REFERENCES
[1] M. G. Cho, “A new gesture recognition algorithm and segmentation
method of korean scripts for gesture allowed ink editor,” Information
Sciences, vol. 176(9), pp. 1290 – 1303, 2006.
[2] M. Maraqa and R. Abu-Zaiter, “Recognition of arabic sign language
(arsl) using recurrent neural networks,” Proceedings of the 1st Interna-
tional Conference on the Applications of Digital Information and Web
Technologies, ICADIWT, IEEE Xplroe Press, Ostrava, pp. 478–481, Aug
4-6 2008.
[3] S. Bilal, R. Akmeliawati, M. El Salami, and A. Shafie, “Vision-based
hand posture detection and recognition for sign language- a study,” Inter-
national Conference On Mechatronics, ICOM, Kuala Lumpur, Malaysia,
pp. 1–6, May 7-19 2011.
[4] M. Mohandes, J. Liu, and M. Deriche, “A survey of image-based arabic
sign language recognition,” International Multi-Conference on Systems,
Signals Devices, SSD, Barcelona, Spain, pp. 1–4, Feb 11 - 14 2014.
[5] A. Jean, “The phonetics of fingerspelling,” Language and Speech,
vol. 36, pp. 471–475, 1993.
[6] M. Jebali, P. Dalle, and M. Jemni, “Hmm-based method to overcome
spatiotemporal sign language recognition issues,” International Confer-
ence on Electrical Engineering and Software Applications, Hammamet,
Tunisia, pp. 1–6, 2013.
[7] S. Liwicki and M. Everingham, “Automatic recognition of fingerspelled
words in british sign language,” IEEE Computer Society Conference
on Computer Vision and Pattern Recognition Workshops, Fontainebleau
Miami Beach, pp. 50–57, 2009.
[8] J. Zhang, W. Zhou, C. Xie, J. Pu, and H. Li, “Chinese sign language
recognition with adaptive hmm,” 2016 IEEE International Conference
on Multimedia and Expo, ICME, Seattle, USA, pp. 1–6, Jul 11-15 2016.
[9] H. H. A. Aliaa A.A Youssif Amal Elsayed Aboutabl, “Arabic Sign
Language (ArSL) Recognition System Using HMM,” International
Journal of Advanced Computer Science and Applications,IJACSA, vol.
2(11), 2011.
[10] S. Satoh, Y. Nakamura, and T. Kanade, “Name-it: Naming and detecting
faces in news videos,” IEEE MultiMedia Conference, vol. 6, pp. 22–35,
Jan-Mar(Spring) 1999.
[11] A. Senior, R. Hsu, M. A. Mottaleb, and A. Jain, “Face detection in color
images,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 24, pp. 696–706,
May 2002.
[12] Y. B. Jemaa and S. Khanfir, “Automatic local gabor features extraction
for face recognition,” CoRR, vol. abs/0907.4984, 2009.
[13] M. J. Jones and J. M. Rehg, “Statistical color models with application
to skin detection,” International Journal of Computer Vision, vol. 46,
pp. 81–96, 2002.
[14] M. Kawulok, J. Kawulok, J. Nalepa, and B. Smolka, “Self-adaptive
algorithm for segmenting skin regions,” EURASIP Journal on Advances
in Signal Processing, vol. 2014, p. 170, 2014.
[15] P. Yogarajah, J. Condell, K. Curran, A. Cheddad, and P. McKevitt, “A
dynamic threshold approach for skin segmentation in color images,”
2010 IEEE International Conference on Image Processing, Hong Kong,
pp. 2225–2228, 2010.
[16] M. Kawulok, J. Kawulok, and J. Nalepa, “Spatial-based skin detection
using discriminative skin-presence features,” Pattern Recognition Let-
ters, vol. 41, pp. 3 – 13, 2014.
[17] B. Manisha, R. K. Krishna, and V.Vivek, “Simplified watershed trans-
formation,” International Journal of Computer Science and Communi-
cation, vol. 1, pp. 175–177, Jan-Jun 2010.
[18] M. C. Christ and R. M. Parvathi, “Segmentation of medical image
using clustering and watershed algorithms,” American Journal of Applied
Sciences, vol. 8, pp. 1349–1352, 2011.
[19] X. Yang, H. Li, and X. Zhou, “Nuclei segmentation using marker-
controlled watershed, tracking using mean-shift, and kalman filter in
time-lapse microscopy,” IEEE Transactions on Circuits and Systems,
vol. 53, pp. 2405–2414, Dec 2006.
[20] B. Paulson, D. Cummings, and T. Hammond, “Object interaction detec-
tion using hand posture cues in an office setting,” International Journal
of Human-Computer Studies, vol. 69, pp. 19 – 29, 2011.
[21] A. Licsár and T. Szirányi, “Hand gesture recognition in camera-projector
system*,” Computer Vision in Human-Computer Interaction,ECCV 2004
Workshop on HCI, Prague, Czech Republic, Proceedings, pp. 83–93,
May 16 2004.
[22] J. Nalepa, T. Grzejszczak, and M. Kawulok, “Wrist localization in
color images for hand gesture recognition,” Man-Machine Interactions
3, Springer International Publishing, Cham, pp. 79–86, 2014.
[23] T. Grzejszczak and M. Nalepa, Jakuband Kawulok, “Real-time wrist
localization in hand silhouettes,” Proceedings of the 8th International
Conference on Computer Recognition Systems, CORES, Milkow, pp.
439–449, May 27-29 2013.
779
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on December 27,2021 at 15:04:43 UTC from IEEE Xplore. Restrict
7. [24] H. Ye, G. Shang, L. Wang, and M. Zheng, “A new method based
on hough transform for quick line and circle detection,” 2015 8th
International Conference on Biomedical Engineering and Informatics,
BMEI, Shenyang, China, pp. 52–56, Oct 14-16 2015.
[25] D. Duan, M. Xie, Q. Mo, Z. Han, and Y. Wan, “An improved hough
transform for line detection,” International Conference on Computer
Application and System Modeling, ICCASM, Taiyuan, China, vol. 2,
pp. 354–357, 2010.
[26] D. Shi, L. Zheng, and J. Liu, “Advanced hough transform using a
multilayer fractional fourier method,” IEEE Transactions on Image
Processing, vol. 19(6).
[27] M. Bober, “Mpeg-7 visual shape descriptors,” IEEE Transactions on
Circuits and Systems for Video Technology, vol. 11, pp. 716–719, 2001.
[28] N. Nacereddine, S. Tabbone, D. Ziou, and L. Hamami, “Shape-based
image retrieval using a new descriptor based on the radon and wavelet
transforms,” 2010 20th International Conference on Pattern Recognition,
Istanbul, Turkey, Aug 23- 26 2010.
[29] F. Berrada, D. Aboutajdine, S. E. Ouatik, and A. Lachkar, “Review
of 2d shape descriptors based on the curvature scale space approach,”
2011 International Conference on Multimedia Computing and Systems,
Ouarzazate, Morocco, Apr 07- 09 2011.
[30] V. Darlagiannis, K. Moustakas, and D. Tzovaras, “On geometric and
soft shape content-based search,” 2010 IEEE International Conference
on Image Processing, Hong Kong, Sep 26- 29 2010.
[31] J. Fang and G. Qiu, “Human face detection using angular radial
transform and support vector machines,” International Conference on
Image Processing, ICIP, Barcelona, Spain, vol. 1, pp. I–669–72, 2003.
[32] J. Davis and M. Shah, “Recognizing hand gestures,” Eropean Conference
on Computer Vision, ECCV, Stockholm, Sweden, pp. 331–340, May 2-6
1994.
[33] D. Zhang and G. Lu, “Enhanced generic fourier descriptors for object-
based image retrieval,” IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), Orlando, Florida, USA, vol. 4,
pp. 3668–3671, May 13-17 2002.
[34] M. Alfonse, A. Ali, A. S. Elons, N. L. Badr, and M. Aboul-Ela, “Arabic
sign language benchmark database for different heterogeneous sensors,”
2015 5th International Conference on Information Communication Tech-
nology and Accessibility,ICTA, Marrakech, Dec 21- 23 2015.
780
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on December 27,2021 at 15:04:43 UTC from IEEE Xplore. Restrict