SlideShare a Scribd company logo
Project Mentor:
     Mr. Aleem Khalid Alvi             [aleem_alvi@yahoo.com]   [akalvi@ssuet.edu.pk]
             Assistant Professor


Team Members:
     Mr. Ali Muzzaffar               [ali_muzzafar@yahoo.com]   [alim@ssuet.edu.pk]
     Mr. Mehmood Usman             [apnamehmood@yahoo.co.uk]    [mgazdhar@ssuet.edu.pk]
     Mr. Suleman Mumtaz                [smkhowaja@yahoo.com]    [smumtaz@ssuet.edu.pk]
     Mr. Yousuf Bin Azhar                   [musuf@yahoo.com]   [muhybina@ssuet.edu.pk]




                            http://www.boltayhaath.cjb.net

                        !                          " !
#"   $%%&                         Sir Syed University of Engineering & Technology

1. ABSTRACT

Humans know each other by conveying their ideas, thoughts, and experiences to the people
around them. There are numerous ways to achieve this and the best one among the rest is the
gift of “Speech”. Through speech everyone can very convincingly transfer their thoughts and
understand each other. It will be injustice if we ignore those who are deprived of this
invaluable gift.

The only means of communication available to the vocally disabled is the use of “Sign
Language”. Using sign language they are limited to their own world. This limitation prevents
them from interacting with the outer world to share their feelings, creative ideas and
potentials. Another problem is that very few people who are not themselves deaf ever learn to
sign. This therefore increases the isolation of deaf and dumb people.

Technology is one way to remove this hindrance and benefit these people, and the project
Boltay Haath is one such attempt to solve this problem by computerized recognition of sign
language. Boltay Haath is an Urdu phrase which means ‘Talking Hands’. The basic concept
involves the use of special data gloves connected to a computer while a vocally disabled
person (who is wearing the gloves) makes the signs. The computer analyzes these gestures
and synthesizes the sound for the corresponding word or letter for ordinary people to
understand.

Several researchers have explored these possibilities and have successfully achieved finger-
spelling recognition with high levels of accuracy, but progress in the recognition of sign
language, as a whole has been limited.

This project is an attempt to recognize Pakistan Sign Language (PSL), which has not been
done in any other system. Furthermore, the Boltay Haath project aims to produce sound
matching the accent and pronunciation of the people of the region in which PSL is used.
Since only single-handed gestures have been considered in this project it is obviously
necessary to select a subset of PSL to be considered for implementation of Boltay Haath as it
would take vast amounts of time to sample most or all of the 4000 signs in PSL.




                                                                                           $
#"   $%%&                         Sir Syed University of Engineering & Technology

2. SYSTEM OVERVIEW
The system objective was to develop a computerized Pakistan Sign Language (PSL)
recognition system which is an application of Human Computer Interface (HCI). The system
considers only single handed gestures; therefore a subset of PSL has been selected for the
implementation of Boltay Haath. The basic concept involves the use of computer interfaced
data gloves worn by a disabled person who makes the signs. The computer analyzes these
gestures, minimizes the variations and synthesizes the sound for the corresponding word or
letter for normal people to understand. The basic working of the project is depicted in the
following figure.




                              Figure 2.1 - System Diagram

The above diagram clearly explains the scope and use of the Boltay Haath system. The
system aims at bridging communication gaps between the deaf community and other people.
When fully operational the system will help in minimizing communication gaps, easier
collaboration and will also enable sharing of ideas and experiences.

2.1 PERFORMANCE MEASURES
The following performance parameters were kept in mind during the design of the project:
    • Recognition time: A gesture should take approximately 0.25 to 0.5 second in the
       recognition process in order to respond in real time.
    • Synchronized speech synthesis. The speech output corresponding to a gesture should
       not lag behind the gesture output by more than 0.25 seconds.
    • Continuous and automatic recognition: To be more natural the system must be
       capable of recognizing the gestures continuously without any manual indication or
       help for demarcating the consecutive gestures.
    • Recognition Accuracy: The system must recognize the gestures accurately between
       80 to 90 percent.




                                                                                         '
#"   $%%&                          Sir Syed University of Engineering & Technology

2.2 DESIGN METHODOLOGY
Waterfall plus Iterative model for the development of Boltay Haath has been followed. This
model was selected because a thorough design of the system was needed before initiating. All
the specifications had to be outlined in detail and all issues worked out so that the
development of this project could carry out within the time and cost constraints. In other
words architecture-first development has been attempted. After this stage a broad
understanding was developed by the team and trouble spots could easily be sensed in the
design. So naturally the next logical step was to repeat the critical stages of the process to
iron out any problems in the way as well as evaluate design alternatives and tradeoffs. Object
oriented approach being the most practical way of developing such kind of projects was
obviously the best choice for the project. Test plans have also been designed to test the
system systematically. The sub systems were tested separately as well as in cohesion.

                                                     Five Improvements for the Waterfall
                                                     Model to Work
                                                     - Complete program design before
                                                     analysis and coding begins
                                                     - Maintain current and complete
                                                        documentation.
                                                     - Do the job twice, if possible.
                                                     - Plan, control, and monitor testing.
                                                     - Involve the user.




                        Figure 2.2 - Waterfall plus Iterative Model

2.3 UNIQUE AND INNOVATIVE IDEAS
Different people in different regions of the world have contributed towards the recognition of
sign language of their regions but so far no work has ever been done regarding the
recognition of sign language (PSL) of our region. So, Boltay Haath is the first system which
contributes in achieving this noble cause. Furthermore, the system aims to produce sound
matching the accent and pronunciation of the people of the region in which PSL is used.

The recognition systems developed to date usually solves the problem of gesture demarcation
through the use of various manual techniques and operations. To make the system more
natural and interactive, Boltay Haath uses the technique for the real time continuous
recognition of gestures, hence no need for any manual indication or signal.

Although the primary objective of Boltay Haath is to recognize Pakistan Sign Language but
the system is capable of recognizing any other sign language of the world by learning their
respective gestures.

The Boltay Haath system can be modified for use on hand held devices thus making the
system more portable and easier to use in daily life. For this purpose the Microsoft compact
framework for .Net is the best candidate since the system is being developed using current
.Net technologies.


                                                                                             (
#"    $%%&                         Sir Syed University of Engineering & Technology
3. IMPLEMENTATION AND ENGINEERING CONSIDERATIONS

3.1 PSL SIGNS USED IN BOLTAY HAATH
The sign language into Sub-domains that is English and Urdu. This is because of the
similarity of some gestures. Moreover English and Urdu both contain gestures of words and
letters. Gestures have been categorized into Dynamic and Static. In Urdu there are 38 letters.
In which few are dynamic and words are of both types one-handed and two-handed. In
English there are 26 letters. In which two are dynamic and words are of both types one-
handed and two-handed. PSL also contains domain specific signs for example computer
terms, Environmental terms and Traffic terms




                  Figure 3.1 - English and Urdu Alphabet Signs in PSL

3.2 SYSTEM ARCHITECTURE
The Boltay Haath system is divided in to the following sub systems:

   •    Gesture Database – Contains all the persistent data related to the system.
   •    Gesture Acquisition – Gets state of hand (position of fingers, roll and pitch) from
        glove and convey to the main software.
   •    Training- It uses the collected data to train the system.
   •    Gesture Recognition Engine – Analyzes the input to recognize the gesture made by
        the user. Two different techniques have been implemented for this purpose namely
        Artificial Neural Network (ANN) and Statistical Template Matching (STM).
   •    Gesture Output- for gesture and textual data. Converts word/ letters obtained after
        gesture Recognition into corresponding sound.



                                                                                            &
#"    $%%&                                Sir Syed University of Engineering & Technology
       •   Accelerometer- Accelerometer detects motion of the hand, in order to demarcate start
           and end of gestures for continuous gesture recognition.

The following figure illustrates the architecture of the Boltay Haath system:




                                   Figure 3.2 - System Architecture [1]

A detailed description of the architecture, the implementation techniques and the algorithms
of the system are given below:

3.3 GESTURE DATABASE
A particular input sample in this system is defined by the combination of five sensors for
fingers and one tilt sensor for roll and pitch which is stored in the Gesture Database during
the data acquisition phase. The gestures in the database are organized with respect to there
area of use i.e. its domain. For example, the alphabet domains contain the Urdu and English
alphabet gestures. Word domains may contain the list of emergency gestures, daily routine
gestures and other special gestures. The database also stores relevant data like gesture’s
phoneme†, training results of STM and ANN and Registered Users‡ information.

3.4 DATA ACQUISITION
This sub system captures the state of the hand (flexure of fingers, roll and pitch) from the
glove and stores it in the Gesture Database for further processing. It handles all the data
coming to and from the Data Glove. The driver software provided by the vendor had to be
adapted for use in the .Net managed environment and hence a wrapper class for the Glove

†
    Phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning.
‡
    Registered Users are those users who participated in the training of the system.

                                                                                                                 )
#"   $%%&                           Sir Syed University of Engineering & Technology
Driver was written in C# for use in the system. During acquisition the training data are
identified by the Gesture ID of their corresponding signs and stored in a table for later use by
the training algorithms.

An input sample consists of five values ranging from 0 to 255 each representing the state of
the sensor on all five fingers of the glove. The sensors for roll and pitch have been ignored in
case of non-moving gestures since their values do not uniquely identify an alphabet sign [2].
The sequence diagram showing the use of data acquisition interface in gesture recognition is
shown below.




                         Figure 3.3 – Data Acquisition Sequence Diagram


3.5 DATA GLOVE
The input device used in Boltay Haath is the
5DT Data Glove 5. It is equipped with sensors
that sense the movements of the hand and
interface those movements with a computer. The
5DT Data Glove 5 measures finger flexure and
the orientation (pitch and roll) of user’s hand. It
consists of 8-bit flexure resolution, Platform
independent - serial port, interface (RS 232),
built in 2-axis tilt sensor.


                                                       Figure 3.4 - Components of 5DT Data Glove 5
                                                                                              *
#"    $%%&                           Sir Syed University of Engineering & Technology

3.6 TRAINING
The Training sub system trains the system so that it can perform gesture recognition
afterwards. The training process is different for the two different modes of operation of
Gesture Recognition Engine (GRE) i.e., Statistical Template Matching (STM) and Artificial
Neural Network (ANN). In both cases training is a batch process.

For the generalized recognition of gestures it was necessary to collect the data from different
users. The system was trained by using data obtained from six different signers [3]. Initially,
training data was collected for the non-moving gestures as in [4] of English as well as Urdu,
since PSL contains both types of signs [5]. This was done due to the limitations of the input
device i.e., the Data Glove 5 does not provide the abduction status and the absence of any
kind of input about the location of the glove in space.

The separate training processes for STM and ANN are disused below.

3.6.1 STATISTICAL TEMPLATE SPECIFICATION (STS)
The idea is to demarcate different gestures by calculating the mean (µ) and standard
deviations (σ) of all the sensors for each gesture in the training set. The resultant (µ, ) pairs
are stored in the gesture database for later use in gesture recognition and are called templates.
Thus the process is named “Template Specification”. The mean and standard deviation is
calculated for each sensor of each gesture as follows:


                                                                                              (3.1)


                                                                                               (3.2)
  Here, xi is the ith sensor value, n is the number of samples, µ(l,m) is the mean of the lth sensor
of the mth gesture and σ(l,m) is the standard deviation of the lth sensor of the mth gesture.




                             Figure 3.5 – STS Sequence Diagram


                                                                                                  +
#"   $%%&                           Sir Syed University of Engineering & Technology

3.6.2 ARTIFICIAL NEURAL NETWORK TRAINING (COMMITTEE SYSTEM)
The Artificial Neural Networks Training (ANN) sub system allows the training of the various
neural networks in the system. It collects data from the Gesture Database and applies
supervised learning algorithm (Backpropagation [6]) for training the neural networks and
finally saves the networks in the database.

Since a single network could not converge on the available data it did not perform well. So it
was decided to tackle the problem with a divide and conquer approach. This technique,
labeled a committee system [7], combines the outputs of multiple networks called experts to
produce a classification rather than using only a single network. The rationale is the
realization that there is not one global minimum into which every net trains but that there are
many minima where adequate classification on the training examples can be obtained. So by
combining the output of several networks it may be possible to gain superior generalization
than that of any single network.

Each small network for a particular gesture is called an ‘expert’. The training data for each of
the experts contains equal number of samples of both classes that it classified. For example,
the training set for the expert for ‘A’ contains half samples of ‘A’ and the remaining half
would comprise of the rest of the signs. Backpropagation was used to train the experts. A
learning rate of 0.1 was used and the training set comprised of more than 2000 samples for
each expert. The input was scaled further from the range of 0 to 255 to –1.28 to 1.27. This
further scaling reduced the error and provided better results after training. This is some times
called pre-processing




                          Figure 3.6 – ANN Training Sequence Diagram


                                                                                              ,
#"     $%%&                           Sir Syed University of Engineering & Technology

3.7 GESTURE RECOGNITION ENGINE
The Gesture Recognition Engine is the core of the system. It performs gesture recognition
using the two techniques (STM and ANN). It interacts with most of the subsystems. It takes
gesture input from the Gesture Input subsystem, identifies them and gives output through the
Output subsystem in text and speech. The separate recognition processes for STM and ANN
are disused below.

3.7.1 STATISTICAL TEMPLATE MATCHING
The statistical model used in Boltay Haath is the simplest approach to recognize postures
(static gestures) [8], [9]. The model used is known as “Template Matching” or “Prototype
Matching” [10]. The idea is to demarcate different gestures by calculating the mean (µ) and
standard deviations (σ) of all the sensors for a gesture and then those input samples that are
within limits bounded by an integral multiple of standard deviation are recognized to be
correct. Gesture boundary [11] for each sensor is defined as,


                                                                                          (3.3)
Here, µ is the mean and σ is the standard deviation of that sensor whose gesture boundary is
to be defined. Similarly gesture boundaries for each sensor of all the gestures are defined and
used in Pattern Matching.

3.7.1.1 ALGORITHMIC MODEL

        a) PATTERN RECOGNITION
        After Statistical Template Specification (STS), test samples are provided to the pattern
        recognition module, which analyzes them using the statistical model [12]. The upper
        and lower limits for the value of a sensor for a particular gesture are defined using the
        standard deviation for that sensor previously calculated. For enhancing the accuracy of
        gesture recognition, various integral multiples of σ are used, denoted by k in (3.3). The
        limits for any given gesture are defined as:
                                                                                     (3.4)

                                                       -                                 (3.5)
        Given the above-mentioned criteria, any given input can be classified as a particular
        gesture if all the sensor values of the test sample lie within these limits (i.e. the gesture
        boundary). These values are retrieved from the gesture database. The values of k used
        for gesture recognition in Boltay Haath range from 1 to 3, providing tolerances ranging
        from 2σ to 6σ. The performance achieved by varying the values of k is discussed later
        in Testing and Verification.

     b) AMBIGUITY AMONG OUTPUTS
     Sometimes due to ambiguity among two
     or more gestures STM may produce
     multiple outputs. The ambiguity is
     created due to the overlapping of
     different gesture boundaries. The
     overlapping increases as the value of k is
     increased from 1 to 3.To cater to this
                                                              Fig 3.7 Ambiguous Signs

                                                                                                  -%
#"     $%%&                         Sir Syed University of Engineering & Technology
 problem the method of Least Mean Squares (LMS) is used. Figure 3.7 shows two
 ambiguous signs ‘R’ and ‘H’.

     c) LMS FOR REMOVING AMBIGUITY
     There are cases where more than one gestures are candidates for output. To overcome
     this type of situation the system calculates Least Mean Squares (LMS) [13] of all the
     candidate gestures and then selects the one with minimum LMS value. It is calculated
     as,
                                                                                       (3.6)


     Here, xi denotes the sensor value of the ith sensor from test sample; µi denote mean
     value for the ith sensor.

     LMS for each candidate gesture is calculated and the gesture with least LMS value is
     selected as the final output. The use of LMS is justified by the results. Analyzing the
     performance of the system it has been observed that the use of LMS provides accurate
     results 60 % of the time.




              Figure 3.8 – STM Gesture Recognition Sequence Diagram

                                                                                         --
#"     $%%&                          Sir Syed University of Engineering & Technology

3.7.2 ARTIFICIAL NEURAL NETWORK (ANN)
In this mode, the GRE takes input data and feeds it to multiple artificial neural networks in
parallel. The approach taken is to initially process the input data so as to produce a
description in terms of the various features (handshape, orientation and motion) of a sign. The
sign can then be classified on the basis of the feature vector thus produced. This mode uses
our Artificial Neural Network Library (ANNLIB) to run Multi Layer Perceptrons (MLPs) for
recognition.

3.7.2.1 COMMITTEE SYSTEM FOR RECOGNITION
        The various experts (neural networks for each gesture) were trained using a “one
        against all” technique in which each network is trained for a particular sign to give a
        positive response for that sign and a negative one for all the others. So in the final
        system all the experts have the same architecture and are given the same input. Fig 3.9
        shows the committee system used in the system.




                                                                                      Output
                                                                        Voting
                                                                       Mechanism
        Input




                                    Figure 3.9 - Committee System




         a) ARCHITECTURE OF EXPERTS
         The architecture of the experts used in the committee system is 5:8:1 i.e., 5 inputs, 8
         hidden nodes and 1 output node. The activation function for nodes in the hidden layer
         was Sigmoid Logistic and Hyperbolic for output nodes.

         b) VOTING MECHANISM
         The voting mechanism takes the output of all the experts as its input. It identifies the
         resultant gesture by examining the outputs of all the experts and selecting the one with
         a positive result.



                                                                                               -$
#"    $%%&                         Sir Syed University of Engineering & Technology

        c) FINAL CLASSIFICATION
        Since the experts could not be optimally trained, multiple experts can give a positive
        result. To solve this problem the results of each expert can be multiplied with its
        accuracy by the voting mechanism to give more weightage to the result of more
        accurate experts over less accurate ones. And finally the output with the largest
        positive value is selected as recognized gesture.




                    Figure 3.10– ANN Gesture Recognition Sequence Diagram


3.8 OUTPUT
The output of the system has two forms, one is the formatted text and other is in the form of
speech. Obviously the important one is the speech output as it accomplishes the objective of
the system.


                                                                                           -'
#"   $%%&                        Sir Syed University of Engineering & Technology

3.8.1 TEXT
This subsystem outputs recognized sign into formatted text. The Urdu sign is output into
Roman Urdu† language. This text output is then used by Text-to-speech module for its
processing.

3.8.2 SPEECH
The text-to-speech subsystem converts text into synthesized Urdu / English speech using
Microsoft Speech SDK version 5.1[14]. The lexicon and phoneme sets have been modified so
that it can pronounce words correctly in local accent.




                           Figure 3.11– Speech Output Sequence Diagram

3.9 ACCELEROMETER
Accelerometer is used for detecting a sign continuously without any need of manual aid for
indicating the start or end of a gesture. It automatically identifies the ending of one gesture
and start of the other. In this way gestures are recognized in continuous fashion.

3.9.1 ALGORITHMIC MODEL
Acceleration is calculated for each sensor by averaging the differences between the last n
inputs (in a sliding window fashion). When the acceleration of all the sensors is below a
certain threshold value, the system identifies the state of hand as stationary and sends the
sensor values for recognition to the engine. As soon as the acceleration exceeds the threshold
value the system marks the hand as in motion and stops recognition. The sliding window size
and the threshold values are adjusted so that the user need not make a deliberate effort to stop

†
    Roman Urdu - Urdu written with the use of English alphabets.

                                                                                             -(
#"   $%%&                                          Sir Syed University of Engineering & Technology
for sometime in order to get the sign recognized. It is accessed by user through an
accelerometer interface. The user can set the threshold value and sliding window size
according to his/her needs.


                                                        Window size = 8



          150 151 152 153 154 156 159 161 165 168 172 178 182 185 186 190




                                        2         3         2         4         3         4         6



                                   Threshold = 1.8

                                   A = 2+3+2+4+3+4+6 = 24 = 3.43
                                              7            7
                                   A> Threshold therefore hand is in motion

                            Figure 3.12(a) - Motion Detection (Hand in motion)

The above figure shows how the accelerometer determines that the hand is in motion. The
sliding window shows the state of a sliding window for a single sensor. The differences are
shown in triangles. The average of these differences is above the threshold value. Hence the
system identifies the sensor to be in motion.

                                                        Window size = 8



         150   151   152    153   154       155       155       156       157       158       158       159   160   161   161   161




                                        1         0         1         1         1         0         1



                                  Threshold = 1.8

                                  A = 1+0+1+1+1+0+1 = 5 = 0.714
                                             7          7
                                  A< Threshold therefore hand is stationary

                           Figure 3.12(b) - Motion Detection (Hand is stationary)

The above figure shows how the accelerometer determines that the hand is stationary. The
sliding window shows the state of a sliding window for a single sensor. The average of these
differences is below the threshold value. Hence the system identifies the sensor to be
stationary.

The accelerometer is used on a per sensor basis. So for five sensors, five accelerometer
objects are used and each is continuously provided with its corresponding sensor value. The
accelerometer design used till now is limited to either static gestures or dynamic gestures.



                                                                                                                                      -&
#"   $%%&                         Sir Syed University of Engineering & Technology

3.10 DEVELOPED TOOLS
In the course of developing the system various tools and libraries were developed. This
includes wrapper class for glove driver (5DT Data Glove5 driver) in C#, its driver was
written originally in VC++ (unmanaged) and was converted in to C# (managed code). An
Artificial Neural Networks library ANNLib was written in C#. Also, a performance
evaluation tool for evaluating performance of STM and ANN recognition systems efficiency
was developed.

3.10.1 PERFORMANCE EVALUATION TOOL FOR GRE (PETool)
The PETool evaluates the results obtained by applying the test data to gesture recognition
engines of STM and ANN and generate reports and graphical view of data for performance
evaluation purpose. The simulation data is used to evaluate whether the current
configurations of STM and ANN are providing acceptable results or not.

3.11 TRADEOFFS
Many tradeoffs regarding accuracy and efficiency were made during the design and
implementation of the system. A major issue was the training of neural networks. The amount
of training data, the optimal architecture of the neural networks and the classification
mechanism were a few considerations. For quick training the training data set needed to be
small and for greater accuracy more data was needed but more time was required for training.

The quality of training data was of major concern for STM as well as ANN. The greater the
number of registered users, the better the generalization. But more data does not come
without its share of bad samples.

In STM recognition, gesture boundaries of sensors are defined as µ ± kσ , the system uses
k=3 after trying all values of 1, 2 and 3. This model µ ± 3σ covers large variation of data
(up to 6-sigma) but at the same time increases the overlapping of different gestures. This
overlapping of gestures creates ambiguity among outputs that has to be removed with the use
of LMS.

A similar case can be made for speech output. Text to speech output provides an efficient
way of producing speech output but the quality of sound produced is not at par with pre-
recorded human voice. However, recorded voices incur a heavy processing cost on the
system when it comes to real-time recognition.

The accelerometer is used to filter the data stream coming form the Data Glove in a thread.
So obviously the performance of the thread will degrade if a decision making block is
executed at each cycle. Same is the case with the accelerometer component [15].

3.12 IMPLEMENTATION TOOLS
Boltay Haath system has been developed in C# using Visual Studio .Net 2002. The gesture
database was maintained on a MS Access database file. Windows being the platform for the
project, all the user interfaces and input components are standard Windows objects. Microsoft
Speech SDK 5.1 was used for speech output.




                                                                                          -)
#"   $%%&                           Sir Syed University of Engineering & Technology


3.13 COST
Cost of components used in this project is given below.
                                  Item                                                     Cost
 5DT Data Glove 5                                                                                 $ 300

3.14 TESTING AND VERIFICATION
The sub systems were tested separately to check their performance in various scenarios.
Because Boltay Haath has a highly modular design, top-down and bottom-up integration
occurred simultaneously. However, the system was integrated incrementally, to control the
amount of bugs that need to be fixed at any given time. Tests conducted in a black box
fashion.

Finally it was tested that software meets the performance criteria set during design system
specification. These tests were performed signers facility since it was deemed to know if the
hardware available meets the performance criteria.


3.14.1 TEST FOR RECOGNITION ACCURACY
The results were obtained using the PETool that was specially developed for measuring
performance of the system.


a) STATISTICAL TEMPLATE MATCHING TEST RESULTS
                  Domain                             Accuracy (%)
                                         k=1            k=2                k=3
            English Alphabets             24             73                 80
            Urdu Alphabets                24             84                 88

                Table 3.1 – Performance Result (Statistical Template Matching)




                   Figure 3.13(a) – Alphabet wise recognition accuracy for STM - English



                                                                                                    -*
#"   $%%&                           Sir Syed University of Engineering & Technology




                   Figure 3.13 (b) – Alphabet wise recognition accuracy for STM - Urdu


b) ANN COMMITTEE SYSTEM TEST RESULTS

                        Domain                            Accuracy (%)
                     24 Handshapes                             84

            Table 3.2 – Performance Result (ANN classification with committee system)




                Figure 3.14 – Alphabet wise accuracy for ANN Committee System - English



3.14.2 TEST FOR RECOGNITION TIME
Multiple gestures were provided to the system in sequence and the average time was
calculated using the system clock. Under normal conditions the average recognition time was
0.4 seconds.

                                                                                          -+
#"   $%%&                          Sir Syed University of Engineering & Technology

3.14.3 TEST FOR SYNCHRONIZED SPEECH SYNTHESIS
This performance parameter was measured using an external timing device and was found to
be within the prescribed limits.

3.14.4 TEST FOR CONTINUSOUS RECOGNITION
The system is able to distinguish between consecutive gestures using the accelerometer
component.

4. SUMMARY
Deaf and dumb people rely on sign language interpreters for communication. However, they
cannot depend on interpreters in every day life mainly due to the high costs involved and the
difficulty in finding qualified interpreters. This system will help disabled persons in
improving their quality of life significantly.

The automatic recognition of sign language is an attractive prospect; the technology exists to
make it possible, while the potential applications are exciting and worthwhile. To date the
research emphasis has been on the capture and classification of the gestures of sign language.
This project will be a valuable addition to the ongoing research in the field of Human
Computer Interface (HCI).

The Boltay Haath system has been shown to work for Pakistan Signing Language (PSL)
without invoking complex hand models. The results obtained indicate that the system is able
to recognize signs efficiently with a good percentage of success.

Future research regarding Boltay Haath will address more complex gestures, such as those
gestures involving two hands. System will be investigated by other ways to model the gesture
dynamics, such as HMMs that achieve minimal classification errors. Dynamic gestures and
online training are the two most attractive features left for future.

Several new directions have been identified through which this work could be expanded in
the near future. The techniques developed are not specific to PSL, and so the system could
easily be adapted to other sign languages or for other gesture recognition systems (for
example, as part of a VR interface, telemetry or robotic control). It can be considered as a
step towards applications which provide user interface based on hand gestures.

One aspect of communication which could not be handled in Boltay Haath is two way
communication. Currently Boltay Haath can convey words from the signer to the listener and
not the other way around. One future enhancement would be to enable two way
communication.

The Boltay Haath system is now almost complete. Though many enhancements and
optimizations can be made to make it better. On the whole 83 gestures have been recognized.
This number can be increased as and when required by user.




                                                                                           -,
#"    $%%&                               Sir Syed University of Engineering & Technology

5. REFERENCES
[1] R.S Pressman, Software Engineering: A Practitioner’s Approach, Fourth Edition, McGraw-HILL
International, 1997

[2] Vesa-Matti, Mantyla, Jani Mantyjarvi, Tapio Seppanen, Esa tuulari. 2000, “Hand Gesture Recognition of a
mobile device user”, 2000 IEEE pp.281-284.

[3] Kadous, Waleed “GRASP: Recognition of Australian sign language using Instrumented gloves”, Australia,
October 1995,pp. 1-2,4-8.

[4] Murakami and Taguchi, “Gesture Recognition using Recurrent Neural Networks”. CHI ' Conference
                                                                                       91
Proceedings, pp.237--242. Human Interface Laboratory, Fujitsu Laboratories, ACM, 1991.

[5] Sulman Nasir, Sadaf Zuberi, “Pakistan Sign Language – A Synopsis“, Pakistan, June 2000.

[6] Simon Haykin, Neural networks: A Comprehensive Foundation, Second Edition, McMaster University, pp.
142.

[7] Peter W. Vamplew, Recognition of Sign Language Using Neural Networks, University of Tasmania, May
1996, Pp. 98

[8] Corradini, Andrea, Horst-Michael Gross. 2000, “A Hybrid Stochastic-Connectionist Architecture for
Gesture Reognition”, 2000 IEEE, 336-341.

[9] K.S. Fu, Syntactic Pattern Recognition, Prentice-Hall 1981, pp. 75-80

[10] Corradini, Andrea Horst-Michael Gross. 2000, “Camera-based Gesture Recognition for Robot Control”,
2000 IEEE, pp.133-138.

[11] Sommerville, I, Software Engineering (6th Ed.), published by: Addison Wesley, chap. 1, pp. 8.

[12] I., Wachsmuth, T. Sowa (Eds.), “Towards an Automatic Sign Language Recognition System using
Subunits”, London, April 2001, pp. 1-2

[13] Liskov, Barbara, Program Development in Java, chap 11, pp. 356.

[14] The Microsoft Speech Website, www.microsoft.com/speech

[15] Richard, Watson, “A survey of Gesture Recognition Techniques Technical Report”, Trinity College,
Dublin, July 1993, pp. 6




                                                                                                       $%

More Related Content

Viewers also liked

Pakistan Tax Directory Data Analysis
Pakistan Tax Directory Data AnalysisPakistan Tax Directory Data Analysis
Pakistan Tax Directory Data Analysis
Ashar Ahmed
 
Automated traffic control system
Automated traffic control systemAutomated traffic control system
Automated traffic control system
Likithe
 
TRAFFIC LIGHT CONTROL USING RF TECH
TRAFFIC LIGHT CONTROL USING RF TECHTRAFFIC LIGHT CONTROL USING RF TECH
TRAFFIC LIGHT CONTROL USING RF TECHPranay Raj
 
intelligence Ambulance project report
intelligence Ambulance project reportintelligence Ambulance project report
intelligence Ambulance project report
Ritesh Kumar
 
Density based traffic light control
Density based traffic light controlDensity based traffic light control
Density based traffic light control
Rohit Nair
 
Project Plan And Srs Final
Project Plan And Srs FinalProject Plan And Srs Final
Project Plan And Srs Final
guest24783f
 
Traffic light controller
Traffic light controllerTraffic light controller
Traffic light controller
" Its-CooL " is said " Iskool "
 
Final Project Report on Image processing based intelligent traffic control sy...
Final Project Report on Image processing based intelligent traffic control sy...Final Project Report on Image processing based intelligent traffic control sy...
Final Project Report on Image processing based intelligent traffic control sy...
Louise Antonio
 
Traffic light.ppt (1)
Traffic light.ppt (1)Traffic light.ppt (1)
Traffic light.ppt (1)
chelseaaaaad
 
Intelligent Traffic Light control using Embedded Systems
Intelligent Traffic Light control using Embedded SystemsIntelligent Traffic Light control using Embedded Systems
Intelligent Traffic Light control using Embedded SystemsSrijan Singh
 
Traffic signal-project-
Traffic signal-project-Traffic signal-project-
Traffic signal-project-Rajeev Verma
 
Representation theory
Representation theoryRepresentation theory
Representation theoryAndy Wallis
 

Viewers also liked (12)

Pakistan Tax Directory Data Analysis
Pakistan Tax Directory Data AnalysisPakistan Tax Directory Data Analysis
Pakistan Tax Directory Data Analysis
 
Automated traffic control system
Automated traffic control systemAutomated traffic control system
Automated traffic control system
 
TRAFFIC LIGHT CONTROL USING RF TECH
TRAFFIC LIGHT CONTROL USING RF TECHTRAFFIC LIGHT CONTROL USING RF TECH
TRAFFIC LIGHT CONTROL USING RF TECH
 
intelligence Ambulance project report
intelligence Ambulance project reportintelligence Ambulance project report
intelligence Ambulance project report
 
Density based traffic light control
Density based traffic light controlDensity based traffic light control
Density based traffic light control
 
Project Plan And Srs Final
Project Plan And Srs FinalProject Plan And Srs Final
Project Plan And Srs Final
 
Traffic light controller
Traffic light controllerTraffic light controller
Traffic light controller
 
Final Project Report on Image processing based intelligent traffic control sy...
Final Project Report on Image processing based intelligent traffic control sy...Final Project Report on Image processing based intelligent traffic control sy...
Final Project Report on Image processing based intelligent traffic control sy...
 
Traffic light.ppt (1)
Traffic light.ppt (1)Traffic light.ppt (1)
Traffic light.ppt (1)
 
Intelligent Traffic Light control using Embedded Systems
Intelligent Traffic Light control using Embedded SystemsIntelligent Traffic Light control using Embedded Systems
Intelligent Traffic Light control using Embedded Systems
 
Traffic signal-project-
Traffic signal-project-Traffic signal-project-
Traffic signal-project-
 
Representation theory
Representation theoryRepresentation theory
Representation theory
 

Similar to Boltay Haath

IRJET - Mutecom using Tensorflow-Keras Model
IRJET - Mutecom using Tensorflow-Keras ModelIRJET - Mutecom using Tensorflow-Keras Model
IRJET - Mutecom using Tensorflow-Keras Model
IRJET Journal
 
Design of a Communication System using Sign Language aid for Differently Able...
Design of a Communication System using Sign Language aid for Differently Able...Design of a Communication System using Sign Language aid for Differently Able...
Design of a Communication System using Sign Language aid for Differently Able...
IRJET Journal
 
Sign Language Detection using Action Recognition
Sign Language Detection using Action RecognitionSign Language Detection using Action Recognition
Sign Language Detection using Action Recognition
IRJET Journal
 
Ai Project ppt.pptx
Ai Project ppt.pptxAi Project ppt.pptx
Ai Project ppt.pptx
hammadhassan9507
 
Sir Siva Nce VISION
Sir Siva Nce VISIONSir Siva Nce VISION
Sir Siva Nce VISION
Ashar Ahmed
 
COMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLE
COMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLECOMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLE
COMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLE
IRJET Journal
 
DHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEDHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTE
IRJET Journal
 
DHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEDHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTE
IRJET Journal
 
DT project.pdf
DT project.pdfDT project.pdf
DT project.pdf
AkshayKumar895051
 
IRJET - Hand Gestures Recognition using Deep Learning
IRJET -  	  Hand Gestures Recognition using Deep LearningIRJET -  	  Hand Gestures Recognition using Deep Learning
IRJET - Hand Gestures Recognition using Deep Learning
IRJET Journal
 
IRJET - Sign Language Recognition using Neural Network
IRJET - Sign Language Recognition using Neural NetworkIRJET - Sign Language Recognition using Neural Network
IRJET - Sign Language Recognition using Neural Network
IRJET Journal
 
Virtual Personal Assistant
Virtual Personal AssistantVirtual Personal Assistant
Virtual Personal Assistant
IRJET Journal
 
IRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech Recognition
IRJET Journal
 
IRJET - Sign Language Converter
IRJET -  	  Sign Language ConverterIRJET -  	  Sign Language Converter
IRJET - Sign Language Converter
IRJET Journal
 
HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014
HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014
HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014
Journal For Research
 
Autotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismAutotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualism
IRJET Journal
 
Real-Time Sign Language Detector
Real-Time Sign Language DetectorReal-Time Sign Language Detector
Real-Time Sign Language Detector
IRJET Journal
 
Alex Shulga resume
Alex Shulga resumeAlex Shulga resume
Alex Shulga resume
Alexey Shulga
 
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLEHAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
IRJET Journal
 
Sign Language Recognition using Mediapipe
Sign Language Recognition using MediapipeSign Language Recognition using Mediapipe
Sign Language Recognition using Mediapipe
IRJET Journal
 

Similar to Boltay Haath (20)

IRJET - Mutecom using Tensorflow-Keras Model
IRJET - Mutecom using Tensorflow-Keras ModelIRJET - Mutecom using Tensorflow-Keras Model
IRJET - Mutecom using Tensorflow-Keras Model
 
Design of a Communication System using Sign Language aid for Differently Able...
Design of a Communication System using Sign Language aid for Differently Able...Design of a Communication System using Sign Language aid for Differently Able...
Design of a Communication System using Sign Language aid for Differently Able...
 
Sign Language Detection using Action Recognition
Sign Language Detection using Action RecognitionSign Language Detection using Action Recognition
Sign Language Detection using Action Recognition
 
Ai Project ppt.pptx
Ai Project ppt.pptxAi Project ppt.pptx
Ai Project ppt.pptx
 
Sir Siva Nce VISION
Sir Siva Nce VISIONSir Siva Nce VISION
Sir Siva Nce VISION
 
COMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLE
COMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLECOMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLE
COMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLE
 
DHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEDHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTE
 
DHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTEDHWANI- THE VOICE OF DEAF AND MUTE
DHWANI- THE VOICE OF DEAF AND MUTE
 
DT project.pdf
DT project.pdfDT project.pdf
DT project.pdf
 
IRJET - Hand Gestures Recognition using Deep Learning
IRJET -  	  Hand Gestures Recognition using Deep LearningIRJET -  	  Hand Gestures Recognition using Deep Learning
IRJET - Hand Gestures Recognition using Deep Learning
 
IRJET - Sign Language Recognition using Neural Network
IRJET - Sign Language Recognition using Neural NetworkIRJET - Sign Language Recognition using Neural Network
IRJET - Sign Language Recognition using Neural Network
 
Virtual Personal Assistant
Virtual Personal AssistantVirtual Personal Assistant
Virtual Personal Assistant
 
IRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech Recognition
 
IRJET - Sign Language Converter
IRJET -  	  Sign Language ConverterIRJET -  	  Sign Language Converter
IRJET - Sign Language Converter
 
HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014
HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014
HCI BASED APPLICATION FOR PLAYING COMPUTER GAMES | J4RV4I1014
 
Autotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismAutotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualism
 
Real-Time Sign Language Detector
Real-Time Sign Language DetectorReal-Time Sign Language Detector
Real-Time Sign Language Detector
 
Alex Shulga resume
Alex Shulga resumeAlex Shulga resume
Alex Shulga resume
 
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLEHAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
HAND GESTURE BASED SPEAKING SYSTEM FOR THE MUTE PEOPLE
 
Sign Language Recognition using Mediapipe
Sign Language Recognition using MediapipeSign Language Recognition using Mediapipe
Sign Language Recognition using Mediapipe
 

More from Ashar Ahmed

Pakistan Energy Yearbook 2012
Pakistan Energy Yearbook 2012Pakistan Energy Yearbook 2012
Pakistan Energy Yearbook 2012
Ashar Ahmed
 
Comapanies listed on Karachi stock exchange Pakistan
Comapanies listed on Karachi stock exchange PakistanComapanies listed on Karachi stock exchange Pakistan
Comapanies listed on Karachi stock exchange Pakistan
Ashar Ahmed
 
Preparation of financial statements in pakistan
Preparation of financial statements in pakistanPreparation of financial statements in pakistan
Preparation of financial statements in pakistan
Ashar Ahmed
 
Circular Debt in Energy Sector of Pakistan
Circular Debt in Energy Sector of PakistanCircular Debt in Energy Sector of Pakistan
Circular Debt in Energy Sector of Pakistan
Ashar Ahmed
 
Pakistan Energy Year book 2011 Highlights
Pakistan Energy Year book 2011 HighlightsPakistan Energy Year book 2011 Highlights
Pakistan Energy Year book 2011 Highlights
Ashar Ahmed
 
Quality Plan Overhauling Of Refinery
Quality Plan Overhauling Of RefineryQuality Plan Overhauling Of Refinery
Quality Plan Overhauling Of Refinery
Ashar Ahmed
 
Scope Statement Overhauling Of Refinery
Scope Statement Overhauling Of RefineryScope Statement Overhauling Of Refinery
Scope Statement Overhauling Of Refinery
Ashar Ahmed
 
WBS Refinery Overhauling
WBS Refinery OverhaulingWBS Refinery Overhauling
WBS Refinery Overhauling
Ashar Ahmed
 
Tables Presentation Refinery Overhauling
Tables Presentation Refinery OverhaulingTables Presentation Refinery Overhauling
Tables Presentation Refinery Overhauling
Ashar Ahmed
 
Refinery Overhauling Introduction
Refinery Overhauling IntroductionRefinery Overhauling Introduction
Refinery Overhauling Introduction
Ashar Ahmed
 
Critical Tasks Refinery Overhauling
Critical Tasks Refinery OverhaulingCritical Tasks Refinery Overhauling
Critical Tasks Refinery Overhauling
Ashar Ahmed
 
ISO 10006
ISO 10006ISO 10006
ISO 10006
Ashar Ahmed
 
North Carolina Su
North Carolina SuNorth Carolina Su
North Carolina Su
Ashar Ahmed
 

More from Ashar Ahmed (15)

Pakistan Energy Yearbook 2012
Pakistan Energy Yearbook 2012Pakistan Energy Yearbook 2012
Pakistan Energy Yearbook 2012
 
Job Interview
Job InterviewJob Interview
Job Interview
 
Comapanies listed on Karachi stock exchange Pakistan
Comapanies listed on Karachi stock exchange PakistanComapanies listed on Karachi stock exchange Pakistan
Comapanies listed on Karachi stock exchange Pakistan
 
Preparation of financial statements in pakistan
Preparation of financial statements in pakistanPreparation of financial statements in pakistan
Preparation of financial statements in pakistan
 
Circular Debt in Energy Sector of Pakistan
Circular Debt in Energy Sector of PakistanCircular Debt in Energy Sector of Pakistan
Circular Debt in Energy Sector of Pakistan
 
Pakistan Energy Year book 2011 Highlights
Pakistan Energy Year book 2011 HighlightsPakistan Energy Year book 2011 Highlights
Pakistan Energy Year book 2011 Highlights
 
Aivis
AivisAivis
Aivis
 
Quality Plan Overhauling Of Refinery
Quality Plan Overhauling Of RefineryQuality Plan Overhauling Of Refinery
Quality Plan Overhauling Of Refinery
 
Scope Statement Overhauling Of Refinery
Scope Statement Overhauling Of RefineryScope Statement Overhauling Of Refinery
Scope Statement Overhauling Of Refinery
 
WBS Refinery Overhauling
WBS Refinery OverhaulingWBS Refinery Overhauling
WBS Refinery Overhauling
 
Tables Presentation Refinery Overhauling
Tables Presentation Refinery OverhaulingTables Presentation Refinery Overhauling
Tables Presentation Refinery Overhauling
 
Refinery Overhauling Introduction
Refinery Overhauling IntroductionRefinery Overhauling Introduction
Refinery Overhauling Introduction
 
Critical Tasks Refinery Overhauling
Critical Tasks Refinery OverhaulingCritical Tasks Refinery Overhauling
Critical Tasks Refinery Overhauling
 
ISO 10006
ISO 10006ISO 10006
ISO 10006
 
North Carolina Su
North Carolina SuNorth Carolina Su
North Carolina Su
 

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

Boltay Haath

  • 1. Project Mentor: Mr. Aleem Khalid Alvi [aleem_alvi@yahoo.com] [akalvi@ssuet.edu.pk] Assistant Professor Team Members: Mr. Ali Muzzaffar [ali_muzzafar@yahoo.com] [alim@ssuet.edu.pk] Mr. Mehmood Usman [apnamehmood@yahoo.co.uk] [mgazdhar@ssuet.edu.pk] Mr. Suleman Mumtaz [smkhowaja@yahoo.com] [smumtaz@ssuet.edu.pk] Mr. Yousuf Bin Azhar [musuf@yahoo.com] [muhybina@ssuet.edu.pk] http://www.boltayhaath.cjb.net ! " !
  • 2. #" $%%& Sir Syed University of Engineering & Technology 1. ABSTRACT Humans know each other by conveying their ideas, thoughts, and experiences to the people around them. There are numerous ways to achieve this and the best one among the rest is the gift of “Speech”. Through speech everyone can very convincingly transfer their thoughts and understand each other. It will be injustice if we ignore those who are deprived of this invaluable gift. The only means of communication available to the vocally disabled is the use of “Sign Language”. Using sign language they are limited to their own world. This limitation prevents them from interacting with the outer world to share their feelings, creative ideas and potentials. Another problem is that very few people who are not themselves deaf ever learn to sign. This therefore increases the isolation of deaf and dumb people. Technology is one way to remove this hindrance and benefit these people, and the project Boltay Haath is one such attempt to solve this problem by computerized recognition of sign language. Boltay Haath is an Urdu phrase which means ‘Talking Hands’. The basic concept involves the use of special data gloves connected to a computer while a vocally disabled person (who is wearing the gloves) makes the signs. The computer analyzes these gestures and synthesizes the sound for the corresponding word or letter for ordinary people to understand. Several researchers have explored these possibilities and have successfully achieved finger- spelling recognition with high levels of accuracy, but progress in the recognition of sign language, as a whole has been limited. This project is an attempt to recognize Pakistan Sign Language (PSL), which has not been done in any other system. Furthermore, the Boltay Haath project aims to produce sound matching the accent and pronunciation of the people of the region in which PSL is used. Since only single-handed gestures have been considered in this project it is obviously necessary to select a subset of PSL to be considered for implementation of Boltay Haath as it would take vast amounts of time to sample most or all of the 4000 signs in PSL. $
  • 3. #" $%%& Sir Syed University of Engineering & Technology 2. SYSTEM OVERVIEW The system objective was to develop a computerized Pakistan Sign Language (PSL) recognition system which is an application of Human Computer Interface (HCI). The system considers only single handed gestures; therefore a subset of PSL has been selected for the implementation of Boltay Haath. The basic concept involves the use of computer interfaced data gloves worn by a disabled person who makes the signs. The computer analyzes these gestures, minimizes the variations and synthesizes the sound for the corresponding word or letter for normal people to understand. The basic working of the project is depicted in the following figure. Figure 2.1 - System Diagram The above diagram clearly explains the scope and use of the Boltay Haath system. The system aims at bridging communication gaps between the deaf community and other people. When fully operational the system will help in minimizing communication gaps, easier collaboration and will also enable sharing of ideas and experiences. 2.1 PERFORMANCE MEASURES The following performance parameters were kept in mind during the design of the project: • Recognition time: A gesture should take approximately 0.25 to 0.5 second in the recognition process in order to respond in real time. • Synchronized speech synthesis. The speech output corresponding to a gesture should not lag behind the gesture output by more than 0.25 seconds. • Continuous and automatic recognition: To be more natural the system must be capable of recognizing the gestures continuously without any manual indication or help for demarcating the consecutive gestures. • Recognition Accuracy: The system must recognize the gestures accurately between 80 to 90 percent. '
  • 4. #" $%%& Sir Syed University of Engineering & Technology 2.2 DESIGN METHODOLOGY Waterfall plus Iterative model for the development of Boltay Haath has been followed. This model was selected because a thorough design of the system was needed before initiating. All the specifications had to be outlined in detail and all issues worked out so that the development of this project could carry out within the time and cost constraints. In other words architecture-first development has been attempted. After this stage a broad understanding was developed by the team and trouble spots could easily be sensed in the design. So naturally the next logical step was to repeat the critical stages of the process to iron out any problems in the way as well as evaluate design alternatives and tradeoffs. Object oriented approach being the most practical way of developing such kind of projects was obviously the best choice for the project. Test plans have also been designed to test the system systematically. The sub systems were tested separately as well as in cohesion. Five Improvements for the Waterfall Model to Work - Complete program design before analysis and coding begins - Maintain current and complete documentation. - Do the job twice, if possible. - Plan, control, and monitor testing. - Involve the user. Figure 2.2 - Waterfall plus Iterative Model 2.3 UNIQUE AND INNOVATIVE IDEAS Different people in different regions of the world have contributed towards the recognition of sign language of their regions but so far no work has ever been done regarding the recognition of sign language (PSL) of our region. So, Boltay Haath is the first system which contributes in achieving this noble cause. Furthermore, the system aims to produce sound matching the accent and pronunciation of the people of the region in which PSL is used. The recognition systems developed to date usually solves the problem of gesture demarcation through the use of various manual techniques and operations. To make the system more natural and interactive, Boltay Haath uses the technique for the real time continuous recognition of gestures, hence no need for any manual indication or signal. Although the primary objective of Boltay Haath is to recognize Pakistan Sign Language but the system is capable of recognizing any other sign language of the world by learning their respective gestures. The Boltay Haath system can be modified for use on hand held devices thus making the system more portable and easier to use in daily life. For this purpose the Microsoft compact framework for .Net is the best candidate since the system is being developed using current .Net technologies. (
  • 5. #" $%%& Sir Syed University of Engineering & Technology 3. IMPLEMENTATION AND ENGINEERING CONSIDERATIONS 3.1 PSL SIGNS USED IN BOLTAY HAATH The sign language into Sub-domains that is English and Urdu. This is because of the similarity of some gestures. Moreover English and Urdu both contain gestures of words and letters. Gestures have been categorized into Dynamic and Static. In Urdu there are 38 letters. In which few are dynamic and words are of both types one-handed and two-handed. In English there are 26 letters. In which two are dynamic and words are of both types one- handed and two-handed. PSL also contains domain specific signs for example computer terms, Environmental terms and Traffic terms Figure 3.1 - English and Urdu Alphabet Signs in PSL 3.2 SYSTEM ARCHITECTURE The Boltay Haath system is divided in to the following sub systems: • Gesture Database – Contains all the persistent data related to the system. • Gesture Acquisition – Gets state of hand (position of fingers, roll and pitch) from glove and convey to the main software. • Training- It uses the collected data to train the system. • Gesture Recognition Engine – Analyzes the input to recognize the gesture made by the user. Two different techniques have been implemented for this purpose namely Artificial Neural Network (ANN) and Statistical Template Matching (STM). • Gesture Output- for gesture and textual data. Converts word/ letters obtained after gesture Recognition into corresponding sound. &
  • 6. #" $%%& Sir Syed University of Engineering & Technology • Accelerometer- Accelerometer detects motion of the hand, in order to demarcate start and end of gestures for continuous gesture recognition. The following figure illustrates the architecture of the Boltay Haath system: Figure 3.2 - System Architecture [1] A detailed description of the architecture, the implementation techniques and the algorithms of the system are given below: 3.3 GESTURE DATABASE A particular input sample in this system is defined by the combination of five sensors for fingers and one tilt sensor for roll and pitch which is stored in the Gesture Database during the data acquisition phase. The gestures in the database are organized with respect to there area of use i.e. its domain. For example, the alphabet domains contain the Urdu and English alphabet gestures. Word domains may contain the list of emergency gestures, daily routine gestures and other special gestures. The database also stores relevant data like gesture’s phoneme†, training results of STM and ANN and Registered Users‡ information. 3.4 DATA ACQUISITION This sub system captures the state of the hand (flexure of fingers, roll and pitch) from the glove and stores it in the Gesture Database for further processing. It handles all the data coming to and from the Data Glove. The driver software provided by the vendor had to be adapted for use in the .Net managed environment and hence a wrapper class for the Glove † Phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. ‡ Registered Users are those users who participated in the training of the system. )
  • 7. #" $%%& Sir Syed University of Engineering & Technology Driver was written in C# for use in the system. During acquisition the training data are identified by the Gesture ID of their corresponding signs and stored in a table for later use by the training algorithms. An input sample consists of five values ranging from 0 to 255 each representing the state of the sensor on all five fingers of the glove. The sensors for roll and pitch have been ignored in case of non-moving gestures since their values do not uniquely identify an alphabet sign [2]. The sequence diagram showing the use of data acquisition interface in gesture recognition is shown below. Figure 3.3 – Data Acquisition Sequence Diagram 3.5 DATA GLOVE The input device used in Boltay Haath is the 5DT Data Glove 5. It is equipped with sensors that sense the movements of the hand and interface those movements with a computer. The 5DT Data Glove 5 measures finger flexure and the orientation (pitch and roll) of user’s hand. It consists of 8-bit flexure resolution, Platform independent - serial port, interface (RS 232), built in 2-axis tilt sensor. Figure 3.4 - Components of 5DT Data Glove 5 *
  • 8. #" $%%& Sir Syed University of Engineering & Technology 3.6 TRAINING The Training sub system trains the system so that it can perform gesture recognition afterwards. The training process is different for the two different modes of operation of Gesture Recognition Engine (GRE) i.e., Statistical Template Matching (STM) and Artificial Neural Network (ANN). In both cases training is a batch process. For the generalized recognition of gestures it was necessary to collect the data from different users. The system was trained by using data obtained from six different signers [3]. Initially, training data was collected for the non-moving gestures as in [4] of English as well as Urdu, since PSL contains both types of signs [5]. This was done due to the limitations of the input device i.e., the Data Glove 5 does not provide the abduction status and the absence of any kind of input about the location of the glove in space. The separate training processes for STM and ANN are disused below. 3.6.1 STATISTICAL TEMPLATE SPECIFICATION (STS) The idea is to demarcate different gestures by calculating the mean (µ) and standard deviations (σ) of all the sensors for each gesture in the training set. The resultant (µ, ) pairs are stored in the gesture database for later use in gesture recognition and are called templates. Thus the process is named “Template Specification”. The mean and standard deviation is calculated for each sensor of each gesture as follows: (3.1) (3.2) Here, xi is the ith sensor value, n is the number of samples, µ(l,m) is the mean of the lth sensor of the mth gesture and σ(l,m) is the standard deviation of the lth sensor of the mth gesture. Figure 3.5 – STS Sequence Diagram +
  • 9. #" $%%& Sir Syed University of Engineering & Technology 3.6.2 ARTIFICIAL NEURAL NETWORK TRAINING (COMMITTEE SYSTEM) The Artificial Neural Networks Training (ANN) sub system allows the training of the various neural networks in the system. It collects data from the Gesture Database and applies supervised learning algorithm (Backpropagation [6]) for training the neural networks and finally saves the networks in the database. Since a single network could not converge on the available data it did not perform well. So it was decided to tackle the problem with a divide and conquer approach. This technique, labeled a committee system [7], combines the outputs of multiple networks called experts to produce a classification rather than using only a single network. The rationale is the realization that there is not one global minimum into which every net trains but that there are many minima where adequate classification on the training examples can be obtained. So by combining the output of several networks it may be possible to gain superior generalization than that of any single network. Each small network for a particular gesture is called an ‘expert’. The training data for each of the experts contains equal number of samples of both classes that it classified. For example, the training set for the expert for ‘A’ contains half samples of ‘A’ and the remaining half would comprise of the rest of the signs. Backpropagation was used to train the experts. A learning rate of 0.1 was used and the training set comprised of more than 2000 samples for each expert. The input was scaled further from the range of 0 to 255 to –1.28 to 1.27. This further scaling reduced the error and provided better results after training. This is some times called pre-processing Figure 3.6 – ANN Training Sequence Diagram ,
  • 10. #" $%%& Sir Syed University of Engineering & Technology 3.7 GESTURE RECOGNITION ENGINE The Gesture Recognition Engine is the core of the system. It performs gesture recognition using the two techniques (STM and ANN). It interacts with most of the subsystems. It takes gesture input from the Gesture Input subsystem, identifies them and gives output through the Output subsystem in text and speech. The separate recognition processes for STM and ANN are disused below. 3.7.1 STATISTICAL TEMPLATE MATCHING The statistical model used in Boltay Haath is the simplest approach to recognize postures (static gestures) [8], [9]. The model used is known as “Template Matching” or “Prototype Matching” [10]. The idea is to demarcate different gestures by calculating the mean (µ) and standard deviations (σ) of all the sensors for a gesture and then those input samples that are within limits bounded by an integral multiple of standard deviation are recognized to be correct. Gesture boundary [11] for each sensor is defined as, (3.3) Here, µ is the mean and σ is the standard deviation of that sensor whose gesture boundary is to be defined. Similarly gesture boundaries for each sensor of all the gestures are defined and used in Pattern Matching. 3.7.1.1 ALGORITHMIC MODEL a) PATTERN RECOGNITION After Statistical Template Specification (STS), test samples are provided to the pattern recognition module, which analyzes them using the statistical model [12]. The upper and lower limits for the value of a sensor for a particular gesture are defined using the standard deviation for that sensor previously calculated. For enhancing the accuracy of gesture recognition, various integral multiples of σ are used, denoted by k in (3.3). The limits for any given gesture are defined as: (3.4) - (3.5) Given the above-mentioned criteria, any given input can be classified as a particular gesture if all the sensor values of the test sample lie within these limits (i.e. the gesture boundary). These values are retrieved from the gesture database. The values of k used for gesture recognition in Boltay Haath range from 1 to 3, providing tolerances ranging from 2σ to 6σ. The performance achieved by varying the values of k is discussed later in Testing and Verification. b) AMBIGUITY AMONG OUTPUTS Sometimes due to ambiguity among two or more gestures STM may produce multiple outputs. The ambiguity is created due to the overlapping of different gesture boundaries. The overlapping increases as the value of k is increased from 1 to 3.To cater to this Fig 3.7 Ambiguous Signs -%
  • 11. #" $%%& Sir Syed University of Engineering & Technology problem the method of Least Mean Squares (LMS) is used. Figure 3.7 shows two ambiguous signs ‘R’ and ‘H’. c) LMS FOR REMOVING AMBIGUITY There are cases where more than one gestures are candidates for output. To overcome this type of situation the system calculates Least Mean Squares (LMS) [13] of all the candidate gestures and then selects the one with minimum LMS value. It is calculated as, (3.6) Here, xi denotes the sensor value of the ith sensor from test sample; µi denote mean value for the ith sensor. LMS for each candidate gesture is calculated and the gesture with least LMS value is selected as the final output. The use of LMS is justified by the results. Analyzing the performance of the system it has been observed that the use of LMS provides accurate results 60 % of the time. Figure 3.8 – STM Gesture Recognition Sequence Diagram --
  • 12. #" $%%& Sir Syed University of Engineering & Technology 3.7.2 ARTIFICIAL NEURAL NETWORK (ANN) In this mode, the GRE takes input data and feeds it to multiple artificial neural networks in parallel. The approach taken is to initially process the input data so as to produce a description in terms of the various features (handshape, orientation and motion) of a sign. The sign can then be classified on the basis of the feature vector thus produced. This mode uses our Artificial Neural Network Library (ANNLIB) to run Multi Layer Perceptrons (MLPs) for recognition. 3.7.2.1 COMMITTEE SYSTEM FOR RECOGNITION The various experts (neural networks for each gesture) were trained using a “one against all” technique in which each network is trained for a particular sign to give a positive response for that sign and a negative one for all the others. So in the final system all the experts have the same architecture and are given the same input. Fig 3.9 shows the committee system used in the system. Output Voting Mechanism Input Figure 3.9 - Committee System a) ARCHITECTURE OF EXPERTS The architecture of the experts used in the committee system is 5:8:1 i.e., 5 inputs, 8 hidden nodes and 1 output node. The activation function for nodes in the hidden layer was Sigmoid Logistic and Hyperbolic for output nodes. b) VOTING MECHANISM The voting mechanism takes the output of all the experts as its input. It identifies the resultant gesture by examining the outputs of all the experts and selecting the one with a positive result. -$
  • 13. #" $%%& Sir Syed University of Engineering & Technology c) FINAL CLASSIFICATION Since the experts could not be optimally trained, multiple experts can give a positive result. To solve this problem the results of each expert can be multiplied with its accuracy by the voting mechanism to give more weightage to the result of more accurate experts over less accurate ones. And finally the output with the largest positive value is selected as recognized gesture. Figure 3.10– ANN Gesture Recognition Sequence Diagram 3.8 OUTPUT The output of the system has two forms, one is the formatted text and other is in the form of speech. Obviously the important one is the speech output as it accomplishes the objective of the system. -'
  • 14. #" $%%& Sir Syed University of Engineering & Technology 3.8.1 TEXT This subsystem outputs recognized sign into formatted text. The Urdu sign is output into Roman Urdu† language. This text output is then used by Text-to-speech module for its processing. 3.8.2 SPEECH The text-to-speech subsystem converts text into synthesized Urdu / English speech using Microsoft Speech SDK version 5.1[14]. The lexicon and phoneme sets have been modified so that it can pronounce words correctly in local accent. Figure 3.11– Speech Output Sequence Diagram 3.9 ACCELEROMETER Accelerometer is used for detecting a sign continuously without any need of manual aid for indicating the start or end of a gesture. It automatically identifies the ending of one gesture and start of the other. In this way gestures are recognized in continuous fashion. 3.9.1 ALGORITHMIC MODEL Acceleration is calculated for each sensor by averaging the differences between the last n inputs (in a sliding window fashion). When the acceleration of all the sensors is below a certain threshold value, the system identifies the state of hand as stationary and sends the sensor values for recognition to the engine. As soon as the acceleration exceeds the threshold value the system marks the hand as in motion and stops recognition. The sliding window size and the threshold values are adjusted so that the user need not make a deliberate effort to stop † Roman Urdu - Urdu written with the use of English alphabets. -(
  • 15. #" $%%& Sir Syed University of Engineering & Technology for sometime in order to get the sign recognized. It is accessed by user through an accelerometer interface. The user can set the threshold value and sliding window size according to his/her needs. Window size = 8 150 151 152 153 154 156 159 161 165 168 172 178 182 185 186 190 2 3 2 4 3 4 6 Threshold = 1.8 A = 2+3+2+4+3+4+6 = 24 = 3.43 7 7 A> Threshold therefore hand is in motion Figure 3.12(a) - Motion Detection (Hand in motion) The above figure shows how the accelerometer determines that the hand is in motion. The sliding window shows the state of a sliding window for a single sensor. The differences are shown in triangles. The average of these differences is above the threshold value. Hence the system identifies the sensor to be in motion. Window size = 8 150 151 152 153 154 155 155 156 157 158 158 159 160 161 161 161 1 0 1 1 1 0 1 Threshold = 1.8 A = 1+0+1+1+1+0+1 = 5 = 0.714 7 7 A< Threshold therefore hand is stationary Figure 3.12(b) - Motion Detection (Hand is stationary) The above figure shows how the accelerometer determines that the hand is stationary. The sliding window shows the state of a sliding window for a single sensor. The average of these differences is below the threshold value. Hence the system identifies the sensor to be stationary. The accelerometer is used on a per sensor basis. So for five sensors, five accelerometer objects are used and each is continuously provided with its corresponding sensor value. The accelerometer design used till now is limited to either static gestures or dynamic gestures. -&
  • 16. #" $%%& Sir Syed University of Engineering & Technology 3.10 DEVELOPED TOOLS In the course of developing the system various tools and libraries were developed. This includes wrapper class for glove driver (5DT Data Glove5 driver) in C#, its driver was written originally in VC++ (unmanaged) and was converted in to C# (managed code). An Artificial Neural Networks library ANNLib was written in C#. Also, a performance evaluation tool for evaluating performance of STM and ANN recognition systems efficiency was developed. 3.10.1 PERFORMANCE EVALUATION TOOL FOR GRE (PETool) The PETool evaluates the results obtained by applying the test data to gesture recognition engines of STM and ANN and generate reports and graphical view of data for performance evaluation purpose. The simulation data is used to evaluate whether the current configurations of STM and ANN are providing acceptable results or not. 3.11 TRADEOFFS Many tradeoffs regarding accuracy and efficiency were made during the design and implementation of the system. A major issue was the training of neural networks. The amount of training data, the optimal architecture of the neural networks and the classification mechanism were a few considerations. For quick training the training data set needed to be small and for greater accuracy more data was needed but more time was required for training. The quality of training data was of major concern for STM as well as ANN. The greater the number of registered users, the better the generalization. But more data does not come without its share of bad samples. In STM recognition, gesture boundaries of sensors are defined as µ ± kσ , the system uses k=3 after trying all values of 1, 2 and 3. This model µ ± 3σ covers large variation of data (up to 6-sigma) but at the same time increases the overlapping of different gestures. This overlapping of gestures creates ambiguity among outputs that has to be removed with the use of LMS. A similar case can be made for speech output. Text to speech output provides an efficient way of producing speech output but the quality of sound produced is not at par with pre- recorded human voice. However, recorded voices incur a heavy processing cost on the system when it comes to real-time recognition. The accelerometer is used to filter the data stream coming form the Data Glove in a thread. So obviously the performance of the thread will degrade if a decision making block is executed at each cycle. Same is the case with the accelerometer component [15]. 3.12 IMPLEMENTATION TOOLS Boltay Haath system has been developed in C# using Visual Studio .Net 2002. The gesture database was maintained on a MS Access database file. Windows being the platform for the project, all the user interfaces and input components are standard Windows objects. Microsoft Speech SDK 5.1 was used for speech output. -)
  • 17. #" $%%& Sir Syed University of Engineering & Technology 3.13 COST Cost of components used in this project is given below. Item Cost 5DT Data Glove 5 $ 300 3.14 TESTING AND VERIFICATION The sub systems were tested separately to check their performance in various scenarios. Because Boltay Haath has a highly modular design, top-down and bottom-up integration occurred simultaneously. However, the system was integrated incrementally, to control the amount of bugs that need to be fixed at any given time. Tests conducted in a black box fashion. Finally it was tested that software meets the performance criteria set during design system specification. These tests were performed signers facility since it was deemed to know if the hardware available meets the performance criteria. 3.14.1 TEST FOR RECOGNITION ACCURACY The results were obtained using the PETool that was specially developed for measuring performance of the system. a) STATISTICAL TEMPLATE MATCHING TEST RESULTS Domain Accuracy (%) k=1 k=2 k=3 English Alphabets 24 73 80 Urdu Alphabets 24 84 88 Table 3.1 – Performance Result (Statistical Template Matching) Figure 3.13(a) – Alphabet wise recognition accuracy for STM - English -*
  • 18. #" $%%& Sir Syed University of Engineering & Technology Figure 3.13 (b) – Alphabet wise recognition accuracy for STM - Urdu b) ANN COMMITTEE SYSTEM TEST RESULTS Domain Accuracy (%) 24 Handshapes 84 Table 3.2 – Performance Result (ANN classification with committee system) Figure 3.14 – Alphabet wise accuracy for ANN Committee System - English 3.14.2 TEST FOR RECOGNITION TIME Multiple gestures were provided to the system in sequence and the average time was calculated using the system clock. Under normal conditions the average recognition time was 0.4 seconds. -+
  • 19. #" $%%& Sir Syed University of Engineering & Technology 3.14.3 TEST FOR SYNCHRONIZED SPEECH SYNTHESIS This performance parameter was measured using an external timing device and was found to be within the prescribed limits. 3.14.4 TEST FOR CONTINUSOUS RECOGNITION The system is able to distinguish between consecutive gestures using the accelerometer component. 4. SUMMARY Deaf and dumb people rely on sign language interpreters for communication. However, they cannot depend on interpreters in every day life mainly due to the high costs involved and the difficulty in finding qualified interpreters. This system will help disabled persons in improving their quality of life significantly. The automatic recognition of sign language is an attractive prospect; the technology exists to make it possible, while the potential applications are exciting and worthwhile. To date the research emphasis has been on the capture and classification of the gestures of sign language. This project will be a valuable addition to the ongoing research in the field of Human Computer Interface (HCI). The Boltay Haath system has been shown to work for Pakistan Signing Language (PSL) without invoking complex hand models. The results obtained indicate that the system is able to recognize signs efficiently with a good percentage of success. Future research regarding Boltay Haath will address more complex gestures, such as those gestures involving two hands. System will be investigated by other ways to model the gesture dynamics, such as HMMs that achieve minimal classification errors. Dynamic gestures and online training are the two most attractive features left for future. Several new directions have been identified through which this work could be expanded in the near future. The techniques developed are not specific to PSL, and so the system could easily be adapted to other sign languages or for other gesture recognition systems (for example, as part of a VR interface, telemetry or robotic control). It can be considered as a step towards applications which provide user interface based on hand gestures. One aspect of communication which could not be handled in Boltay Haath is two way communication. Currently Boltay Haath can convey words from the signer to the listener and not the other way around. One future enhancement would be to enable two way communication. The Boltay Haath system is now almost complete. Though many enhancements and optimizations can be made to make it better. On the whole 83 gestures have been recognized. This number can be increased as and when required by user. -,
  • 20. #" $%%& Sir Syed University of Engineering & Technology 5. REFERENCES [1] R.S Pressman, Software Engineering: A Practitioner’s Approach, Fourth Edition, McGraw-HILL International, 1997 [2] Vesa-Matti, Mantyla, Jani Mantyjarvi, Tapio Seppanen, Esa tuulari. 2000, “Hand Gesture Recognition of a mobile device user”, 2000 IEEE pp.281-284. [3] Kadous, Waleed “GRASP: Recognition of Australian sign language using Instrumented gloves”, Australia, October 1995,pp. 1-2,4-8. [4] Murakami and Taguchi, “Gesture Recognition using Recurrent Neural Networks”. CHI ' Conference 91 Proceedings, pp.237--242. Human Interface Laboratory, Fujitsu Laboratories, ACM, 1991. [5] Sulman Nasir, Sadaf Zuberi, “Pakistan Sign Language – A Synopsis“, Pakistan, June 2000. [6] Simon Haykin, Neural networks: A Comprehensive Foundation, Second Edition, McMaster University, pp. 142. [7] Peter W. Vamplew, Recognition of Sign Language Using Neural Networks, University of Tasmania, May 1996, Pp. 98 [8] Corradini, Andrea, Horst-Michael Gross. 2000, “A Hybrid Stochastic-Connectionist Architecture for Gesture Reognition”, 2000 IEEE, 336-341. [9] K.S. Fu, Syntactic Pattern Recognition, Prentice-Hall 1981, pp. 75-80 [10] Corradini, Andrea Horst-Michael Gross. 2000, “Camera-based Gesture Recognition for Robot Control”, 2000 IEEE, pp.133-138. [11] Sommerville, I, Software Engineering (6th Ed.), published by: Addison Wesley, chap. 1, pp. 8. [12] I., Wachsmuth, T. Sowa (Eds.), “Towards an Automatic Sign Language Recognition System using Subunits”, London, April 2001, pp. 1-2 [13] Liskov, Barbara, Program Development in Java, chap 11, pp. 356. [14] The Microsoft Speech Website, www.microsoft.com/speech [15] Richard, Watson, “A survey of Gesture Recognition Techniques Technical Report”, Trinity College, Dublin, July 1993, pp. 6 $%