• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Boltay Haath
 

Boltay Haath

on

  • 3,322 views

Pakistani Sign Language Sir Syed University of Engineering & Technology Boltay Haath Engineering Project

Pakistani Sign Language Sir Syed University of Engineering & Technology Boltay Haath Engineering Project

Statistics

Views

Total Views
3,322
Views on SlideShare
3,321
Embed Views
1

Actions

Likes
1
Downloads
0
Comments
0

1 Embed 1

http://ashar-ashar.blogspot.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Boltay Haath Boltay Haath Document Transcript

    • Project Mentor: Mr. Aleem Khalid Alvi [aleem_alvi@yahoo.com] [akalvi@ssuet.edu.pk] Assistant Professor Team Members: Mr. Ali Muzzaffar [ali_muzzafar@yahoo.com] [alim@ssuet.edu.pk] Mr. Mehmood Usman [apnamehmood@yahoo.co.uk] [mgazdhar@ssuet.edu.pk] Mr. Suleman Mumtaz [smkhowaja@yahoo.com] [smumtaz@ssuet.edu.pk] Mr. Yousuf Bin Azhar [musuf@yahoo.com] [muhybina@ssuet.edu.pk] http://www.boltayhaath.cjb.net ! " !
    • #" $%%& Sir Syed University of Engineering & Technology 1. ABSTRACT Humans know each other by conveying their ideas, thoughts, and experiences to the people around them. There are numerous ways to achieve this and the best one among the rest is the gift of “Speech”. Through speech everyone can very convincingly transfer their thoughts and understand each other. It will be injustice if we ignore those who are deprived of this invaluable gift. The only means of communication available to the vocally disabled is the use of “Sign Language”. Using sign language they are limited to their own world. This limitation prevents them from interacting with the outer world to share their feelings, creative ideas and potentials. Another problem is that very few people who are not themselves deaf ever learn to sign. This therefore increases the isolation of deaf and dumb people. Technology is one way to remove this hindrance and benefit these people, and the project Boltay Haath is one such attempt to solve this problem by computerized recognition of sign language. Boltay Haath is an Urdu phrase which means ‘Talking Hands’. The basic concept involves the use of special data gloves connected to a computer while a vocally disabled person (who is wearing the gloves) makes the signs. The computer analyzes these gestures and synthesizes the sound for the corresponding word or letter for ordinary people to understand. Several researchers have explored these possibilities and have successfully achieved finger- spelling recognition with high levels of accuracy, but progress in the recognition of sign language, as a whole has been limited. This project is an attempt to recognize Pakistan Sign Language (PSL), which has not been done in any other system. Furthermore, the Boltay Haath project aims to produce sound matching the accent and pronunciation of the people of the region in which PSL is used. Since only single-handed gestures have been considered in this project it is obviously necessary to select a subset of PSL to be considered for implementation of Boltay Haath as it would take vast amounts of time to sample most or all of the 4000 signs in PSL. $
    • #" $%%& Sir Syed University of Engineering & Technology 2. SYSTEM OVERVIEW The system objective was to develop a computerized Pakistan Sign Language (PSL) recognition system which is an application of Human Computer Interface (HCI). The system considers only single handed gestures; therefore a subset of PSL has been selected for the implementation of Boltay Haath. The basic concept involves the use of computer interfaced data gloves worn by a disabled person who makes the signs. The computer analyzes these gestures, minimizes the variations and synthesizes the sound for the corresponding word or letter for normal people to understand. The basic working of the project is depicted in the following figure. Figure 2.1 - System Diagram The above diagram clearly explains the scope and use of the Boltay Haath system. The system aims at bridging communication gaps between the deaf community and other people. When fully operational the system will help in minimizing communication gaps, easier collaboration and will also enable sharing of ideas and experiences. 2.1 PERFORMANCE MEASURES The following performance parameters were kept in mind during the design of the project: • Recognition time: A gesture should take approximately 0.25 to 0.5 second in the recognition process in order to respond in real time. • Synchronized speech synthesis. The speech output corresponding to a gesture should not lag behind the gesture output by more than 0.25 seconds. • Continuous and automatic recognition: To be more natural the system must be capable of recognizing the gestures continuously without any manual indication or help for demarcating the consecutive gestures. • Recognition Accuracy: The system must recognize the gestures accurately between 80 to 90 percent. '
    • #" $%%& Sir Syed University of Engineering & Technology 2.2 DESIGN METHODOLOGY Waterfall plus Iterative model for the development of Boltay Haath has been followed. This model was selected because a thorough design of the system was needed before initiating. All the specifications had to be outlined in detail and all issues worked out so that the development of this project could carry out within the time and cost constraints. In other words architecture-first development has been attempted. After this stage a broad understanding was developed by the team and trouble spots could easily be sensed in the design. So naturally the next logical step was to repeat the critical stages of the process to iron out any problems in the way as well as evaluate design alternatives and tradeoffs. Object oriented approach being the most practical way of developing such kind of projects was obviously the best choice for the project. Test plans have also been designed to test the system systematically. The sub systems were tested separately as well as in cohesion. Five Improvements for the Waterfall Model to Work - Complete program design before analysis and coding begins - Maintain current and complete documentation. - Do the job twice, if possible. - Plan, control, and monitor testing. - Involve the user. Figure 2.2 - Waterfall plus Iterative Model 2.3 UNIQUE AND INNOVATIVE IDEAS Different people in different regions of the world have contributed towards the recognition of sign language of their regions but so far no work has ever been done regarding the recognition of sign language (PSL) of our region. So, Boltay Haath is the first system which contributes in achieving this noble cause. Furthermore, the system aims to produce sound matching the accent and pronunciation of the people of the region in which PSL is used. The recognition systems developed to date usually solves the problem of gesture demarcation through the use of various manual techniques and operations. To make the system more natural and interactive, Boltay Haath uses the technique for the real time continuous recognition of gestures, hence no need for any manual indication or signal. Although the primary objective of Boltay Haath is to recognize Pakistan Sign Language but the system is capable of recognizing any other sign language of the world by learning their respective gestures. The Boltay Haath system can be modified for use on hand held devices thus making the system more portable and easier to use in daily life. For this purpose the Microsoft compact framework for .Net is the best candidate since the system is being developed using current .Net technologies. (
    • #" $%%& Sir Syed University of Engineering & Technology 3. IMPLEMENTATION AND ENGINEERING CONSIDERATIONS 3.1 PSL SIGNS USED IN BOLTAY HAATH The sign language into Sub-domains that is English and Urdu. This is because of the similarity of some gestures. Moreover English and Urdu both contain gestures of words and letters. Gestures have been categorized into Dynamic and Static. In Urdu there are 38 letters. In which few are dynamic and words are of both types one-handed and two-handed. In English there are 26 letters. In which two are dynamic and words are of both types one- handed and two-handed. PSL also contains domain specific signs for example computer terms, Environmental terms and Traffic terms Figure 3.1 - English and Urdu Alphabet Signs in PSL 3.2 SYSTEM ARCHITECTURE The Boltay Haath system is divided in to the following sub systems: • Gesture Database – Contains all the persistent data related to the system. • Gesture Acquisition – Gets state of hand (position of fingers, roll and pitch) from glove and convey to the main software. • Training- It uses the collected data to train the system. • Gesture Recognition Engine – Analyzes the input to recognize the gesture made by the user. Two different techniques have been implemented for this purpose namely Artificial Neural Network (ANN) and Statistical Template Matching (STM). • Gesture Output- for gesture and textual data. Converts word/ letters obtained after gesture Recognition into corresponding sound. &
    • #" $%%& Sir Syed University of Engineering & Technology • Accelerometer- Accelerometer detects motion of the hand, in order to demarcate start and end of gestures for continuous gesture recognition. The following figure illustrates the architecture of the Boltay Haath system: Figure 3.2 - System Architecture [1] A detailed description of the architecture, the implementation techniques and the algorithms of the system are given below: 3.3 GESTURE DATABASE A particular input sample in this system is defined by the combination of five sensors for fingers and one tilt sensor for roll and pitch which is stored in the Gesture Database during the data acquisition phase. The gestures in the database are organized with respect to there area of use i.e. its domain. For example, the alphabet domains contain the Urdu and English alphabet gestures. Word domains may contain the list of emergency gestures, daily routine gestures and other special gestures. The database also stores relevant data like gesture’s phoneme†, training results of STM and ANN and Registered Users‡ information. 3.4 DATA ACQUISITION This sub system captures the state of the hand (flexure of fingers, roll and pitch) from the glove and stores it in the Gesture Database for further processing. It handles all the data coming to and from the Data Glove. The driver software provided by the vendor had to be adapted for use in the .Net managed environment and hence a wrapper class for the Glove † Phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. ‡ Registered Users are those users who participated in the training of the system. )
    • #" $%%& Sir Syed University of Engineering & Technology Driver was written in C# for use in the system. During acquisition the training data are identified by the Gesture ID of their corresponding signs and stored in a table for later use by the training algorithms. An input sample consists of five values ranging from 0 to 255 each representing the state of the sensor on all five fingers of the glove. The sensors for roll and pitch have been ignored in case of non-moving gestures since their values do not uniquely identify an alphabet sign [2]. The sequence diagram showing the use of data acquisition interface in gesture recognition is shown below. Figure 3.3 – Data Acquisition Sequence Diagram 3.5 DATA GLOVE The input device used in Boltay Haath is the 5DT Data Glove 5. It is equipped with sensors that sense the movements of the hand and interface those movements with a computer. The 5DT Data Glove 5 measures finger flexure and the orientation (pitch and roll) of user’s hand. It consists of 8-bit flexure resolution, Platform independent - serial port, interface (RS 232), built in 2-axis tilt sensor. Figure 3.4 - Components of 5DT Data Glove 5 *
    • #" $%%& Sir Syed University of Engineering & Technology 3.6 TRAINING The Training sub system trains the system so that it can perform gesture recognition afterwards. The training process is different for the two different modes of operation of Gesture Recognition Engine (GRE) i.e., Statistical Template Matching (STM) and Artificial Neural Network (ANN). In both cases training is a batch process. For the generalized recognition of gestures it was necessary to collect the data from different users. The system was trained by using data obtained from six different signers [3]. Initially, training data was collected for the non-moving gestures as in [4] of English as well as Urdu, since PSL contains both types of signs [5]. This was done due to the limitations of the input device i.e., the Data Glove 5 does not provide the abduction status and the absence of any kind of input about the location of the glove in space. The separate training processes for STM and ANN are disused below. 3.6.1 STATISTICAL TEMPLATE SPECIFICATION (STS) The idea is to demarcate different gestures by calculating the mean (µ) and standard deviations (σ) of all the sensors for each gesture in the training set. The resultant (µ, ) pairs are stored in the gesture database for later use in gesture recognition and are called templates. Thus the process is named “Template Specification”. The mean and standard deviation is calculated for each sensor of each gesture as follows: (3.1) (3.2) Here, xi is the ith sensor value, n is the number of samples, µ(l,m) is the mean of the lth sensor of the mth gesture and σ(l,m) is the standard deviation of the lth sensor of the mth gesture. Figure 3.5 – STS Sequence Diagram +
    • #" $%%& Sir Syed University of Engineering & Technology 3.6.2 ARTIFICIAL NEURAL NETWORK TRAINING (COMMITTEE SYSTEM) The Artificial Neural Networks Training (ANN) sub system allows the training of the various neural networks in the system. It collects data from the Gesture Database and applies supervised learning algorithm (Backpropagation [6]) for training the neural networks and finally saves the networks in the database. Since a single network could not converge on the available data it did not perform well. So it was decided to tackle the problem with a divide and conquer approach. This technique, labeled a committee system [7], combines the outputs of multiple networks called experts to produce a classification rather than using only a single network. The rationale is the realization that there is not one global minimum into which every net trains but that there are many minima where adequate classification on the training examples can be obtained. So by combining the output of several networks it may be possible to gain superior generalization than that of any single network. Each small network for a particular gesture is called an ‘expert’. The training data for each of the experts contains equal number of samples of both classes that it classified. For example, the training set for the expert for ‘A’ contains half samples of ‘A’ and the remaining half would comprise of the rest of the signs. Backpropagation was used to train the experts. A learning rate of 0.1 was used and the training set comprised of more than 2000 samples for each expert. The input was scaled further from the range of 0 to 255 to –1.28 to 1.27. This further scaling reduced the error and provided better results after training. This is some times called pre-processing Figure 3.6 – ANN Training Sequence Diagram ,
    • #" $%%& Sir Syed University of Engineering & Technology 3.7 GESTURE RECOGNITION ENGINE The Gesture Recognition Engine is the core of the system. It performs gesture recognition using the two techniques (STM and ANN). It interacts with most of the subsystems. It takes gesture input from the Gesture Input subsystem, identifies them and gives output through the Output subsystem in text and speech. The separate recognition processes for STM and ANN are disused below. 3.7.1 STATISTICAL TEMPLATE MATCHING The statistical model used in Boltay Haath is the simplest approach to recognize postures (static gestures) [8], [9]. The model used is known as “Template Matching” or “Prototype Matching” [10]. The idea is to demarcate different gestures by calculating the mean (µ) and standard deviations (σ) of all the sensors for a gesture and then those input samples that are within limits bounded by an integral multiple of standard deviation are recognized to be correct. Gesture boundary [11] for each sensor is defined as, (3.3) Here, µ is the mean and σ is the standard deviation of that sensor whose gesture boundary is to be defined. Similarly gesture boundaries for each sensor of all the gestures are defined and used in Pattern Matching. 3.7.1.1 ALGORITHMIC MODEL a) PATTERN RECOGNITION After Statistical Template Specification (STS), test samples are provided to the pattern recognition module, which analyzes them using the statistical model [12]. The upper and lower limits for the value of a sensor for a particular gesture are defined using the standard deviation for that sensor previously calculated. For enhancing the accuracy of gesture recognition, various integral multiples of σ are used, denoted by k in (3.3). The limits for any given gesture are defined as: (3.4) - (3.5) Given the above-mentioned criteria, any given input can be classified as a particular gesture if all the sensor values of the test sample lie within these limits (i.e. the gesture boundary). These values are retrieved from the gesture database. The values of k used for gesture recognition in Boltay Haath range from 1 to 3, providing tolerances ranging from 2σ to 6σ. The performance achieved by varying the values of k is discussed later in Testing and Verification. b) AMBIGUITY AMONG OUTPUTS Sometimes due to ambiguity among two or more gestures STM may produce multiple outputs. The ambiguity is created due to the overlapping of different gesture boundaries. The overlapping increases as the value of k is increased from 1 to 3.To cater to this Fig 3.7 Ambiguous Signs -%
    • #" $%%& Sir Syed University of Engineering & Technology problem the method of Least Mean Squares (LMS) is used. Figure 3.7 shows two ambiguous signs ‘R’ and ‘H’. c) LMS FOR REMOVING AMBIGUITY There are cases where more than one gestures are candidates for output. To overcome this type of situation the system calculates Least Mean Squares (LMS) [13] of all the candidate gestures and then selects the one with minimum LMS value. It is calculated as, (3.6) Here, xi denotes the sensor value of the ith sensor from test sample; µi denote mean value for the ith sensor. LMS for each candidate gesture is calculated and the gesture with least LMS value is selected as the final output. The use of LMS is justified by the results. Analyzing the performance of the system it has been observed that the use of LMS provides accurate results 60 % of the time. Figure 3.8 – STM Gesture Recognition Sequence Diagram --
    • #" $%%& Sir Syed University of Engineering & Technology 3.7.2 ARTIFICIAL NEURAL NETWORK (ANN) In this mode, the GRE takes input data and feeds it to multiple artificial neural networks in parallel. The approach taken is to initially process the input data so as to produce a description in terms of the various features (handshape, orientation and motion) of a sign. The sign can then be classified on the basis of the feature vector thus produced. This mode uses our Artificial Neural Network Library (ANNLIB) to run Multi Layer Perceptrons (MLPs) for recognition. 3.7.2.1 COMMITTEE SYSTEM FOR RECOGNITION The various experts (neural networks for each gesture) were trained using a “one against all” technique in which each network is trained for a particular sign to give a positive response for that sign and a negative one for all the others. So in the final system all the experts have the same architecture and are given the same input. Fig 3.9 shows the committee system used in the system. Output Voting Mechanism Input Figure 3.9 - Committee System a) ARCHITECTURE OF EXPERTS The architecture of the experts used in the committee system is 5:8:1 i.e., 5 inputs, 8 hidden nodes and 1 output node. The activation function for nodes in the hidden layer was Sigmoid Logistic and Hyperbolic for output nodes. b) VOTING MECHANISM The voting mechanism takes the output of all the experts as its input. It identifies the resultant gesture by examining the outputs of all the experts and selecting the one with a positive result. -$
    • #" $%%& Sir Syed University of Engineering & Technology c) FINAL CLASSIFICATION Since the experts could not be optimally trained, multiple experts can give a positive result. To solve this problem the results of each expert can be multiplied with its accuracy by the voting mechanism to give more weightage to the result of more accurate experts over less accurate ones. And finally the output with the largest positive value is selected as recognized gesture. Figure 3.10– ANN Gesture Recognition Sequence Diagram 3.8 OUTPUT The output of the system has two forms, one is the formatted text and other is in the form of speech. Obviously the important one is the speech output as it accomplishes the objective of the system. -'
    • #" $%%& Sir Syed University of Engineering & Technology 3.8.1 TEXT This subsystem outputs recognized sign into formatted text. The Urdu sign is output into Roman Urdu† language. This text output is then used by Text-to-speech module for its processing. 3.8.2 SPEECH The text-to-speech subsystem converts text into synthesized Urdu / English speech using Microsoft Speech SDK version 5.1[14]. The lexicon and phoneme sets have been modified so that it can pronounce words correctly in local accent. Figure 3.11– Speech Output Sequence Diagram 3.9 ACCELEROMETER Accelerometer is used for detecting a sign continuously without any need of manual aid for indicating the start or end of a gesture. It automatically identifies the ending of one gesture and start of the other. In this way gestures are recognized in continuous fashion. 3.9.1 ALGORITHMIC MODEL Acceleration is calculated for each sensor by averaging the differences between the last n inputs (in a sliding window fashion). When the acceleration of all the sensors is below a certain threshold value, the system identifies the state of hand as stationary and sends the sensor values for recognition to the engine. As soon as the acceleration exceeds the threshold value the system marks the hand as in motion and stops recognition. The sliding window size and the threshold values are adjusted so that the user need not make a deliberate effort to stop † Roman Urdu - Urdu written with the use of English alphabets. -(
    • #" $%%& Sir Syed University of Engineering & Technology for sometime in order to get the sign recognized. It is accessed by user through an accelerometer interface. The user can set the threshold value and sliding window size according to his/her needs. Window size = 8 150 151 152 153 154 156 159 161 165 168 172 178 182 185 186 190 2 3 2 4 3 4 6 Threshold = 1.8 A = 2+3+2+4+3+4+6 = 24 = 3.43 7 7 A> Threshold therefore hand is in motion Figure 3.12(a) - Motion Detection (Hand in motion) The above figure shows how the accelerometer determines that the hand is in motion. The sliding window shows the state of a sliding window for a single sensor. The differences are shown in triangles. The average of these differences is above the threshold value. Hence the system identifies the sensor to be in motion. Window size = 8 150 151 152 153 154 155 155 156 157 158 158 159 160 161 161 161 1 0 1 1 1 0 1 Threshold = 1.8 A = 1+0+1+1+1+0+1 = 5 = 0.714 7 7 A< Threshold therefore hand is stationary Figure 3.12(b) - Motion Detection (Hand is stationary) The above figure shows how the accelerometer determines that the hand is stationary. The sliding window shows the state of a sliding window for a single sensor. The average of these differences is below the threshold value. Hence the system identifies the sensor to be stationary. The accelerometer is used on a per sensor basis. So for five sensors, five accelerometer objects are used and each is continuously provided with its corresponding sensor value. The accelerometer design used till now is limited to either static gestures or dynamic gestures. -&
    • #" $%%& Sir Syed University of Engineering & Technology 3.10 DEVELOPED TOOLS In the course of developing the system various tools and libraries were developed. This includes wrapper class for glove driver (5DT Data Glove5 driver) in C#, its driver was written originally in VC++ (unmanaged) and was converted in to C# (managed code). An Artificial Neural Networks library ANNLib was written in C#. Also, a performance evaluation tool for evaluating performance of STM and ANN recognition systems efficiency was developed. 3.10.1 PERFORMANCE EVALUATION TOOL FOR GRE (PETool) The PETool evaluates the results obtained by applying the test data to gesture recognition engines of STM and ANN and generate reports and graphical view of data for performance evaluation purpose. The simulation data is used to evaluate whether the current configurations of STM and ANN are providing acceptable results or not. 3.11 TRADEOFFS Many tradeoffs regarding accuracy and efficiency were made during the design and implementation of the system. A major issue was the training of neural networks. The amount of training data, the optimal architecture of the neural networks and the classification mechanism were a few considerations. For quick training the training data set needed to be small and for greater accuracy more data was needed but more time was required for training. The quality of training data was of major concern for STM as well as ANN. The greater the number of registered users, the better the generalization. But more data does not come without its share of bad samples. In STM recognition, gesture boundaries of sensors are defined as µ ± kσ , the system uses k=3 after trying all values of 1, 2 and 3. This model µ ± 3σ covers large variation of data (up to 6-sigma) but at the same time increases the overlapping of different gestures. This overlapping of gestures creates ambiguity among outputs that has to be removed with the use of LMS. A similar case can be made for speech output. Text to speech output provides an efficient way of producing speech output but the quality of sound produced is not at par with pre- recorded human voice. However, recorded voices incur a heavy processing cost on the system when it comes to real-time recognition. The accelerometer is used to filter the data stream coming form the Data Glove in a thread. So obviously the performance of the thread will degrade if a decision making block is executed at each cycle. Same is the case with the accelerometer component [15]. 3.12 IMPLEMENTATION TOOLS Boltay Haath system has been developed in C# using Visual Studio .Net 2002. The gesture database was maintained on a MS Access database file. Windows being the platform for the project, all the user interfaces and input components are standard Windows objects. Microsoft Speech SDK 5.1 was used for speech output. -)
    • #" $%%& Sir Syed University of Engineering & Technology 3.13 COST Cost of components used in this project is given below. Item Cost 5DT Data Glove 5 $ 300 3.14 TESTING AND VERIFICATION The sub systems were tested separately to check their performance in various scenarios. Because Boltay Haath has a highly modular design, top-down and bottom-up integration occurred simultaneously. However, the system was integrated incrementally, to control the amount of bugs that need to be fixed at any given time. Tests conducted in a black box fashion. Finally it was tested that software meets the performance criteria set during design system specification. These tests were performed signers facility since it was deemed to know if the hardware available meets the performance criteria. 3.14.1 TEST FOR RECOGNITION ACCURACY The results were obtained using the PETool that was specially developed for measuring performance of the system. a) STATISTICAL TEMPLATE MATCHING TEST RESULTS Domain Accuracy (%) k=1 k=2 k=3 English Alphabets 24 73 80 Urdu Alphabets 24 84 88 Table 3.1 – Performance Result (Statistical Template Matching) Figure 3.13(a) – Alphabet wise recognition accuracy for STM - English -*
    • #" $%%& Sir Syed University of Engineering & Technology Figure 3.13 (b) – Alphabet wise recognition accuracy for STM - Urdu b) ANN COMMITTEE SYSTEM TEST RESULTS Domain Accuracy (%) 24 Handshapes 84 Table 3.2 – Performance Result (ANN classification with committee system) Figure 3.14 – Alphabet wise accuracy for ANN Committee System - English 3.14.2 TEST FOR RECOGNITION TIME Multiple gestures were provided to the system in sequence and the average time was calculated using the system clock. Under normal conditions the average recognition time was 0.4 seconds. -+
    • #" $%%& Sir Syed University of Engineering & Technology 3.14.3 TEST FOR SYNCHRONIZED SPEECH SYNTHESIS This performance parameter was measured using an external timing device and was found to be within the prescribed limits. 3.14.4 TEST FOR CONTINUSOUS RECOGNITION The system is able to distinguish between consecutive gestures using the accelerometer component. 4. SUMMARY Deaf and dumb people rely on sign language interpreters for communication. However, they cannot depend on interpreters in every day life mainly due to the high costs involved and the difficulty in finding qualified interpreters. This system will help disabled persons in improving their quality of life significantly. The automatic recognition of sign language is an attractive prospect; the technology exists to make it possible, while the potential applications are exciting and worthwhile. To date the research emphasis has been on the capture and classification of the gestures of sign language. This project will be a valuable addition to the ongoing research in the field of Human Computer Interface (HCI). The Boltay Haath system has been shown to work for Pakistan Signing Language (PSL) without invoking complex hand models. The results obtained indicate that the system is able to recognize signs efficiently with a good percentage of success. Future research regarding Boltay Haath will address more complex gestures, such as those gestures involving two hands. System will be investigated by other ways to model the gesture dynamics, such as HMMs that achieve minimal classification errors. Dynamic gestures and online training are the two most attractive features left for future. Several new directions have been identified through which this work could be expanded in the near future. The techniques developed are not specific to PSL, and so the system could easily be adapted to other sign languages or for other gesture recognition systems (for example, as part of a VR interface, telemetry or robotic control). It can be considered as a step towards applications which provide user interface based on hand gestures. One aspect of communication which could not be handled in Boltay Haath is two way communication. Currently Boltay Haath can convey words from the signer to the listener and not the other way around. One future enhancement would be to enable two way communication. The Boltay Haath system is now almost complete. Though many enhancements and optimizations can be made to make it better. On the whole 83 gestures have been recognized. This number can be increased as and when required by user. -,
    • #" $%%& Sir Syed University of Engineering & Technology 5. REFERENCES [1] R.S Pressman, Software Engineering: A Practitioner’s Approach, Fourth Edition, McGraw-HILL International, 1997 [2] Vesa-Matti, Mantyla, Jani Mantyjarvi, Tapio Seppanen, Esa tuulari. 2000, “Hand Gesture Recognition of a mobile device user”, 2000 IEEE pp.281-284. [3] Kadous, Waleed “GRASP: Recognition of Australian sign language using Instrumented gloves”, Australia, October 1995,pp. 1-2,4-8. [4] Murakami and Taguchi, “Gesture Recognition using Recurrent Neural Networks”. CHI ' Conference 91 Proceedings, pp.237--242. Human Interface Laboratory, Fujitsu Laboratories, ACM, 1991. [5] Sulman Nasir, Sadaf Zuberi, “Pakistan Sign Language – A Synopsis“, Pakistan, June 2000. [6] Simon Haykin, Neural networks: A Comprehensive Foundation, Second Edition, McMaster University, pp. 142. [7] Peter W. Vamplew, Recognition of Sign Language Using Neural Networks, University of Tasmania, May 1996, Pp. 98 [8] Corradini, Andrea, Horst-Michael Gross. 2000, “A Hybrid Stochastic-Connectionist Architecture for Gesture Reognition”, 2000 IEEE, 336-341. [9] K.S. Fu, Syntactic Pattern Recognition, Prentice-Hall 1981, pp. 75-80 [10] Corradini, Andrea Horst-Michael Gross. 2000, “Camera-based Gesture Recognition for Robot Control”, 2000 IEEE, pp.133-138. [11] Sommerville, I, Software Engineering (6th Ed.), published by: Addison Wesley, chap. 1, pp. 8. [12] I., Wachsmuth, T. Sowa (Eds.), “Towards an Automatic Sign Language Recognition System using Subunits”, London, April 2001, pp. 1-2 [13] Liskov, Barbara, Program Development in Java, chap 11, pp. 356. [14] The Microsoft Speech Website, www.microsoft.com/speech [15] Richard, Watson, “A survey of Gesture Recognition Techniques Technical Report”, Trinity College, Dublin, July 1993, pp. 6 $%