In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
A survey of named entity recognition in assamese and other indian languagesijnlc
Named Entity Recognition is always important when dealing with major Natural Language Processing
tasks such as information extraction, question-answering, machine translation, document summarization
etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular
reference to Assamese. There are various rule-based and machine learning approaches available for
Named Entity Recognition. At the very first of the paper we give an idea of the available approaches for
Named Entity Recognition and then we discuss about the related research in this field. Assamese like other
Indian languages is agglutinative and suffers from lack of appropriate resources as Named Entity
Recognition requires large data sets, gazetteer list, dictionary etc and some useful feature like
capitalization as found in English cannot be found in Assamese. Apart from this we also describe some of
the issues faced in Assamese while doing Named Entity Recognition.
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
A survey of named entity recognition in assamese and other indian languagesijnlc
Named Entity Recognition is always important when dealing with major Natural Language Processing
tasks such as information extraction, question-answering, machine translation, document summarization
etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular
reference to Assamese. There are various rule-based and machine learning approaches available for
Named Entity Recognition. At the very first of the paper we give an idea of the available approaches for
Named Entity Recognition and then we discuss about the related research in this field. Assamese like other
Indian languages is agglutinative and suffers from lack of appropriate resources as Named Entity
Recognition requires large data sets, gazetteer list, dictionary etc and some useful feature like
capitalization as found in English cannot be found in Assamese. Apart from this we also describe some of
the issues faced in Assamese while doing Named Entity Recognition.
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...IJERA Editor
This research paper reports preliminary results of data-driven modeling of segmentalphoneme duration for
Marathi. Classification and Regression Tree based data driven duration modeling for segmental duration
prediction is presented. A number of features are considered and their usefulness and relative contribution for
segmental duration prediction is assessed. Objective evaluation of the duration model, by root mean squared
prediction error and correlation between actual and predicted durations, is performed.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...kevig
Many Natural Language Processing (NLP) applications involve Named Entity Recognition (NER) as an important task, where it leads to improve the overall performance of NLP applications. In this paper the Deep learning techniques are used to perform NER task on Hindi text data as it found that as compared to English NER, Hindi language NER is not sufficiently done. This is a barrier for resource-scarce languages as many resources are not readily available. Many researchers use various techniques such as rule based, machine learning based and hybrid approaches to solve this problem. Deep learning based algorithms are being developed in large scale as an innovative approach now a days for the advanced NER models which will give the best results out of it. In this paper we devise a Novel architecture based on residual network architecture for preferably Bidirectional Long Short Term Memory (BiLSTM) with fasttext word embedding layers. For this purpose we use pre-trained word embedding to represent the words in the corpus where the NER tags of the words are defined as the used annotated corpora. BiLSTM Development of an NER system for Indian languages is a comparatively difficult task. In this paper, we have done the various experiments to compare the results of NER with normal embedding and fasttext embedding layers to analyse the performance of word embedding with different batch sizes to train the deep learning models. Here we present a state-of-the-art results with said approach F1 Score measures.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
A Context-based Numeral Reading Technique for Text to Speech Systems IJECEIAES
This paper presents a novel technique for context based numeral reading in Indian language text to speech systems. The model uses a set of rules to determine the context of the numeral pronunciation and is being integrated with the waveform concatenation technique to produce speech out of the input text in Indian languages. For this purpose, the three Indian languages Odia, Hindi and Bengali are considered. To analyze the performance of the proposed technique, a set of experiments are performed considering different context of numeral pronunciations and the results are compared with existing syllable-based technique. The results obtained from different experiments shows the effectiveness of the proposed technique in producing intelligible speech out of the entered text utterances compared to the existing technique even with very less storage and execution time.
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Part of Speech tagging in Indian Languages is still an open problem. We still lack a clear approach in implementing a POS tagger for Indian Languages. In this paper we describe our efforts to build a Hidden Markov Model based Part of Speech Tagger. We have used IL POS tag set for the development of this tagger. We have achieved the accuracy of 92%
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
A Proposed Web Accessibility Framework for the Arab DisabledWaqas Tariq
The Web is providing unprecedented access to information and interaction for people with disabilities. This paper presents a Web accessibility framework which offers the ease of the Web accessing for the disabled Arab users and facilitates their lifelong learning as well. The proposed framework system provides the disabled Arab user with an easy means of access using their mother language so they don’t have to overcome the barrier of learning the target-spoken language. This framework is based on analyzing the web page meta-language, extracting its content and reformulating it in a suitable format for the disabled users. The basic objective of this framework is supporting the equal rights of the Arab disabled people for their access to the education and training with non disabled people. Key Words : Arabic Moon code, Arabic Sign Language, Deaf, Deaf-blind, E-learning Interactivity, Moon code, Web accessibility , Web framework , Web System, WWW.
Human Arm Inverse Kinematic Solution Based Geometric Relations and Optimizati...Waqas Tariq
Kinematics for robotic systems with many degrees of freedom (DOF) and high redundancy are still an open issue. Namely, computation time in robotic applications is often too high to reach good solution, for parts of the kinematic chain; the problem of inverse kinematics is not linear, as rotations are involved. This means that analytical solutions are only available in limited situations. In all other cases, alternative methods will have to be an employed.The most-used alternative is numerical solutions optimization. This paper presents a strategy based on combine’s analytical solutions with nonlinear optimization algorithm solutions to solution the IKP. A analytical solutions is used to reduce the size of problem from seven variable of joint angle to single variable and nonlinear optimization algorithm was used to find approximate solution which make the computation time is very small
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...Waqas Tariq
A paradox has been observed whereby web site usability is proven to be an essential element in a web site, yet at the same time there exist an abundance of web pages with poor usability. This discrepancy is the result of limitations that are currently preventing web developers in the commercial sector from producing usable web sites. In this paper we propose a framework whose objective is to alleviate this problem by automating certain aspects of the usability evaluation process. Mainstreaming comes as a result of automation, therefore enabling a non-expert in the field of usability to conduct the evaluation. This results in reducing the costs associated with such evaluation. Additionally, the framework allows the flexibility of adding, modifying or deleting guidelines without altering the code that references them since the guidelines and the code are two separate components. A comparison of the evaluation results carried out using the framework against published evaluations of web sites carried out by web site usability professionals reveals that the framework is able to automatically identify the majority of usability violations. Due to the consistency with which it evaluates, it identified additional guideline-related violations that were not identified by the human evaluators.
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 BoardWaqas Tariq
The main aim of this paper is to build a system that is capable of detecting and recognizing the hand gesture in an image captured by using a camera. The system is built based on Altera’s FPGA DE2 board, which contains a Nios II soft core processor. Image processing techniques and a simple but effective algorithm are implemented to achieve this purpose. Image processing techniques are used to smooth the image in order to ease the subsequent processes in translating the hand sign signal. The algorithm is built for translating the numerical hand sign signal and the result are displayed on the seven segment display. Altera’s Quartus II, SOPC Builder and Nios II EDS software are used to construct the system. By using SOPC Builder, the related components on the DE2 board can be interconnected easily and orderly compared to traditional method that requires lengthy source code and time consuming. Quartus II is used to compile and download the design to the DE2 board. Then, under Nios II EDS, C programming language is used to code the hand sign translation algorithm. Being able to recognize the hand sign signal from images can helps human in controlling a robot and other applications which require only a simple set of instructions provided a CMOS sensor is included in the system.
Our research aims to propose a global approach for specification, design and verification of context awareness Human Computer Interface (HCI). This is a Model Based Design approach (MBD). This methodology describes the ubiquitous environment by ontologies. OWL is the standard used for this purpose. The specification and modeling of Human-Computer Interaction are based on Petri nets (PN). This raises the question of representation of Petri nets with XML. We use for this purpose, the standard of modeling PNML. In this paper, we propose an extension of this standard for specification, generation and verification of HCI. This extension is a methodological approach for the construction of PNML with Petri nets. The design principle uses the concept of composition of elementary structures of Petri nets as PNML Modular. The objective is to obtain a valid interface through verification of properties of elementary Petri nets represented with PNML.
The Use of Java Swing’s Components to Develop a WidgetWaqas Tariq
Widget is a kind of application provides a single service such as a map, news feed, simple clock, battery-life indicators, etc. This kind of interactive software object has been developed to facilitate user interface (UI) design. A user interface (UI) function may be implemented using different widgets with the same function. In this article, we present the widget as a platform that is generally used in various applications, such as in desktop, web browser, and mobile phone. We also describe a visual menu of Java Swing’s components that will be used to establish widget. It will assume that we have successfully compiled and run a program that uses Swing components.
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...IJERA Editor
This research paper reports preliminary results of data-driven modeling of segmentalphoneme duration for
Marathi. Classification and Regression Tree based data driven duration modeling for segmental duration
prediction is presented. A number of features are considered and their usefulness and relative contribution for
segmental duration prediction is assessed. Objective evaluation of the duration model, by root mean squared
prediction error and correlation between actual and predicted durations, is performed.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...kevig
Many Natural Language Processing (NLP) applications involve Named Entity Recognition (NER) as an important task, where it leads to improve the overall performance of NLP applications. In this paper the Deep learning techniques are used to perform NER task on Hindi text data as it found that as compared to English NER, Hindi language NER is not sufficiently done. This is a barrier for resource-scarce languages as many resources are not readily available. Many researchers use various techniques such as rule based, machine learning based and hybrid approaches to solve this problem. Deep learning based algorithms are being developed in large scale as an innovative approach now a days for the advanced NER models which will give the best results out of it. In this paper we devise a Novel architecture based on residual network architecture for preferably Bidirectional Long Short Term Memory (BiLSTM) with fasttext word embedding layers. For this purpose we use pre-trained word embedding to represent the words in the corpus where the NER tags of the words are defined as the used annotated corpora. BiLSTM Development of an NER system for Indian languages is a comparatively difficult task. In this paper, we have done the various experiments to compare the results of NER with normal embedding and fasttext embedding layers to analyse the performance of word embedding with different batch sizes to train the deep learning models. Here we present a state-of-the-art results with said approach F1 Score measures.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
A Context-based Numeral Reading Technique for Text to Speech Systems IJECEIAES
This paper presents a novel technique for context based numeral reading in Indian language text to speech systems. The model uses a set of rules to determine the context of the numeral pronunciation and is being integrated with the waveform concatenation technique to produce speech out of the input text in Indian languages. For this purpose, the three Indian languages Odia, Hindi and Bengali are considered. To analyze the performance of the proposed technique, a set of experiments are performed considering different context of numeral pronunciations and the results are compared with existing syllable-based technique. The results obtained from different experiments shows the effectiveness of the proposed technique in producing intelligible speech out of the entered text utterances compared to the existing technique even with very less storage and execution time.
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Part of Speech tagging in Indian Languages is still an open problem. We still lack a clear approach in implementing a POS tagger for Indian Languages. In this paper we describe our efforts to build a Hidden Markov Model based Part of Speech Tagger. We have used IL POS tag set for the development of this tagger. We have achieved the accuracy of 92%
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
A Proposed Web Accessibility Framework for the Arab DisabledWaqas Tariq
The Web is providing unprecedented access to information and interaction for people with disabilities. This paper presents a Web accessibility framework which offers the ease of the Web accessing for the disabled Arab users and facilitates their lifelong learning as well. The proposed framework system provides the disabled Arab user with an easy means of access using their mother language so they don’t have to overcome the barrier of learning the target-spoken language. This framework is based on analyzing the web page meta-language, extracting its content and reformulating it in a suitable format for the disabled users. The basic objective of this framework is supporting the equal rights of the Arab disabled people for their access to the education and training with non disabled people. Key Words : Arabic Moon code, Arabic Sign Language, Deaf, Deaf-blind, E-learning Interactivity, Moon code, Web accessibility , Web framework , Web System, WWW.
Human Arm Inverse Kinematic Solution Based Geometric Relations and Optimizati...Waqas Tariq
Kinematics for robotic systems with many degrees of freedom (DOF) and high redundancy are still an open issue. Namely, computation time in robotic applications is often too high to reach good solution, for parts of the kinematic chain; the problem of inverse kinematics is not linear, as rotations are involved. This means that analytical solutions are only available in limited situations. In all other cases, alternative methods will have to be an employed.The most-used alternative is numerical solutions optimization. This paper presents a strategy based on combine’s analytical solutions with nonlinear optimization algorithm solutions to solution the IKP. A analytical solutions is used to reduce the size of problem from seven variable of joint angle to single variable and nonlinear optimization algorithm was used to find approximate solution which make the computation time is very small
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...Waqas Tariq
A paradox has been observed whereby web site usability is proven to be an essential element in a web site, yet at the same time there exist an abundance of web pages with poor usability. This discrepancy is the result of limitations that are currently preventing web developers in the commercial sector from producing usable web sites. In this paper we propose a framework whose objective is to alleviate this problem by automating certain aspects of the usability evaluation process. Mainstreaming comes as a result of automation, therefore enabling a non-expert in the field of usability to conduct the evaluation. This results in reducing the costs associated with such evaluation. Additionally, the framework allows the flexibility of adding, modifying or deleting guidelines without altering the code that references them since the guidelines and the code are two separate components. A comparison of the evaluation results carried out using the framework against published evaluations of web sites carried out by web site usability professionals reveals that the framework is able to automatically identify the majority of usability violations. Due to the consistency with which it evaluates, it identified additional guideline-related violations that were not identified by the human evaluators.
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 BoardWaqas Tariq
The main aim of this paper is to build a system that is capable of detecting and recognizing the hand gesture in an image captured by using a camera. The system is built based on Altera’s FPGA DE2 board, which contains a Nios II soft core processor. Image processing techniques and a simple but effective algorithm are implemented to achieve this purpose. Image processing techniques are used to smooth the image in order to ease the subsequent processes in translating the hand sign signal. The algorithm is built for translating the numerical hand sign signal and the result are displayed on the seven segment display. Altera’s Quartus II, SOPC Builder and Nios II EDS software are used to construct the system. By using SOPC Builder, the related components on the DE2 board can be interconnected easily and orderly compared to traditional method that requires lengthy source code and time consuming. Quartus II is used to compile and download the design to the DE2 board. Then, under Nios II EDS, C programming language is used to code the hand sign translation algorithm. Being able to recognize the hand sign signal from images can helps human in controlling a robot and other applications which require only a simple set of instructions provided a CMOS sensor is included in the system.
Our research aims to propose a global approach for specification, design and verification of context awareness Human Computer Interface (HCI). This is a Model Based Design approach (MBD). This methodology describes the ubiquitous environment by ontologies. OWL is the standard used for this purpose. The specification and modeling of Human-Computer Interaction are based on Petri nets (PN). This raises the question of representation of Petri nets with XML. We use for this purpose, the standard of modeling PNML. In this paper, we propose an extension of this standard for specification, generation and verification of HCI. This extension is a methodological approach for the construction of PNML with Petri nets. The design principle uses the concept of composition of elementary structures of Petri nets as PNML Modular. The objective is to obtain a valid interface through verification of properties of elementary Petri nets represented with PNML.
The Use of Java Swing’s Components to Develop a WidgetWaqas Tariq
Widget is a kind of application provides a single service such as a map, news feed, simple clock, battery-life indicators, etc. This kind of interactive software object has been developed to facilitate user interface (UI) design. A user interface (UI) function may be implemented using different widgets with the same function. In this article, we present the widget as a platform that is generally used in various applications, such as in desktop, web browser, and mobile phone. We also describe a visual menu of Java Swing’s components that will be used to establish widget. It will assume that we have successfully compiled and run a program that uses Swing components.
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...Waqas Tariq
Mobile multimedia service is relatively new but has quickly dominated people¡¯s lives, especially among young people. To explain this popularity, this study applies and modifies the Technology Acceptance Model (TAM) to propose a research model and conduct an empirical study. The goal of study is to examine the role of Perceived Enjoyment (PE) and what determinants can contribute to PE in the context of using mobile multimedia service. The result indicates that PE is influencing on Perceived Usefulness (PU) and Perceived Ease of Use (PEOU) and directly Behavior Intention (BI). Aesthetics and flow are key determinants to explain Perceived Enjoyment (PE) in mobile multimedia usage.
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...Waqas Tariq
There is growing ageing phenomena with the rise of ageing population throughout the world. According to the World Health Organization (2002), the growing ageing population indicates 694 million, or 223% is expected for people aged 60 and over, since 1970 and 2025.The growth is especially significant in some advanced countries such as North America, Japan, Italy, Germany, United Kingdom and so forth. This growing older adult population has significantly impact the social-culture, lifestyle, healthcare system, economy, infrastructure and government policy of a nation. However, there are limited research studies on the perception and usage of a mobile phone and its service for senior citizens in a developing nation like Malaysia. This paper explores the relationship between mobile phones and senior citizens in Malaysia from the perspective of a developing country. We conducted an exploratory study using contextual interviews with 5 senior citizens of how they perceive their mobile phones. This paper reveals 4 interesting themes from this preliminary study, in addition to the findings of the desirable mobile requirements for local senior citizens with respect of health, safety and communication purposes. The findings of this study bring interesting insight to local telecommunication industries as a whole, and will also serve as groundwork for more in-depth study in the future.
Virtual teams are used more and more by companies and other organizations to receive benefits. They are a great way to enable teamwork in situations where people are not sitting in the same physical place at the same time. As companies seek to increase the use of virtual teams, a need exists to explore the context of these teams, the virtuality of a team and software that may help these teams working virtualy. Virtual teams have the same basic principles as traditional teams, but there is one big difference. This difference is the way the team members communicate. Instead of using the dynamics of in-office face-to-face exchange, they now rely on special communication channels enabled by modern technologies, such as e-mails, faxes, phone calls and teleconferences, virtual meetings etc. This is why this paper is focused on the issues regarding virtual teams, and how these teams are created and progressing in Albania.
3D Human Hand Posture Reconstruction Using a Single 2D ImageWaqas Tariq
Passive sensing of the 3D geometric posture of the human hand has been studied extensively over the past decade. However, these research efforts have been hampered by the computational complexity caused by inverse kinematics and 3D reconstruction. In this paper, our objective focuses on 3D hand posture estimation based on a single 2D image with aim of robotic applications. We introduce the human hand model with 27 degrees of freedom (DOFs) and analyze some of its constraints to reduce the DOFs without any significant degradation of performance. A novel algorithm to estimate the 3D hand posture from eight 2D projected feature points is proposed. Experimental results using real images confirm that our algorithm gives good estimates of the 3D hand pose. Keywords: 3D hand posture estimation; Model-based approach; Gesture recognition; human- computer interface; machine vision.
An overview on Advanced Research Works on Brain-Computer InterfaceWaqas Tariq
A brain–computer interface (BCI) is a proficient result in the research field of human- computer synergy, where direct articulation between brain and an external device occurs resulting in augmenting, assisting and repairing human cognitive. Advanced works like generating brain-computer interface switch technologies for intermittent (or asynchronous) control in natural environments or developing brain-computer interface by Fuzzy logic Systems or by implementing wavelet theory to drive its efficacies are still going on and some useful results has also been found out. The requirements to develop this brain machine interface is also growing day by day i.e. like neuropsychological rehabilitation, emotion control, etc. An overview on the control theory and some advanced works on the field of brain machine interface are shown in this paper.
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...Waqas Tariq
Camera mouse has been widely used for handicap person to interact with computer. The utmost important of the use of camera mouse is must be able to replace all roles of typical mouse and keyboard. It must be able to provide all mouse click events and keyboard functions (include all shortcut keys) when it is used by handicap person. Also, the use of camera mouse must allow users troubleshooting by themselves. Moreover, it must be able to eliminate neck fatigue effect when it is used during long period. In this paper, we propose camera mouse system with timer as left click event and blinking as right click event. Also, we modify original screen keyboard layout by add two additional buttons (button “drag/ drop” is used to do drag and drop of mouse events and another button is used to call task manager (for troubleshooting)) and change behavior of CTRL, ALT, SHIFT, and CAPS LOCK keys in order to provide shortcut keys of keyboard. Also, we develop recovery method which allows users go from camera and then come back again in order to eliminate neck fatigue effect. The experiments which involve several users have been done in our laboratory. The results show that the use of our camera mouse able to allow users do typing, left and right click events, drag and drop events, and troubleshooting without hand. By implement this system, handicap person can use computer more comfortable and reduce the dryness of eyes.
Principles of Good Screen Design in WebsitesWaqas Tariq
Visual techniques for proper arrangement of the elements on the user screen have helped the designers to make the screen look good and attractive. Several visual techniques emphasize the arrangement and ordering of the screen elements based on particular criteria for best appearance of the screen. This paper investigates few significant visual techniques in various web user interfaces and showcases the results for better understanding and their presence.
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...Waqas Tariq
A robot arm utilized having meal support system based on computer input by human eyes only is proposed. The proposed system is developed for handicap/disabled persons as well as elderly persons and tested with able persons with several shapes and size of eyes under a variety of illumination conditions. The test results with normal persons show the proposed system does work well for selection of the desired foods and for retrieve the foods as appropriate as usersf requirements. It is found that the proposed system is 21% much faster than the manually controlled robotics.
Collaborative Learning of Organisational KnolwedgeWaqas Tariq
This paper presents recent research into methods used in Australian Indigenous Knowledge sharing and looks at how these can support the creation of suitable collaborative envi- ronments for timely organisational learning. The protocols and practices as used today and in the past by Indigenous communities are presented and discussed in relation to their relevance to a personalised system of knowledge sharing in modern organisational cultures. This research focuses on user models, knowledge acquisition and integration of data for constructivist learning in a networked repository of or- ganisational knowledge. The data collected in the repository is searched to provide collections of up-to-date and relevant material for training in a work environment. The aim is to improve knowledge collection and sharing in a team envi- ronment. This knowledge can then be collated into a story or workflow that represents the present knowledge in the organisation.
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...Waqas Tariq
It is a well established fact that the Web-Applications require frequent maintenance because of cutting– edge business competitions. The authors have worked on quality evaluation of web-site of Indian ecommerce domain. As a result of that work they have made a quality-wise ranking of these sites. According to their work and also the survey done by various other groups Futurebazaar web-site is considered to be one of the best Indian e-shopping sites. In this research paper the authors are assessing the maintenance of the same site by incorporating the problems incurred during this evaluation. This exercise gives a real world maintainability problem of web-sites. This work will give a clear picture of all the quality metrics which are directly or indirectly related with the maintainability of the web-site.
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...Waqas Tariq
A method for computer input with human eyes-only using two Purkinje images which works in a real time basis without calibration is proposed. Experimental results shows that cornea curvature can be estimated by using two light sources derived Purkinje images so that no calibration for reducing person-to-person difference of cornea curvature. It is found that the proposed system allows usersf movements of 30 degrees in roll direction and 15 degrees in pitch direction utilizing detected face attitude which is derived from the face plane consisting three feature points on the face, two eyes and nose or mouth. Also it is found that the proposed system does work in a real time basis.
Inverse Kinematics Analysis for Manipulator Robot with Wrist Offset Based On ...Waqas Tariq
This paper presents an algorithm to solve the inverse kinematics for a six degree of freedom (6 DOF) manipulator robot with wrist offset. This type of robot has a complex inverse kinematics, which needs a long time for such calculation. The proposed algorithm starts from find the wrist point by vectors computation then compute the first three joint angles and after that compute the wrist angles by analytic solution. This algorithm is tested for the TQ MA2000 manipulator robot as case study. The obtained results was compared with results of rotational vector algorithm where both algorithms have the same accuracy but the proposed algorithm saving round about 99.6% of the computation time required by the rotational vector algorithm, which leads to used this algorithm in real time robot control.
Real Time Blinking Detection Based on Gabor FilterWaqas Tariq
New method of blinking detection is proposed. The utmost important of blinking detections method is robust against different users, noise, and also change of eye shape. In this paper, we propose blinking detections method by measuring of distance between two arcs of eye (upper part and lower part). We detect eye arcs by apply Gabor filter onto eye image. As we know that Gabor filter has advantage on image processing application since it able to extract spatial localized spectral features, such line, arch, and other shape are more easily detected. After two of eye arcs are detected, we measure the distance between both by using connected labeling method. The open eye is marked by the distance between two arcs is more than threshold and otherwise, the closed eye is marked by the distance less than threshold. The experiment result shows that our proposed method robust enough against different users, noise, and eye shape changes with perfectly accuracy.
This paper presents an unsupervised approach for the development of a stemmer (For the case of Urdu &
Marathi language). Especially, during last few years, a wide range of information in Indian regional
languages has been made available on web in the form of e-data. But the access to these data repositories
is very low because the efficient search engines/retrieval systems supporting these languages are very
limited. Hence automatic information processing and retrieval is become an urgent requirement. To train
the system training dataset, taken from CRULP [22] and Marathi corpus [23] are used. For generating
suffix rules two different approaches, namely, frequency based stripping and length based stripping have
been proposed. The evaluation has been made on 1200 words extracted from the Emille corpus. The
experiment results shows that in the case of Urdu language the frequency based suffix generation approach gives the maximum accuracy of 85.36% whereas Length based suffix stripping algorithm gives maximum accuracy of 79.76%. In the case of Marathi language the systems gives 63.5% accuracy in the case of frequency based stripping and achieves maximum accuracy of 82.5% in the case of length based suffix stripping algorithm.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with each other for different purposes using various languages. Machine translation is highly demanding due to increasing the usage of web based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding whether a word is a naming word or not. Here we have introduced a new approach to identify naming word from a Bengali sentence for machine translation system without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion with minimal storing word in dictionary.
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
Speech is the important mode of communication and is the current research
topic. The concentration is mostly focused on synthesis and analyzing part.Apart of
synthesizing, text to speech system is developed.Speech synthesis is an artificial production
of human speech.A text to speech system (TTS) is to convert an arbitrary text into speech.In
India different languages have been spoken each being the mother tongue of tens of millions
of people.In this paper,the text to speech system is primarily developed for Telugu, a
Dravidian language predominantly spoken in Indian state of Andhra Pradesh.The
important qualities expected from this system are naturalness and intelligibility.Telugu TTS
can be developed using other synthesis methods like articulatory synthesis,formant synthesis
and concatenative synthesis.This paper describes a development of a Telugu text to speech
system using concatenative synthesis method on mobile based system OMAP 3530 (ARM
Cortex A-8 core) in Linux.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation result based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation
result based on huge amount of aligned parallel text corpora in both the source and target languages.
Although Bodo is a recognized natural language of India and co-official languages of Assam, still the
machine readable information of Bodo language is very low. Therefore, to expand the computerized
information of the language, English to Bodo SMT system has been developed. But this paper mainly
focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using
Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator
tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora.
Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques
in the SMT system.
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
ABSTRACT
Natural Language processing is an interdisciplinary branch of linguistic and computer science studied under the Artificial Intelligence (AI) that gave birth to an allied area called
‘Computational Linguistic’ which focuses on processing of natural languages on computational devices. A natural language consists of a large number of sentences which are linguistic units involving one or more words linked together in accordance with a set of predefined rules called grammar. Grammar checking is the task of validating sentences syntactically and is a prominent tool within language engineering. Our review draws on the recent development of various grammar checkers to look at past, present and the future in a new light. Our review covers grammar checkers of many languages with the aim of seeking their approaches, methodologies for developing new tool and system as a whole. The survey concludes with the discussion of various features included in existing grammar checkers of foreign languages as well as a few Indian Languages.
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...ijnlc
Building
dialogues systems
interaction
has recently gained considerable
attention, but most of the
resourc
es and systems built so far are
tailored to
English and other Indo
-
European languages. The need
for designing
systems for
other languages is increasing such as Arabic language.
For this reasons, there
are more int
erest for Arabic dialogue acts classification
task because it
a key player in Arabic language
under
standing
to
bu
ilding this systems
.
This paper surveys
different techniques
for dialogue acts classification
for Arabic.
W
e describe the
main existing techniques for utterances segmentations and
classification, annotation schemas, and
test corpora for Arabic
dialogues understanding
that have introduced
in the literature
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
Stemming is the process of term conflation. It conflates all the word variants to a common form called as stem. It plays significant role in numerous Natural Language Processing (NLP) applications like morphological analysis, parsing, document summarization, text classification, part-of-speech tagging, question-answering system, machine translation, word sense disambiguation, information retrieval (IR), etc. Each of these tasks requires some pre-processing to be done. Stemming is one of the important building blocks for all these applications. This paper, presents an overview of various stemming techniques, evaluation criteria for stemmers and various existing stemmers for Indic languages.
Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Investigations of the Distributions of Phonemic Durations in Hindi and Dogrikevig
Speech generation is one of the most important areas of research in speech signal processing which is now gaining a serious attention. Speech is a natural form of communication in all living things. Computers with the ability to understand speech and speak with a human like voice are expected to contribute to the development of more natural man-machine interface. However, in order to give those functions that are even closer to those of human beings, we must learn more about the mechanisms by which speech is produced and perceived, and develop speech information processing technologies that can generate a more natural sounding systems. The so described field of stud, also called speech synthesis and more prominently acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to understand the basic theory of speech production. Every language has different phonetic alphabets and a different set of possible phonemes and their combinations. For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been calculated. Investigations have shown that the two durational distributions differ significantly with respect to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There are many applications which are directly of indirectly related to the research being carried out. For instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for building large vocabulary speech recognition systems.
Investigations of the Distributions of Phonemic Durations in Hindi and Dogrikevig
Speech generation is one of the most important areas of research in speech signal processing which is now gaining a serious attention. Speech is a natural form of communication in all living things. Computers with the ability to understand speech and speak with a human like voice are expected to contribute to the development of more natural man-machine interface. However, in order to give those functions that are even closer to those of human beings, we must learn more about the mechanisms by which speech is produced and perceived, and develop speech information processing technologies that can generate a more natural sounding systems. The so described field of stud, also called speech synthesis and more prominently acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to understand the basic theory of speech production. Every language has different phonetic alphabets and a different set of possible phonemes and their combinations.
For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been calculated. Investigations have shown that the two durational distributions differ significantly with respect to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There are many applications which are directly of indirectly related to the research being carried out. For instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for building large vocabulary speech recognition systems.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEijnlc
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and is monosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language
Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech
(POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
Interface on Usability Testing Indonesia Official Tourism WebsiteWaqas Tariq
Ministry of Tourism and Creative Economy of The Republic of Indonesia must meet the wide audience various needs and should reach people from all levels of society around the world to provide Indonesia tourism and travel information. This article will gives the details in the evolution of one important component of Indonesia Official Tourism Website as it has grown in functionality and usefulness over several years of use by a live, unrestricted community. We chose this website to see the website interface design and usability and to popularize Indonesia tourism and travel highlights. The analysis done by looking at the criteria specified for usability testing. Usability testing measures are the ease of use (effectiveness, efficiency, consistency and interface design), easy to learn, errors and syntax which is related to the human computer interaction. The purpose of this article is to test the usability level of the website, analyze the website interface design, and provide suggestions for improvements in Indonesia Official Tourism Website of analysis we have done before.
Monitoring and Visualisation Approach for Collaboration Production Line Envir...Waqas Tariq
In this paper, a tool, called SPMonitor, to monitor and visualize of run-time execution productive processes is proposed. SPMonitor enables dynamically visualizing and monitoring workflows running in a system. It displays versatile information about currently executed workflows providing better understanding about processes and the general functionality of the domain. Moreover, SPMonitor enhances cooperation between different stakeholders by offering extensive communication and problem solving features that allow actors concerned to react more efficiently to different anomalies that may occur during a workflow execution. The ideas discussed are validated through the study of real case related to airbus assembly lines.
Hand Segmentation Techniques to Hand Gesture Recognition for Natural Human Co...Waqas Tariq
This work is the part of vision based hand gesture recognition system for Natural Human Computer Interface. Hand tracking and segmentation are the primary steps for any hand gesture recognition system. The aim of this paper is to develop robust and efficient hand segmentation algorithm where three algorithms for hand segmentation using different color spaces with required morphological processing have were utilized. Hand tracking and segmentation algorithm (HTS) is found to be most efficient to handle the challenges of vision based system such as skin color detection, complex background removal and variable lighting condition. Noise may contain, sometime, in the segmented image due to dynamic background. An edge traversal algorithm was developed and applied on the segmented hand contour for removal of unwanted background noise.
Vision Based Gesture Recognition Using Neural Networks Approaches: A ReviewWaqas Tariq
The aim of gesture recognition researches is to create system that easily identifies gestures, and use them for device control, or convey in formations. In this paper we are discussing researches done in the area of hand gesture recognition based on Artificial Neural Networks approaches. Several hand gesture recognition researches that use Neural Networks are discussed in this paper, comparisons between these methods were presented, advantages and drawbacks of the discussed methods also included, and implementation tools for each method were presented as well.
Evaluation of Web Search Engines Based on Ranking of Results and FeaturesWaqas Tariq
Search engines help the user to surf the web. Due to the vast number of web pages it is highly impossible for the user to retrieve the appropriate web page he needs. Thus, Web search ranking algorithms play an important role in ranking web pages so that the user could retrieve the page which is most relevant to the user's query. This paper presents a study of the applicability of two user-effort-sensitive evaluation measures on five Web search engines (Google, Ask, Yahoo, AOL and Bing). Twenty queries were collected from the list of most hit queries in the last year from various search engines and based upon that search engines are evaluated.
Generating a Domain Specific Inspection Evaluation Method through an Adaptive...Waqas Tariq
The growth of the Internet and related technologies has enabled the development of a new breed of dynamic websites and applications that are growing rapidly in use and that have had a great impact on many businesses. These websites need to be continuously evaluated and monitored to measure their efficiency and effectiveness, to assess user satisfaction, and ultimately to improve their quality. Nearly all the studies have used Heuristic Evaluation (HE) and User Testing (UT) methodologies, which have become the accepted methods for the usability evaluation of User Interface Design (UID); however, the former is general, and unlikely to encompass all usability attributes for all website domains. The latter is expensive, time consuming and misses consistency problems. To address this need, new evaluation method is developed using traditional evaluations (HE and UT) in novel ways.
The lack of a methodological framework that can be used to generate a domain-specific evaluation method, which can then be used to improve the usability assessment process for a product in any chosen domain, represents a missing area in usability testing. This paper proposes an adapting framework and evaluates it by generating an evaluation method for assessing and improving the usability of a product, called Domain Specific Inspection (DSI), and then analysing it empirically by applying it on the educational domain. Our experiments show that the adaptive framework is able to build a formative and summative evaluation method that provides optimal results with regard to the identification of comprehensive usability problem areas and relevant usability evaluation method (UEM) metrics, with minimum input in terms of the cost and time usually spent on employing UEMs.
A Case Study on Non-Visual Pen-Based Interaction with the Numerical DataWaqas Tariq
A widespread tabular format still poses great problems for screen readers because of diversity and complexity of the cells’ content. How to access the numerical data presented in tabular form in a quick and intuitive way in the absence of visual feedback? We have implemented and assessed the algorithm supporting an exploration of the tabular data in the absence of visual feedback. This algorithm helps to solve the most commonly encountered problems: retrieving the position of extreme values and the target value that can also be linked to the specific content of the virtual table. The performance of 11 blindfolded subjects was evaluated when they used the StickGrip kinesthetic display and when they relied on the Wacom pen and auditory signals. The results of the comparative study are reported.
AudiNect: An Aid for the Autonomous Navigation of Visually Impaired People, B...Waqas Tariq
In this paper, the realization of a new kind of autonomous navigation aid is presented. The prototype, called AudiNect, is mainly developed as an aid for visually impaired people, though a larger range of applications is also possible. The AudiNect prototype is based on the Kinect device for Xbox 360. On the basis of the Kinect output data, proper acoustic feedback is generated, so that useful depth information from 3D frontal scene can be easily developed and acquired. To this purpose, a number of basic problems have been analyzed, in relation to visually impaired people orientation and movement, through both actual experimentations and a careful literature research in the field. Quite satisfactory results have been reached and discussed, on the basis of proper tests on blindfolded sighted individuals.
User Centered Design Patterns and Related Issues – A ReviewWaqas Tariq
A design pattern describes possible good solutions to common problems within certain context. This is done by describing the invariant qualities of all those solutions where good patterns improve with time and widespread use. In this research paper some existing user centered design patterns and their issues are discussed. We have studied many user centered design patterns; however most of them do not provide diagrammatic solutions which can be implementable. It is observed that there is a need of a design pattern which can address issues specifically related to Open Source Software (OSS) users.
Identifying the Factors Affecting Users’ Adoption of Social NetworkingWaqas Tariq
Through the rapid expansion of information and communication technologies, social networking sites have received much more attention in the scope of internet communication. Success of a social web primarily depends on users’ satisfaction. In this context, this study aims to identify the influencing factors that affect users’ satisfaction towards social networking site use. A multidimensional model has been proposed based on the Information Quality, System Quality, Environmental and Affective dimensions to assess the effects of key variables – Semantic Intention, Usability, Web-Page Aesthetics, Subjective Norm and Trust- on users’ satisfaction. Facebook was chosen as a focused social networking site, because of its popularity. A comprehensive survey instrument was applied to 203 Facebook users. Also, Structural Equation Modeling, particularly Partial Least Square, was conducted to analyze the proposed research model. As a result, proposed multidimensional research model predicts the factors influencing users’ satisfaction towards social networking site use and relationships among these factors. The findings of this research will be valuable for literature by analyzing the influencing factors that have not been previously researched in the context of social networking satisfaction area.
Usability of User Interface Styles for Learning Graphical Software ApplicationsWaqas Tariq
This paper examines usability of different user interface styles for learning graphical software applications, namely Adobe Flash CS4 and Microsoft Expression blend 4. An empirical study was performed to investigate the usability attributes of effectiveness, efficiency and satisfaction scores for learning the graphical software applications. There were 32 participants recruited whom consist of interface designers and software developers. A set of 7 tasks was designed to compare the different effects of user interface styles including graphical user interface (GUI) and command line interface (CLI). User Performance variables (effectiveness, efficiency, duration, number of errors and number of helps) were measured for tasks performed by all the participants in the test. Satisfaction score was measured using QUIS (Questionnaire for User Interface Satisfaction) tool. The result revealed that the average effectiveness scores are higher than 75% for both software applications. Although Adobe Flash CS4 gained slightly higher on effectiveness, Microsoft Expression Blend 4 obtained better results in terms of efficiency, duration, errors and helps. The user satisfaction rates also showed Microsoft Expression Blend 4 gained higher satisfaction comparing Adobe Flash CS4. Generally, both software applications gained scores above average (>3.5) for majority of the user interface satisfaction attributes of software regardless of users’ background.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
1. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 83
Dynamic Construction of Telugu Speech Corpus for Voice
Enabled Text Editor
Dr. K. V. N. Sunitha k.v.n.sunitha@gmail.com
Principal,
BVRIT Hyderabad College of Engg. for women,
Bachupally, Hyderabad, A.P., India
A. Sharada sharada.nirmal@gmail.com
Assoc.Prof, CSE Dept
G.Narayanamma Inst.Of Technology & Science
Shaikpet, Hyderabad,A.P., India
Abstract
In recent decades speech interactive systems have gained increasing importance. Performance
of an ASR system mainly depends on the availability of large corpus of speech. The conventional
method of building a large vocabulary speech recognizer for any language uses a top-down
approach to speech. This approach requires large speech corpus with sentence or phoneme level
transcription of the speech utterances. The transcriptions must also include different speech order
so that the recognizer can build models for all the sounds present. But, for Telugu language,
because of its complex nature, a very large, well annotated speech database is very difficult to
build. It is very difficult, if not impossible, to cover all the words of any Indian language, where
each word may have thousands and millions of word forms. A significant part of grammar that is
handled by syntax in English (and other similar languages) is handled within morphology in
Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a
single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology.
That is why the speech technology developed for English cannot be applied to Telugu language.
This paper highlights the work carried out in an attempt to build a voice enabled text editor with
capability of automatic term suggestion. Main claim of the paper is the recognition enhancement
process developed by us for suitability of highly inflecting, rich morphological languages. This
method results in increased speech recognition accuracy with very much reduction in corpus size.
It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Keywords: Speech Corpus, Suggestion List, Text Editor, Signal Comparator, Incremental
Growth.
1. INTRODUCTION
A corpus is a large and representative collection of language material stored in a computer
processable form. Corpora provide realistic, interesting and insightful examples of the
language use for theory building and for verifying hypothesis. Insights obtained from analysis
of corpora have led to fresh and better understanding of how language actually works. A lot of
research is carried out throughout India over the decade and most of the documents are being
prepared in local language through software available for the purpose. Though there are many
software’s and keyboards available to produce such documents, the accuracy is not always
acceptable, the chance of getting errors is more.
,
particularly for Indian language most of which
are highly inflecting. Developing a robust voice enabled editor to transcribe continuous speech
signal into a sequence of words is a difficult task, as continuous speech does not have any
natural pauses in between words. It is also difficult to make the system robust for speaker
variability and the environment conditions.
2. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 84
The conventional method of building a large vocabulary speech recognizer for any language uses
a top-down approach to speech. Top-down approach means system first hypothesize the
sentence, then the words that make up the sentence and ultimately the sub-word units that make
up the words. This approach requires large speech corpus with sentence or phoneme level
transcription of the speech utterances. The transcriptions must also include different speech order
so that the recognizer can build models for all the sounds present. It also requires maintaining a
dictionary with the phoneme/sub word unit transcription of the words and language models to
perform large vocabulary continuous speech recognition. The recognizer outputs words that exist
in the dictionary. If the system is to be developed for a new language it requires building of a
dictionary and extensive language models for the new language. In country like India which
includes 22 officials and a number of unofficial languages, building huge text and speech
databases is a difficult task.
The paper is organized as follows: Section 2 explains the state of the art, Section 3 details the
characteristics of Telugu Language, Section 4 describes the proposed model in which ASR is the
major contribution, Section 5 elaborates on Architecture of ASR that is designed to suit Indian
languages and enhances speech recognition accuracy, Section 6 explains Speech correction
procedure that enhances speech recognition accuracy, Section 7 gives Results and Conclusion.
2. LITERATURE SURVEY
There has been a significant progress in advancing Automatic Speech Recognition (ASR)
technologies in the past several decades. However, ASR systems are not widely adopted today.
Among all the issues that prevent users from using ASR systems, recognition accuracy is still the
most important factor [6]. It is well known that an ASR system adapted to a specific user performs
better. It’s not sufficient by its own due to the fact that there are more elements to be adapted
than the AM alone. For example, the default dictionary does not include some of the words
frequently used by a specific user. Examples of these Out Of Vocabulary (OOV) words are
project names, acronyms, and foreign names. Adapting the lexicon would recover on average 1.2
times as many errors as OOV words removed [7].
The understanding of importance of access to a large amount of correctly annotated speech data
is not only widely recognized among the people working in the field of speech recognition, but
became generally recognized in the whole speech researchers’ society. For this reason
nowadays the increasing number of professionals engaged in speech research is involved in the
projects, which aim the creation of large-scale speech databases [31]. While corpus based and
statistical approaches to speech have been well established elsewhere in the world , India is still
lagging far behind. A more recent development was a Hindi recognition system from HP Labs,
India which involved Hindi speech corpus collection and subsequent system building [23] . But
for Telugu language, the work is started recently and a large, well annotated speech
database is almost not available. The reason being the complexity of Telugu language and the
difficulties in developing the speech database. In India, CDAC, LDC-IL(CIIL), IISc, IBM, TIFR, IIT
–Mumbai including few other prestigious organizations are working on speech Technologies and
Speech corpus creation for Indian Languages.
The approach proposed here is different from the research work mentioned in [17], [19], [21], in
that they work at the signal level, where as our approach exploits the language characteristics
and works at Language Model.
3. CHARACTERISTICS OF TELUGU LANGUAGE
Telugu is a Dravidian language spoken in southern Indian states and is the official
language of Andhra Pradesh. It is considered to be the 2nd most widely spoken language in
India after Hindi. There are 75 million first language and 5 million 2nd language speakers of
Telugu [6]. The most important feature of Telugu language is its phonetic nature. There is almost
one to one correspondence between what is written and what is spoken in contrast to English.
3. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 85
For xample, in English we write l-a-u-g-h but pronounce it as l-a-f. That is, we are writing 'g' sound
but saying 'f' sound. This kind of situation does not occur in Telugu language. What we write is
what we speak. The rules required to map the letters to sounds in Telugu are almost straight
forward[8].
A significant part of grammar that is handled by syntax in English (and other similar languages)
is handled within morphology in Telugu (and other Dravidian languages). Phrases including
several words (that is, tokens) in English would be mapped on to a single word in Telugu. Verbs
may include aspectual auxiliaries apart from tense and agreement. There are several types
of non-finite forms too. A single verbal root can lead to formation of a few hundred thousand
word forms. Nouns are also inflected for number and case. Derivation being very productive,
even more forms become possible when we consider full word forms. These words are made
up of several morphemes conjoined through complex morpho-phonemic processes.
In inflectional language every word consists of one or several morphemes into which the word
can be segmented; consider for instance the morpheme segmentations of the following Telugu
words: “Ame(she), Ame+yokka(of her), Ame+tO(with her), AmE+nA(is it she)”. In highly-inflecting
and compounding languages the number of possible word forms is very high. This poses special
challenges to NLP systems dealing with these languages. For example, in automatic speech
recognition it is customary to use pre-made lists of attested word forms as a “normative”
vocabulary.
The incoming acoustic signal is matched against the list, and only words contained in the corpus
can be recognized. Such a word list can be created by collecting word forms from large text
corpora or existing lexicons, and the aim is to obtain as much coverage as possible of the words
of the language. When processing languages with extremely rich word forming like Telugu, the
resulting word lists are typically very large, this is demanding, from a computational point of view.
A more serious problem is that many perfectly valid word forms are likely to be missing from the
list anyway, since they might never have occurred in the corpus used as a source. For words
which are out of the dictionary the recognition accuracy is quite low and gets matched to the
nearest possible word in the dictionary.
4. PROPOSED MODEL
One dimension of variation in speech recognition tasks is the vocabulary size. Speech recognition
is easier if the number of distinct words we need to recognize is smaller. So tasks with a two word
vocabulary, like yes versus no detection, or an eleven word vocabulary, like recognizing
sequences of digits, in what is called the digits task, are relatively easy. On the other end, tasks
with large vocabularies, like transcribing human-human telephone conversations, or transcribing
broadcast news, tasks with vocabularies of 64,000 words or more, are much harder.
We propose a method where we use an efficient ASR technique to recognize the word. The given
word is compared with the words in the database. If an exact match is found that is displayed.
Otherwise, Speech correction procedure designed by us will be applied and a list of suggested
words is given. If the user accepts any of those suggested words then it is displayed in the editor.
Otherwise the word is added to the database by taking 3 samples from the user.
4. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 86
FIGURE 1: Proposed Model
Steps for Corpus Construction
The proposed approach is a three-step process:
recognition of uttered word
generating list of probable words in case of the word does not match with the existing
words of database
updating the database with newly added words. The system will check whether the error
is likely caused by OOV words. If yes, the new words are added into the lexicon.
4.1 Training
In this module, the system is designed to take three speech samples of a word from the user and
thus the system is trained so that it recognizes the word when uttered for the next time. This
phase involves extracting MFCC features of these sample speech and an average model for
these samples is built. Thus the system is trained for each new word that to be added.
Steps:
Collecting speech samples
Feature Extraction
Finding templates of each sample
Finding templates across all the samples
The training phase is implemented by using two algorithms. They are as follows:
Segmental K-Means algorithm
Baum-Welch Algorithm
4.2 Automatic Speech Recognition
This module is the major claim of our paper. We designed an efficient ASr that suits highly
inflecting languages. Architecture of our ASR is explained in detail in Section 5.
4.3 Suggestion List
In case, the user spoken word is not recognized, then Speech correction Procedure developed by
us will be applied. Based on the probabilities, the suggested word list is generated, from which
user can select a word. This can be done in two ways. The first approach is, distance
5. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 87
computation of signals of corpus words with the user input word. And the second approach is
word co-occurrence probabilities are computed and stored beforehand. Later approach requires
deep insight into the language under consideration whereas first approach is independent of
language.
4.4 Adaptation
If the user input is not matched with any of the words present in database, then a list of possible
words are suggested that closely matches with the user input. If the user is does not accept any
words from the suggested list, then the uttered word should be added to the database. For this
user is asked to repeat the utter the same word for three times. Features are extracted and model
is built for that word. Then it is added to the database.
4.5 Database
As the proposed system is aimed at dynamic recognition and insertion of speech input, we use
Trie data structure for efficient and faster searching. Trie is a data structure that can be used to
do a fast search in a large text and that stores the information about the contents of each node in
the path from the root to the node, rather than the node itself. A Trie node, named for its
successful use in information retrieval, is an array of pointers, one for each character in an
alphabet. Each leaf node is the terminus of a chain of trie nodes representing a string. For string
management, tries are fast with reasonable worst-case performance. A Trie (short for reTrieval) is
a multiway tree that is used for efficient searching. The main objective of this work is to reduce
the search space and minimize the data base and thereby improve the performance using Trie
based data structure.
Our database structure is explained in detail in section 5.1.
5. ARCITECTURE OF ASR IN THE PROPOSED MODEL
Factored language models have recently been proposed for incorporating morphological
knowledge in the modeling of inflecting language. As suffix and compound words are the cause of
the growth of the vocabulary in many languages, a logical idea is to split the words into shorter
units.The approach used here aims at reducing the above mentioned problem of having a very
huge corpus for good recognition accuracy. It exploits the characteristic of Telugu language that
every word consists of one or several morphemes into which the word can be segmented. The
base ASR identifies the nearest root word, over which Speech correction procedure is
applied.The proposed model consists of the Stem Recognizer, Segmenter, Database and Signal
comparator modules.
6. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 88
FIGURE 2: ASR Architecture
5.1 Database
Database contains stems and inflections separately. It does not store inflected words as it is very
difficult, if not impossible, to cover all inflected words of the language. The database consists of 2
dictionaries:
a) Stem Dictionary
b) Inflection Dictionary
Stem dictionary contains the stem words of the language, signal information for that stem which
includes the duration and location of that utterance and list of indices of inflection dictionary which
are possible with that stem word.
Inflection Dictionary contains the inflections of the language, signal information for that inflection
which includes the duration and location of that utterance.
Both the dictionaries are implemented using trie structure in order to reduce the search space.
Details of this implementation can be seen in [3].
Word Stem signal
information
Inflection
Indices
amma (అమ్మ) D:works1.wav 1, 2, 4,
6,7,8
anubhUti(అనుభూతి) D:works2.wav 1,7
ataDu (అతడు) …. ….
TABLE 1: Stem Dictionary
7. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 89
Sl.No Inflection Inflection
Signal
Information
1 ki (కి) D:workinf1.wav
2 tO (తో) D:workinf2.wav
3 guriMci
(గురించి)
4 …. ....
TABLE 2: Inflection Dictionary
As the vocabulary of Telugu language is infinite, it aims at collecting the most frequently used
words and stores to database in the form of audio files.
5.2 Stem Recognizer
This module identifies the nearest stem of the given speech input. This is a basic ASR system
whose accuracy may not be very good. Subsequent modules will help enhance this accuracy.
5.3 Segmenter
Segmenter takes the identified stem word as input. Accesses the database for signal information
and segments the given signal into stem and inflection. If the utterance is a stem word then the
second part will be empty.
5.4 Signal Comparator
This module compares the inflection part of the signal with the possible inflections list from the
database and gives correct inflection. This will be given to Morph Analyzer to apply morpho
syntactic rules of the language and gives the correct inflected word.
8. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 90
6. SPEECH CORRECTION PROCEDURE
Speech correction procedure is explained in Fig 3 shown below.
FIGURE 3: Speech Correction Procedure
The steps of the procedure are listed as follows:
Step 1: Capture the utterance
Step 2: Get the nearest stem word
Step 3: Segment the signal into stem and inflection
Step 4: Get its syllabified form
Step 5: Get the inflection information
Step 6: Compare the inflection signals possible with that stem one by one and store the signal
differences into an array
Step 7: Apply morpho syntactic rules of the language to combine stem and inflection
Step 8: Display the inflected word
Using the rules the possible set of root words are combined with possible set of inflections and
the obtained results are compared with the given user input and the nearest possible root word
and inflection are displayed if the given input is correct.
If the given input is not correct then the inflection part of the given input word is compared with
the inflections of that particular root word and identifies the nearest possible inflection and
combines the root word with those identified inflections, applies sandhi rules and displays the
output.
9. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 91
The user input is syllabified and this would be the input to the analyzer module. Matching the
syllabified input from starting with the root words stored in dictionary module a possible set of
root words is obtained.
When there is more than one root word or more than one inflection has minimum edit distance
then the model will display all the possible options. User can choose the correct one from that.
E.g., when the given word is pustakaMdO (పుస్తకందో), the inflections tO making it pustakaMtO
(పుస్తకంతో) meaning ‘with the book’ and lO making it pustakaMlO (పుస్తకంలో) meaning ‘in the
book’) mis are possible. Present work will list both the words and user is given the option. We are
working on improving this by selecting the appropriate word based on the context.spelled
The recognized stem is syllabified and this would be the input to the analyzer module. Matching
the syllabified input from starting with the root words stored in dictionary module a possible set of
root words is obtained.
For eg: nAnnagAriki user input
(Identified stem after syllabification)
nAn-na
Once possible root words identified the inflection part is compared in the reverse direction for a
match in the inflection dictionary. It will consider only the inflections that are mentioned against
the possible root words, thus reducing the search space and making the algorithm faster.
gA-ri-ki (matching from reverse)
After getting the possible set of root words and possible set of inflections they are combined with
the help of SaMdhi formation rules.
Here in this example gA-ri-ki is compared with the inflections of the root word nAnna. After
comparing it identifies gAriki as the nearest possible inflection and combines the root word with
the inflection and displays the output as “nAnnagAriki”.
7. SAMPLE RESULTS & CONCLUSION
The approach proposed here results in increased speech recognition accuracy with very much
reduction in corpus size. This approach is different from the research work mentioned in [17],
[19], [21], in that they work at the signal level, where as our approach exploits the language
characteristics and works at Language Model. It also adapts Telugu words to the database
dynamically, resulting in growth of the corpus.
10. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 92
It eliminates the need for a huge corpus, the main hurdle of ASR technology development for
Indian languages. This method recognizes all inflected word forms just by using k+j words in the
database which otherwise requires k*j, where ‘k’ is number of stem words and ‘j’ is number of
inflections possible in the language. The model proposed is very useful for enhancing speech
recognition accuracy.
When there is more than one root word or more than one inflection has minimum edit distance
then the model will display all the possible options. User can choose the correct one from that.
We are working on improving this by selecting the appropriate word based on the context. This
approach not only identifies all inflected word forms with a small database but also corrects the
wrongly uttered/recognized word to its correct-form. Sample result can be seen in Table 3.
This work can contribute to the future employment of speech technologies in a variety of possible
applications like keyword based search for given acoustic signal which at present is done by
giving text as input, and generating documents through speech rather than through keyboard. We
hope this work helps in accelerating research work in Telugu Speech Technoloy, by eliminating
the need of laborious task of constructing huge speech corpus.
11. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 93
Input word Nearest Root
word
Nearest
Inflection
Is Correct Suggestion List Select/Add
pustakAlu pustakaM lu yes --- ---
pustakAdu pustakaM du no pustakAlu Select
pustakaMdO pustakaM lO
tO
no pustakaMlO
pustakaMtO
Select
putakaM patakaM (పతకం)
pustakaM
--- no patakaM
pustakaM
Select
puste pustakaM --- pustakaM Add ‘puste’ to
database
pusteki puste ki yes --- ---
pusta puste --- no puste Select
Table 3: Sample Result.
8. REFERENCES
1. “LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow”, Vol 6, August
2006.
2. “Dravidian Phonological Systems”, University of Washington Press, 1975.
3. Oppenheim, A. V., & Schafer, R.W. (1975). “Digital signal processing”, Englewood Cliffs:
Prentice-Hall.
4. “Conversational Telugu”, N.D.K.Institute of Languages, 1992.
5. ”Issues in Indian languages computing in particular reference to search and retrieval in
Telugu Language”, Emerald Group Publishing Ltd.
6. Jinxi Xu and W. Bruce Croft.”Corpus based stemming using co-occurrence of word variants”.
ACM Trans.Inf.Syst., 16(1):61-81,1998.
7. J.L.Dawson. ”Suffix removal for word conflation”. In Bulletin of the Association for Literary and
Linguistic Computing, volume 2(3), pp. 33-46, Michaelmas,1974.
8. K.V.N.Sunitha, A.Sharada, “Telugu Text Corpora Analysis for Creating Speech Database”,
IJEIT,ISSN 0975-5292, Dec 2009, Volume 1, No.2.
9. L. Deng, and X. Huang, “Challenges in Adopting Speech Recognition”, Communications of
ACM, vol. 47, No. 1,pp69-75, Jan 2004.
10. CampbellWN, Isard S D “Segment durations in a syllable frame”, J. Phonetics: Special issue
on speechsynthesis 1991 19: 37–47.
12. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 94
11. Young, S., and G. Bloothooft. eds. 1997.Corpus-Based Methods in Language and Speech
Precessing.Vol-II.Dordrecht: Kluwer Academic Publishers.
12. D. J. Ravi and Sudarshan Patilkulkarni, “A NovelApproach to Develop Speech Database for
Kannada Text-to-Speech System”, Int. J. on Recent Trends in Engineering & Technology,
2011, Vol. 05, No. 01, in ACEEE.
13. Atkins, S., J. Clear and N. Ostler. 1992. "Corpus Design Criteria." Literary and
Linguistic Computing . 7(1): 1-16.
14. Barlow, M. 1996. "Corpora for Theory and Practice." International Journal of Corpus
inguistics, 1(1): 1-38.
15. M.A. Anusuya · S.K. Katti, “Front end analysis of speech recognition: a review”, Int J Speech
Technol (2011) 14: 99–145.
16. K Subramanyam, D.Arun Kumar,”Static Dictionary for Pronunciation Modeling”, IJRET,Oct
2012, Volume: 1 Issue: 2 , pp.185 – 189.
17. N. Usha Rani and P.N. Girija, “Analyzing and Correction of Errors to Improve the Speech
Recognition Accuracy for Telugu Language”, CiiT International Journal of Artificial Intelligent
Systems and Machine Learning , Issue : June 2011.
18. Pukhraj P Shrishrimal, Ratnadeep R Deshmukh and Vishal B Waghmare. “Article: Indian
Language Speech Database: A Review”, International Journal of Computer
Applications 47(5):17-21, June 2012. Published by Foundation of Computer Science, New
York, USA.
19. S. Lokesh, G. Balakrishnan, “Speech Enhancement using Mel-LPC Cepstrum and Vector
Quantization for ASR”, European Journal of Scientific Research ISSN 1450-216X Vol.73
No.2 (2012), pp. 202-209.
20. Prateek Srivastava, Reena Panda&SankarsanRauta, “A Novel, Robust, Hierarchical, Text-
Independent Speaker Recognition Technique”, Signal Processing: An International Journal
(SPIJ), Volume (6) : Issue (4) : 2012.
21. Urmila Shrawankar & Vilas Thakare, “Parameters Optimization for Improving ASR
Performance in Adverse Real World Noisy Environmental Conditions”, International Journal
of Human Computer Interaction (IJHCI), Volume (3) : Issue (3) : 2012 58.
22. M.S. Salam, Dzulkifli Mohamad and S.H. Salleh, “Improved Statistical Speech Segmentation
Using Connectionist Approach”, Journal of Computer Science 5 (4): 275-282, 2009 ISSN
1549-3636
23. Gopalakrishna~Anumanchipalli et. al., “Development of indian language speech databases
for large vocabulary speech recognition systems”, Proceedings of International Conference
on Speech and Computer(SPECOM), Patras,Greece, Oct 2005.
24. R.Krovetz. “Viewing morphology as an inference process”. In Proceedings of Sixteenth
Annual International ACM SIGIR Conference on Research and Development in Information
Retrieval, pages 191-203, 1993.
25. Paice C and Husk G. “Another Stemmer”. In ACM SIGIR Forum 24(3):566,1990.
13. Dr. K. V. N. Sunitha & A. Sharada
International Journal of Human Computer Interaction (IJHCI), Volume (3) : Issue (4): 2012 95
26. L. Lamel and G. Adda, "On Designing Pronunciation Lexicons for Large
Vocabulary,Continuous Speech Recognition", Proc. International Conference on
SpokenLanguage Processing (ICSLP'96), pp6-9, 1996.
27. Khan A N, Gangashetty S V, Yegnanarayana B “Syllabic properties of three Indian
languages: Implications for speech recognition and language identification”, In Int. Conf.
Natural Language Processing, Mysore, India, pp. 125–134, 2003.
28. Agrawal S. S., “Recent Developments in Speech Corpora in Indian Languages: Country
Report of India”, O-COCOSDA 2010, Kathmandu, Nepal.
29. Chalapathy Neti, Nitendra Rajput, Ashish Verma, “A Large Vocabulary Continuous Speech
Recognition system for Hindi”, In Proceedings of the National conference on
Communications, 2002 , Mumbai, pp. 366-370.
30. M.F.Porter. “An algorithm for suffix stripping”.In readings in information retrieval, pages 313-
316, San Francisco,CA,USA,1997.Morgan Kaufmann Publishers Inc.
31. Akshar Bharathi, Prakash~Rao K, Rajeev Sangal, and S.M.Bendre, “Basic statistical analysis
of corpus and cross coparision among corpora”, Technical Report~4, IIIT, Hyderabad,
www.iiit.net/techreports/2002.4.pdf, 2002
32. vishwabharat@tdil, MIT Govt. of India Magazine
33. Size of Speech Corpora ( As on Dec 2011) , Available at: http://www. ldcil.
org/resourcesSpeechCorp. Aspx.
34. K.Nagamma Reddy, “Phonetic, Phonological, morpho-syntactic and semantic functions of
segmental duration in spoken Telugu:acoustic evidence”.
35. Bansal, R.K. 1969.The intelligibility of Indian English. Monograph No. 4, CIEFL, Hyderabad.
36. B.Ramakrishna Reddy, “Localist studies in Telugu Syntax”, Ph.D Thesis, University of
Edinburgh, 1976.