IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
An efficient approach to query reformulation in web searcheSAT Journals
Abstract Wide range of problems regarding to natural language processing, mining of data, bioinformatics and information retrieval can be categorized as string transformation, the following task refers the same. If we give an input string, the system will generates the top k most equivalent output strings which are related to the same input string. In this paper we proposes a narrative and probabilistic method for the transformation of string, which is considered as accurate and also efficient. The approach uses a log linear model, along with the method used for training the model, and also an algorithm that generates the top k outcomes. Log linear method can be defined as restrictive possibility distribution of a result string and the set of rules for the alteration conditioned on key string. It is guaranteed that the resultant top k list will be generated using the algorithm for string generation which is based on pruning. The projected technique is applied to correct the spelling error in query as well as reformulation of queries in case of web based search. Spelling error correction, query reformulation for the related query is not considered in the previous work. Efficiency is not considered as an important issue taken into the consideration in earlier methods and was not focused on improvement of accuracy and efficiency in string transformation. The experimental outcomes on huge scale data show that the projected method is extremely accurate and also efficient. Keywords: Log linear method, Query reformulation, Spelling Error correction.
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
An efficient approach to query reformulation in web searcheSAT Journals
Abstract Wide range of problems regarding to natural language processing, mining of data, bioinformatics and information retrieval can be categorized as string transformation, the following task refers the same. If we give an input string, the system will generates the top k most equivalent output strings which are related to the same input string. In this paper we proposes a narrative and probabilistic method for the transformation of string, which is considered as accurate and also efficient. The approach uses a log linear model, along with the method used for training the model, and also an algorithm that generates the top k outcomes. Log linear method can be defined as restrictive possibility distribution of a result string and the set of rules for the alteration conditioned on key string. It is guaranteed that the resultant top k list will be generated using the algorithm for string generation which is based on pruning. The projected technique is applied to correct the spelling error in query as well as reformulation of queries in case of web based search. Spelling error correction, query reformulation for the related query is not considered in the previous work. Efficiency is not considered as an important issue taken into the consideration in earlier methods and was not focused on improvement of accuracy and efficiency in string transformation. The experimental outcomes on huge scale data show that the projected method is extremely accurate and also efficient. Keywords: Log linear method, Query reformulation, Spelling Error correction.
Performance analysis on secured data method in natural language steganographyjournalBEEI
The rapid amount of exchange information that causes the expansion of the internet during the last decade has motivated that a research in this field. Recently, steganography approaches have received an unexpected attention. Hence, the aim of this paper is to review different performance metric; covering the decoding, decrypting and extracting performance metric. The process of data decoding interprets the received hidden message into a code word. As such, data encryption is the best way to provide a secure communication. Decrypting take an encrypted text and converting it back into an original text. Data extracting is a process which is the reverse of the data embedding process. The effectiveness evaluation is mainly determined by the performance metric aspect. The intention of researchers is to improve performance metric characteristics. The evaluation success is mainly determined by the performance analysis aspect. The objective of this paper is to present a review on the study of steganography in natural language based on the criteria of the performance analysis. The findings review will clarify the preferred performance metric aspects used. This review is hoped to help future research in evaluating the performance analysis of natural language in general and the proposed secured data revealed on natural language steganography in specific.
An optimized approach to voice translation on mobile phoneseSAT Journals
Abstract Current voice translation tools and services use natural language understanding and natural language processing to convert words. However, these parsing methods concentrate more on capturing keywords and translating them, completely neglecting the considerable amount of processing time involved. In this paper, we are suggesting techniques that can optimize the processing time thereby increasing the throughput of voice translation services. Techniques like template matching, indexing frequently used words using probability search and session-based cache can considerably enhance processing times. More so, these factors become all the more important when we need to achieve real-time translation on mobile phones. Keywords:-Optimized voice translation, mobile client, speech recognition, language interpretation, template matching, probability search, session- based cache.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
Significance of Speech Intelligibility Assessors in Medium Classroom Using An...TELKOMNIKA JOURNAL
When there are constraints on the resources-equipment, manpower and time-to conduct speech
intelligibility tests, the most reliable or significant SI assessor for many different types of rooms is always
sought for. The purpose of this study was to determine the most significant speech intelligibility assessor in
four medium classrooms. The speech intelligibility assessors tested were RT60, C50, D50, and STIPA.
The data were acquired by means of sound recorder that recorded six Malay words spoken by a trained
male speaker, in four medium classrooms.The recorded speech signals were analyzed by DIRAC
software. The data of four speech intelligibility assessors have to be normalized before it can be analyzed
by AHP. In conclusion, C50 has shown the most consistent prediction of speech intelligibility in all sampled
classrooms. On the other hand, as the room gets larger, RT60 becomes significant for determining
speech intelligibility in these sampled classrooms.
Speech to text conversion for visually impaired person using µ law compandingiosrjce
The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An automatic text summarization using lexical cohesion and correlation of sen...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELijaia
Recent development of generative pretrained language models has been proven very successful on a wide
range of NLP tasks, such as text classification, question answering, textual entailment and so on. In this
work, we present a two-phase encoder decoder architecture based on Bidirectional Encoding
Representation from Transformers(BERT) for extractive summarization task. We evaluated our model by
both automatic metrics and human annotators, and demonstrated that the architecture achieves the stateof-the-art comparable result on large scale corpus – ‘CNN/Daily Mail1
As the best of our knowledge’, this
is the first work that applies BERT based architecture to a text summarization task and achieved the stateof-the-art comparable result.
A survey on sentence fusion techniques of abstractive text summarizationIJERA Editor
Sentence fusion is one of the tasks of automatic text summarization that has wide spread applications in the field of computer science. It is currently used in automatic text summarization and question answering systems. It is also used in natural language generation systems in the name of sentence aggregation. The task of sentence fusion poses various challenges like deciding whether two sentences can be combined or not, which part of a sentence should be selected for combination, how these parts can be combined and how to fit the combinations in a grammatically correct sentence structure. In this paper we discuss about the research done on sentence fusion and various challenges that are yet to be met.
This paper presents a natural language processing based automated system called DrawPlus for generating UML diagrams, user scenarios and test cases after analyzing the given business requirement specification which is written in natural language. The DrawPlus is presented for analyzing the natural languages and extracting the relative and required information from the given business requirement Specification by the user. Basically user writes the requirements specifications in simple English and the designed system has conspicuous ability to analyze the given requirement specification by using some of the core natural language processing techniques with our own well defined algorithms. After compound analysis and extraction of associated information, the DrawPlus system draws use case diagram, User scenarios and system level high level test case description. The DrawPlus provides the more convenient and reliable way of generating use case, user scenarios and test cases in a way reducing the time and cost of software development process while accelerating the 70 of works in Software design and Testing phase Janani Tharmaseelan ""Cohesive Software Design"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22900.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/22900/cohesive-software-design/janani-tharmaseelan
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemIJERA Editor
The objective of this paper is to evaluate the quality of HMM based Marathi TTS system. The main advantage of HMM technique is its ability to allow the variation in voice easily. The output speeches produced in this method have greater impact on emotion, style and intonation. The naturalness and intelligibility are the two important parameters to decide the quality of synthetic speech. Depending on the parameters specified the results of synthetic speech are categorized into 4 categories: natural speech, high quality synthetic speech, low quality synthetic speech and moderate quality synthetic speech. The results are obtained by using CT, DRT and MOS test.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Abstract Language Identification has an important role in Natural Language processing applications as one of the pre-processing steps. There are various mechanisms in use today to achieve this task with brilliant recognition rates. Recent years have seen rapid growth in international communication which has lead to the requirement of systems capable of correctly identifying languages of documents. Possible applications of language identification include information retrieval, web crawlers, text mining and email filtering. The paper uses a process called G-LDA [1], which takes concepts from Latent Dirichlet Allocation (LDA) and Genetic Evolution techniques. This involves framing a set of words having a high frequency of occurrence in any given document. The method was tested on Leipzig Corpora. The phrases that were evolved through the generations reflected significant improvement. Keywords: Language Identification, Latent Dirichlet Allocation, Gibbs Sampling, Genetic Algorithm, Topic Modeling, Breeding, Fitness, Roulette Wheel.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Performance analysis on secured data method in natural language steganographyjournalBEEI
The rapid amount of exchange information that causes the expansion of the internet during the last decade has motivated that a research in this field. Recently, steganography approaches have received an unexpected attention. Hence, the aim of this paper is to review different performance metric; covering the decoding, decrypting and extracting performance metric. The process of data decoding interprets the received hidden message into a code word. As such, data encryption is the best way to provide a secure communication. Decrypting take an encrypted text and converting it back into an original text. Data extracting is a process which is the reverse of the data embedding process. The effectiveness evaluation is mainly determined by the performance metric aspect. The intention of researchers is to improve performance metric characteristics. The evaluation success is mainly determined by the performance analysis aspect. The objective of this paper is to present a review on the study of steganography in natural language based on the criteria of the performance analysis. The findings review will clarify the preferred performance metric aspects used. This review is hoped to help future research in evaluating the performance analysis of natural language in general and the proposed secured data revealed on natural language steganography in specific.
An optimized approach to voice translation on mobile phoneseSAT Journals
Abstract Current voice translation tools and services use natural language understanding and natural language processing to convert words. However, these parsing methods concentrate more on capturing keywords and translating them, completely neglecting the considerable amount of processing time involved. In this paper, we are suggesting techniques that can optimize the processing time thereby increasing the throughput of voice translation services. Techniques like template matching, indexing frequently used words using probability search and session-based cache can considerably enhance processing times. More so, these factors become all the more important when we need to achieve real-time translation on mobile phones. Keywords:-Optimized voice translation, mobile client, speech recognition, language interpretation, template matching, probability search, session- based cache.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
Significance of Speech Intelligibility Assessors in Medium Classroom Using An...TELKOMNIKA JOURNAL
When there are constraints on the resources-equipment, manpower and time-to conduct speech
intelligibility tests, the most reliable or significant SI assessor for many different types of rooms is always
sought for. The purpose of this study was to determine the most significant speech intelligibility assessor in
four medium classrooms. The speech intelligibility assessors tested were RT60, C50, D50, and STIPA.
The data were acquired by means of sound recorder that recorded six Malay words spoken by a trained
male speaker, in four medium classrooms.The recorded speech signals were analyzed by DIRAC
software. The data of four speech intelligibility assessors have to be normalized before it can be analyzed
by AHP. In conclusion, C50 has shown the most consistent prediction of speech intelligibility in all sampled
classrooms. On the other hand, as the room gets larger, RT60 becomes significant for determining
speech intelligibility in these sampled classrooms.
Speech to text conversion for visually impaired person using µ law compandingiosrjce
The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An automatic text summarization using lexical cohesion and correlation of sen...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELijaia
Recent development of generative pretrained language models has been proven very successful on a wide
range of NLP tasks, such as text classification, question answering, textual entailment and so on. In this
work, we present a two-phase encoder decoder architecture based on Bidirectional Encoding
Representation from Transformers(BERT) for extractive summarization task. We evaluated our model by
both automatic metrics and human annotators, and demonstrated that the architecture achieves the stateof-the-art comparable result on large scale corpus – ‘CNN/Daily Mail1
As the best of our knowledge’, this
is the first work that applies BERT based architecture to a text summarization task and achieved the stateof-the-art comparable result.
A survey on sentence fusion techniques of abstractive text summarizationIJERA Editor
Sentence fusion is one of the tasks of automatic text summarization that has wide spread applications in the field of computer science. It is currently used in automatic text summarization and question answering systems. It is also used in natural language generation systems in the name of sentence aggregation. The task of sentence fusion poses various challenges like deciding whether two sentences can be combined or not, which part of a sentence should be selected for combination, how these parts can be combined and how to fit the combinations in a grammatically correct sentence structure. In this paper we discuss about the research done on sentence fusion and various challenges that are yet to be met.
This paper presents a natural language processing based automated system called DrawPlus for generating UML diagrams, user scenarios and test cases after analyzing the given business requirement specification which is written in natural language. The DrawPlus is presented for analyzing the natural languages and extracting the relative and required information from the given business requirement Specification by the user. Basically user writes the requirements specifications in simple English and the designed system has conspicuous ability to analyze the given requirement specification by using some of the core natural language processing techniques with our own well defined algorithms. After compound analysis and extraction of associated information, the DrawPlus system draws use case diagram, User scenarios and system level high level test case description. The DrawPlus provides the more convenient and reliable way of generating use case, user scenarios and test cases in a way reducing the time and cost of software development process while accelerating the 70 of works in Software design and Testing phase Janani Tharmaseelan ""Cohesive Software Design"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22900.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/22900/cohesive-software-design/janani-tharmaseelan
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemIJERA Editor
The objective of this paper is to evaluate the quality of HMM based Marathi TTS system. The main advantage of HMM technique is its ability to allow the variation in voice easily. The output speeches produced in this method have greater impact on emotion, style and intonation. The naturalness and intelligibility are the two important parameters to decide the quality of synthetic speech. Depending on the parameters specified the results of synthetic speech are categorized into 4 categories: natural speech, high quality synthetic speech, low quality synthetic speech and moderate quality synthetic speech. The results are obtained by using CT, DRT and MOS test.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Abstract Language Identification has an important role in Natural Language processing applications as one of the pre-processing steps. There are various mechanisms in use today to achieve this task with brilliant recognition rates. Recent years have seen rapid growth in international communication which has lead to the requirement of systems capable of correctly identifying languages of documents. Possible applications of language identification include information retrieval, web crawlers, text mining and email filtering. The paper uses a process called G-LDA [1], which takes concepts from Latent Dirichlet Allocation (LDA) and Genetic Evolution techniques. This involves framing a set of words having a high frequency of occurrence in any given document. The method was tested on Leipzig Corpora. The phrases that were evolved through the generations reflected significant improvement. Keywords: Language Identification, Latent Dirichlet Allocation, Gibbs Sampling, Genetic Algorithm, Topic Modeling, Breeding, Fitness, Roulette Wheel.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Stress analysis of stick reinforced granite periwinkle concrete slab under un...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Temperature analysis of lna with improved linearity for rf receivereSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A lexicon based algorithm for noisy text normalization as pre processing for ...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Isolated word recognition using lpc & vector quantizationeSAT Journals
Abstract Speech recognition is always looked upon as a fascinating field in human computer interaction. It is one of the fundamental steps towards understanding human recognition and their behavior. This paper explicates the theory and implementation of Speech recognition. This is a speaker-dependent real time isolated word recognizer. The major logic used was to first obtain the feature vectors using LPC which was followed by vector quantization. The quantized vectors were then recognized by measuring the Minimum average distortion. All Speech Recognition systems contain Two Main Phases, namely Training Phase and Testing Phase. In the Training Phase, the Features of the words are extracted and during the recognition phase feature matching Takes place. The feature or the template thus extracted is stored in the data base, during the recognition phase the extracted features are compared with the template in the database. The features of the words are extracted by using LPC analysis. Vector Quantization is used for generating the code books. Finally the recognition decision is made based on the matching score. MATLAB will be used to implement this concept to achieve further understanding. Index Terms: Speech Recognition, LPC, Vector Quantization, and Code Book.
A novel approach for text extraction using effective pattern matching techniqueeSAT Journals
Abstract
There are many data mining techniques have been proposed for mining useful patterns from documents. Still, how to effectively use and update discovered patterns is open for future research , especially in the field of text mining. As most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy(words have multiple meanings) and synonymy(multiple words have same meaning). People have held hypothesis that pattern-based approaches should perform better than the term-based, but many experiments does not support this hypothesis. This paper presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern matching, to improve the effective use of discovered patterns.
Keywords: Pattern Mining, Pattern Taxonomy Model, Inner Pattern Evolving, TF-IDF, NLP etc.
Dynamic thresholding on speech segmentationeSAT Journals
Abstract Word is the preferred and natural unit of speech, because word units have well defined acoustic representation. This paper presents several dynamic thresholding approaches for segmenting continuous Bangla speech sentences into words/sub-words. We have proposed three efficient methods for speech segmentation: two of them are usually used in pattern classification (i.e., k-means and FCM clustering) and one of them is used in image segmentation (i.e., Otsu’s thresholding method). We also used new approaches blocking black area and boundary detection techniques to properly detect word boundaries in continuous speech and label the entire speech sentence into a sequence of words/sub-words. K-Means and FCM clustering methods produce better segmentation results than that of Otsu’s Method. All the algorithms and methods used in this research are implemented in MATLAB and the proposed system achieved the average segmentation accuracy of 94% approximately. Keywords: Blocking Black Area, Clustering, Dynamic Thresholding, Fuzzy Logic and Speech Segmentation.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Using data mining methods knowledge discovery for text miningeSAT Journals
Abstract Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. Proposed work presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Keywords:-Text mining, text classification, pattern mining, pattern evolving, information filtering.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
Static dictionary for pronunciation modeling
1. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org 185
STATIC DICTIONARY FOR PRONUNCIATION MODELING
K.Subramanyam1
, D. Arun Kumar2
1
Assistant Professor, IT Department, R.V.R&J.C College of Engineering, Guntur, A.P, India
2
Assistant Professor, ECE Department, R.V.R&J.C College of Engineering, Guntur, A.P, India
subramanyamkunisetti@gmail.com, arundhupam@gmail.com
Abstract
Speech Recognition is the process by which a computer can recognize spoken words. Basically it means talking to your computer and
having it correctly recognize what you are saying. This work is to improve the speech recognition accuracy for Telugu language using
pronunciation model. Speech Recognizers based upon Hidden Markov Model have achieved a high level of performance in controlled
environments. One naive approach for robust speech recognition in order to handle the mismatch between training and testing
environments, many techniques have been proposed. some methods work on acoustic model and some methods work on Language
model. Pronunciation dictionary is the most important component of the Speech Recognition system. New pronunciations for the
words will be considered for speech recognition accuracy.
----------------------------------------------------------------------***------------------------------------------------------------------------
STATEMENT OF THE PROBLEM
Most current leading edge speech recognition systems are
based on an approach called Hidden Markov Modeling
(HMM). By adding new pronunciations to the Static
Dictionary the accuracy can be improved. There are 2 methods
to add pronunciations to the static dictionary. First, calculating
Levenshtein distance between the strings in the confusion
pairs. Second, add the pronunciation by frequency of
occurrence.
Automatic Speech Recognition (ASR) or Speech-to-text
conversion is a sequential pattern recognition problem. It
comprises of three major components- acoustic models,
language model and the pronunciation dictionary, which aims
to correctly hypothesize a spoken utterance into a string of
words. During the training, the system is provided with speech
data, the corresponding transcription and a pronunciation
dictionary. At the decoding time, the acoustic models and
language models trained on the task are used along with one of
the standard dictionary (CMUdict) as lexicon. After Decoding
is completed, the confusion pairs are used as arguments for
Levenshtein Distance algorithm, which gives the maximum
number of operations required to convert one string into
another. The pair which has minimum distance will be
considered for adding to the static dictionary. Later the
decoding process will be repeated to find improved recognition
accuracy. In the other method the word which has occurred
more number of times will be considered for new entry in the
static dictionary.
Automatic base form learning:
The simple method of learning pronunciation variants is to
learn each word's various pronunciations on a word-by-word
basis. Typically a phone recognizer is used to determine
possible alternatives for each word by finding a best-fit
alignment between the phone recognition string and canonical
pronunciations provided by a Static Dictionary.
DICTIONARY REFINEMENT:
Sometimes Dictionary pruning is used to improve the Speech
Recognition accuracy. Dictionary pruning is done based on the
speech training database. we may arrive 2 types of problems
1. Words that are not included in the data do not have
information to be treated with and
2. Some words tend to keep pronunciations that were rarely
observed.
To solve the Unobserved words problem, we can use central or
summary pronunciations in the pruned Dictionary [3] . The aim
of this sort of pronunciation is to capture the phonetic contents
included in the set of pronunciation variants of each word and
to consolidate them in a reduced pronunciation set.
Enhanced Tree Clustering:
This approach is contrast to decision tree based approach,
which allows parameter sharing across phonemes [4]. In this
approach a single decision tree is grown for all sub-states. The
clustering procedure starts with all polyphones at the root.
Questions are asked regarding the identity of the center phone
and its neighboring phones. At any node, the question that
yields the highest information is chosen and the tree is split.
This process is repeated until either the tree reaches a certain
size or a minimum count threshold is crossed. Compared to the
traditional multiple-tree approach, a single tree allows more
flexible sharing of parameters any nodes can potentially be
shared by multiple phones.
2. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org 186
Present work
We tried with various approaches to improve the speech
recognition accuracy for Telugu sentences. The approaches are
Calculating distance using Levenshtein distance algorithm and
minimum distance variants are added to the Static Dictionary.
1. Addition of the frequently occurring errors.
2. Addition of variant in Language model.
3. Changing the probability.
4. Transcription Modification.
Levenshtein Distance Algorithm:
Levenshtein distance algorithm is to calculate the distance
between the variants. The variants which are having minimum
distance will be added to the Static Dictionary. Then we can
observe the improved accuracy.
Addition of the Frequently Occurring Errors:
In the result file we get confusion pairs with number of times
the error was repeated. I took frequently occurring errors and
added to the Static Dictionary. Then i got the improved
Accuracy.
Addition of Variant in Language Model:
I also tried to include the variant in the Language model also.
But i got reduced accuracy. So i did not tried this procedure
later.
Changing the Probability:
I tried to change probabilities of the states, because I want to
know whether the accuracy will be increased or decreased. But
i could unable to open the following files in the folders which
are in model_parameters.
(i) means.
(ii) mixture_weights.
(iii) transition_matrices .
(iv) variances.
Transcription Modification:
when I get some type of errors , I modified the Transcription
for those words, i observed improved accuracy. The following
are the Examples of errors.
MEEREKKADA- MEE
EKKADIKI - EKKADA
HYDERAABAD- MEERAEMI
BHAARATHADESAM-BHAARATHEEYULANDARU
The modified Transcription words are
MEERREKKADA
EKKADAKKI
HHYDERAABAD
BHAARATTHADESAM
In all the approaches, in which i succeeded one main
observation is that when I add new variant to the dictionary,
there is reduced error rate. Which is contradiction to other
papers [3-5]. For all approaches initially I used Praat to
eliminate the noise present in the wave files. if we have noise
in the wave files I got insertion errors, because of these errors
the error rate is increasing. After deleting noise using Praat the
error rate is decreased.
Database
The speech database consists of 24 speaker’s voice and each
speaker spoken 40 sentences.
The database is verified with different times with increased
training data. We observed improved accuracy.
TRAIN
ING
(numbe
r of
speaker
s)
TES
TING
(num
ber of
speak
ers)
ACCU
RACY
(%)
ERRO
RS (%)
AFTER
DICTIONARY
ADDITION
ACCURA
CY(%)
ERRO
RS(%)
12 12 51.159 85.797 69.420 88.478
16 8 58.370 71.196 76.413 65.000
20 4 59.826 73.696 79.130 65.870
The following are the Results when the wave files are noisy
EXPERIMENT
NAME
WORD
ACCURACY(%)
ERRORS(%)
50 61.364 59.091
51 88.696 21.739
52 84.348 31.304
53 80.870 65.217
54 93.913 16.522
55 87.826 24.348
56 73.913 68.696
57 96.522 16.522
58 89.565 18.261
59 77.391 61.739
60 88.696 38.261
61 62.609 95.652
62 90.435 21.739
3. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org 187
63 93.043 16.522
64 93.043 11.304
65 92.174 26.087
66 81.739 20.870
67 91.304 20.000
68 93.913 12.174
69 66.957 72.714
70 83.478 24.348
71 83.478 36.522
72 76.522 31.304
73 55.000 50.833
After eliminating the noise using Praat the results are as
follows
EXPERIMENT
NAME
WORD
ACCURACY(%)
ERRORS(%)
50 80.870 40.000
51 88.696 19.130
52 86.957 21.739
53 81.739 45.217
54 91.304 18.261
55 92.174 14.783
56 87.826 21.739
57 94.783 12.174
58 93.913 10.435
59 84.348 37.391
60 96.522 10.435
61 73.913 53.043
62 92.174 13.043
63 93.043 12.174
64 92.174 13.043
65 94.696 22.957
66 81.739 20.870
67 93.043 14.783
68 93.913 12.174
69 78.261 51.304
70 80.870 29.565
71 85.217 21.739
72 86.957 19.130
73 93.913 9.565
After addition to the Dictionary the results are as follows
EXPER
IMENT
NAME
WORD
ACCURA
CY(%)
ERRORS(%)
50 99.130 22.609
51 95.652 12.174
52 98.261 10.435
53 96.522 28.696
54 99.130 10.435
55 97.391 8.696
56 97.391 16.522
57 98.261 8.696
58 99.130 5.217
59 96.522 23.478
60 99.130 7.826
61 94.696 30.739
62 98.261 6.957
63 98.261 6.957
64 96.522 8.696
65 100.000 15.652
66 94.783 7.826
67 99.130 6.957
68 97.391 6.957
69 93.913 36.522
70 93.913 16.522
71 93.913 13.913
72 96.522 9.565
73 99.130 4.348
CONCLUSIONS
The approaches discussed in this dissertation working well.
Initially the wave files are noisy. I removed the noise with
sound recorder .But sound recorder can remove noise present
at both the ends , not in the middle. so I used Praat to remove
the noise present in the middle part of wave files . I given these
wave files to the Sphinx tool ,i got word accuracy, errors and
confusion pairs .These confusion pairs are added to the
Dictionary using the approaches discussed in this thesis. finally
I observed improved accuracy and decreased errors.
4. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org 188
FUTURE WORK
I worked with the database of 24 speakers, each 40 sentences.
it's better to work with large database. we can try to achieve
100% accuracy for individual speakers. and also we can try for
improving accuracy for large database. Extending the work for
Dynamic Dictionary. Try to record wave files in noise less
environment.
Levenshtein Distance algorithm
Steps Description
1 Set n to be the length of s.
Set m to be the length of t.
If n = 0, return m and exit.
If m = 0, return n and exit.
Construct a matrix containing 0..m rows and
0..n columns.
2. Initialize the first row to 0..n.
Initialize the first column to 0..m.
3 Examine each character of s (i from 1 to n).
4. Examine each character of t (j from 1 to m).
5. If s[i] equals t[j], the cost is 0.
If s[i] doesn't equal t[j], the cost is 1.
6. Set cell d[i,j] of the matrix equal to the
minimum of:
a. The cell immediately above plus 1: d[i-1,j] +
1.
b. The cell immediately to the left plus 1: d[i,j-
1] + 1.
c. The cell diagonally above and to the left plus
the cost: d[i-1,j-1] + cost.
7. After the iteration steps (3, 4, 5, 6) are
complete, the distance is found in cell d[n,m].
REFERENCES:
[1] Gopala krishna Anumanchipalli, “Modeling Pronunciation
Variation for Speech Recognition”, M.S Thesis , IIIT
Hyderabad, February 2008.
[2] Eric John Fosler-Lussier, “Dynamic Pronunciation Models
for Automatic Speech Recognition”, Ph.d Thesis, University
of California, Berkeley, Berkeley, CA, 1999.
[3] Gustavo Hernandez-Abrego, Lex Olorenshaw, “Dictionary
Refinement based on Phonetic Consensus and Non-uniform
Pronunciation Reduction”, INTERSPEECH 2004 -ICSLP 8th
International Conference on Spoken Language Processing,
October 4-8, 2004.
[4] Hua Yu and Tanja Schultz, “Enhanced Tree Clustering with
Single Pronunciation Dictionary for Conversational Speech
Recognition”, in Proceedings of Eurospeech, pp. 1869-1872,
2003.
[5] Dan Jurafsky, Wayne Ward, Zhang Jianping, Keith Herold,
Yu Xiuyang, and Zhang sen, “What Kind of Pronunciation
Variation is Hard for Triphones to Model”, in proceedings of
ICASSP, pp. 577-580, 2001.
[6] Adda-Decker M. and Lamel L, “Pronunciation Variants
Across System Configuration”, Speech Communication,
1999.
[7] J. M. Kessens, M. Wester, and H. Strik , “Improving the
Performance of a Dutch csr by Modeling within-word and
cross-word Pronunciation Variation”, Speech Commun., vol.
29, no. 2-4, pp. 193–207, 1999.
[8] R. Rosenfeld, “Optimizing Lexical and N-gram Coverage
via Judicious use of Linguistic data”, in Proc.
Eurospeech, (Madrid), 1995.
[9] Sk . Akbar, “ Comparing Speech Recognition Accuracy
using HMM and Multi-layer Perceptrons ”, M.Tech
dissertation, June 2008.
[10] kenneth A. Kozar, “Representing Systems with Data Flow
Diagrams”, Spring, 1997.
http://spot.colorado.edu/~kozar/DFD.html
[11] Finke Michael and Waibel Alex, “Speaking Mode
Dependent Pronunciation Modeling in Large Vocabulary
Conversational Speech Recognition”, in Procedures of
Eurospeech, 1995.
[12] M. Ravishankar and M. Eskenazi, “Automatic Generation
of Context-Dependent Pronunciations”, in Proc. Eurospeech
’97, (Rhodes, Greece), pp. 2467–2470, 1997.
[13] T. Sloboda and A. Waibel, “Dictionary Learning for
Spontaneous Speech Recognition”, in Proc. ICSLP ’96,
(Philadelphia), pp. 2328–2331, 1996.
[14] A. Xavier and D. Christian, “Improved Acoustic-Phonetic
Modeling in Philip's Dictation System by Handling Liaisons
and Multiple Pronunciations” , in Proc. Eurospeech ’95,
(Madrid), pp. 767– 770, 1995.
[15] P. S. Cohen and R. L. Mercer, “The Phonological
Component of an Automatic Speech Recognition System ”,
Reddy , D.R. (Ed) Speech Recognition. Invited Papers
Presented at the 1974 IEEE Symposium., pp. 275–319, 1975.
[16] N. Cremelie and J.-P. Martens, “In Search of Better
Pronunciation Models for Speech Recognition”, Speech
Communication, vol. 29, no. 2-4, pp. 115–136, 1999.
[17] B. C. S. and Y. S. J., “Pseudo-Articulatory Speech
Synthesis for Recognition using Automatic Feature Extraction
from X-ray Data”, in Proc. ICSLP ’96, (Philadelphia), pp. 969–
972, 1996.
[18] I.Amdal, F. Korkmazskiy, and A. C. Surendran, “Joint
Pronunciation Modeling of Non-native Speakers using Data-
Driven Methods”, in Proc. ICSLP ’00, (Beijing, China), pp.
622–625, 2000.
[19] M. Bacchiani and M. Ostendorf, “Joint Lexicon, Acoustic
Unit Inventory and Model Design”, Speech Commun., vol. 29,
no. 2-4, pp. 99–114, 1999.
[20] T. Fukada, T. Yoshimura, and Y. Sagisaka, “Automatic
Generation of Multiple Pronunciations based on Neural
Networks”, Speech Commun., vol. 27, no. 1, pp. 63–73, 1999.
[21] S. Greenberg, “Speaking in Shorthand – A Syllable-
centric Perspective for Understanding Pronunciation
Variation”, in Proc. of the ESCA Workshop on Modeling
5. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 02 | Oct-2012, Available @ http://www.ijret.org 189
Pronunciation Variation for Automatic Speech Recognition,
(Kekrade, Netherlands, May 1998. ESCA.), 1998.
[22] T. Holter and T. Svendsen, “Maximum Likelihood
Modeling of Pronunciation Variation”, Speech Commun., vol.
29, no. 2-4, pp. 177–191, 1999.
[23] H. Strik and C. Cucchiarini, “Modeling Pronunciation
Variation for ASR: a survey of the literature”, Speech
Commun., vol. 29, no. 2-4, pp. 225–246, 1999.
[24] M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J.
McDonough, H. Nock, M. Saraclar, C. Wooters, and G.
Zavaliagkos, “Stochastic Pronunciation Modeling from Hand-
Labeled Phonetic Corpora”, Speech Commun., vol. 29, no. 2-4,
pp. 209–224, 1999.