Speech generation is one of the most important areas of research in speech signal processing which is now gaining a serious attention. Speech is a natural form of communication in all living things. Computers with the ability to understand speech and speak with a human like voice are expected to contribute to the development of more natural man-machine interface. However, in order to give those functions that are even closer to those of human beings, we must learn more about the mechanisms by which speech is produced and perceived, and develop speech information processing technologies that can generate a more natural sounding systems. The so described field of stud, also called speech synthesis and more prominently acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to understand the basic theory of speech production. Every language has different phonetic alphabets and a different set of possible phonemes and their combinations. For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been calculated. Investigations have shown that the two durational distributions differ significantly with respect to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There are many applications which are directly of indirectly related to the research being carried out. For instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for building large vocabulary speech recognition systems.
This document discusses improving the word accuracy of an automatic speech recognition (ASR) system for the Telugu language. It analyzes the substitution errors in the system using two different lexical models - one based on stress-timed English phonemes (CMU lexicon) and one handcrafted lexicon for syllable-timed Telugu (UOH lexicon). The UOH lexicon improves word accuracy by 20-30% compared to the CMU lexicon by better modeling the phonetic characteristics of Telugu. The paper also examines the effect of gender, accents, and non-native speakers on substitution errors and the resulting confusion matrices provide insight into the most commonly substituted phonemes.
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...IOSR Journals
This document discusses substitution error analysis to improve word accuracy in an automatic speech recognition system for the Telugu language. It analyzes the performance of an ASR system using two different lexical models - one based on a stress-timed language (CMU) and the other a handcrafted lexicon for syllable-timed Telugu. The effect of gender, accents, and pronunciation variants on substitution errors is studied. Confusion matrices of vowels and consonants show the most common phoneme substitutions for each case. The Telugu-based lexicon improves word accuracy by 20-30% over the CMU-based system.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Reasarch proposal (The Influence of Indonesian Language toward English Writing)Ria Dwi Pratiwi
This document summarizes a research paper that examines the influence of the Indonesian language on English writing. It begins by providing background on the importance of language and definitions of Indonesian and English. It then discusses writing in both languages and compares their characteristics. The research aims to understand how the Indonesian language influences English writing for students in Indonesia. Literature reviewed includes definitions of language, Indonesian, English and how they are written. The methodology describes conducting the study on 5th semester English students in Pontianak, Indonesia.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation result based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation
result based on huge amount of aligned parallel text corpora in both the source and target languages.
Although Bodo is a recognized natural language of India and co-official languages of Assam, still the
machine readable information of Bodo language is very low. Therefore, to expand the computerized
information of the language, English to Bodo SMT system has been developed. But this paper mainly
focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using
Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator
tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora.
Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques
in the SMT system.
The document discusses definitions of language from various sources. It provides definitions that view language as a systemic set of arbitrary symbols used for human communication within a community or culture. The definitions also note that language symbols can be vocal or visual. The document then discusses key aspects of language, including that it is a system that allows for combining smaller units into larger ones for communication purposes. It also notes language has multiple functions like social interaction and transmitting information. The document explores theories around the origin of language and physiological adaptations that enabled language development. It discusses language as having rules and patterns at various linguistic levels. It also addresses concepts like language universals, the innateness of language, and its creative aspects. In the end, it frames
Role of language engineering to preserve endangered languagesDr. Amit Kumar Jha
Role of Language Engineering to Preserve Endangered Languages discusses how language engineering can help preserve endangered languages through documentation and digitization. Language engineering is the application of computer science to develop language-related software and hardware. It involves techniques like speech and text processing to develop systems that can understand, interpret, and generate human language. Documenting endangered languages through recording speech samples and collecting texts is important for preservation. Language engineering makes this documentation process easier through tools like speech-to-text, text-to-speech, and transcription tools. It also allows for digital storage of language data, which helps preserve languages for longer as digital data is more durable than other forms of storage. Developing applications that use endangered languages, like translation systems,
This document discusses improving the word accuracy of an automatic speech recognition (ASR) system for the Telugu language. It analyzes the substitution errors in the system using two different lexical models - one based on stress-timed English phonemes (CMU lexicon) and one handcrafted lexicon for syllable-timed Telugu (UOH lexicon). The UOH lexicon improves word accuracy by 20-30% compared to the CMU lexicon by better modeling the phonetic characteristics of Telugu. The paper also examines the effect of gender, accents, and non-native speakers on substitution errors and the resulting confusion matrices provide insight into the most commonly substituted phonemes.
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...IOSR Journals
This document discusses substitution error analysis to improve word accuracy in an automatic speech recognition system for the Telugu language. It analyzes the performance of an ASR system using two different lexical models - one based on a stress-timed language (CMU) and the other a handcrafted lexicon for syllable-timed Telugu. The effect of gender, accents, and pronunciation variants on substitution errors is studied. Confusion matrices of vowels and consonants show the most common phoneme substitutions for each case. The Telugu-based lexicon improves word accuracy by 20-30% over the CMU-based system.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Reasarch proposal (The Influence of Indonesian Language toward English Writing)Ria Dwi Pratiwi
This document summarizes a research paper that examines the influence of the Indonesian language on English writing. It begins by providing background on the importance of language and definitions of Indonesian and English. It then discusses writing in both languages and compares their characteristics. The research aims to understand how the Indonesian language influences English writing for students in Indonesia. Literature reviewed includes definitions of language, Indonesian, English and how they are written. The methodology describes conducting the study on 5th semester English students in Pontianak, Indonesia.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation result based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translation
result based on huge amount of aligned parallel text corpora in both the source and target languages.
Although Bodo is a recognized natural language of India and co-official languages of Assam, still the
machine readable information of Bodo language is very low. Therefore, to expand the computerized
information of the language, English to Bodo SMT system has been developed. But this paper mainly
focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using
Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator
tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora.
Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques
in the SMT system.
The document discusses definitions of language from various sources. It provides definitions that view language as a systemic set of arbitrary symbols used for human communication within a community or culture. The definitions also note that language symbols can be vocal or visual. The document then discusses key aspects of language, including that it is a system that allows for combining smaller units into larger ones for communication purposes. It also notes language has multiple functions like social interaction and transmitting information. The document explores theories around the origin of language and physiological adaptations that enabled language development. It discusses language as having rules and patterns at various linguistic levels. It also addresses concepts like language universals, the innateness of language, and its creative aspects. In the end, it frames
Role of language engineering to preserve endangered languagesDr. Amit Kumar Jha
Role of Language Engineering to Preserve Endangered Languages discusses how language engineering can help preserve endangered languages through documentation and digitization. Language engineering is the application of computer science to develop language-related software and hardware. It involves techniques like speech and text processing to develop systems that can understand, interpret, and generate human language. Documenting endangered languages through recording speech samples and collecting texts is important for preservation. Language engineering makes this documentation process easier through tools like speech-to-text, text-to-speech, and transcription tools. It also allows for digital storage of language data, which helps preserve languages for longer as digital data is more durable than other forms of storage. Developing applications that use endangered languages, like translation systems,
Marathi Text-To-Speech Synthesis using Natural Language Processingiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Role of Language Engineering to Preserve Endangered Language Dr. Amit Kumar Jha
Language engineering can help preserve endangered languages by developing tools like language translators, speech generation systems, and language teaching systems. If these tools are applied to endangered languages, they may help prevent the languages from going extinct by increasing the number of people who understand and use them. Key applications of language engineering that could support endangered languages include speech synthesis, machine translation, speech recognition, text-to-speech conversion, and language teaching systems. Developing digital language documentation and creating transcription tools can also help endangered languages survive by making more materials available to study and learn them.
A Review on Gesture Recognition Techniques for Human-Robot CollaborationIRJET Journal
This document provides a review of gesture recognition techniques for human-robot collaboration. It discusses how sign language uses gestures as a form of communication for deaf individuals. There are four key components of gesture recognition systems for human-robot interaction: sensor technologies to detect gestures, gesture detection, gesture tracking, and gesture classification. The document analyzes approaches based on these components and provides statistical analysis. It also discusses challenges in sign language recognition and the importance of technologies that enable communication for deaf individuals.
The document presents an unsupervised approach for developing stemmers for Urdu and Marathi languages. It uses frequency-based and length-based stripping approaches to automatically learn and generate suffix rules from word corpora without any linguistic knowledge. The frequency-based approach achieved 85.36% accuracy for Urdu and 82.5% for Marathi, outperforming the length-based approach. The stemmer could improve information retrieval for languages by reducing morphological variants to word stems.
This document provides an overview of language and its components. It defines language as a system of human communication using structured sounds or their written representations to form codes that label concepts. A language has components like phonology, morphology and syntax. It also discusses the International Phonetic Alphabet used to represent speech sounds. Key points made are that language is first spoken and later written, and that phonetics studies sound production while phonology studies sound patterning to form words. Semantics refers to word meanings and pragmatics to how language is used to communicate. Vowels are sounds made without oral impediment while consonants are made with impediments.
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
Summer Research Project (Anusaaraka) ReportAnwar Jameel
This document discusses Anusaaraka, a machine translation tool being developed to translate between English and Hindi. It uses principles from Panini's grammar to map word groups and constructions between the languages. Where differences exist, extra notation is added to preserve source language information. The output is presented in layers to show the translation process. It aims to bridge the language barrier by allowing users to access text in their preferred Indian language.
Key features include faithfully representing the source text, reversibility of the translation process through layered output, and transparency by allowing users to trace the translation steps. It was developed by combining traditional Indian linguistic principles with modern technologies.
TRANSLITERATION BY ORTHOGRAPHY OR PHONOLOGY FOR HINDI AND MARATHI TO ENGLISH:...kevig
e-Governance and Web based online commercial multilingual applications has given utmost importance to
the task of translation and transliteration. The Named Entities and Technical Terms occur in the source
language of translation are called out of vocabulary words as they are not available in the multilingual
corpus or dictionary used to support translation process. These Named Entities and Technical Terms need
to be transliterated from source language to target language without losing their phonetic properties. The
fundamental problem in India is that there is no set of rules available to write the spellings in English for
Indian languages according to the linguistics. People are writing different spellings for the same name at
different places. This fact certainly affects the Top-1 accuracy of the transliteration and in turn the
translation process. Major issue noticed by us is the transliteration of named entities consisting three
syllables or three phonetic units in Hindi and Marathi languages where people use mixed approach to
write the spelling either by orthographical approach or by phonological approach. In this paper authors
have provided their opinion through experimentation about appropriateness of either approach.
A Descriptive Study of Standard Dialect and Western Dialect of Odia Language ...ijtsrd
Language is a unique blessing to human beings. Human beings are bestowed with the faculty of language from very primitive age. Language makes human beings social and in a society human beings communicate with the help of language. Odia is one among the constitutionally approved language of India. Odisha is situated in the eastern part of India. Presently, this state has thirty districts. Odisha is bound to the north by the state Jharkhand, to the northeast by the state West Bengal, to the east by the Bay of Bengal, to the south by the state Andhra Pradesh, and to the west by the state Chhattisgarh. The languages used by the neighboring states have a lot of influence on Odia language. In this present study a modest attempt has been made to high light the differences between Standard Odia and Western Odia dialects. Various linguistic items used by the western Odia dialect users have marked differences compared to the standard Odia. The study has been done to delve into the phonological, morphological, semantic and syntactic features of both Standard Odia and Western Odia. Secondly, for ease of understanding some amount of discussion has been made on the existing literature on language and its variety in general. As spoken form is the primary form of any language, data have been collected from the informants' conversation for analysis. Debiprasad Pany "A Descriptive Study of Standard Dialect and Western Dialect of Odia Language in Terms of Linguistic Items" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29632.pdf Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/odia/29632/a-descriptive-study-of-standard-dialect-and-western-dialect-of-odia-language-in-terms-of-linguistic-items/debiprasad-pany
Language has several key characteristics and definitions according to experts. Aristotle defined language as representing the mind's experiences, while Chomsky saw it as an innate human capacity to form grammatical sentences. Language is a means of communication, uses symbols, and has systematic rules or patterns. It is uniquely human and relies on vocal symbols produced using articulators in the vocal tract. The symbols of a language, like words, are arbitrarily assigned and different languages use different symbols for the same concepts.
Dictionary Entries for Bangla Consonant Ended Roots in Universal Networking L...Waqas Tariq
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. We are working to include Bangla language in the UNL system so that Bangla language can be converted to UNL expressions. As a part of this process currently we are working on Bangla Consonant Ended Verb Roots and trying to develop lexical or dictionary entries for the Consonant Ended Verb Roots. In this paper, we have presented our work by describing Bangla verb, Verb root, Verbal Inflections and then finally showed the dictionary entries for the consonant ended roots.
The document summarizes language issues in higher education in India. It discusses the sociolinguistic situation in India, characterized by multilingualism and the use of different languages for different social roles. English plays an important functional role, being used for acquiring knowledge, as a link language, and increasingly as an alternate language. The Constitution designates Hindi as the official language but allows the continued use of English. It also directs states to provide primary education in a child's mother tongue and promote the spread of Hindi while enriching it with other languages.
Sign language uses visual gestures rather than sounds to communicate. While different from spoken languages, sign languages have a similar underlying linguistic structure using words, grammar and sentences. Neurologically, both sign and spoken languages activate Broca's area for language production, though sign language relies more on visual processing. Studies also show semantic processing occurs in distinct brain regions for comprehending meaning regardless of input or output modality.
The document discusses the concept of language and its functions. It defines language and examines its key characteristics such as being social, symbolic, systematic, vocal, conventional, productive, and a means of communication. The document also explores several important functions of language, including communication, transmission of culture, thought, diffusion of knowledge, political cohesion, cultural identity, and facilitating human cooperation and society.
Characteristics and features of Language Junaid Amjed
Language is a uniquely human system of communication that has enabled the development of human civilization. It has several key characteristics: it is arbitrary, systematic, productive, creative, social, and conventional. Language exists through social conventions and allows for human interaction, cooperation, and the development of culture. It consists of symbols organized into complex, rule-based systems to convey meaning. The ability to use language sets humans apart from other animals.
This document provides an introduction, body, and conclusion to a case study on multilingualism in contemporary India conducted by Pushpi Bagchi. The introduction discusses the debate around loss of linguistic diversity in India, which has over 20 official languages. The body explores themes of English as the unofficial national language, language as a cultural asset, advocating for regional languages, and multilingualism in India. Interviews with Indian students in Edinburgh are presented. The conclusion suggests that while English is prominent, regional languages are still important to cultural identity and pride, and linguistic diversity in India will continue to evolve.
Language enables communication through a system of symbols with arbitrary meanings that are understood by a given culture or community. It is a generative system with rules that govern how symbols can be combined to form an infinite number of messages. A language consists of various building blocks including phonemes, morphemes, and syntax. It uses various modes of communication such as verbal, gestural, and spatial systems.
Uniqueness of Human language By Sheikh TalhaSheikh Talha
This document discusses the uniqueness of human language. It notes that human language has both verbal and non-verbal components. It is also complex and compositional, with meanings of sentences composed from the meanings of individual words. Additionally, human language has the property of recursion and allows for open-ended, productive communication through a finite set of elements. Human language is qualitatively different from animal communication in its use of grammar, semantics, and ability to refer to abstract concepts.
Kannada Phonemes to Speech Dictionary: Statistical ApproachIJERA Editor
The input or output of a natural Language processing system can be either written text or speech. To process written text we need to analyze: lexical, syntactic, semantic knowledge about the language, discourse information, real world knowledge to process spoken language, we need to analyze everything required to process written text, along with the challenges of speech recognition and speech synthesis. This paper describes how articulatory phonetics of Kannada is used to generate the phoneme to speech dictionary for Kannada; the statistical computational approach is used to map the elements which are taken from input query or documents. The articulatory phonetics is the place of articulation of a consonant. It is the point of contact where an obstruction occurs in the vocal tract between an articulatory gesture, an active articulator, typically some part of the tongue, and a passive location, typically some part of the roof of the mouth. Along with the manner of articulation and the phonation, this gives the consonant its distinctive sound. The results are presented for the same.
This document discusses several topics in sociolinguistics, including language contact and variation, nativization of English in India, bilingualism and multilingualism, code-switching and code-mixing, pidgins and creoles, dialects, and register and style. It explains that language contact occurs when speakers of different languages interact, leading to transfer of features between languages. In India, the prolonged contact between English and Indian languages has resulted in nativization, where English has taken on features of Indian languages. It also discusses how multilingualism has increased due to globalization, and defines concepts like pidgins, creoles, dialects, and linguistic style.
Identification and Classification of Named Entities in Indian Languageskevig
The process of identification of Named Entities (NEs) in a given document and then there classification into
different categories of NEs is referred to as Named Entity Recognition (NER). We need to do a great effort
in order to perform NER in Indian languages and achieve the same or higher accuracy as that obtained by
English and the European languages. In this paper, we have presented the results that we have achieved by
performing NER in Hindi, Bengali and Telugu using Hidden Markov Model (HMM) and Performance
Metrics.
Effect of Query Formation on Web Search Engine Resultskevig
Query in a search engine is generally based on natural language. A query can be expressed in more than
one way without changing its meaning as it depends on thinking of human being at a particular moment.
Aim of the searcher is to get most relevant results immaterial of how the query has been expressed. In the
present paper, we have examined the results of search engine for change in coverage and similarity of first
few results when a query is entered in two semantically same but in different formats. Searching has been
made through Google search engine. Fifteen pairs of queries have been chosen for the study. The t-test has
been used for the purpose and the results have been checked on the basis of total documents found,
similarity of first five and first ten documents found in the results of a query entered in two different
formats. It has been found that the total coverage is same but first few results are significantly different.
More Related Content
Similar to Investigations of the Distributions of Phonemic Durations in Hindi and Dogri
Marathi Text-To-Speech Synthesis using Natural Language Processingiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Role of Language Engineering to Preserve Endangered Language Dr. Amit Kumar Jha
Language engineering can help preserve endangered languages by developing tools like language translators, speech generation systems, and language teaching systems. If these tools are applied to endangered languages, they may help prevent the languages from going extinct by increasing the number of people who understand and use them. Key applications of language engineering that could support endangered languages include speech synthesis, machine translation, speech recognition, text-to-speech conversion, and language teaching systems. Developing digital language documentation and creating transcription tools can also help endangered languages survive by making more materials available to study and learn them.
A Review on Gesture Recognition Techniques for Human-Robot CollaborationIRJET Journal
This document provides a review of gesture recognition techniques for human-robot collaboration. It discusses how sign language uses gestures as a form of communication for deaf individuals. There are four key components of gesture recognition systems for human-robot interaction: sensor technologies to detect gestures, gesture detection, gesture tracking, and gesture classification. The document analyzes approaches based on these components and provides statistical analysis. It also discusses challenges in sign language recognition and the importance of technologies that enable communication for deaf individuals.
The document presents an unsupervised approach for developing stemmers for Urdu and Marathi languages. It uses frequency-based and length-based stripping approaches to automatically learn and generate suffix rules from word corpora without any linguistic knowledge. The frequency-based approach achieved 85.36% accuracy for Urdu and 82.5% for Marathi, outperforming the length-based approach. The stemmer could improve information retrieval for languages by reducing morphological variants to word stems.
This document provides an overview of language and its components. It defines language as a system of human communication using structured sounds or their written representations to form codes that label concepts. A language has components like phonology, morphology and syntax. It also discusses the International Phonetic Alphabet used to represent speech sounds. Key points made are that language is first spoken and later written, and that phonetics studies sound production while phonology studies sound patterning to form words. Semantics refers to word meanings and pragmatics to how language is used to communicate. Vowels are sounds made without oral impediment while consonants are made with impediments.
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
Summer Research Project (Anusaaraka) ReportAnwar Jameel
This document discusses Anusaaraka, a machine translation tool being developed to translate between English and Hindi. It uses principles from Panini's grammar to map word groups and constructions between the languages. Where differences exist, extra notation is added to preserve source language information. The output is presented in layers to show the translation process. It aims to bridge the language barrier by allowing users to access text in their preferred Indian language.
Key features include faithfully representing the source text, reversibility of the translation process through layered output, and transparency by allowing users to trace the translation steps. It was developed by combining traditional Indian linguistic principles with modern technologies.
TRANSLITERATION BY ORTHOGRAPHY OR PHONOLOGY FOR HINDI AND MARATHI TO ENGLISH:...kevig
e-Governance and Web based online commercial multilingual applications has given utmost importance to
the task of translation and transliteration. The Named Entities and Technical Terms occur in the source
language of translation are called out of vocabulary words as they are not available in the multilingual
corpus or dictionary used to support translation process. These Named Entities and Technical Terms need
to be transliterated from source language to target language without losing their phonetic properties. The
fundamental problem in India is that there is no set of rules available to write the spellings in English for
Indian languages according to the linguistics. People are writing different spellings for the same name at
different places. This fact certainly affects the Top-1 accuracy of the transliteration and in turn the
translation process. Major issue noticed by us is the transliteration of named entities consisting three
syllables or three phonetic units in Hindi and Marathi languages where people use mixed approach to
write the spelling either by orthographical approach or by phonological approach. In this paper authors
have provided their opinion through experimentation about appropriateness of either approach.
A Descriptive Study of Standard Dialect and Western Dialect of Odia Language ...ijtsrd
Language is a unique blessing to human beings. Human beings are bestowed with the faculty of language from very primitive age. Language makes human beings social and in a society human beings communicate with the help of language. Odia is one among the constitutionally approved language of India. Odisha is situated in the eastern part of India. Presently, this state has thirty districts. Odisha is bound to the north by the state Jharkhand, to the northeast by the state West Bengal, to the east by the Bay of Bengal, to the south by the state Andhra Pradesh, and to the west by the state Chhattisgarh. The languages used by the neighboring states have a lot of influence on Odia language. In this present study a modest attempt has been made to high light the differences between Standard Odia and Western Odia dialects. Various linguistic items used by the western Odia dialect users have marked differences compared to the standard Odia. The study has been done to delve into the phonological, morphological, semantic and syntactic features of both Standard Odia and Western Odia. Secondly, for ease of understanding some amount of discussion has been made on the existing literature on language and its variety in general. As spoken form is the primary form of any language, data have been collected from the informants' conversation for analysis. Debiprasad Pany "A Descriptive Study of Standard Dialect and Western Dialect of Odia Language in Terms of Linguistic Items" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29632.pdf Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/odia/29632/a-descriptive-study-of-standard-dialect-and-western-dialect-of-odia-language-in-terms-of-linguistic-items/debiprasad-pany
Language has several key characteristics and definitions according to experts. Aristotle defined language as representing the mind's experiences, while Chomsky saw it as an innate human capacity to form grammatical sentences. Language is a means of communication, uses symbols, and has systematic rules or patterns. It is uniquely human and relies on vocal symbols produced using articulators in the vocal tract. The symbols of a language, like words, are arbitrarily assigned and different languages use different symbols for the same concepts.
Dictionary Entries for Bangla Consonant Ended Roots in Universal Networking L...Waqas Tariq
The Universal Networking Language (UNL) deals with the communication across nations of different languages and involves with many different related discipline such as linguistics, epistemology, computer science etc. It helps to overcome the language barrier among people of different nations to solve problems emerging from current globalization trends and geopolitical interdependence. We are working to include Bangla language in the UNL system so that Bangla language can be converted to UNL expressions. As a part of this process currently we are working on Bangla Consonant Ended Verb Roots and trying to develop lexical or dictionary entries for the Consonant Ended Verb Roots. In this paper, we have presented our work by describing Bangla verb, Verb root, Verbal Inflections and then finally showed the dictionary entries for the consonant ended roots.
The document summarizes language issues in higher education in India. It discusses the sociolinguistic situation in India, characterized by multilingualism and the use of different languages for different social roles. English plays an important functional role, being used for acquiring knowledge, as a link language, and increasingly as an alternate language. The Constitution designates Hindi as the official language but allows the continued use of English. It also directs states to provide primary education in a child's mother tongue and promote the spread of Hindi while enriching it with other languages.
Sign language uses visual gestures rather than sounds to communicate. While different from spoken languages, sign languages have a similar underlying linguistic structure using words, grammar and sentences. Neurologically, both sign and spoken languages activate Broca's area for language production, though sign language relies more on visual processing. Studies also show semantic processing occurs in distinct brain regions for comprehending meaning regardless of input or output modality.
The document discusses the concept of language and its functions. It defines language and examines its key characteristics such as being social, symbolic, systematic, vocal, conventional, productive, and a means of communication. The document also explores several important functions of language, including communication, transmission of culture, thought, diffusion of knowledge, political cohesion, cultural identity, and facilitating human cooperation and society.
Characteristics and features of Language Junaid Amjed
Language is a uniquely human system of communication that has enabled the development of human civilization. It has several key characteristics: it is arbitrary, systematic, productive, creative, social, and conventional. Language exists through social conventions and allows for human interaction, cooperation, and the development of culture. It consists of symbols organized into complex, rule-based systems to convey meaning. The ability to use language sets humans apart from other animals.
This document provides an introduction, body, and conclusion to a case study on multilingualism in contemporary India conducted by Pushpi Bagchi. The introduction discusses the debate around loss of linguistic diversity in India, which has over 20 official languages. The body explores themes of English as the unofficial national language, language as a cultural asset, advocating for regional languages, and multilingualism in India. Interviews with Indian students in Edinburgh are presented. The conclusion suggests that while English is prominent, regional languages are still important to cultural identity and pride, and linguistic diversity in India will continue to evolve.
Language enables communication through a system of symbols with arbitrary meanings that are understood by a given culture or community. It is a generative system with rules that govern how symbols can be combined to form an infinite number of messages. A language consists of various building blocks including phonemes, morphemes, and syntax. It uses various modes of communication such as verbal, gestural, and spatial systems.
Uniqueness of Human language By Sheikh TalhaSheikh Talha
This document discusses the uniqueness of human language. It notes that human language has both verbal and non-verbal components. It is also complex and compositional, with meanings of sentences composed from the meanings of individual words. Additionally, human language has the property of recursion and allows for open-ended, productive communication through a finite set of elements. Human language is qualitatively different from animal communication in its use of grammar, semantics, and ability to refer to abstract concepts.
Kannada Phonemes to Speech Dictionary: Statistical ApproachIJERA Editor
The input or output of a natural Language processing system can be either written text or speech. To process written text we need to analyze: lexical, syntactic, semantic knowledge about the language, discourse information, real world knowledge to process spoken language, we need to analyze everything required to process written text, along with the challenges of speech recognition and speech synthesis. This paper describes how articulatory phonetics of Kannada is used to generate the phoneme to speech dictionary for Kannada; the statistical computational approach is used to map the elements which are taken from input query or documents. The articulatory phonetics is the place of articulation of a consonant. It is the point of contact where an obstruction occurs in the vocal tract between an articulatory gesture, an active articulator, typically some part of the tongue, and a passive location, typically some part of the roof of the mouth. Along with the manner of articulation and the phonation, this gives the consonant its distinctive sound. The results are presented for the same.
This document discusses several topics in sociolinguistics, including language contact and variation, nativization of English in India, bilingualism and multilingualism, code-switching and code-mixing, pidgins and creoles, dialects, and register and style. It explains that language contact occurs when speakers of different languages interact, leading to transfer of features between languages. In India, the prolonged contact between English and Indian languages has resulted in nativization, where English has taken on features of Indian languages. It also discusses how multilingualism has increased due to globalization, and defines concepts like pidgins, creoles, dialects, and linguistic style.
Similar to Investigations of the Distributions of Phonemic Durations in Hindi and Dogri (20)
Identification and Classification of Named Entities in Indian Languageskevig
The process of identification of Named Entities (NEs) in a given document and then there classification into
different categories of NEs is referred to as Named Entity Recognition (NER). We need to do a great effort
in order to perform NER in Indian languages and achieve the same or higher accuracy as that obtained by
English and the European languages. In this paper, we have presented the results that we have achieved by
performing NER in Hindi, Bengali and Telugu using Hidden Markov Model (HMM) and Performance
Metrics.
Effect of Query Formation on Web Search Engine Resultskevig
Query in a search engine is generally based on natural language. A query can be expressed in more than
one way without changing its meaning as it depends on thinking of human being at a particular moment.
Aim of the searcher is to get most relevant results immaterial of how the query has been expressed. In the
present paper, we have examined the results of search engine for change in coverage and similarity of first
few results when a query is entered in two semantically same but in different formats. Searching has been
made through Google search engine. Fifteen pairs of queries have been chosen for the study. The t-test has
been used for the purpose and the results have been checked on the basis of total documents found,
similarity of first five and first ten documents found in the results of a query entered in two different
formats. It has been found that the total coverage is same but first few results are significantly different.
Investigations of the Distributions of Phonemic Durations in Hindi and Dogrikevig
Speech generation is one of the most important areas of research in speech signal processing which is now gaining a serious attention. Speech is a natural form of communication in all living things. Computers with the ability to understand speech and speak with a human like voice are expected to contribute to the development of more natural man-machine interface. However, in order to give those functions that are even closer to those of human beings, we must learn more about the mechanisms by which speech is produced and perceived, and develop speech information processing technologies that can generate a more natural sounding systems. The so described field of stud, also called speech synthesis and more prominently acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to understand the basic theory of speech production. Every language has different phonetic alphabets and a different set of possible phonemes and their combinations.
For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been calculated. Investigations have shown that the two durational distributions differ significantly with respect to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There are many applications which are directly of indirectly related to the research being carried out. For instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for building large vocabulary speech recognition systems.
May 2024 - Top10 Cited Articles in Natural Language Computingkevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
Effect of Singular Value Decomposition Based Processing on Speech Perceptionkevig
Speech is an important biological signal for primary mode of communication among human being and also the most natural and efficient form of exchanging information among human in speech. Speech processing is the most important aspect in signal processing. In this paper the theory of linear algebra called singular value decomposition (SVD) is applied to the speech signal. SVD is a technique for deriving important parameters of a signal. The parameters derived using SVD may further be reduced by perceptual evaluation of the synthesized speech using only perceptually important parameters, where the speech signal can be compressed so that the information can be transformed into compressed form without losing its quality. This technique finds wide applications in speech compression, speech recognition, and speech synthesis. The objective of this paper is to investigate the effect of SVD based feature selection of the input speech on the perception of the processed speech signal. The speech signal which is in the form of vowels \a\, \e\, \u\ were recorded from each of the six speakers (3 males and 3 females). The vowels for the six speakers were analyzed using SVD based processing and the effect of the reduction in singular values was investigated on the perception of the resynthesized vowels using reduced singular values. Investigations have shown that the number of singular values can be drastically reduced without significantly affecting the perception of the vowels.
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Modelskevig
Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4,
demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to
identify which specific terms in prompts positively or negatively impact relevance
evaluation with LLMs. We employed two types of prompts: those used in previous
research and generated automatically by LLMs. By comparing the performance of
these prompts in both few-shot and zero-shot settings, we analyze the influence of
specific terms in the prompts. We have observed two main findings from our study.
First, we discovered that prompts using the term ‘answer’ lead to more effective
relevance evaluations than those using ‘relevant.’ This indicates that a more direct
approach, focusing on answering the query, tends to enhance performance. Second,
we noted the importance of appropriately balancing the scope of ‘relevance.’ While
the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments.
The inclusion of few-shot examples helps in more precisely defining this balance.
By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute
to refine relevance criteria. In conclusion, our study highlights the significance of
carefully selecting terms in prompts for relevance evaluation with LLMs.
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Modelskevig
Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4, demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to identify which specific terms in prompts positively or negatively impact relevance evaluation with LLMs. We employed two types of prompts: those used in previous research and generated automatically by LLMs. By comparing the performance of these prompts in both few-shot and zero-shot settings, we analyze the influence of specific terms in the prompts. We have observed two main findings from our study. First, we discovered that prompts using the term ‘answer’ lead to more effective relevance evaluations than those using ‘relevant.’ This indicates that a more direct approach, focusing on answering the query, tends to enhance performance. Second, we noted the importance of appropriately balancing the scope of ‘relevance.’ While the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments. The inclusion of few-shot examples helps in more precisely defining this balance. By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute to refine relevance criteria. In conclusion, our study highlights the significance of carefully selecting terms in prompts for relevance evaluation with LLMs.
In recent years, great advances have been made in the speed, accuracy, and coverage of automatic word
sense disambiguator systems that, given a word appearing in a certain context, can identify the sense of
that word. In this paper we consider the problem of deciding whether same words contained in different
documents are related to the same meaning or are homonyms. Our goal is to improve the estimate of the
similarity of documents in which some words may be used with different meanings. We present three new
strategies for solving this problem, which are used to filter out homonyms from the similarity computation.
Two of them are intrinsically non-semantic, whereas the other one has a semantic flavor and can also be
applied to word sense disambiguation. The three strategies have been embedded in an article document
recommendation system that one of the most important Italian ad-serving companies offers to its customers.
Genetic Approach For Arabic Part Of Speech Taggingkevig
With the growing number of textual resources available, the ability to understand them becomes critical.
An essential first step in understanding these sources is the ability to identify the parts-of-speech in each
sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech
tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a
genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic
approaches.
Rule Based Transliteration Scheme for English to Punjabikevig
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
Improving Dialogue Management Through Data Optimizationkevig
In task-oriented dialogue systems, the ability for users to effortlessly communicate with machines and computers through natural language stands as a critical advancement. Central to these systems is the dialogue manager, a pivotal component tasked with navigating the conversation to effectively meet user goals by selecting the most appropriate response. Traditionally, the development of sophisticated dialogue management has embraced a variety of methodologies, including rule-based systems, reinforcement learning, and supervised learning, all aimed at optimizing response selection in light of user inputs. This research casts a spotlight on the pivotal role of data quality in enhancing the performance of dialogue managers. Through a detailed examination of prevalent errors within acclaimed datasets, such as Multiwoz 2.1 and SGD, we introduce an innovative synthetic dialogue generator designed to control the introduction of errors precisely. Our comprehensive analysis underscores the critical impact of dataset imperfections, especially mislabeling, on the challenges inherent in refining dialogue management processes.
Document Author Classification using Parsed Language Structurekevig
Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used, for example, to determine authorship of all of The Federalist Papers. Such methods may be useful in more modern times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship using grammatical structural information extracted using a statistical natural language parser. This paper provides a proof of concept, testing author classification based on grammatical structure on a set of “proof texts,” The Federalist Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted from the statisticalnaturallanguage parserwere explored: all subtrees of some depth from any level; rooted subtrees of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the features into a lower dimensional space. Statistical experiments on these documents demonstrate that information from a statistical parser can, in fact, assist in distinguishing authors.
Rag-Fusion: A New Take on Retrieval Augmented Generationkevig
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots, but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal scores and fusing the documents and scores. Through manually evaluating answers on accuracy, relevance, and comprehensiveness, I found that RAG-Fusion was able to provide accurate and comprehensive answers due to the generated queries contextualizing the original query from various perspectives. However, some answers strayed off topic when the generated queries' relevance to the original query is insufficient. This research marks significant progress in artificial intelligence (AI) and natural language processing (NLP) applications and demonstrates transformations in a global and multi-industry context.
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...kevig
The common practice in Machine Learning research is to evaluate the top-performing models based on their performance. However, this often leads to overlooking other crucial aspects that should be given careful consideration. In some cases, the performance differences between various approaches may be insignificant, whereas factors like production costs, energy consumption, and carbon footprint should be taken into account. Large Language Models (LLMs) are widely used in academia and industry to address NLP problems. In this study, we present a comprehensive quantitative comparison between traditional approaches (SVM-based) and more recent approaches such as LLM (BERT family models) and generative models (GPT2 and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only performance parameters (standard indices), but also alternative measures such as timing, energy consumption and costs, which collectively contribute to the carbon footprint. To ensure a complete analysis, we separately considered the prototyping phase (which involves model selection through training-validation-test iterations) and the in-production phases. These phases follow distinct implementation procedures and require different resources. The results indicate that simpler algorithms often achieve performance levels similar to those of complex models (LLM and generative models), consuming much less energy and requiring fewer resources. These findings suggest that companies should consider additional considerations when choosing machine learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary for the scientific world to also begin to consider aspects of energy consumption in model evaluations, in order to be able to give real meaning to the results obtained using standard metrics (Precision, Recall, F1 and so on).
Evaluation of Medium-Sized Language Models in German and English Languagekevig
Large language models (LLMs) have garnered significant attention, but the definition of “large” lacks clarity. This paper focuses on medium-sized language models (MLMs), defined as having at least six billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot generative question answering, which requires models to provide elaborate answers without external document retrieval. The paper introduces an own test dataset and presents results from human evaluation. Results show that combining the best answers from different MLMs yielded an overall correct answer rate of 82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 71.8% and has 33B parameters, which highlights the importance of using appropriate training data for fine-tuning rather than solely relying on the number of parameters. More fine-grained feedback should be used to further improve the quality of answers. The open source community is quickly closing the gap to the best commercial models.
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONkevig
In task-oriented dialogue systems, the ability for users to effortlessly communicate with machines and
computers through natural language stands as a critical advancement. Central to these systems is the
dialogue manager, a pivotal component tasked with navigating the conversation to effectively meet user
goals by selecting the most appropriate response. Traditionally, the development of sophisticated dialogue
management has embraced a variety of methodologies, including rule-based systems, reinforcement
learning, and supervised learning, all aimed at optimizing response selection in light of user inputs. This
research casts a spotlight on the pivotal role of data quality in enhancing the performance of dialogue
managers. Through a detailed examination of prevalent errors within acclaimed datasets, such as
Multiwoz 2.1 and SGD, we introduce an innovative synthetic dialogue generator designed to control the
introduction of errors precisely. Our comprehensive analysis underscores the critical impact of dataset
imperfections, especially mislabeling, on the challenges inherent in refining dialogue management
processes.
Document Author Classification Using Parsed Language Structurekevig
Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the
text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used,
for example, to determine authorship of all of The Federalist Papers. Such methods may be useful in more modern
times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of
using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship
using grammatical structural information extracted using a statistical natural language parser. This paper provides a
proof of concept, testing author classification based on grammatical structure on a set of “proof texts,” The Federalist
Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted
of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the
features into a lower dimensional space. Statistical experiments on these documents demonstrate that information
from a statistical parser can, in fact, assist in distinguishing authors.
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONkevig
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product
information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots,
but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines
RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal
scores and fusing the documents and scores. Through manually evaluating answers on accuracy,
relevance, and comprehensiveness, I found that RAG-Fusion was able to provide accurate and
comprehensive answers due to the generated queries contextualizing the original query from various
perspectives. However, some answers strayed off topic when the generated queries' relevance to the
original query is insufficient. This research marks significant progress in artificial intelligence (AI) and
natural language processing (NLP) applications and demonstrates transformations in a global and multiindustry context
Performance, energy consumption and costs: a comparative analysis of automati...kevig
The common practice in Machine Learning research is to evaluate the top-performing models based on their
performance. However, this often leads to overlooking other crucial aspects that should be given careful
consideration. In some cases, the performance differences between various approaches may be insignificant, whereas factors like production costs, energy consumption, and carbon footprint should be taken into
account. Large Language Models (LLMs) are widely used in academia and industry to address NLP problems. In this study, we present a comprehensive quantitative comparison between traditional approaches
(SVM-based) and more recent approaches such as LLM (BERT family models) and generative models (GPT2 and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only performance
parameters (standard indices), but also alternative measures such as timing, energy consumption and costs,
which collectively contribute to the carbon footprint. To ensure a complete analysis, we separately considered the prototyping phase (which involves model selection through training-validation-test iterations) and
the in-production phases. These phases follow distinct implementation procedures and require different resources. The results indicate that simpler algorithms often achieve performance levels similar to those of
complex models (LLM and generative models), consuming much less energy and requiring fewer resources.
These findings suggest that companies should consider additional considerations when choosing machine
learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary for the scientific
world to also begin to consider aspects of energy consumption in model evaluations, in order to be able to
give real meaning to the results obtained using standard metrics (Precision, Recall, F1 and so on).
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEkevig
Large language models (LLMs) have garnered significant attention, but the definition of “large” lacks
clarity. This paper focuses on medium-sized language models (MLMs), defined as having at least six
billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot generative
question answering, which requires models to provide elaborate answers without external document
retrieval. The paper introduces an own test dataset and presents results from human evaluation. Results
show that combining the best answers from different MLMs yielded an overall correct answer rate of
82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 71.8% and has 33B
parameters, which highlights the importance of using appropriate training data for fine-tuning rather than
solely relying on the number of parameters. More fine-grained feedback should be used to further improve
the quality of answers. The open source community is quickly closing the gap to the best commercial
models.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Investigations of the Distributions of Phonemic Durations in Hindi and Dogri
1. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
DOI : 10.5121/ijnlc.2013.2103 17
INVESTIGATIONS OF THE DISTRIBUTIONS OF
PHONEMIC DURATIONS IN HINDI AND DOGRI
Padmini Rajput and Parveen Lehana*
Dept. of Physics and Electronics, University of Jammu, Jammu, India
*pklehanajournals@gmail.com
ABSTRACT
Speech generation is one of the most important areas of research in speech signal processing which is now
gaining a serious attention. Speech is a natural form of communication in all living things. Computers with
the ability to understand speech and speak with a human like voice are expected to contribute to the
development of more natural man-machine interface. However, in order to give those functions that are
even closer to those of human beings, we must learn more about the mechanisms by which speech is
produced and perceived, and develop speech information processing technologies that can generate a more
natural sounding systems. The so described field of stud, also called speech synthesis and more prominently
acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP
and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to
understand the basic theory of speech production. Every language has different phonetic alphabets and a
different set of possible phonemes and their combinations.
For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male
and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational
distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been
calculated. Investigations have shown that the two durational distributions differ significantly with respect
to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation
can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in
comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were
found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There
are many applications which are directly of indirectly related to the research being carried out. For
instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further
utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be
useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for
building large vocabulary speech recognition systems.
KEYWORDS
Natural Speech Production, Text to Speech System, Indian Languages, Speech Processing.
1. INTRODUCTION
Natural speech signal generation is one of the most amazing physical processes. Various
articulatory movements taking place as a result of brain’s motor signals consequently regulating a
dynamic vocal tract system with time varying excitations, in conjunction with the pulmonic
2. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
18
egressive emission of air from the lungs constitute this physical mechanism. The manner of
excitation and the shape of the vocal tract may be speaker and language dependent. Thus human
acoustic structure is a complicated sensory organization which generates a sequence of non-
stationary sound waves, interest in its composition and effectiveness stems mainly from a general
interest in the field of speech synthesis [1]. Speech signal is the most inherent kind of
communication which transmits wide range of information. Among them, the value of the
message being uttered is of primary importance; nevertheless, secondary information like the
speaker individuality also plays vital part in the oral swap over of communication [2].
Articulation is the result of brain’s activity of arranging thoughts into sequence of words. The
series of indistinguishable units that add up to make a sequence of words and hence a variety of
languages (according to the manner and context of utterances) are termed as phonemes. The
pronunciation of phonemes depends upon contextual effects, speaker characteristics and emotions
[3]. Human speech is dynamic rather than static, since the articulators keep moving during
articulation this fact leads to an assumption that we begin to articulate the next segment before
completing the previous one that is events are all set before they occur [4]. A human can replicate
the sound of an utterance by merely visualizing the mouth movements, even if he is not aware of
that particular language. This exceptional attribute of speech has constrained researchers to think
of speech as the fastest and efficient method of interaction between human.
Speech signal processing has many efficient and intelligent applications, like speech recognition,
speaker transformation and text-to-speech (TTS) systems, which are continuously gaining a
serious attention since many years; however it is not a trouble-free task and requires that the
machine should have the adequate intelligence to recognize human voices [5]. TTS systems
convert arbitrary text into spoken waveform; it generally employs the processing of the text
followed by speech generation. The main reason behind the improvement in text-to-speech
synthesizers is the requirement of a natural sounding machine [6] [7]. Research on Indian
languages has been used for developing Text-To-Speech synthesis systems for only a few Indian
languages like Hindi, Tamil, Kannad, Marathi, and Bangla.
The objective of this research is to investigate the distribution of phonemic durations in
Hindi and Dogri subsequently; examining the pitch, amplitude, spectrograms, formants,
and bandwidths of speech waveforms. Section 2 throws light on the Indian language scripts;
Section 3 describes the complete mechanism of natural speech production and types of speech
signal. The knowledge about the basic characteristics and production mechanism involved for
speech signal generation is employed for the development machines producing most natural
sounding voices. Section 4 throws light on some of the models that are developed for speech
production various models are described and compare in this section. In section 5 illustrates to a
larger extent the methodology employed for the investigations. The last two sections; section 6
and section 7 present the results and conclusions of the analysis.
2. INDIAN LANGUAGE SCRIPTS
There is a wide-ranging linguistic homogeneity of Indian languages at the micro-level, however if
the nation is regarded as one entity India is a linguistically diverse country with 22 official
languages [8]. Language technologies play a vital role in multilingual society like India which has
about 1652 dialects/native tongues. Languages of India belong to numeral racial groups like
Negroids, Austrics, Mongoloids, Caucasoid Dravidians and Caucasoid Aryans but it is the last
3. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
19
two families that preside over the country. According to 2001 census, Dravidian languages are
spoken by 24% of Indians and 74% of Indians speak Indo Aryan languages. The Dravidian
languages are the languages of south India whereas Sanskrit based Aryan languages are from the
north yet both have acquired their scripts from a common foundation. Marathi, Tamil and Bengali
are extensively spread in outer areas, whereas, Assamese and Kashmiri have flourished in
linguistic seclusion. Punjabi and Sindhi were dwarfed because of historic-geographical factors.
Sindhi does not have any core in India. The other languages in India belong to Austro- Asiatic
and Tibeto- Burman [9]. Hindi and Urdu are nationwide extended; still their use in the southern
plain domain is insignificant. There are some peculiar characteristics in each language. Indian
languages have a more sophisticated notation of a character unit or akshara that forms the
fundamental linguistic unit. There are 10-12 major scripts in India. Indian languages have been
derived from the prehistoric Brahmi script. Extensive use of reduplication is a particular
characteristic of Indian Languages. Hindi and Dogri are one of the 22 official languages of India.
Hindi is an Indo- European language of the indo Aryan subfamily. Hindi is one of the prevalent
languages of India after English and Mandarin, spoken in the major regions like Himachal
Pradesh, Uttar Pradesh, Rajasthan, Bihar, Haryana and Chattisgarh [8]. A distinctive character in
Indian languages scripts are near to syllable and can be characteristically of the form: C, V, CV,
VCV, CVC, and CCV, where C is for a consonant and V is for a vowel [10].Typical Hindi,
casually spoken by people is called Manak Hindi, High Hindi, Nagari Hindi and Literary Hindi; it
is derived from the Khariboli dialect of Delhi and belongs to the Devnagri script while Dogri has
its own script namely Doger. Hindi and Dogri are closely related languages having their roots in
Sanskrit, and belonging to the same subgroup of Indo-European family. Indian census (2001)
states that about 258 million people in India are Hindi speakers, while the number of Dogri
speakers is far less than that of Hindi speakers and counts to about 5 million. Mostly people of
north India are familiar with Dogri; these regions include Jammu, parts of Kashmir, Himachal,
and northern Punjab. This language is essentially written in Takri Script, strongly linked to
Sharada script that comprises Kashmiri and Gurmukhi script for Punjabi. Dogri speakers are often
called Dogras. Dogri was given the honor of national language on 22nd December, 2003. Dogri
and Hindi are rather alike with diverse phonologies, and mode of pronunciation [8]. Text-To-
Speech synthesis systems have also been developed for few Indian languages like Hindi, Tamil,
Kannad, Marathi, and Bangla [11].
3. NATURAL SPEECH PRODUCTION MECHANISM
Speech generation is the most intrinsic phenomenon which starts with the creation a message in
brain and ends up with the production of an utterance from the oral cavity. The production of
speech sounds requires the integration of various information sources in order to generate the
complex patterns of muscle activations required for fluency. The natural phenomenon of speech
has always been an object of both general curiosity and scientific inquisition. We use speech more
or less unconsciously but hardly a few people have the idea pertaining to this innate blessing. A
study pertaining to human evolution revealed that human skull base, which was formerly located
in the upper position gradually inclined simultaneously with the descent of the larynx. This
research led to the conclusion that the time of speech acquisition and the origin of speech can be
estimated by the inclination angle of the base of the skull. This theory is supported by taking the
unique case of the development of human children that the enhancement in the inclination of the
skull base is very much connected to the maturation of the speech organs [12]. Mechanism of
speech in human beings has developed over several years yielding a vocal structure that is
proficient in terms of spoken swap over of communication [13].
4. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
20
The mechanism of natural speech encompasses four processes: Language processing, in which the
content of the utterance is converted into phonemic symbols in the brain’s language centre;
generation of motor commands in the brain’s motor center to the vocal organs; initiation of
articulatory movements for the production of speech by the vocal organs based on these motor
commands; and the emission of air sent from the lungs resulting in a speech signal that we hear
[14]. This whole phenomenon can be visualized as a chain mechanism passing through various
levels like linguistic level, physiological level, acoustic level, and at the last linguistic level. The
vocal folds change the signal originating from any source or from the vocal chord itself into
intelligible speech [15] [16].
Research has shown that vocal folds in case of males are usually longer than that in females,
causing a lower pitch and a deeper voice. The male vocal folds are between 17.5 mm and 25 mm
(approx 0.75 to 1.0”) in length. This difference in the size of vocal chords causes a difference in
vocal pitch. The female vocal folds are between 12.5 mm and 17.55 (approx0.5” to 0.75”) in
length. There is a gap between the vocal folds which is called glottis, and the production from this
place is often called glottis source. The air passing through the vocal folds, when forming a
narrow opening, makes them to vibrate, producing a periodic sound. The rate of vibration of the
vocal folds is called as the fundamental frequency F0. From the point of view of F0 larynx is the
most important vocal organ. Fundamental frequency is between 80 to 250 Hz for a male speakers
since a male can vibrate his vocal folds in between 80 to 250 times per second in comparison to
this a female has F0 that is between 120 to 400 Hz [17]. The term pitch refers to the rate of
vibration that is perceived by the listener. Figure 1 depicts various articulators employed in
natural speech production. Above the larynx is the human pharynx situated behind the mouth,
which divides into two parts one entering the mouth region and the other entering inside the nasal
region [18].
Speech sound which is originated when the middle part of the tongue called dorsum touches the
soft plate called velar consonant. While speaking, the velum is raised so as to block the air from
flowing towards the nasal region, while letting it flow out of the mouth. Like the soft plate there is
one more plate next to it, the hard plate above the tongue, often termed as the roof of the mouth.
Figure 1. Natural speech production process [18].
5. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
21
Between the teeth and the hard plate is the alveolar ridge, sound made with the tongue touching
this region is named alveolar. The motion of the vocal folds decides whether the sound produced
will be voiced or unvoiced [15].
Figure 2. Shapes of mouth while articulation of vowels [15].
The voiced sounds are the result of the vibrations (depending upon the mass and tension of the
cords) of the vocal chords, caused by the rapid opening and closing of the vocal folds. All vowels
and some of the consonants are voiced sounds. There is also a category of sound called unvoiced
sound in which there is no periodic component in the resultant speech that is a result of the
condition in which the sound is produced without the vibration of the glottis. Different shapes
made by lips produce different sounds. In addition to these articulators there are three more things
to be taken into consideration, the larynx, the lower jaws and the nasal cavity, these three
elements cannot be termed as articulators since they cannot make contact with other articulators,
but they, no doubt have a very important role in natural sound production process. Various
anatomical articulators work in a synchronized manner, these vocal organs collectively make an
effort for sound generation, like the slight variation in the movement of tongue, can directly affect
the variation in the sound produced [19].
4. MODELS FOR SPEECH SYNTHESIS
Brain process underlying the formation of spoken language involves auditor encoding, prosodic
analysis, and linguistic evaluation [20]. Artificial speech production can be produced by two
ways, model based method and waveform based method [21]. For waveform based methods, a set
of pre-recorded statements from the database for the desired speech. Model based methods
comprise source filter model (synthesis by rule), direction into velocity of articulators model
(DIVA), and Lavelt’ model. Model based techniques make use of the natural model of speech
generation in human beings. Unit selection, concatenative, linear predictive coding (LPC), and
formant synthesis, and harmonic plus noise model (HNM) techniques are used in waveform based
methods [16] [22] [23].
Source-Filter synthesis also called formant synthesis; makes use of the spectral shaping of driving
excitation. This model describes the speech signal as an excitation from the vocal tract by a cyclic
source. Fant’s and Ungeheuer’s theories built two assumptions about the configuration and
procedure concerned in the speech production. First, both the theories pay no attention to the
6. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
22
concept that the sub-glottal cavity is acoustically coupled with supra-glottal cavities for the
duration of articulated phonation with the measure of coupling that varies during each cycle of the
vocal chord vibration, ranging from a complete de-coupling for the duration of glottal closure,
and a gradual raise and decline during the open glottis state. Second, the theories equally assume
that the contributions of vocal source and the vocal tract filter can be separated for each other,
with the filter having no back-coupling effect on the source. Source-filter model speech
synthesizers employing LPC also take account of these assumptions [18].
DIVA model uses the sensory and motor biochemical activities of the brain, and works on the
hypothesis for generating high quality acoustic speech, based on what is perceived to drive what
is produced. The technique utilizes the sensory activities since for the period of speech production
a great section of the cerebral cortex appears into action [16], and motor biochemical activities of
the brain, since during speech production a large portion of the cerebral cortex appears into action
[44]. Out of all the models created for finest synthetic speech the most extensively used model for
L2 speech production is Lavelt’s model originally developed for monolingual communication,
based on encoding the thought into words. Levelt’s model is the most current model. The
Direction into Velocity of Articulators Model (DIVA) works on this assumption for generating
exceptional acoustic speech [21].
Present-day model used for speech production is the Levelt’s model. Lavelt predicted speech
generation as a modular approach, illustrating it as an organization of models in the systems
which are self-governing. He approximated two most important mechanisms:
rhetorical/semantic/syntactic system and the phonological/phonetic system [20].
Under concatenative speech synthesis we have unit selection and diphone synthesis. Unit
selection synthesis utilizes richer variety of speech which simply cuts out speech and rearranges
it. In order to synthesize speech which is more varied in voice characteristics it makes use of a
large database of pre-recorded speech, which increases the cost and difficulty. The diphone
synthesis makes use of the notion that a little portion of the acoustic signal varies to minor
amount, and is also less subjective by the phonetic context than others. The quality of the sound
positions somewhat in between that obtained form concatenative and formant synthesis
techniques but suffers from glitches. Concatenative synthesis approaches are extensively used by
many systems, in which stored speech unit waveforms can be blended together to generate new
speech. This is the simplest technique providing high quality and naturalness. [16] [19].
Linear predictive coding (LPC) is amongst the most powerful and latest synthesis techniques
operated in signal processing for the expression of the spectral envelope of speech in compact
form taking into concern the information required in the linear predictive model. It’s an important
technique for the accurate, economical measurement of speech parameters like pitch, formants
spectra, and vocal tract area functions and for the representation of speech for low rate
transmission or storage [15]. The formant synthesis also called rule-based synthesis is based on
source-filter model. Formants F0, F1 and F3 are required for the production of a bit good
sounding voice while the computation of up to first five formants is necessary for high quality
intelligible sound production. Formant synthesis is based on source-filter model and so it has a
source for speech signal and a filter for representing the resonance of the vocal tract, a two pole
model resonator represents the formant frequencies and bandwidths. F0, F1 and F3 are required
for the construction of a bit good sounding voice though the calculation of up to first five
formants is necessary for finest precision of sound [24]. Harmonic plus noise model (HNM) has
7. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
23
reduced database and provides a direct technique for smoothing discontinuities of acoustic units
around concatenation points which makes this method quite efficient. HNM is a pitch-
synchronous system [21], unlike TD-PSOLA and other concatenative approaches, hence
eliminating the problem of synchronization of speech frames and hence shows the capabilities of
providing high-quality prosodic modifications without buzziness compared to other methods.
HNM framework is also used in a low bit rate speech coder to increase naturalness. Research has
shown that all vowels and syllables can be produced with a better quality syllables by the
implementation of HNM. Results obtained from many speech signals including both male and
female voices are quite satisfactory with respect to the background noise and inaccuracies in the
pitch.
The reason behind the advancement of the new techniques and acceptance of improved models
for speech generation is the result of need for more improved, human like utterances by a
machine. Previous models produced mechanically sounding voices that are irritating to the
listener, progression in models and systems improved the quality of synthetic speech greater than
before, the quality of the speech generated depends on the type of model employed in the speech
synthesizer however still there is not any perfect model that can generate a voice exactly like that
of a human, yet the models that are discussed above have contributed their part in the generation
of more natural sounding voices.
b
5. METHODOLOGY
The complete procedure for the analysis of the speech signal has been divided into two parts. For
the analysis process the recording of 13 speakers in Dogri (3 male and 2 females) and Hindi (4
male and 4 female) language were taken. Speakers of different age group, from different regions
of Jammu have been taken. The data for recording comprises 110 TIFR (Tata Institute of
Fundamental Research) sentences in Hindi, and a sequence of words having all the 35 consonants
in between, named VCV (vowel consonant vowel). The same data was used for recording in
Dogri. After recordings in both languages the speech was segmented manually into vowels. For
estimating the means and standard deviations of each vowel, ten instances were selected from
similar contexts. Also the variation in duration of all the, vowels is calculated by taking the
percentage of the value obtained by dividing the standard deviation with the mean of the ten
vowels
6. RESULTS AND CONCLUSIONS
The averaged mean and standard deviations and the variation in duration of all the ten vowels (v1:
/ah/, v2: /aa/, v3: /ih/, v4: /iy/, v5: /ey/, v6: /ae/ v7: /uh/, v8: /uw/, v9: /uh/ and v10: /ao/) for
each speaker in both languages are shown in the Tables 1 and Table 2 and plotted in Fig. 3 to Fig.
5 as histograms for ease of visual interpretation. For the investigation of vowels, mean of ten
instances was calculated for each speaker, and this resulting value for all the vowels was plotted.
Figure 6.1 shows the average duration of vowels uttered by the first speaker sp1. The plot shows
that v3 and v7 is of minimum duration of 40 ms, while v6 has the maximum duration of about
120ms, v1 and v9 and v2 and v10 have almost the same length. The height of the average bars
shows the length of the utterances for a vowel. Similarly we can concude the results from the
other histograms and the whole investigation can be concluded with the end result that most of the
Dogri vowels have shorter duration, in comparison to Hindi phonemes. The period in milli
8. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
24
seconds of same vowel when spoken in Hindi was found to be of longer duration compared to
when it was spoken by a person with Dogri as his mother tongue. The language we speak depends
upon the region we belong to. Only Dogri is spoken in most parts of Jammu but there happens to
be a change in tone, pronunciation, and utterances of phonemes with the change in region which
may be the reason for different duration of the same phonemes and hence the results may differ
for other regions of Jammu. There are many applications which are directly of indirectly related
to the research carried out in this thesis. For instance the main application may be for
transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can
develop a speech aid to teach Dogri to children.
Table 1. Mean of mean of the duration of all vowels spoken by all the speakers of
Dogri and Hindi language
S No. v1 v2 v3 v4
sp1 42.8 83.04 44.88 64.83
sp2 44.81 17.24 54.59 83.06
sp3 45.32 83.12 49.81 86.88
sp4 42.54 55.58 45.49 76.73
sp5 40.07 74.47 51.58 81.51
Mean(ms) 43.11 62.69 49.27 78.60
Sd(ms) 2.09 27.78 4.11 8.52
Variation(ms) 4.84 44.31 8.34 10.83
a) Mean of mean of the duration of all vowels spoken by all the
speakers of Dogri language
S No. v1 v2 v3 v4
sp1 67.62 85.27 41.31 88.85
sp2 43.53 77.64 48.59 97.65
sp3 45.78 91.62 62.65 86.64
sp4 48.79 87.79 56.82 89.16
sp5 52.93 95.31 44.95 74.5
sp6 47.77 70.76 39.68 56.24
sp7 34.09 60.22 45.61 61.3
sp8 40.88 77.66 46.72 96.8
Mean(ms) 47.67 80.78 48.29 81.39
Sd(ms) 9.84 11.59 7.77 15.72
Variation(ms) 20.65 14.35 16.09 19.31
b) Mean of mean of the duration of all vowels spoken by all the
speakers of Hindi language.
9. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
25
Table 2. Mean of mean of the duration of all vowels spoken by all the speakers of
Dogri and Hindi language
S No. v5 v6 v7 v8
sp1 54.53 19.91 65.77 90.43
sp2 74.59 111.06 82.47 84.75
sp3 73.98 99.91 68.07 93.68
sp4 65.32 71.57 59.69 80.84
sp5 57.13 86.46 61.74 86.58
Mean(ms) 65.11 77.782 67.548 87.26
Sd(ms) 9.27 35.56 8.96 4.98
Variation(ms) 14.24 45.72 13.27 5.70
a) Mean of mean of the duration of all vowels spoken by all the
speakers of Dogri language.
S No. v5 v6 v7 v8
sp1 111.32 126.8 41.85 116.15
sp2 79.62 109.63 62.58 93.72
sp3 96.37 123.08 57.79 110.82
sp4 101.47 108.54 48.9 72.18
sp5 74.91 97.13 60.77 106.7
sp6 81.91 82.6 47.07 93.96
sp7 64.71 73.83 47.65 98.35
sp8 81.43 101.49 56.76 83.74
Mean(ms) 86.47 102.89 52.92 96.95
Sd(ms) 15.31 18.31 7.50 14.47
Variation(ms) 17.71 17.80 14.18 14.92
b) Mean of mean of the duration of all vowels spoken by all the
speakers of Hindi language.
10. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
26
Table 3 Mean of mean of the duration of all vowels spoken by all the speakers of Dogri
and Hindi language.
S No. v9 v10
sp1 69.12 83.75
sp2 80.19 92.59
sp3 72.12 98.07
sp4 68.37 87.33
sp5 64.35 78.27
Mean(ms) 70.83 88.00
Sd(ms) 5.92 7.67
Variation(ms) 8.36 8.72
a) Mean of mean of the duration of all vowels spoken by all the speakers of
Dogri language.
S No. v9 v10
sp1 67.62 103.45
sp2 93.74 93.45
sp3 88.65 96.06
sp4 96.44 96.45
sp5 85.5 97.49
sp6 65.56 78.01
sp7 67.59 81.18
sp8 86.41 98.94
Mean(ms) 81.44 93.13
Sd(ms) 12.56 8.87
Variation(ms) 15.42 9.52
b) Mean of mean of the duration of all vowels spoken by all the speakers of
Hindi language.
11. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
27
v1
v2
v3
v4
Figure 3. Comparisons of mean and standard deviation for duration of first vowels of Hindi
and Dogri language articulated by respective speakers.
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
12. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
28
v5
v6
v7
v8
Figure 4. Comparisons of mean and standard deviation for duration ofvowels of Hindi and Dogri
language articulated by respective speakers.
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
13. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
29
REFERENCES
[1] Yin Hui, Hohmann Volker, & Nadeu Climent (2011) “Acoustic feature for speech recognition based
on Gammatone filterbank and instantaneous frequency”, ELSEVIER Speech Communication 53, pp
707-715.
[2] Stylianou Yannis, Cappe Oliver, & Moulines Eric (1998) “Continuous probabilistic transform for
voice conversion”, IEEE Trans. Speech and Audio Processing, Vol. 6, No. 2, pp 131-142.
[3] Chaudhari U V, Navaratil J & Maes S H (2003) “Multigrained modeling with pattern specific
maximum likelihood transformations for text-independent speaker recognition”, IEEE Trans. on
Speech and Audio Processing, Vol. 11, No. 1, pp 61-69.
[4] Alani Ahmed, & Deriche Mohamed (1999) “A novel approach to speech segmentation using the
wavelet transform”, Fifth International Symposiumon Signal Processing and its Applications
(ISPRA), pp 127-130.
[5] Moataz El Ayadi, Mohamed S. Kamel, & Fakhri Karray (2011) “Survey on speech emotion
recognition: Features, classification schemes, and database”, ELSEVIER Pattern
Recognition, pp 572-587.
[6] Park J, Delhi F, Gales M J F, Tomalin M & Woodland P C (2011) “The efficient incorporation of MLP
features into automatic speech recognition system”, ELSEVIER, Computer Speech and Languages,
pp 519-534.
[7] Lu X, Unoki M & Nakamura S (2011) “Sub-band temporal modulation envelopes and their
normalization for automatic speech recognition in reverberant environments”, ELSVIER Computer
Speech and Language, pp 571-584.
v9
v10
Figure 5. Comparisons of mean and standard deviation for duration ofvowels of Hindi and
Dogri language articulated by respective speakers.
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8
0
100
200
300
400
sp1 sp2 sp3 sp4 sp5
14. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
30
[8] Dubey Preeti, Pathania Shashi & Devanand (2011) “Comparative study of Hindi and Dogri languages
with regard to machine translation,” language in India, Vol. 11, pp 298-309 .
[9] Dutt Ashok K, Khan Chandrakanta C, & Sangwan Chanralekha (1985) “Spatial pattren of languages
in India: A culture-historical analysis”, International Gournal of Geography, GeoJournal, Vol 10,
pp 51- 74.
[10] Kishore P.S, Kumar Rohit, & Sangal Rajeev (2002) “A data-driven synthesis approach for Indian
languages using syllable as basic unit”, in Conference of Natural Language Processing.
[11] Rao Sreenivasa K. (2011) “Application of prosody models for developing speech systems in Indian
languages”, International Journal of Speech Technology” Vol. 14, No. 1, pp 19-33.
[12] Flanagan J. L (1972) “ Speech Analysis, Synthetic and Perception”, Springer, New York.
[13] Dudley Homer (1940) “The carrier nature of speech”, The Bell Syst. Tech. Journal. Vol. 9, No.4, pp
495- 515.
[14] Dudley Homer, Tarnozy T.H. (1950) “The speaking machine of Wolfgang von kempelen”, The Joural
of the Acoustic Society of America, Vol. 22, No. 2, pp151-166.
[15] Al-Akaidi Marwan, (2004) “Fractal Speech Processing”, The Press Syndicate of University of
Cambridge, pp 3-4.
[16] Denes P. & Pinson E. (1963) “The Speech Chain”, Bell Telephone labs, Murray Hill, New Jersey.
[17] Rabiner L R & Schafer R W (1978) “ Digital processing of speech signals,” Prentice-Hall Inc.,
Englewood Cliffs. New Jersey
[18] Furui S. & Sondhi M.M. (1992) “Advance in speech Signal Processig”, Marcel Dekker, New York.
[19] Gold Ben & Morgan N. (2000) “Speech and Audio Signal Processing”, Willey, New York.
[20] Herrmann C S, Friederici A D, Oertel U, Maess B, Hahne A & Alter K (2003) “ The brain generates
its own sentence melody: A Gestalt phenomenon in speech perception,” ELSEVIER Brain and
Language Vol. 85, pp 396–401.
[21] Honda Masaaki (2003) “Human speech production mechanisms,” NIT Technical Review, Vol. 1, No.
2, pp 24-26.
[22] Taylor Paul (2009) “Text to speech synthesis,” Cambridge University Press, pp 147-1492.
[23] Desai S, Yegnanarayana B & K Prahallad (2003) “A framework for cross-lingual voice conversion
using artificial neural networks," in Proc.of International Conference on Natural Language Processing
(ICON).
[24] Herrmann C S, Friederici A D, Oertel U, Maess B, Hahne A & Alter K (2003) “ The brain generates
its own sentence melody: A Gestalt phenomenon in speech perception,” ELSEVIER Brain and
Language Vol. 85, pp 396–401.
[25] Alam Firoj, Nath Promila Kanti, & Khan Mumit (2007) “Text To Speech for Bangla Language using
Festival”, BRAC University Institutional Repository, pp 128-133.