The document describes the implementation of a natural sounding speech synthesizer for the Marathi language using English text input. It discusses concatenative speech synthesis using a unit selection approach. Over 28,580 syllables, words and sentences recorded from a female speaker were used to create an inventory of speech units. The synthesizer was tested and able to generate natural sounding output and waveforms. Formant frequencies were analyzed using MATLAB and PRAAT tools to evaluate the quality of the synthesized speech.
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
Marathi is one of the oldest languages in India. This research paper describes the development of Marathi Textto-
Speech System (TTS). In Marathi TTS the input is Marathi text in Unicode. The voices are sampled from real
recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding
spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech
from any text input to imitate human speakers. Text processing and speech generation are two main components
of a text to speech system. To build a natural sounding speech synthesis system, it is essential that text
processing component produce an appropriate sequence of phonemic units. Generation of sequence of phonetic
units for a given standard word is referred to as letter to phoneme rule or text to phoneme rule. The
complexity of these rules and their derivation depends upon the nature of the language. The quality of a speech
synthesizer is judged by its closeness to the natural human voice and understandability. In this research paper we
described an approach to build a Marathi TTS system using concatenative synthesis method with syllable as a
basic unit of concatenation.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEijnlc
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and is monosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language
Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech
(POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
Speech is the important mode of communication and is the current research
topic. The concentration is mostly focused on synthesis and analyzing part.Apart of
synthesizing, text to speech system is developed.Speech synthesis is an artificial production
of human speech.A text to speech system (TTS) is to convert an arbitrary text into speech.In
India different languages have been spoken each being the mother tongue of tens of millions
of people.In this paper,the text to speech system is primarily developed for Telugu, a
Dravidian language predominantly spoken in Indian state of Andhra Pradesh.The
important qualities expected from this system are naturalness and intelligibility.Telugu TTS
can be developed using other synthesis methods like articulatory synthesis,formant synthesis
and concatenative synthesis.This paper describes a development of a Telugu text to speech
system using concatenative synthesis method on mobile based system OMAP 3530 (ARM
Cortex A-8 core) in Linux.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEkevig
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and ismonosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological
Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech (POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
A Corpus-Based Concatenative Speech Synthesis System for Marathiiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
Marathi Text-To-Speech Synthesis using Natural Language Processingiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
Marathi is one of the oldest languages in India. This research paper describes the development of Marathi Textto-
Speech System (TTS). In Marathi TTS the input is Marathi text in Unicode. The voices are sampled from real
recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding
spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech
from any text input to imitate human speakers. Text processing and speech generation are two main components
of a text to speech system. To build a natural sounding speech synthesis system, it is essential that text
processing component produce an appropriate sequence of phonemic units. Generation of sequence of phonetic
units for a given standard word is referred to as letter to phoneme rule or text to phoneme rule. The
complexity of these rules and their derivation depends upon the nature of the language. The quality of a speech
synthesizer is judged by its closeness to the natural human voice and understandability. In this research paper we
described an approach to build a Marathi TTS system using concatenative synthesis method with syllable as a
basic unit of concatenation.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEijnlc
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and is monosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language
Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech
(POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
Speech is the important mode of communication and is the current research
topic. The concentration is mostly focused on synthesis and analyzing part.Apart of
synthesizing, text to speech system is developed.Speech synthesis is an artificial production
of human speech.A text to speech system (TTS) is to convert an arbitrary text into speech.In
India different languages have been spoken each being the mother tongue of tens of millions
of people.In this paper,the text to speech system is primarily developed for Telugu, a
Dravidian language predominantly spoken in Indian state of Andhra Pradesh.The
important qualities expected from this system are naturalness and intelligibility.Telugu TTS
can be developed using other synthesis methods like articulatory synthesis,formant synthesis
and concatenative synthesis.This paper describes a development of a Telugu text to speech
system using concatenative synthesis method on mobile based system OMAP 3530 (ARM
Cortex A-8 core) in Linux.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEkevig
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and ismonosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological
Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech (POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
A Corpus-Based Concatenative Speech Synthesis System for Marathiiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
Marathi Text-To-Speech Synthesis using Natural Language Processingiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Transliteration by orthography or phonology for hindi and marathi to english ...ijnlc
e-Governance and Web based online commercial multilingual applications has given utmost importance to
the task of translation and transliteration. The Named Entities and Technical Terms occur in the source
language of translation are called out of vocabulary words as they are not available in the multilingual
corpus or dictionary used to support translation process. These Named Entities and Technical Terms need
to be transliterated from source language to target language without losing their phonetic properties. The
fundamental problem in India is that there is no set of rules available to write the spellings in English for
Indian languages according to the linguistics. People are writing different spellings for the same name at
different places. This fact certainly affects the Top-1 accuracy of the transliteration and in turn the
translation process. Major issue noticed by us is the transliteration of named entities consisting three
syllables or three phonetic units in Hindi and Marathi languages where people use mixed approach to
write the spelling either by orthographical approach or by phonological approach. In this paper authors
have provided their opinion through experimentation about appropriateness of either approach.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
Abstract: In speech synthesis in text to speech systems, the words usually break to different parts and use from recorded sound of each part for play words. This paper use silent in word's pronunciation for better quality of speech. Most algorithms divide words to syllable and some of them divide words to phoneme, but This paper benefit from silent in intonation and divide words at silent region and then set equivalent sound of each parts whereupon joining the parts is trusty and speech quality being more smooth . this paper concern Persian language but extendable to another language. This method has been tested with MOS test and intelligibility, naturalness and fluidity are better.
Keywords:TTS, SBS, Sillable, Diphone.
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...IJERA Editor
This research paper presents the approach towards converting text to speech using new methodology. The text to
speech conversion system enables user to enter text in Marathi and as output it gets sound. The paper presents
the steps followed for converting text to speech for Marathi language and the algorithm used for it. The focus of
this paper is based on the tokenisation process and the orthographic representation of the text that shows the
mapping of letter to sound using the description of language’s phonetics. Here the main focus is on the text to
IPA transcription concept. It is in fact, a system that translates text to IPA transcription which is the primary
stage for text to speech conversion. The whole procedure for converting text to speech involves a great deal of
time as it’s not an easy task and requires efforts.
Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
Abstract:The interest in text to speech synthesis increased in the world .text to speech have been developed for many popular languages such as English, Spanish and French and many researches and developments have been applied to those languages. Persian on the other hand, has been given little attention compared to other languages of similar importance and the research in Persian is still in its infancy. Persian languages possess many difficulty and exceptions that increase complexity of text to speech systems. For example: short vowels is absent in written text or existence of homograph words. in this paper we propose a new method for Persian text to phonetic that base on pronunciations by analogy in words, semantic relations and grammatical rules for finding proper phonetic.Keywords:PbA, text to speech, Persian language, Phonetic recognition.
Title:Phonetic Recognition In Words For Persian Text To Speech Systems
Author:Ahmad Musavi Nasab, Ali Joharpour
International Journal of Recent Research in Mathematics Computer Science and Information Technology (IJRRMCSIT)
Paper Publications
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
Machine translation is being carried out by the researchers from quite a long time. However, it is still a
dream to materialize flawless Machine Translator and the small numbers of researchers has focussed at
translating Marathi Text to English. Perfect Machine Translation Systems have not yet been fully built
owing to the fact that languages differ syntactically as well as morphologically. Majority of the researchers
have opted for Statistical Machine translation whereas in this paper we have addressed the challenges of
Rule based Machine Translation. The paper describes the major divergences observed in language
Marathi and English and many challenges encountered while attempting to build machine translation
system form Marathi to English using rule based approach and rules to handle these challenges. As there
are exceptions to the rules and limit to the feasibility of maintaining knowledgebase, the practical machine
translation from Marathi to English is a complex task.
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...iosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Transliteration by orthography or phonology for hindi and marathi to english ...ijnlc
e-Governance and Web based online commercial multilingual applications has given utmost importance to
the task of translation and transliteration. The Named Entities and Technical Terms occur in the source
language of translation are called out of vocabulary words as they are not available in the multilingual
corpus or dictionary used to support translation process. These Named Entities and Technical Terms need
to be transliterated from source language to target language without losing their phonetic properties. The
fundamental problem in India is that there is no set of rules available to write the spellings in English for
Indian languages according to the linguistics. People are writing different spellings for the same name at
different places. This fact certainly affects the Top-1 accuracy of the transliteration and in turn the
translation process. Major issue noticed by us is the transliteration of named entities consisting three
syllables or three phonetic units in Hindi and Marathi languages where people use mixed approach to
write the spelling either by orthographical approach or by phonological approach. In this paper authors
have provided their opinion through experimentation about appropriateness of either approach.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
Abstract: In speech synthesis in text to speech systems, the words usually break to different parts and use from recorded sound of each part for play words. This paper use silent in word's pronunciation for better quality of speech. Most algorithms divide words to syllable and some of them divide words to phoneme, but This paper benefit from silent in intonation and divide words at silent region and then set equivalent sound of each parts whereupon joining the parts is trusty and speech quality being more smooth . this paper concern Persian language but extendable to another language. This method has been tested with MOS test and intelligibility, naturalness and fluidity are better.
Keywords:TTS, SBS, Sillable, Diphone.
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...IJERA Editor
This research paper presents the approach towards converting text to speech using new methodology. The text to
speech conversion system enables user to enter text in Marathi and as output it gets sound. The paper presents
the steps followed for converting text to speech for Marathi language and the algorithm used for it. The focus of
this paper is based on the tokenisation process and the orthographic representation of the text that shows the
mapping of letter to sound using the description of language’s phonetics. Here the main focus is on the text to
IPA transcription concept. It is in fact, a system that translates text to IPA transcription which is the primary
stage for text to speech conversion. The whole procedure for converting text to speech involves a great deal of
time as it’s not an easy task and requires efforts.
Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
Abstract:The interest in text to speech synthesis increased in the world .text to speech have been developed for many popular languages such as English, Spanish and French and many researches and developments have been applied to those languages. Persian on the other hand, has been given little attention compared to other languages of similar importance and the research in Persian is still in its infancy. Persian languages possess many difficulty and exceptions that increase complexity of text to speech systems. For example: short vowels is absent in written text or existence of homograph words. in this paper we propose a new method for Persian text to phonetic that base on pronunciations by analogy in words, semantic relations and grammatical rules for finding proper phonetic.Keywords:PbA, text to speech, Persian language, Phonetic recognition.
Title:Phonetic Recognition In Words For Persian Text To Speech Systems
Author:Ahmad Musavi Nasab, Ali Joharpour
International Journal of Recent Research in Mathematics Computer Science and Information Technology (IJRRMCSIT)
Paper Publications
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
Machine translation is being carried out by the researchers from quite a long time. However, it is still a
dream to materialize flawless Machine Translator and the small numbers of researchers has focussed at
translating Marathi Text to English. Perfect Machine Translation Systems have not yet been fully built
owing to the fact that languages differ syntactically as well as morphologically. Majority of the researchers
have opted for Statistical Machine translation whereas in this paper we have addressed the challenges of
Rule based Machine Translation. The paper describes the major divergences observed in language
Marathi and English and many challenges encountered while attempting to build machine translation
system form Marathi to English using rule based approach and rules to handle these challenges. As there
are exceptions to the rules and limit to the feasibility of maintaining knowledgebase, the practical machine
translation from Marathi to English is a complex task.
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...iosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
A Context-based Numeral Reading Technique for Text to Speech Systems IJECEIAES
This paper presents a novel technique for context based numeral reading in Indian language text to speech systems. The model uses a set of rules to determine the context of the numeral pronunciation and is being integrated with the waveform concatenation technique to produce speech out of the input text in Indian languages. For this purpose, the three Indian languages Odia, Hindi and Bengali are considered. To analyze the performance of the proposed technique, a set of experiments are performed considering different context of numeral pronunciations and the results are compared with existing syllable-based technique. The results obtained from different experiments shows the effectiveness of the proposed technique in producing intelligible speech out of the entered text utterances compared to the existing technique even with very less storage and execution time.
An expert system for automatic reading of a text written in standard arabicijnlc
In this work we present our expert system of Automatic reading or speech synthesis based on a text
written in Standard Arabic, our work is carried out in two great stages: the creation of the sound data
base, and the transformation of the written text into speech (Text To Speech TTS). This transformation is
done firstly by a Phonetic Orthographical Transcription (POT) of any written Standard Arabic text with
the aim of transforming it into his corresponding phonetics sequence, and secondly by the generation of
the voice signal which corresponds to the chain transcribed. We spread out the different of conception of
the system, as well as the results obtained compared to others works studied to realize TTS based on
Standard Arabic.
A Marathi Hidden-Markov Model Based Speech Synthesis Systemiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
We describe in detail a Grapheme-to-Phoneme (G2P) converter required for the development of a good quality
Marathi Text-to-Speech (TTS) system. The Festival and Festvox framework is chosen for developing the
Marathi TTS system. Since Festival does not provide complete language processing support specie to various
languages, it needs to be augmented to facilitate the development of TTS systems in certain new languages.
Because of this, a generic G2P converter has been developed. In the customized Marathi G2P converter, we
have handled schwa deletion and compound word extraction. In the experiments carried out to test the Marathi
G2P on a text segment of 2485 words, 91.47% word phonetisation accuracy is obtained. This Marathi G2P has
been used for phonetising large text corpora which in turn is used in designing an inventory of phonetically rich
sentences. The sentences ensured a good coverage of the phonetically valid di-phones using only 1.3% of the
complete text corpora.
Kannada Phonemes to Speech Dictionary: Statistical ApproachIJERA Editor
The input or output of a natural Language processing system can be either written text or speech. To process written text we need to analyze: lexical, syntactic, semantic knowledge about the language, discourse information, real world knowledge to process spoken language, we need to analyze everything required to process written text, along with the challenges of speech recognition and speech synthesis. This paper describes how articulatory phonetics of Kannada is used to generate the phoneme to speech dictionary for Kannada; the statistical computational approach is used to map the elements which are taken from input query or documents. The articulatory phonetics is the place of articulation of a consonant. It is the point of contact where an obstruction occurs in the vocal tract between an articulatory gesture, an active articulator, typically some part of the tongue, and a passive location, typically some part of the roof of the mouth. Along with the manner of articulation and the phonation, this gives the consonant its distinctive sound. The results are presented for the same.
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
Abstract: In speech synthesis in text to speech systems, the words usually break to different parts and use from recorded sound of each part for play words. This paper use silent in word's pronunciation for better quality of speech. Most algorithms divide words to syllable and some of them divide words to phoneme, but This paper benefit from silent in intonation and divide words at silent region and then set equivalent sound of each parts whereupon joining the parts is trusty and speech quality being more smooth . this paper concern Persian language but extendable to another language. This method has been tested with MOS test and intelligibility, naturalness and fluidity are better.Keywords:TTS, SBS, Sillable, Diphone.
Title:Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Author:Sohrab Hojjatkhah, Ali Jowharpour
International Journal of Recent Research in Mathematics Computer Science and Information Technology (IJRRMCSIT)
Paper Publications
Emotional telugu speech signals classification based on k nn classifiereSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Emotional telugu speech signals classification based on k nn classifiereSAT Journals
Abstract Speech processing is the study of speech signals, and the methods used to process them. In application such as speech coding, speech synthesis, speech recognition and speaker recognition technology, speech processing is employed. In speech classification, the computation of prosody effects from speech signals plays a major role. In emotional speech signals pitch and frequency is a most important parameters. Normally, the pitch value of sad and happy speech signals has a great difference and the frequency value of happy is higher than sad speech. But, in some cases the frequency of happy speech is nearly similar to sad speech or frequency of sad speech is similar to happy speech. In such situation, it is difficult to recognize the exact speech signal. To reduce such drawbacks, in this paper we propose a Telugu speech emotion classification system with three features like Energy Entropy, Short Time Energy, Zero Crossing Rate and K-NN classifier for the classification. Features are extracted from the speech signals and given to the K-NN. The implementation result shows the effectiveness of proposed speech emotion classification system in classifying the Telugu speech signals based on their prosody effects. The performance of the proposed speech emotion classification system is evaluated by conducting cross validation on the Telugu speech database. Keywords: Emotion Classification, K-NN classifier, Energy Entropy, Short Time Energy, Zero Crossing Rate.
Phrase Identification is one of the most critical and widely studied in Natural Language processing (NLP) tasks. Verb Phrase Identification within a sentence is very useful for a variety of application on NLP. One of the core enabling technologies required in NLP applications is a Morphological Analysis. This paper presents the Myanmar Verb Phrase Identification and Translation Algorithm and develops a Markov Model with Morphological Analysis. The system is based on Rule-Based Maximum Matching Approach. In Machine Translation, Large amount of information is needed to guide the translation process. Myanmar Language is inflected language and there are very few creations and researches of Lexicon in Myanmar, comparing to other language such as English, French and Czech etc. Therefore, this system is proposed Myanmar Verb Phrase identification and translation model based on Syntactic Structure and Morphology of Myanmar Language by using Myanmar- English bilingual lexicon. Markov Model is also used to reformulate the translation probability of Phrase pairs. Experiment results showed that proposed system can improve translation quality by applying morphological analysis on Myanmar Language.
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
Stemming is the process of term conflation. It conflates all the word variants to a common form called as stem. It plays significant role in numerous Natural Language Processing (NLP) applications like morphological analysis, parsing, document summarization, text classification, part-of-speech tagging, question-answering system, machine translation, word sense disambiguation, information retrieval (IR), etc. Each of these tasks requires some pre-processing to be done. Stemming is one of the important building blocks for all these applications. This paper, presents an overview of various stemming techniques, evaluation criteria for stemmers and various existing stemmers for Indic languages.
The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS) Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM) and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.
High Quality Arabic Concatenative Speech Synthesissipij
This paper describes the implementation of TD-PSOLA tools to improve the quality of the Arabic Text-tospeech (TTS) system. This system based on Diphone concatenation with TD-PSOLA modifier synthesizer. This paper describes techniques to improve the precision of prosodic modifications in the Arabic speech synthesis using the TD-PSOLA (Time Domain Pitch Synchronous Overlap-Add) method. This approach is based on the decomposition of the signal into overlapping frames synchronized with the pitch period. The main objective is to preserve the consistency and accuracy of the pitch marks after prosodic modifications of the speech signal and diphone with vowel integrated database adjustment and optimisation.
On Developing an Automatic Speech Recognition System for Commonly used Englis...rahulmonikasharma
Speech is one of the easiest and the fastest way to communicate. Recognition of speech by computer for various languages is a challenging task. The accuracy of Automatic speech recognition system (ASR) remains one of the key challenges, even after years of research. Accuracy varies due to speaker and language variability, vocabulary size and noise. Also, due to the design of speech recognition that is based on issues like- speech database, feature extraction techniques and performance evaluation. This paper aims to describe the development of a speaker-independent isolated automatic speech recognition system for Indian English language. The acoustic model is build using Carnegie Mellon University (CMU) Sphinx tools. The corpus used is based on Most Commonly used English words in everyday life. Speech database includes the recordings of 76 Punjabi Speakers (north-west Indian English accent). After testing, the system obtained an accuracy of 85.20 %, when trained using 128 GMMs (Gaussian Mixture Models).
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
F017163443
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. VI (Jan – Feb. 2015), PP 34-43
www.iosrjournals.org
DOI: 10.9790/0661-17163443 www.iosrjournals.org 34 | Page
Implementation of English-Text to Marathi-Speech (ETMS)
Synthesizer
Sunil S. Nimbhore1
, Ghanshyam D. Ramteke2
, Rakesh J. Ramteke*3
1
(Assistant Professor, Department of Computer Science & IT, Dr. B. A. M. University, Aurangabad, India)
2
(Research Scholar, School of Computer Sciences, North Maharashtra University, Jalgaon,, India)
3
(Professor, School of Computer Sciences, North Maharashtra University, Jalgaon, India)
Abstract: This paper describes the Implementation of Natural Sounding Speech Synthesizer for Marathi
Language using English Script. The natural synthesizer is developed using unit selection on the basis of
concatenative synthesis approach. The purpose of this synthesizer is to produce natural sound in Marathi
language by computer. The natural Marathi words and sentences have been acquired by ‘Marathi Wordnet’
because all Marathi linguists are referred Wordnet. In this system, around 28,580 syllables, natural words and
sentences were used. These natural syllables have been spoken by one female speaker. The voice signals were
recorded through standard Sennheiser HD449 Wired Headphone using PRAAT tool with sampling frequency of
22 KHz. The ETMS-system was tested and generated the natural output as well as waveform. The formant
frequencies (F1, F2 and F3) were also determined by MATLAB and PRAAT tools. The formant frequencies
results are to be found satisfactory.
Keywords: Formant Frequency, Natural Synthesizer, Concatenative, LPC, Speech Corpus.
I. Introduction
Nowadays, Human Computer Interface (HCI) is familiar, and handled by human for increasing the
efficiency in various fields. Speech is the most effective way of communication for human beings. The interface
between human and computer speech play an important role in day to day life. The new technologies are
recently been adopted for effective and user friendly communication into digital technology. However, the
English-Text to Marathi-Speech (ETMS) is not available for the commercial purpose. A synthesizer can
incorporate a model of the vocal tract for human voice characteristics to create a completely “synthetic” voice
output. The quality of a speech synthesizer is referred by its similarity to the human voice and it shall be
understandable.
The formants are physically defined as poles in a system function expressing the characteristics of a
vocal tract. Therefore, it can be demonstrated clearly their existence. Therefore, a different variety of the
formant tracking spectrum can be analyzed and synthesized [1]. The formant frequencies play a vital role for
signal classification of speech signals, and therefore, these techniques are reliable for computing and suitable for
speech synthesis. The Marathi numerals, vowels, and words speech signals are stored in speech corpus. These
speech signals are extracted the first three formant frequencies (F1, F2 and F3) [2]. The most popular two
techniques for format frequencies are [A] LPC analysis [B] Cepstral analysis. The experimental work is done by
LPC analysis using the MATLAB and PRAAT tool.
The main objective of the present paper is to report design and development of as system which input
is in the form of English text and output is corresponding spoken text into Marathi language. The synthetic
spoken words are also analyzed and the formant frequencies were determined by tool available in MATLAB
and PRAAT. These formant frequencies determine the quality of the spoken synthetic words
The paper is organized as follows; Section-I deals with introduction of Natural Sounding Speech
Synthesizer and formant pitch frequencies. Section-II depicts the description of Marathi digit, vowels,
consonants and words, written in Devnagari scripts are described. Section-III describes concatenative synthesis
approach is based on unit selection. Section-IV describes the methods regarding the determination of formant
frequencies using software tools available in MATLAB and PRAAT. Section-V describes the acquisition of
Text and Speech Corpus for Marathi Language. The section-VI deals with experimental works and discussion of
results. Finally, the paper is concluded in the section-VII.
II. Description of Devnagari Script Language
Devnagari script is used worldwide by millions of people. Marathi, Hindi, Sanskrit Languages are
written by using Devnagari script. The Structure and Grammar of Marathi language is similar to Hindi language.
Marathi is primarily spoken language in Maharashtra State, (India). It is an official language used in State of
Maharashtra. In Maharashtra all students study using the Devnagari script (Marathi) as 1st Language at school
level.
2. Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
DOI: 10.9790/0661-17163443 www.iosrjournals.org 35 | Page
Marathi is spoken completely in Maharashtra state which covers in vast geographical area which consists of 36
different districts in different dialects. The Major dialects of Marathi languages are called standard Marathi and
Varhadi. The other few sub-dialects are like Ahirani, Dangi, Vadvali, Samavedi, Khandeshi and Malwani.
However, Standard Marathi is an official language of Maharashtra. So it is essential to do the research on this
domain.
The written digits (0-9), 12-vowels, 34-consonants and some words in English and Devnagari scripts
together are shown table-1 [3]. The written text in English can be performed in Marathi in similar way as
illustrate in Table-1. For example word „bharat‟ in English will be performed as „भारत’ in Marathi.
Table 1. Digits, Vowels, Consonants and Words Written in Marathi and English Script
WRITTEN ENGLISH SCRIPT WRITTEN DEVNAGARI SCRIPT
DIGITS 0,1,2,3,4,5,6,7,8,9
अंक
(ANK)
०, १, २, ३, ४, ५, ६, ७, ८, ९
VOWELS
a, aa, i, ee, u, oo, ae, aae, o, ou,
am, aha
स्वर
(SWARAS)
अ, आ, इ, ई, उ, ऊ, ए, ऐ, ओ, औ, अं,
अः
CONSONANTS
ka, kha, ga, gha, nga, cha,
chcha, ja, jha, ta, tha, da, dda,
nha, tta, ththa, da, dha, na, pa,
pha, ba, bha, ma, ya, ra, la, va,
sha, sshha, sa, ha, lla, ksha, gnya
व्यंजन
(VANJNAS)
क, ख, ग, घ, ङ
च, छ, ज, झ,
ट, ठ, ड, ढ, ण,
त, थ, द, ध, न,
ऩ, प, फ, ब, भ,
म, य, र, ल,
ळ, ऴ, व,
श, ऱ, ष, स
WORDS
bharat, aajoba, chandichya,
zadala. antaralat.
शब्द
(SHABDAS)
भारत, आजोबा, चाांदीच्या, डाळऩेच,
झाडाऱा, अांतरालात. इ.
III. Concatenative Synthesis Approach
Concatenative Speech Synthesis approach plays an important role to implement the natural synthesized
speech for Marathi language. Concatenative synthesis is concatenating the pre-recorded segments to generate
the natural speech. Concatenative speech is produce intelligible & natural synthetic speech, usually close to a
real voice of person [4, 5]. However, concatenative synthesizers are limited to only one speaker and one voice.
The difference between natural variation in speech signals and the nature of the automated techniques are
segmenting the waveforms form the audible output. The unit selection synthesis is sub-type of concatenative
synthesis approach is details described in next section [6, 7].
Unit Selection Synthesis
Unit selection is the natural extensions of second generation concatenative system. Unit selection
synthesis requires large corpus of recorded speech. During corpus of speech creation, each recorded utterances
are segmented into the form of digits, vowels, consonants, words and sentences. The segmentation is done using
visual representations such as the waveform of speech, pitch track and spectrogram.
An index of the units in the speech database is then created based on the segmentation and acoustic
parameters like the fundamental frequency (F0) pitch, duration, position in the syllable, and neighboring phones
[8]. In runtime, the desired targeted utterances are determined by the best chain of units from the database (unit
selection).
The unit selection provides the extreme naturalness, because a small amount of digital signal
processing (DSP) is applied to the recorded speech. DSP often makes recorded speech sound less natural,
although some systems use a small amount of signal processing at the point of concatenation to generate the
smooth waveform. The unit-selection systems are producing the best output from real human voice. However, it
requires large speech database for unit-selection system [9, 10]. The fig.1 represents the process flow of Natural
Sounding Speech Synthesizer for Marathi language and is describe below.
3. Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
DOI: 10.9790/0661-17163443 www.iosrjournals.org 36 | Page
Fig.1. Process Flow of Natural Sounding Speech Synthesizer for Marathi language
Firstly, the input text is taken in the form digits / vowels / consonants / words / sentences. Then
corresponding input texts fetched and matching from the speech database. The speech signals are corresponding
to the text form speech database. That speech signals were normalized and noise was illuminated through the
Audacity, PRAAT and voice activity detection (VAD) algorithm. The units of speech signals were concatenate
using the concatenative synthesis approach.
Then system was able to produce the sound related the input text and generate the synthesized speech
waveform. The process flow diagram is implementing through the MATLAB tool. Throughout experimental
work is carried out to design GUI-based tool for analysis and produce the natural synthetic speech.
IV. Formant Frequency Detection Technique
Formants are defined as the spectral peaks of sound spectrum, of the voice of a person. The speech and
phonetics, are acoustic resonance of the human vocal tract is referred as formant frequencies [11]. The vocal
tract presents some appropriate pulses are very specious in the spectral of the acoustic signal. These appropriate
frequencies constitute the formants of the vocal signal. After calculating the smoothed spectrum, it can extract
amplitudes corresponding to the vocal tract resonance. The source vocal tract particularly it peaks corresponding
smoothed spectrum to the resonance of the formants. This can be easily obtained by localizing the spectral
maxima from frequency bands [12]. The two popular method Cepstral and LPC analysis which have been used
for determined the formant frequencies of speech signals.
[A] Cepstral Analysis
In this section a model for formant estimation based on Cepstral analysis is represented. The way of
representing the spectral envelope by computing the power spectrum from the fourier transform, by execution of
an Inverse Fourier Transform of the logarithm of that power spectrum, and by retaining just the low-order
coefficients of this inverse. This overall result is called Cepstrum of the signal. The pitch period is estimated and
roundup the log magnitudes are obtained from the Cepstrum. The formants which are estimated from the sharp
spectral envelope using the constraints on formant frequency ranges and relative levels of spectral peaks at the
formant frequencies. The vocal signal results from the convolution of the source and the contribution of the
vocal tract. This technique is designed for separate the barrier of signal components [13].
4. Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
DOI: 10.9790/0661-17163443 www.iosrjournals.org 37 | Page
Fig.2. Formant Frequency Estimation Using Cepstral Analysis
The speech signal is represented as;
S (n) =g (t) ⊗ h (t) (1)
Where ⊗ denotes convolution, S(n) is the value of the speech signal at the nth point, g(t) and h(t) are
contribution of the excitation and vocal tract respectively. The Cepstrum method represents the spectral
envelope by computing the power spectrum using Fourier transform of logarithmic of that power spectrum. The
Cepstrum method is computed through inverse Fourier transform of the log spectrum. The Cepstrum method
expression shown in equation (2).
ç(n) = FFT − 1(Log(FFT(s(n)))) (2)
Where ç(n) is Cepstrum coefficient of speech signal at the nth point. At this state, the excitation g(t) and the
vocal tract shape h(t) are superimposed. It can be separated by conventional signal processing such as temporal
filtering. In fact, the low order terms of the Cepstrum contain the information relative to the vocal tract. These
two equations are contributed to separate the peak values and peaking the simple temporal windowing.
[B] LPC Analysis
Linear prediction is a good tool for analysis of speech signals. Linear prediction model is the human
vocal tract model as treated as infinite impulse response (IIR) system, which produces the speech signal. In
speech coding, the success of LPC have been explained by the fact that an all pole model is a reasonable
approximation for the transfer function of the vocal tract. All pole models are also suitable in terms of human
hearing, because the ear is most sensitive to spectral peaks than spectral valleys. Hence, an all pole model is
useful not only because it may be a physical model for a signal, but it is a perceptually expressive parametric
representation for a speech signal [14].
The analyzing and estimating the speech signals using formant frequency on the basis of linear
predictive coefficient (LPC) technique. The fig.3 shows the LPC based processed step by step. All Cepstral and
Linear Predicting Coefficients (12 coefficients) have been computed from pre-emphasized speech signal using
512 points Hamming windowed speech frames [15].
Fig.3. Formant Frequency Estimation Using LPC Analysis
5. Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
DOI: 10.9790/0661-17163443 www.iosrjournals.org 38 | Page
V. Database Acquisition
[A] Text Corpus
The main objective of the text corpora design in this study is to construct a minimum but sufficient
speech corpus for concatenative synthesizer system to producing the speaker‟s natural voice. It is important that
created text corpus, words and sentences that frequently appearing in speech corpus which is synthesized
naturally and comprehensively.
This research work text corpus is created by „Marathi Wordnet‟ because all Marathi linguists are
approved the „Marathi Wordnet‟ dictionary. The time of selection words and sentences are in order to avoid the
repetition of common words, which decreases size of the database, but enhance the overall quality of speech
synthesizer. The some natural words with sentences are shown in Table-II. All Marathi words, sentences used in
this research work are taken from Marathi Wordnet dictionary, which is consider to be standard in the Marathi
language. For present research study around 28580 natural syllables and phonemes are selected in testing phase.
Table 2. Marathi Words and Sentences written in English and Marathi script
Marathi words
written in English
script
Marathi words in
phonetic form
Marathi Sentences written in English Script Marathi Sentences written in
Devnagari script
aachari
आचारी Aachari plyane varan halvat hota
आचारी ऩळ्याने ळरण हऱळत होता.
aajoba,
आजोबा
Aajobansathi nave dhotrache pan aanale
आजोबाांसाठी नळे धोतराचे ऩान
आणऱे.
zadala
झाडाऱा Zadala hirvya rangachi pane yetat
झाडाऱा हहरव्या रांगाची ऩाने येतात.
antaralat.
अांतरालात
Antaralat khup sury ahet
अांतरालात खूऩ सूयय आहेत.
bharat,
भारत
Bharat v Pakistan yanchyatil criketcha samana
changlach rangto
भारत ळ ऩाकिस्तान याांच्यातीऱ
कििे टचा सामना चाांगऱाच रांगतो.
chandichya,
चाांदीच्या
Mithai chandichya varkhat gundalleli hoti
ममठाई चाांदीच्या ळखायत गांडालऱेऱी
होती.
[B] Speech Corpus
The phonetically rich natural Marathi words and sentences are taken form the text corpus. These
phonemes are spoken by only female speaker of age 22-year. The concatenative speech synthesis is restricted to
only one speaker and one voice for producing natural and intelligible speech signals. These natural words and
sentences were acquired through Standard Sennheiser HD-449 Wired Headphone. That speech corpus data
acquisition was done in 12‟ X 10‟ X 12‟ lab. The speech corpus was acquired in normal room temperature
through PRAAT tool with sampling frequency of 22 KHz. These natural syllables are spoken in continuous
rhythm with small gap between two successive words. The speech corpus is divided in the form of Marathi
numbers 1-100-319, Vowels-12, Consonants-34, Words-5058 and SentencesX1855 were stored in .wav file
format. The size of speech corpus was 1.2 GB.
VI. Experimental Work, Results And Discussion
The formant frequencies have been determined by [A] Cepstral and [B] LPC analysis. Experiment was
done through standard Praat and Mat lab Tools. The LPC techniques have been used for estimating the formant
frequencies which is denoted as F1, F2, and F3. Various undefined signal have been ignored. For obtaining the
formants, it has done the experiment using different values for the prediction order and varying the degree of
pre-emphasis. When dealing with windowing speech it need to take into account the boundary effects in order to
avoid large prediction errors at the edges. When it defines the area over perform speech at minimization.
The synthesized speech signals which are computed Cepstral with LPC coefficient that smoothing
spectrum is denoted as formants F1, F2, and F3. This experimental work it decides some samples are taken for
analysis. These samples are described to produce synthesized speech signals. These samples are Marathi
Numbers, vowels and words for extracting the formant frequencies on the basis of LPC analysis using Praat and
Mat lab Tool.
The string and speech units are matching and comparing from the speech and text corpus. The each
speech units are selecting from the speech corpus then concatenate and produce the natural synthetic speech.
The Fig. 4 and Fig. 5 indicated the formant pitch track and waveform of Marathi spoken numbers ‘ऩाच’ and „नऊ
हजार आठऴे सदसष्ट‟ respectively. Similarly the waveform and Formant Track of Marathi Spoken Vowel ‘ओ’
6. Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
DOI: 10.9790/0661-17163443 www.iosrjournals.org 39 | Page
and Marathi Word ‘आचारी’ is shown in Fig.6 and Fig.7 respectively. The table III is obtained result for first
three formants frequencies of synthesized speech for Marathi numbers with its duration.
This experiment which have used PRAAT tool for synthesized the speech signals of number, vowels
and words. In this order it has estimated the formant frequency (F1, F2, and F3) values are varies from 540 to
2992 and its standard deviation values are varies from 45 to 616 respectively. The LPC based MATLAB tool
which have used the for synthesized the speech signals of number, vowels and words. In this order it has
estimated the formant frequency (F1, F2, and F3) values are varies from 145 to 831 and its standard deviation
values are varies from 43 to 138 respectively. The estimated formant frequencies (F1, F2, and F3) are
determined by PRAAT and MATLAB tool is denoted as PT and MT. These results were found to be
satisfactory, which is shown in Table-III-VII. The implemented system provides the good accuracy and
produces the high quality synthesized speech.
Fig.4. Waveform and Formant Track of
Synthesized Speech for Marathi Number ‘५’
Fig.5.Waveform and Formant Track of
Synthesized Speech for Marathi Number ‘९८६७’
Fig.6. Waveform and Formant Track of
Synthesized Speech for Marathi Vowel ‘ओ’
Fig.7. Waveform and Formant Track of
Synthesized Speech for Marathi Word ‘आचारी’
7. Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
DOI: 10.9790/0661-17163443 www.iosrjournals.org 40 | Page
Table 3. Estimated Format Frequencies of Marathi Vowels by LPC using PRAAT (PT) Tool.
English
Numbers
Marathi
Numbers
Marathi Numbers in
Phonetic form
Duration
in Sec.
F1 F2 F3
8 ८ आठ 0.30 596.67 914.37 1773.43
39 ३९ एिोणचालीस 0.69 435.55 974.56 2718.47
44 ४४ चौरेचालीस 0.73 438.74 714.38 2665.99
135 १३५ एिऴे ऩस्तीस 1.30 707.40 1526.45 3237.21
475 ४७५ चारऴे ऩांचाहत्तर 1.49 632.48 1200.04 2518.39
838 ८३८ आठऴे अडोतीस 1.09 431.92 946.94 2722.69
1785 १७८५ एि हजार सातऴे ऩांच्याऐऴी 2.08 520.93 1012.98 2668.22
2699 २६९९ दोन हजार सहाऴे नव्याण्णळ 2.22 459.41 912.24 2640.74
4555 ४५५५ चार हजार ऩाचऴे ऩांचाळन्न 2.27 623.78 1034.33 2260.65
5000 ५००० ऩाच हजार 0.93 607.38 934.06 1996.00
6689 ६६८९ सहा हजार सहाऴे एिोणनव्ळद 2.20 502.87 894.80 2434.45
9867 ९८६७ नऊ हजार आठऴे सदसष्ट 2.39 523.82 1193.40 2720.16
Mean 540.08 1021.55 2529.70
Standard Deviation(SD) 87.89 197.03 364.94
Table 4. Estimated Format Frequencies of Marathi Vowels by LPC using PRAAT (PT) Tool.
English Written
Marathi Vowels
Marathi Vowels in
Phonetic form
Duration
in Sec.
F1 F2 F3
a अ 0.54 570.26 750.73 1098.7
aa आ 0.71 812.73 995.19 1466.48
i इ 0.73 288.75 668.44 3108.02
ee ई 0.74 305.88 648.03 3056.11
u उ 0.58 335.91 594.73 2717.26
oo ऊ 0.59 334.59 657.51 2305.26
ae ए 0.66 445.38 657.55 2231.12
aae ऐ 0.66 444.89 661.57 2254.37
o ओ 0.75 494.88 805.54 1532.29
ou औ 0.68 432.32 664.74 2423.56
am अां 0.54 496.10 842.48 1608.33
aha अ् 0.50 727.88 947.20 1775.34
Mean 474.13 741.14 2131.40
Standard Deviation(SD) 156.49 123.51 616.49
8. Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
DOI: 10.9790/0661-17163443 www.iosrjournals.org 41 | Page
Table 5. Estimated Format Frequencies of Marathi Words by LPC using PRAAT (PT) Tool.
English Written
Marathi Words
Marathi Words in
phonetic form
Duration
in Sec.
F1 F2 F3
Aachari आचारी 0.55 468.37 947.29 2760.29
Aajoba आजोबा 0.48 397.92 700.61 2769.87
Zadala झाडाऱा 0.63 566.78 1135.21 3291.71
Vastu ळस्त 0.47 421.96 1148.27 3559.05
Antaralat अांतरालात 0.65 472.41 745.98 2308.18
Aadhar आधार 0.36 474.64 888.78 3254.87
Balachi बालाची 0.75 503.02 1119.99 2821.13
Banachi बाणाची 0.67 520.60 1351.08 2998.12
Bharat भारत 0.39 441.13 1156.05 3125.76
Chandichya चाांदीच्या 0.60 465.74 921.34 2809.42
Criketacha कििे टचा 0.62 410.10 1136.36 3144.37
Davpech डाळऩेच 0.47 464.96 999.27 3065.15
Mean 467.30 1020.85 2992.33
Standard Deviation(SD) 45.58 180.54 310.93
Table 6. Estimated Format Frequencies of Marathi Numerals by LPC using MATLAB (MT) Tool.
English
Numbers
Marathi
Numbers
Marathi Numbers in
Phonetic
form
Duration
in Sec.
F1 F2 F3
8 ८ आठ 0.30 213.49 289.41 644.67
39 ३९ एिोणचालीस 0.69 102.96 228.77 812.80
44 ४४ चौरेचालीस 0.73 159.28 221.64 683.77
135 १३५ एिऴे ऩस्तीस 1.30 162.59 248.51 875.83
475 ४७५ चारऴे ऩांचाहत्तर 1.49 131.37 237.46 768.37
838 ८३८ आठऴे अडोतीस 1.09 123.55 226.59 854.25
1785 १७८५ एि हजार सातऴे ऩांच्याऐऴी 2.08 150.12 282.87 632.12
2699 २६९९ दोन हजार सहाऴे नव्याण्णळ 2.22 167.47 306.35 875.34
4555 ४५५५ चार हजार ऩाचऴे ऩांचाळन्न 2.27 176.92 286.99 577.39
5000 ५००० ऩाच हजार 0.93 217.38 288.63 570.08
6689 ६६८९ सहा हजार सहाऴे एिोणनव्ळद 2.20 264.70 271.09 872.86
9867 ९८६७ नऊ हजार आठऴे सदसष्ट 2.39 141.30 273.20 626.15
Mean 167.59 263.46 732.80
Standard Deviation(SD) 45.35 29.30 122.52
9. Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
DOI: 10.9790/0661-17163443 www.iosrjournals.org 42 | Page
Table 7. Estimated Format Frequencies of Marathi Vowels by LPC using MATLAB (MT) Tool.
English Written
Marathi Vowels
Marathi Vowels
in Phonetic form
Duration
in Sec.
F1 F2 F3
a अ 0.54 209.59 263.51 564.94
aa आ 0.71 311.10 413.73 671.39
i इ 0.73 100.88 240.20 831.95
ee ई 0.74 104.90 116.14 847.33
u उ 0.58 149.31 220.40 831.22
oo ऊ 0.59 110.16 221.08 851.47
ae ए 0.66 158.93 249.86 545.71
aae ऐ 0.66 158.86 250.64 578.24
o ओ 0.75 166.87 273.64 847.62
ou औ 0.68 165.83 285.89 554.35
am अां 0.54 219.66 327.77 559.48
aha अ् 0.50 273.08 366.25 561.33
Mean 177.43 269.09 687.09
Standard Deviation(SD) 65.51 76.13 140.44
Table 8. Estimated Format Frequencies of Marathi Words by LPC using MATLAB (MT) Tool
English Written
Marathi Words
Marathi Words in
phonetic form
Duration in
Sec.
F1 F2 F3
Aachari आचारी 0.55 93.62 295.22 862.11
Aajoba आजोबा 0.48 93.52 233.07 851.31
Zadala झाडाऱा 0.63 103.01 275.13 843.23
Vastu ळस्त 0.47 118.92 272.58 753.80
Antaralat अांतरालात 0.65 158.28 249.57 830.39
Aadhar आधार 0.36 254.93 269.90 773.24
Balachi बालाची 0.75 107.61 275.94 858.01
Banachi बाणाची 0.67 266.22 267.52 857.79
Bharat भारत 0.39 102.01 269.17 854.11
Chandichya चाांदीच्या 0.60 91.99 315.96 781.73
Criketacha कििे टचा 0.62 199.95 259.52 856.32
Davpech डाळऩेच 0.47 150.7 165.64 852.93
Mean 145.06 262.44 831.25
Standard Deviation(SD) 63.10 36.83 38.56
VII. Conclusion
This research work has reported the implementation of Natural Sounding Speech synthesizer for
Marathi. The important feature of concatenative speech synthesizer is restricted to only one speaker and one
voice for producing natural and intelligible speech signals. The throughout experiment was carried out by DSP
tool available in MATLAB software. The formant frequency results which are determined by formant detection
techniques through LPC and Cepstral analysis using MATLAB and PRAAT tool. The synthesized speech
signals are extracted from the formant detection technique that separated the peaks are denoted as F1, F2 and F3
that results are reported in table 3-8. This results are given good and high quality performance, with the help it
produce the natural synthetic speech and generated the waveform of corresponding input text.
10. Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
DOI: 10.9790/0661-17163443 www.iosrjournals.org 43 | Page
Acknowledgements
The authors are indebted and thankful to Dr. S. C. Mehrotra, Professor, (Ramanujun Geospatial Chair),
Department of Computer Science & IT, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad (MS)
for valuable suggestion and discussion. Author G. D. Ramteke would like to thankful for providing the financial
support of Raisoni Doctoral Fellowship, North Maharashtra University, Jalgaon.
References
[1]. A. Watanabe, “Formant estimation method using inverse-filter control, ” IEEE Trans. Speech and Audio Processing, vol. 9, 2001,
pp. 317-326.
[2]. Roy C. Snell and Fausto Milinazzo, "Formant location from LPC analysis data", IEEE Transaction on Speech and Audio
Processing, Vol.1, No-2, 1993.
[3]. R.K. Bansal, and J.B. Harrison, “Spoken English for India, a Manual of Speech and Phonetics'”, Orient Longman, 1972.
[4]. Aimilios Chalamandaris, Sotiris Karabetsos, “A Unit Selection Text-to-Speech Synthesis System Optimized for Use with Screen
Readers”, IEEE Transactions on Consumer Electronics, Vol. 56, No. 3, 2010, pp.1890-1897.
[5]. Akemi Iida, Nick Campbell,” A database design for a concatenative speech synthesis system for the disabled”, ISCA, ITRW on
Speech Synthesis, 2001.
[6]. A.Chauhan, Vineet Chauhan, Gagandeep Singh, Chhavi Choudhary, Priyanshu Arya, “Design and Development of a Text-To-
Speech Synthesizer System”, IJECT Vol. 2, Issue 3, Sept. 2011.
[7]. Chomtip Pronpanomchai, Nichakant Soontharanont, Charnchai Langla and Narunat Wongsawat, “A Dictionary-Based Approach
for Thai Text to Speech (TTTS)”, The Third International Conference on Measuring Technology and Mechatronics Automation, pp.
40 – 43, 2011.
[8]. D. J. Ravi and Sudarshan Patilkulkarni, “A Novel Approach to Develop Speech Database for Kannada Text-to-Speech System”,
Internal Journal on Recent Trends in Engineering & Techonology, Vol. 05, No. 01, 2011, pp. 119-122.
[9]. Muhammad Masud Rashid, Md. Akter Hussian, M. Shahidur Rahman, “Text Normalization and Diphone Preparation for Bangla
Speech Synthesis”, Joural of Multimedia, Vol. 5, No. 6, 2010, pp. 551-559.
[10]. Mustafa Zeki, Othman O. Khalifa, A. W. Naji, “Development of An Arabic Text-To-Speech System”, International Conference on
Computer and Communication Engineering (ICCCE 2010), 2010, pp. 1-5.
[11]. L. Welling & H. Ney, “Formant estimation for speech recognition,” IEEE Trans. Speech and Audio Processing, vol.6, 1998, pp. 36-
48.
[12]. Patti Adank, Roeland van Hout, and Roel Smits, “A comparison between human vowel normalization strategies and acoustic vowel
transformation techniques”, 2001 Euro speech.
[13]. D.Gargouri, Med Ali Kammoun and Ahmed ben Hamida, "Formants Estimation Techniques for Speech Analysis", International
Conference on Machine Intelligence, Tozeur–Tunisia, Nov.5-7, 2005, pp. 96-100.
[14]. D.Gargouri, Med Ali Kammoun and Ahmed ben Hamida,"A Comparative Study of Formant Frequencies Estimation Techniques",
Proceedings of the 5th WSEAS International Conference on Signal Processing, Istanbul, Turkey, 2006, pp15-19.
[15]. Parminder Singha, Gurpreet Singh Lehal, “Text-To-Speech Synthesis System for Punjabi Language”, 2011, pp. 140 – 143.
[16]. G. D. Ramteke, S. S. Nimbhore, R. J. Ramteke, “De-noising Speech of Marathi Numerals Using Spectral Subtraction” , Advances
in Computational Research, ISSN: 0975-3273 & E-ISSN: 0975-9085, Volume 4, Issue 1, 2012, pp.-122-124.
[17]. S. S. Nimbhore, G. D. Ramteke, R. J. Ramteke, “Pitch Estimation of Devnagari Vowels using Cepstral and Autocorrelation
Techniques for Original Speech Signal”, International Journal of Computer Applications Journal(0975-8887), Volume 55 - Number
17, Oct-2012, pp.38-43.
[18]. Knowledge is treasure that will follow that will follow its owner everywhere, available at: http://www.balmitra.com.
[19]. http:// www.cfilt.iitb.ac.in/wordnet/
[20]. http://www.learn101.org/marathi_phrases.php.