IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
Marathi is one of the oldest languages in India. This research paper describes the development of Marathi Textto-
Speech System (TTS). In Marathi TTS the input is Marathi text in Unicode. The voices are sampled from real
recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding
spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech
from any text input to imitate human speakers. Text processing and speech generation are two main components
of a text to speech system. To build a natural sounding speech synthesis system, it is essential that text
processing component produce an appropriate sequence of phonemic units. Generation of sequence of phonetic
units for a given standard word is referred to as letter to phoneme rule or text to phoneme rule. The
complexity of these rules and their derivation depends upon the nature of the language. The quality of a speech
synthesizer is judged by its closeness to the natural human voice and understandability. In this research paper we
described an approach to build a Marathi TTS system using concatenative synthesis method with syllable as a
basic unit of concatenation.
A Corpus-Based Concatenative Speech Synthesis System for Marathiiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
A Marathi Hidden-Markov Model Based Speech Synthesis Systemiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Speech to text conversion for visually impaired person using µ law compandingiosrjce
The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
Text independent speaker identification system using average pitch and forman...ijitjournal
The aim of this paper is to design a closed-set text-independent Speaker Identification system using average
pitch and speech features from formant analysis. The speech features represented by the speech signal are
potentially characterized by formant analysis (Power Spectral Density). In this paper we have designed two
methods: one for average pitch estimation based on Autocorrelation and other for formant analysis. The
average pitches of speech signals are calculated and employed with formant analysis. From the performance
comparison of the proposed method with some of the existing methods, it is evident that the designed
speaker identification system with the proposed method is superior to others.
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET Journal
This document describes the design of a Punjabi speech synthesis system using Hidden Markov Models. It discusses collecting Punjabi text from various domains to build a speech corpus. Features are extracted from the text and stored in a database. The system has offline and online phases, where the database is created offline and text-to-speech conversion occurs online. Hidden Markov Models are used for statistical parametric speech synthesis, modeling acoustic features like fundamental frequency, duration, and spectrum. The system breaks text into phonetic units like phonemes and diphones to generate waveforms for natural-sounding synthesized speech.
This document summarizes a research paper on developing a syllable-based speech recognition system for the Myanmar language. The proposed system has three main components: feature extraction, phone recognition, and decoding. Feature extraction transforms speech into acoustic feature vectors. Phone recognition computes likelihoods of acoustic observations given linguistic units like phones. Decoding uses acoustic and language models to find the most likely sequence of words. The paper discusses building acoustic and language models for Myanmar. The acoustic model is trained using Hidden Markov Models and Gaussian mixture models. The language model is an n-gram model built using syllable segmentation of text. Developing the first speech recognition system for Myanmar poses technical challenges due to its tonal syllabic structure.
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
This document summarizes an implementation of an English-text to Marathi-speech synthesizer. The synthesizer uses a unit selection approach based on concatenative synthesis to produce natural sounding Marathi speech from English text input. Over 28,000 Marathi syllables, words and sentences were recorded from a female speaker and used to create the speech corpus. Formant frequencies (F1, F2, F3) were analyzed from the synthesized speech using MATLAB and PRAAT tools to evaluate the quality and naturalness of the output.
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
Marathi is one of the oldest languages in India. This research paper describes the development of Marathi Textto-
Speech System (TTS). In Marathi TTS the input is Marathi text in Unicode. The voices are sampled from real
recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding
spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech
from any text input to imitate human speakers. Text processing and speech generation are two main components
of a text to speech system. To build a natural sounding speech synthesis system, it is essential that text
processing component produce an appropriate sequence of phonemic units. Generation of sequence of phonetic
units for a given standard word is referred to as letter to phoneme rule or text to phoneme rule. The
complexity of these rules and their derivation depends upon the nature of the language. The quality of a speech
synthesizer is judged by its closeness to the natural human voice and understandability. In this research paper we
described an approach to build a Marathi TTS system using concatenative synthesis method with syllable as a
basic unit of concatenation.
A Corpus-Based Concatenative Speech Synthesis System for Marathiiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
A Marathi Hidden-Markov Model Based Speech Synthesis Systemiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Speech to text conversion for visually impaired person using µ law compandingiosrjce
The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
Text independent speaker identification system using average pitch and forman...ijitjournal
The aim of this paper is to design a closed-set text-independent Speaker Identification system using average
pitch and speech features from formant analysis. The speech features represented by the speech signal are
potentially characterized by formant analysis (Power Spectral Density). In this paper we have designed two
methods: one for average pitch estimation based on Autocorrelation and other for formant analysis. The
average pitches of speech signals are calculated and employed with formant analysis. From the performance
comparison of the proposed method with some of the existing methods, it is evident that the designed
speaker identification system with the proposed method is superior to others.
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET Journal
This document describes the design of a Punjabi speech synthesis system using Hidden Markov Models. It discusses collecting Punjabi text from various domains to build a speech corpus. Features are extracted from the text and stored in a database. The system has offline and online phases, where the database is created offline and text-to-speech conversion occurs online. Hidden Markov Models are used for statistical parametric speech synthesis, modeling acoustic features like fundamental frequency, duration, and spectrum. The system breaks text into phonetic units like phonemes and diphones to generate waveforms for natural-sounding synthesized speech.
This document summarizes a research paper on developing a syllable-based speech recognition system for the Myanmar language. The proposed system has three main components: feature extraction, phone recognition, and decoding. Feature extraction transforms speech into acoustic feature vectors. Phone recognition computes likelihoods of acoustic observations given linguistic units like phones. Decoding uses acoustic and language models to find the most likely sequence of words. The paper discusses building acoustic and language models for Myanmar. The acoustic model is trained using Hidden Markov Models and Gaussian mixture models. The language model is an n-gram model built using syllable segmentation of text. Developing the first speech recognition system for Myanmar poses technical challenges due to its tonal syllabic structure.
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
This document summarizes an implementation of an English-text to Marathi-speech synthesizer. The synthesizer uses a unit selection approach based on concatenative synthesis to produce natural sounding Marathi speech from English text input. Over 28,000 Marathi syllables, words and sentences were recorded from a female speaker and used to create the speech corpus. Formant frequencies (F1, F2, F3) were analyzed from the synthesized speech using MATLAB and PRAAT tools to evaluate the quality and naturalness of the output.
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
Development of Quranic Reciter Identification System using MFCC and GMM Clas...IJECEIAES
The document presents a study on developing a Quranic reciter identification system using Mel-frequency Cepstral Coefficients (MFCC) and Gaussian Mixture Model (GMM) classifier. Researchers collected 15 Quranic audio samples from 5 reciters and used 10 samples for training GMM models and 5 samples for testing. The system achieved 100% recognition rate for identifying the 5 reciters, and was also able to reject unknown samples.
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
This document describes a Marathi text-to-speech (TTS) synthesis system based on a concatenative approach using syllables as the basic speech units. The system analyzes input text, performs syllabification based on linguistic rules, retrieves corresponding speech files from a corpus, concatenates the files while minimizing discontinuities at boundaries, and outputs synthesized speech. A Marathi speech corpus was created containing over 1000 sentences from various domains. Subjective quality tests found the synthesized speech to have naturalness and intelligibility comparable to natural speech. The system demonstrates an effective approach for Marathi TTS using a syllable-based concatenative method.
Myanmar news summarization using different word representations IJECEIAES
There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massive amount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation is really important in text summarization to get relevant information. Bag-ofwords cannot give word similarity on syntactic and semantic relationship. Word embedding can give good document representation to capture and encode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paper, Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach, word embedding. Experiments were done on Myanmar local and international news dataset using different word embedding models and the results are compared with performance of bag-of-words summarization. Centroid summarization using word embedding performs comprehensively better than centroid summarization using bag-of-words.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
This document summarizes a semi-supervised clustering approach for classifying P300 signals for brain-computer interface (BCI) speller systems. It involves using k-means clustering on wavelet features extracted from EEG data, with some data points labeled to initialize the clusters. An ensemble of support vector machines is then trained on the clustered data points to classify new unlabeled P300 signals. The document outlines the P300 speller paradigm used to collect the EEG data, pre-processing steps like filtering and wavelet transformation, the seeded k-means semi-supervised clustering method, and using an ensemble SVM classifier trained on the clustered data for classification.
IRJET- Audio Data Summarization System using Natural Language ProcessingIRJET Journal
This document summarizes a research paper on an audio data summarization system using natural language processing. The system first converts an input audio file to text using speech recognition techniques. It then summarizes the text using natural language processing methods like tokenization, frequency analysis of words, sentence scoring, and selection of important sentences. The summarized text is output along with an optional audio summary generated using text-to-speech synthesis. The system provides a way to automatically summarize long-form audio content like podcasts or speeches into condensed textual or audio summaries.
Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
Arabic text categorization algorithm using vector evaluation methodijcsit
Text categorization is the process of grouping documents into categories based on their contents. This
process is important to make information retrieval easier, and it became more important due to the huge
textual information available online. The main problem in text categorization is how to improve the
classification accuracy. Although Arabic text categorization is a new promising field, there are a few
researches in this field. This paper proposes a new method for Arabic text categorization using vector
evaluation. The proposed method uses a categorized Arabic documents corpus, and then the weights of the
tested document's words are calculated to determine the document keywords which will be compared with
the keywords of the corpus categorizes to determine the tested document's best category.
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemIJERA Editor
The objective of this paper is to evaluate the quality of HMM based Marathi TTS system. The main advantage of HMM technique is its ability to allow the variation in voice easily. The output speeches produced in this method have greater impact on emotion, style and intonation. The naturalness and intelligibility are the two important parameters to decide the quality of synthetic speech. Depending on the parameters specified the results of synthetic speech are categorized into 4 categories: natural speech, high quality synthetic speech, low quality synthetic speech and moderate quality synthetic speech. The results are obtained by using CT, DRT and MOS test.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
The speech is most basic and essential method of communication used by person.On the basis of individual information included in speech signals the speaker is recognized. Speaker recognition (SR) is useful to identify the person who is speaking. In recent years speaker recognition is used for security system. In this paper we have discussed the feature extraction techniques like Mel frequency cepstral coefficient (MFCC), Linear predictive coding (LPC), Dynamic time wrapping (DTW), and for classification Gaussian Mixture Models (GMM), Artificial neural network (ANN)& Support vector machine (SVM).
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
The document compares the performance of two speech synthesis methods - unit selection and hidden Markov model (HMM) - for Hindi language. It finds that unit selection results in higher quality synthesized speech than HMM based on both subjective and objective quality measurements. Subjective measurements using mean opinion scores show unit selection receives higher average ratings. Objective measurements of mean square error and peak signal-to-noise ratio also indicate unit selection introduces less distortion compared to the original speech samples.
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
This document describes an expert system for speech synthesis of Standard Arabic text. It involves two main stages: 1) creation of a sound database and 2) text-to-speech transformation. The transformation process involves phonetic orthographic transcription of the text and then generating voice signals corresponding to the transcribed phonetic sequence. The expert system uses a knowledge base containing sound data and rewriting rules. It transcribes text using graphemes as basic units and then concatenates sound units from the database to synthesize speech. Tests achieved a 96% success rate in pronouncing sentences correctly. Future work aims to improve prosody and develop fully automatic signal segmentation.
The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS) Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM) and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.
The document describes the implementation of a natural sounding speech synthesizer for the Marathi language using English text input. It discusses concatenative speech synthesis using a unit selection approach. Over 28,580 syllables, words and sentences recorded from a female speaker were used to create an inventory of speech units. The synthesizer was tested and able to generate natural sounding output and waveforms. Formant frequencies were analyzed using MATLAB and PRAAT tools to evaluate the quality of the synthesized speech.
This document summarizes a research paper on developing a speech-to-text conversion system for visually impaired people using μ-law companding. The system uses MATLAB to analyze input speech signals, extract features, filter noise, and match signals to samples stored in a database to convert speech to text. A graphical user interface was created to input speech and display recognition results. The system achieved real-time speech recognition and conversion to text with high accuracy using μ-law companding techniques for signal processing and correlation comparisons to the stored samples.
SMATalk: Standard Malay Text to Speech Talk SystemCSCJournals
This document summarizes a research paper on SMaTTS, a rule-based text-to-speech synthesis system for Standard Malay. The system uses a sinusoidal synthesis method and some pre-recorded wave files to generate speech. It has two main phases: natural language processing (NLP) and digital signal processing (DSP). The NLP phase analyzes text and converts it to phonetic representations with prosody information. The DSP phase generates the speech waveform. The system was evaluated using diagnostic rhyme tests and mean opinion scores, and areas for improvement are discussed.
Hindi digits recognition system on speech data collected in different natural...csandit
This paper presents a baseline digits speech recognizer for Hindi language. The recording environment is different for all speakers, since the data is collected in their respective homes. The different environment refers to vehicle horn noises in some road facing rooms, internal background noises in some rooms like opening doors, silence in some rooms etc. All these recordings are used for training acoustic model. The Acoustic Model is trained on 8 speakers’ audio data. The vocabulary size of the recognizer is 10 words. HTK toolkit is used for building
acoustic model and evaluating the recognition rate of the recognizer. The efficiency of the recognizer developed on recorded data, is shown at the end of the paper and possible directions for future research work are suggested.
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
Development of Quranic Reciter Identification System using MFCC and GMM Clas...IJECEIAES
The document presents a study on developing a Quranic reciter identification system using Mel-frequency Cepstral Coefficients (MFCC) and Gaussian Mixture Model (GMM) classifier. Researchers collected 15 Quranic audio samples from 5 reciters and used 10 samples for training GMM models and 5 samples for testing. The system achieved 100% recognition rate for identifying the 5 reciters, and was also able to reject unknown samples.
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
This document describes a Marathi text-to-speech (TTS) synthesis system based on a concatenative approach using syllables as the basic speech units. The system analyzes input text, performs syllabification based on linguistic rules, retrieves corresponding speech files from a corpus, concatenates the files while minimizing discontinuities at boundaries, and outputs synthesized speech. A Marathi speech corpus was created containing over 1000 sentences from various domains. Subjective quality tests found the synthesized speech to have naturalness and intelligibility comparable to natural speech. The system demonstrates an effective approach for Marathi TTS using a syllable-based concatenative method.
Myanmar news summarization using different word representations IJECEIAES
There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massive amount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation is really important in text summarization to get relevant information. Bag-ofwords cannot give word similarity on syntactic and semantic relationship. Word embedding can give good document representation to capture and encode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paper, Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach, word embedding. Experiments were done on Myanmar local and international news dataset using different word embedding models and the results are compared with performance of bag-of-words summarization. Centroid summarization using word embedding performs comprehensively better than centroid summarization using bag-of-words.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
This document summarizes a semi-supervised clustering approach for classifying P300 signals for brain-computer interface (BCI) speller systems. It involves using k-means clustering on wavelet features extracted from EEG data, with some data points labeled to initialize the clusters. An ensemble of support vector machines is then trained on the clustered data points to classify new unlabeled P300 signals. The document outlines the P300 speller paradigm used to collect the EEG data, pre-processing steps like filtering and wavelet transformation, the seeded k-means semi-supervised clustering method, and using an ensemble SVM classifier trained on the clustered data for classification.
IRJET- Audio Data Summarization System using Natural Language ProcessingIRJET Journal
This document summarizes a research paper on an audio data summarization system using natural language processing. The system first converts an input audio file to text using speech recognition techniques. It then summarizes the text using natural language processing methods like tokenization, frequency analysis of words, sentence scoring, and selection of important sentences. The summarized text is output along with an optional audio summary generated using text-to-speech synthesis. The system provides a way to automatically summarize long-form audio content like podcasts or speeches into condensed textual or audio summaries.
Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
Arabic text categorization algorithm using vector evaluation methodijcsit
Text categorization is the process of grouping documents into categories based on their contents. This
process is important to make information retrieval easier, and it became more important due to the huge
textual information available online. The main problem in text categorization is how to improve the
classification accuracy. Although Arabic text categorization is a new promising field, there are a few
researches in this field. This paper proposes a new method for Arabic text categorization using vector
evaluation. The proposed method uses a categorized Arabic documents corpus, and then the weights of the
tested document's words are calculated to determine the document keywords which will be compared with
the keywords of the corpus categorizes to determine the tested document's best category.
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemIJERA Editor
The objective of this paper is to evaluate the quality of HMM based Marathi TTS system. The main advantage of HMM technique is its ability to allow the variation in voice easily. The output speeches produced in this method have greater impact on emotion, style and intonation. The naturalness and intelligibility are the two important parameters to decide the quality of synthetic speech. Depending on the parameters specified the results of synthetic speech are categorized into 4 categories: natural speech, high quality synthetic speech, low quality synthetic speech and moderate quality synthetic speech. The results are obtained by using CT, DRT and MOS test.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
The speech is most basic and essential method of communication used by person.On the basis of individual information included in speech signals the speaker is recognized. Speaker recognition (SR) is useful to identify the person who is speaking. In recent years speaker recognition is used for security system. In this paper we have discussed the feature extraction techniques like Mel frequency cepstral coefficient (MFCC), Linear predictive coding (LPC), Dynamic time wrapping (DTW), and for classification Gaussian Mixture Models (GMM), Artificial neural network (ANN)& Support vector machine (SVM).
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
The document compares the performance of two speech synthesis methods - unit selection and hidden Markov model (HMM) - for Hindi language. It finds that unit selection results in higher quality synthesized speech than HMM based on both subjective and objective quality measurements. Subjective measurements using mean opinion scores show unit selection receives higher average ratings. Objective measurements of mean square error and peak signal-to-noise ratio also indicate unit selection introduces less distortion compared to the original speech samples.
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
This document describes an expert system for speech synthesis of Standard Arabic text. It involves two main stages: 1) creation of a sound database and 2) text-to-speech transformation. The transformation process involves phonetic orthographic transcription of the text and then generating voice signals corresponding to the transcribed phonetic sequence. The expert system uses a knowledge base containing sound data and rewriting rules. It transcribes text using graphemes as basic units and then concatenates sound units from the database to synthesize speech. Tests achieved a 96% success rate in pronouncing sentences correctly. Future work aims to improve prosody and develop fully automatic signal segmentation.
The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS) Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM) and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.
The document describes the implementation of a natural sounding speech synthesizer for the Marathi language using English text input. It discusses concatenative speech synthesis using a unit selection approach. Over 28,580 syllables, words and sentences recorded from a female speaker were used to create an inventory of speech units. The synthesizer was tested and able to generate natural sounding output and waveforms. Formant frequencies were analyzed using MATLAB and PRAAT tools to evaluate the quality of the synthesized speech.
This document summarizes a research paper on developing a speech-to-text conversion system for visually impaired people using μ-law companding. The system uses MATLAB to analyze input speech signals, extract features, filter noise, and match signals to samples stored in a database to convert speech to text. A graphical user interface was created to input speech and display recognition results. The system achieved real-time speech recognition and conversion to text with high accuracy using μ-law companding techniques for signal processing and correlation comparisons to the stored samples.
SMATalk: Standard Malay Text to Speech Talk SystemCSCJournals
This document summarizes a research paper on SMaTTS, a rule-based text-to-speech synthesis system for Standard Malay. The system uses a sinusoidal synthesis method and some pre-recorded wave files to generate speech. It has two main phases: natural language processing (NLP) and digital signal processing (DSP). The NLP phase analyzes text and converts it to phonetic representations with prosody information. The DSP phase generates the speech waveform. The system was evaluated using diagnostic rhyme tests and mean opinion scores, and areas for improvement are discussed.
Hindi digits recognition system on speech data collected in different natural...csandit
This paper presents a baseline digits speech recognizer for Hindi language. The recording environment is different for all speakers, since the data is collected in their respective homes. The different environment refers to vehicle horn noises in some road facing rooms, internal background noises in some rooms like opening doors, silence in some rooms etc. All these recordings are used for training acoustic model. The Acoustic Model is trained on 8 speakers’ audio data. The vocabulary size of the recognizer is 10 words. HTK toolkit is used for building
acoustic model and evaluating the recognition rate of the recognizer. The efficiency of the recognizer developed on recorded data, is shown at the end of the paper and possible directions for future research work are suggested.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages: Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the input speech waveform into a sequence of acoustic feature vectors, each vector representing the information in a small time window of the signal. And then the likelihood of the observation of feature vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language Model (LM). The system will produce the most likely sequence of words as the output. The system creates the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
This document describes the development of a text-to-speech synthesizer for the Pali language. It discusses previous work on speech synthesis systems for Indian languages. It then outlines the methodology used, including developing a phone set and speech database for Pali, and using a unit selection approach for speech synthesis. The system was evaluated based on the naturalness of the synthesized speech output. Results showed smooth spectral changes at concatenation points and uniform spectral changes across syllable boundaries, indicating the system produces intelligible synthetic Pali speech.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTNathan Mathis
This document discusses the development of an Amharic text-to-speech synthesis system. It provides background on text-to-speech synthesis and challenges in developing systems for under-resourced languages like Amharic. The document outlines objectives to build an Amharic TTS prototype using deep learning methods like Tacotron 2 and WaveNet. It reviews related work on Amharic speech synthesis and discusses Amharic phonology. The proposed system aims to overcome limitations of previous rule-based and concatenative Amharic TTS systems by leveraging recent advances in neural network-based speech synthesis techniques.
An expert system for automatic reading of a text written in standard arabicijnlc
In this work we present our expert system of Automatic reading or speech synthesis based on a text
written in Standard Arabic, our work is carried out in two great stages: the creation of the sound data
base, and the transformation of the written text into speech (Text To Speech TTS). This transformation is
done firstly by a Phonetic Orthographical Transcription (POT) of any written Standard Arabic text with
the aim of transforming it into his corresponding phonetics sequence, and secondly by the generation of
the voice signal which corresponds to the chain transcribed. We spread out the different of conception of
the system, as well as the results obtained compared to others works studied to realize TTS based on
Standard Arabic.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An optimized approach to voice translation on mobile phoneseSAT Journals
Abstract Current voice translation tools and services use natural language understanding and natural language processing to convert words. However, these parsing methods concentrate more on capturing keywords and translating them, completely neglecting the considerable amount of processing time involved. In this paper, we are suggesting techniques that can optimize the processing time thereby increasing the throughput of voice translation services. Techniques like template matching, indexing frequently used words using probability search and session-based cache can considerably enhance processing times. More so, these factors become all the more important when we need to achieve real-time translation on mobile phones. Keywords:-Optimized voice translation, mobile client, speech recognition, language interpretation, template matching, probability search, session- based cache.
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
This document discusses speech feature extraction and classification techniques for speech recognition systems. It provides an overview of common feature extraction methods like MFCC and LPC, and classification algorithms like ANN and SVM. MFCC mimics human auditory perception but provides weak power spectrum, while LPC is easy to calculate but does not capture information at a linear scale. ANN can learn from data but is complex for large datasets, while SVM is accurate and suitable for pattern recognition but requires fixed-length coefficients. The document evaluates these techniques and concludes that MFCC performance is more efficient than LPC for feature extraction in speech recognition.
speech segmentation based on four articles in one.Abebe Tora
speech segmentation based on four articles in one. the aim of this review is comparing four articles based on their methods, findings, limitations, evaluation metrics, strength and contribute to area. if any one want to obtain articles, contact me. abebetora79@gmail.com.
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
This paper presents a design for rule-based machine translation system for English to Marathi language pair. The machine translation system will take input script as English sentence and parse with the help of Stanford parser. The Stanford parser will be used for main purposes on the source side processing, in the machine translation system. English to Marathi Bilingual dictionary is going to be created. The system will take the parsed output and separate the source text word by word and searches for their corresponding target words in the bilingual dictionary. The hand coded rules are written for Marathi inflections and also reordering rules are there. After applying the reordering rules, English sentence will be syntactically reordered to suit Marathi language
Similar to Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System for Marathi Using Three Level Fall Back Technique (20)
An Examination of Effectuation Dimension as Financing Practice of Small and M...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Does Goods and Services Tax (GST) Leads to Indian Economic Development?iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Childhood Factors that influence success in later lifeiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Customer’s Acceptance of Internet Banking in Dubaiiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Consumer Perspectives on Brand Preference: A Choice Based Model Approachiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Student`S Approach towards Social Network Sitesiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Broadcast Management in Nigeria: The systems approach as an imperativeiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Study on Retailer’s Perception on Soya Products with Special Reference to T...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladeshiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...iosrjce
1. The document describes a study that designed a balanced scorecard for a nonprofit organization called Yayasan Pembinaan dan Kesembuhan Batin (YPKB) in Malang, Indonesia.
2. The balanced scorecard translated YPKB's vision and mission into strategic objectives across four perspectives: financial, customer, internal processes, and learning and growth.
3. Key strategic objectives included donation growth, budget effectiveness, customer satisfaction, reputation, service quality, innovation, and employee development. Customers perspective had the highest weighting, suggesting a focus on public service over financial growth.
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Media Innovations and its Impact on Brand awareness & Considerationiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Customer experience in supermarkets and hypermarkets – A comparative studyiosrjce
- The document examines customer experience in supermarkets and hypermarkets in India through a survey of 418 customers.
- It finds that in supermarkets, previous experience, atmosphere, price, social environment and experience in other channels most influence customer experience, while in hypermarkets, previous experience, product assortment, social environment and experience in other channels are most influential.
- The study provides insights for retailers on key determinants of customer experience in each format to help them improve strategies and competitive positioning.
Social Media and Small Businesses: A Combinational Strategic Approach under t...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Implementation of Quality Management principles at Zimbabwe Open University (...iosrjce
This document discusses the implementation of quality management principles at Zimbabwe Open University's Matabeleland North Regional Centre. It begins with background information on ZOU and the importance of quality management in open and distance learning institutions. The study aimed to determine if quality management and its principles were being implemented at the regional centre. Key findings included that the centre prioritized customer focus and staff involvement. Decisions were made based on data analysis. The regional centre implemented a quality system informed by its policy documents. The document recommends ensuring staffing levels match needs and providing sufficient resources to the regional centre.
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Explore the essential graphic design tools and software that can elevate your creative projects. Discover industry favorites and innovative solutions for stunning design results.
International Upcycling Research Network advisory board meeting 4Kyungeun Sung
Slides used for the International Upcycling Research Network advisory board 4 (last one). The project is based at De Montfort University in Leicester, UK, and funded by the Arts and Humanities Research Council.
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System for Marathi Using Three Level Fall Back Technique
1. IOSR Journal of VLSI and Signal Processing (IOSR-JVSP)
Volume 5, Issue 6, Ver. II (Nov -Dec. 2015), PP 31-35
e-ISSN: 2319 – 4200, p-ISSN No. : 2319 – 4197
www.iosrjournals.org
DOI: 10.9790/4200-05623135 www.iosrjournals.org 31 | Page
Approach of Syllable Based Unit Selection Text- To-Speech
Synthesis System for Marathi Using Three Level Fall Back
Technique
Sangramsing N. Kayte1
, Monica Mundada1
, Dr. Charansing N. Kayte2
,
Dr.Bharti Gawali*
1,*Department of Computer Science and Information Technology Dr. Babasaheb Ambedkar Marathwada
University, Aurangabad
2Department of Digital and Cyber Forensic, Aurangabad, Maharashtra
Abstract: A text-to-speech synthesis system is one that is capable of producing intelligible and natural speech
corresponding to any given text. A popular approach to speech synthesis is unit selection synthesis (USS). The
current work focuses on developing a USS system for Marathi. Literature suggests that syllable is a suitable unit
for Indian languages. Creating a database that covers all the syllables of Marathi is tedious and expensive, and
the footprint size of the system would be in the order of GBs. Therefore, to reach a compromise between the
quality and the footprint size, the current work proposes to use a database containing all the phonemes,
consonant-vowel (CV) units, and the most frequently occurring syllables of Marathi. This way, given a text, it is
first decomposed into syllables. If a particular syllable is not available in the database, it is broken down to CV
units and phonemes. The appropriate speech units are then chosen from the database and concatenated to
produce a speech utterance. The performance of the system will be evaluated subjectively by the mean opinion
score.
Keywords: Speech Synthesis, Unit Selection, Text-to-Speech, Mean Opinion Score
I. Introduction
Speech Synthesis is the artificial generation of speech signal from text. Speech synthesis system has
mainly two parts;first part converts speech to linguistic specification (grapheme-to-phoneme conversion). The
second part generates the speech waveform. An unrestricted text-to-speech system is expected to produce a
speech signal, which is corresponding to the given text[1]. A popular approach to speech synthesis are Unit
Selection Synthesis (USS), Hidden Markov model-based speech synthesis[2][3]. A successful concatenative
speech synthesis technique is the unit selection synthesis. Unit selection speech synthesis is used to synthesize
speech for the given input text. Festival framework is requiredfor synthesizing the voice for unit selection
speech synthesis approach. It involves the concatenation of appropriate pre-recorded speech units, for the given
text, based on the target and concatenation costs. The target cost identifies the units in the database that best
match the required specification and the concatenation cost identifies the units that join smoothly. The speech
units can be words or sub-word units such as phonemes, di-phones, syllables, etc. The quality of speech
synthesized varies based on the size of the unit. If the units are longer, naturalness is better-preserved and the
number of concatenation point are less. However, the amount of data required to train the synthesis system
increases, by increasing the unit-size, thereby increasing the footprint size of the system. Earlier phoneme based
system and CV based systems are developed, and the result shows that phoneme system performs better since
the concatenation points are less. For smaller units, phoneme is selected as the best unit, but there are more
sonic glitches in the phoneme based system. For building a system using syllable as a unit, more amount of
training speech data are required, creating a huge database is tedious and not a time consuming process and also
more memory space is required for building such a system. If less amount of data are used then some syllables
[2][4]may not be present in the database. Hence there are some issues in maintaining the good quality of the
synthesized Speech and naturalness has to be considered to a greater extent for a better synthesized system. In
the current work, the unit selection speech synthesis systems for Marathi are developed with less amount of
speech data with syllable as the major unit. The given text is first decomposed to syllables. If the particular
syllable in the given input is not present in the internal database, then the syllable unit is broken down into CV
units and phonemes. If the particular CV is not present in the database, then it picks the phoneme unit from the
database for developing the synthesis system. Finally the appropriate speech units are chosen from the entire
database and the units are concatenated which are used to produce a speech utterance. The given text is
synthesized by combining the pre-recorded units from the database. The performance of the synthesized voice
is analyzed by obtaining the mean opinion score. The creation of the database is tedious process, so in this
2. Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System for Marathi Using…
DOI: 10.9790/4200-05623135 www.iosrjournals.org 32 | Page
approach with less amount of training data, the system generates a speech which is highly intelligible and
natural [5] and quality is also maintained. The paper is organized as given below: Section II describes the speech
corpus for building the synthesis system. Section III describes the Unit Selection systems developed for Marathi
in detail. Section IV describes the performance analysis for the developed systems. Section V describes the
conclusion of the given work [12-20]..
II. Speech Corpus
The speech corpus consists of one hour of recorded Marathi speech data for training the system. The
data are recorded from a native Marathi speaker. The recorded speech has the sampling rate of 16 kHz.
Recording was done in a noise free environment at laboratory using a unidirectional carbon microphone. An M-
Audio Mixer was used to suppress the noise and was set in mono mode to capture only the speaker’s voice and
the sampling rate was set to be 160000 KHz. For recording, Audacity software was used. Precautions were
taken to maintain a constant energy in the recorded speech. Festival supports different forms of audio files such
as ulaw, snd, aiff (audio interchange file format) and riff (resource interchange file format chunks) format [6].
Segmentation
Speech segmentation is the process of identifying the boundaries between words, syllables, or
phonemes in spoken natural languages. The lowest level of speech segmentation is the breakup and
classification of the sound signal into a string of phones. The lab files are required for the corresponding wave
files. The lab files are generated by segmenting the data. The data are segmented by using forced-viterbi
algorithm. Initially five minutes of data are manually segmented, which contains of 30 sentences. The manual
segmentation is done at phoneme level for the representations waveforms and the spectrograms, HTK
transcriptions, TIMIT transcriptions, etc. Models are generated for all the phonemes and the lab files are
generated. Forced-viterbi alignment are performed to segment the rest of the data. The following steps are
used to segment the data:
1) By using 5 minutes of data and the corresponding time aligned phonetic transcriptions, context-independent
phoneme models are trained.
2) Using these models and the phonetic transcriptions, the speech data are segmented using forced-Viterbi
alignment procedure.
3) Using the obtained phonetic transcription (phone-level label files), new context-independent phoneme
models are trained. 4) Steps 2 and 3 are repeated for times.
5) After iterations, the resultant HMMs are used to segment the entire speech data, again. These boundaries are
considered as final boundaries. Finally the lab files are obtained by itearatively performing forced-viterbi
alignment for the speech data.
Figure 1 Segmented wave file
From the phoneme level lab file obtained, the CV and syllable lab files are obtained by using the script
which gives the phonemes as CV units and syllable units. Finally phoneme, CV and syllable lab files has been
created and the segmentation is checked manually.
III. Unit Selection Speech Synthesis
In the current work, the phoneme based system, CV based system, syllable based system and syllable
with fall back systems were developed using Unit Selection Speech Synthesis to compare the performance of all
3. Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System for Marathi Using…
DOI: 10.9790/4200-05623135 www.iosrjournals.org 33 | Page
the systems. Concatenative synthesis generates speech by connecting natural, prerecorded speech units. These
units can be words, syllables, half-syllables, phonemes, diphones or trip hones [7]. The unit length affects the
quality of the synthesized speech. With longer units, the naturalness increases, less concatenation points are
needed, but more memory is needed and the number of units stored in the database becomes very numerous.
With shorter units, less memory is needed, but the sample collecting and labeling techniques become more
complex.
The unit selection is based on two cost functions [12-20].
– Target cost, Ct (ui, ti) is an estimate of the difference between a database unit, ui and the target, ti
which it is supposed to represent.
– Concatenation cost, Cc (ui-1, ui) , is an estimate of the quality of a join between consecutive units, ui-1 and
ui
The architecture of unit selection system is shown in the below figure:
The following systems are developed using unit selection speech synthesis:
Phoneme based System: The phoneme based system is developed using the phoneme level lab files. In
the word “SangraM”. It is transcribed as /s/ /a/ /n/ /g//r/ /a/ /M/. The utterances are generated for each of the
labels and the corresponding text. Phoneme is the smaller unit in the speech corpus, hence the concatenation
points are more in the phoneme based system. Finally the system is built with one hour of speech data. The
phone-set features have been defined for all the phonemes in the Marathi language. Using the festival
framework, the system is developed and the performance of the system is also analyzed.
CV-Based System: The CV-based system was developed using CV level lab files. The label files for
the CV-based system are obtained from the phoneme-level label files by combining two successive phonemes if
a vowel follows a consonant. The CV-based system contains vowels (V), consonants(C), and consonant-vowel
(CV) units. For example, the word “sangraM” is split as /s/ /a/ /n/ /g//r/ /a/ /M/.
Syllable based system:
The syllable based unit selection speech synthesis systems [8] are developed. More amount of training
data are required to develop a synthesis system with syllable as a unit. In this system, there are less number of
examples so some of the syllable units are not synthesized. The syllable level lab files are generated from the
CV lab files, whereas the CV lab files are obtained from the phoneme lab files by concatenating the consonants
followed by a vowel. Finally these lab files with corresponding text and wave files are required for building the
synthesis system.
Syllable with three level fall back:
The syllable based system with fallback is similar to syllable based system but in addition the CV and
phoneme lab files are also considered. The systems are built using all the sub word units such as phoneme,
Consonant vowel (CV) and syllable units. The syllables which occur frequently and contains more number of
examples are sorted out, and for these syllables alone the syllable units are picked. In case if particular syllable
is not present in the database the syllables are decomposed into CV units. If the particular CV is not present in
4. Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System for Marathi Using…
DOI: 10.9790/4200-05623135 www.iosrjournals.org 34 | Page
the database finally it is fall back to phoneme units and the corresponding word has been synthesized and the
speech was generated. In this system, if the syllable units are not present in the database, then also the text is
synthesized by fall back to CV units and phonemes.Hence the syllable with fallback requires only less amount
of data but can synthesize all the syllable units and generate the corresponding speech voice. Therefore we
introduce a system which contains all the units and the text is synthesized. Therefore we infer that this systems
outperforms the phoneme, CV and syllable based systems [9]. The steps for building voice in festival
framework are as follows:
The features for each of the units has to be mentioned and is used for the pronunciation of the particular
unit, the features are vowel/ consonant/ nasal/ fricative etc.
The utterance are created for all the sentences used in the entire database.
Letter-to-sound rules are used to break a sentence into required sub-word units, to generate the initial
utterances.
Extracting the pitch marks and building LPC coefficients.
Post processing is done to tune the pitch marks. Pitch marking plays a vital role in the extraction of Mel
cepstral coefficients, because synchronous framing was used for Festival.[9][10]
The units in the database are clustered using Classification and Regression Trees (CART) [6]. The
number of questions has been defined to classify the units into clusters. If the number of questions are
greater than the number of units in a cluster are less and tree becomes deeper.
Testing of the voice for out domain sentences.
To synthesize a new sentence or paragraph, the text is given as the input, the system splits the text into
the required sub-word units and identifies the most suitable unit to be concatenated based on the target cost and
the concatenation cost. The utterance are created for the corresponding text with the units selected and the final
speech is synthesized from the utterance.
IV. Performance Analysis
MOS is calculated for subjective quality measurement. It is calculated for the synthesized speech using
the Unit selection synthesis. It was counseled to the listeners that they have to score between 01 to 05 (Excellent
– 05 Very good – 04 Good – 03 Satisfactory – 02 Not understandable-01) for understandable. The MOS
scores were collected from 10 native listeners. 30 wave files were synthesized and the Mean Opinion Score was
obtained for each of the wave files[6] [11].
Table 1: MOS Obtained for all the Systems
Syllable with Fall back
System
Phoneme based
System
CV based System Syllable based System
4 4.5 4.7 3.5
The testing is done by synthesizing sentences from out domain, which is not present in the training
data. The corresponding wave files are synthesized form the festival system for the sentences or paragraphs.
From the MOS score obtained, the syllable with fall back system has the highest score and intelligibility.
V. Conclusion
Thus from the systems developed and the observations made, we conclude that the syllable with fall
back system outperforms the other systems. The synthesized speech is highly intelligible and the quality is
improved to a greater extent. Although the syllable based system is intelligible, more amount of training data are
required to synthesize all the out domain sentences. The voices are tested and analyzed and found from the
analysis made, it is observed that many of the syllables are missing in syllable based system due to less amount
of training data in the corpus, if data increased these can be neglected, which results in high memory and more
time for creating a large speech database. So we conclude that the systems developed using fall back can be
considered as the best synthesis technique to building speech voices.
References
[1]. Sangramsing Kayte, Dr. Bharti Gawali “Marathi Speech Synthesis: A review” International Journal on Recent and Innovation
Trends in Computing and Communication ISSN: 2321-8169 Volume: 3 Issue: 6 3708 – 3711
[2]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "A Review of Unit Selection Speech Synthesis International Journal
of Advanced Research in Computer Science and Software Engineering -Volume 5, Issue 10, October-2015
[3]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "A Marathi Hidden-Markov Model Based Speech Synthesis System"
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 34-39e-ISSN: 2319 –
4200, p-ISSN No. : 2319 –4197
5. Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System for Marathi Using…
DOI: 10.9790/4200-05623135 www.iosrjournals.org 35 | Page
[4]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte “Di-phone-Based Concatenative Speech Synthesis Systems for
Marathi Language” OSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 5, Ver. I (Sep –Oct. 2015), PP 76-
81e-ISSN: 2319 –4200, p-ISSN No. : 2319 –4197
[5]. Young S., G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev and P. Woodland, “HTK
Book”, ver 3.2.1 Cambridge University Engineering department, 2002.
[6]. Sangramsing Kayte, Monica Mundada, Santosh Gaikwad, Bharti Gawali "PERFORMANCE EVALUATION OF SPEECH
SYNTHESIS TECHNIQUES FOR ENGLISH LANGUAGE " International Congress on Information and Communication
Technology 9-10 October, 2015
[7]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "Implementation of Marathi Language Speech Databases for Large
Dictionary" IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 40-45e-
ISSN: 2319 –4200, p-ISSN No. : 2319 –4197
[8]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "A Corpus-Based Concatenative Speech Synthesis System for
Marathi" IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 20-26e-ISSN:
2319 –4200, p-ISSN No. : 2319 –4197
[9]. Sangramsing Kayte, Monica Mundada "Study of Marathi Phones for Synthesis of Marathi Speech from Text" International Journal
of Emerging Research in Management &Technology ISSN: 2278-9359 (Volume-4, Issue-10) October 2015
[10]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte "Di-phone-Based Concatenative Speech Synthesis System for Hindi"
International Journal of Advanced Research in Computer Science and Software Engineering -Volume 5, Issue 10, October-2015
[11]. Sangramsing Kayte, Monica Mundada, Dr. CharansingKayte " Performance Calculation of Speech Synthesis Methods for Hindi
language IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue 6, Ver. I (Nov -Dec. 2015), PP 13-19e-ISSN:
2319 –4200, p-ISSN No. : 2319 –4197
[12]. Sangramsing Kayte, Monica Mundada,Dr. Charansing Kayte” Speech Synthesis System for Marathi Accent using FESTVOX”
International Journal of Computer Applications (0975 – 8887) Volume 130 – No.6, November2015
[13]. Sangramsing Kayte, Monica Mundada,Dr. Charansing Kayte “Screen Readers for Linux and Windows – Concatenation Methods
and Unit Selection based Marathi Text to Speech System” International Journal of Computer Applications (0975 – 8887) Volume
130 – No.14, November 2015
[14]. Sangramsing Kayte, Monica Mundada,Dr. Charansing Kayte “ Performance Evaluation of Speech Synthesis Techniques for
Marathi Language “ International Journal of Computer Applications (0975 – 8887) Volume 130 – No.3, November 2015
[15]. Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi, “ Hidden Markov Model based Speech Synthesis: A Review”
International Journal of Computer Applications (0975 – 8887) Volume 130 – No.3, November 2015
[16]. Sangramsing N. Kayte ,Monica Mundada,Dr. Charansing N. Kayte, Dr.Bharti Gawali "Approach To Build A Marathi Text-To-
Speech System Using Concatenative Synthesis Method With The Syllable” Sangramsing Kayte et al.Int. Journal of Engineering
Research and Applications ISSN: 2248-9622, Vol. 5, Issue 11, (Part-4) November 2015, pp.93-97
[17]. Sangramsing N. Kayte, Dr. Charansing N. Kayte,Dr.Bharti Gawali* "Grapheme-To-Phoneme Tools for the Marathi Speech
Synthesis" Sangramsing Kayte et al.Int. Journal of Engineering Research and Applications ISSN: 2248-9622, Vol. 5, Issue 11, (Part
-4) November 2015, pp.86-92
[18]. Sangramsing Kayte "Duration for Classification and Regression Tree for Marathi Text-to-Speech Synthesis System" Sangramsing
Kayte Int. Journal of Engineering Research and Applications ISSN: 2248-9622, Vol. 5, Issue 11, (Part-4)November2015
[19]. Sangramsing Kayte "Transformation of feelings using pitch parameter for Marathi speech" Sangramsing Kayte Int. Journal of
Engineering Research and Applications ISSN: 2248-9622, Vol. 5, Issue 11, (Part -4) November 2015, pp.120-124
[20]. Sangramsing N. Kayte, Monica Mundada, Dr. Charansing N. Kayte, Dr.BhartiGawali "Automatic Generation of Compound Word
Lexicon for Marathi Speech Synthesis" IOSR Journal of VLSI and Signal Processing (IOSR-JVSP)Volume 5, Issue 6, Ver. II (Nov
-Dec. 2015), PP 25-30e-ISSN: 2319 – 4200, p-ISSN No. : 2319 – 4197