SlideShare a Scribd company logo
1 of 13
Download to read offline
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
DOI : 10.5121/ijist.2012.2406 57
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
Divya Bansal1
, Ankita Goel2
, Khushneet Jindal3
School of Mathematics and Computer Applications,
Thapar University, Patiala (Punjab) – India
1
divyabansal150@yahoo.com
2
goel.ankitathapar@gmail.com
3
khushneet.jindal@thapar.edu
ABSTRACT
This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in
which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi
speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden
Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text
messages and caller’s identity in English language are mapped to tokens in Punjabi language which are
further concatenated to form speech with certain rules and procedures.
To build the synthesizer we recorded the speech database and phonetically segmented it, thus first
extracting context-independent monophones and then context-dependent triphones. For e.g. for word
bharat monophones are a, bh, t etc. & triphones are bh-a+r. These speech utterances and their phone level
transcriptions (monophones and triphones) are the inputs to the speech synthesis system. System outputs the
sequence of phonemes after resolving various ambiguities regarding selection of phonemes using word
network files e.g. for the word Tapas the output phoneme sequence is ਤ,ਪ,ਸ instead of phoneme sequence
ਟ,ਪ,ਸ .
KEYWORDS
Hidden Markov models, Context-dependent acoustic modeling, Punjabi speech corpora.
1. INTRODUCTION
Speech is the most important form of communication in everyday life. However, the dependence
of human computer interaction on written text and images makes the use of computers impossible
for visually and physically impaired and illiterate masses [1]. Text-to-speech synthesis (TTS)
helps speech processing researchers to act upon this problem by synthesizing speech (in local
languages e.g. Tamil, Hindi, Punjabi etc.) from written text like in browsers, mobile phones etc.
Speech can be synthesized by mainly three methods: Articulatory synthesis, Concatenative
synthesis and Formant synthesis
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
58
Articulatory synthesis tries to model the human speech production system (especially vocal tract
system, various articulators viz. lip, tongue, jaw etc.) and articulatory processes directly.
However, it is also the most difficult method to implement due to lack of knowledge of the
complex human articulation organs.
Concatenative speech synthesis systems can synthesize high quality and more natural sound
speech but in order to synthesize speech with various voice characteristics such as speaker
individualities, speaking styles, emotions, etc., a large amount of speech corpus and memory is
required as stored basic speech units (like syllables, diphones etc.) are concatenated to form word
sequence using pronunciation dictionary.
Formant synthesis is based on the rules which describe the resonant frequencies of the vocal
tract. The formant method uses the source-filter model of speech production, where speech is
modeled by parameters of the filter model [2]. Rule-based formant synthesis can produce quality
speech which sounds unnatural, since it is difficult to estimate the vocal tract model and source
parameters [3].
One more approach for speech synthesis is Hidden Markov Model based synthesis i.e. HTS. It
was initially implemented for Japanese language but, today, can be implemented for various
languages viz. Hindi, English, Tamil etc. It is used easily for implementing prosody and various
voice characteristics on the basis of probabilities without having large databases. In this approach
speech utterances are used to extract spectral (Mel-Cepstral Coeff.), excitation parameters and
model context dependent phone models which are, in turn, concatenated and used to synthesize
speech waveform corresponding to the text input.
This paper is organized as follows: In section 2 we present Hidden Markov Model based speech
synthesis, section 3 describes overall implementation of Hidden Markov Model based Text-to-
Speech System on Hidden Markov Model Toolkit architecture from feature extraction to training
of system, the fourth part contains results of speech synthesis and finally the fifth part concludes
the paper with Discussion and Conclusion.
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
59
Figure 1. HMM Based Speech Synthesis System
2. HIDDEN MARKOV MODEL BASED SPEECH SYNTHESIS
In speech synthesis, Viterbi algorithm is used to find the most probable path through Hidden
Markov Models that can generate speech signal feature vectors like MFCC (Mel Cepstral Coeff.)
which are used, in turn, to generate speech signal.
2.1 Training Part
In this the spectral parameters i.e. Mel Cepstral Coefficients and excitation parameters i.e.
fundamental frequency F0 are extracted from the speech database and concatenated further to use
them for Hidden Markov Models training acoustic models. The training of phone Hidden Markov
Models using pitch and Mel cepstrum simultaneously is enabled in a unified framework by using
multi-space probability distribution Hidden Markov Models and multi-dimensional Gaussian
distributions [4]. The simultaneous modeling of pitch and spectrum results in the set of context-
dependent Hidden Markov Models. [2]
2.2 Synthesis Part
In this part, the speech parameters like Mel Frequency Cepstral Coefficients etc. are generated
according to the text given as input from the context dependent Hidden Markov Model phone
models e.g. for word tapas phone model is t-a + p etc. which are obtained as output from the
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
60
training part. These generated speech parameters are, in turn, used to synthesize speech signal as
final output.
This approach is very flexible as is implemented by the acoustical features of phone models
obtained from speech corpora. Thus characteristics of synthesized speech can easily be modified
by altering Hidden Markov Model parameters and acoustical features.
3. HTS IMPLEMENTATION ON HTK ARCHITECTURE
3.1 Signal Features Generation
HMM (Hidden Markov Model) have three model parameters (A, B, π) that is there are finite
number, say N, of states in Hidden Markov Model. At each time t, a new state is entered based on
the transition probability distribution (A) which depends on previous state. After each transition,
an observation output symbol depends on the current state based on output probability distribution
(B) and π is initial state probability.
In order to synthesize speech, most probable sequence of state feature vectors  is required to
find from Hidden Markov Model λ, which contains concatenated context-dependent triphones like
t-o+n or context-independent monophones like t, o, n (phone transcriptions) corresponding to the
symbols in a word w like Tony which is present in text that is required to be synthesized. These
acoustic phone models are obtained in training phase after modeling Hidden Markov Models by
various feature parameters obtained from stored speech corpora.
Thus we need to generate feature vector sequence Â= Aq1, Aq2, Aq3…AqL of length L by
maximizing the likelihood P(A | λ) of a Hidden Markov Models
 = arg max {P(A | λ)}
= arg max {∑Q P(A |q, λ) P(q | λ)}
(1)
In this equation P(A | λ) of a Hidden Markov Model is calculated by adding the product of joint
output probability P(A | q, λ) and state sequence probability P(q | λ) over all possible paths Q [4]
Where Q = q1, q2., qL is the path through the states of the model λ and qi is a state at time ti as in
Fig. 2
2(a) 2(b)
Figure 2(a). Concatenated HMM chain 2(b). HMM chain for word bharat
Aqi
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
61
Thus we are using Viterbi approximation as we need to find most probable state sequence for
generating feature vector sequence  because searching for all possible paths through the model
is time consuming and complex.
The state sequence q^
of the model λ can be maximized independently of Â
q^
= arg max {P(q | λ, L)} (2)
Hidden Markov Model Toolkit represents output distributions by Gaussian mixture densities.
Thus output probability distribution of each state qi is represented by one Gaussian density
function with a mean vector µi and covariance matrix ∑i. The Hidden Markov Model λ is a set of
all means and covariance matrices for all N states:
λ = (µ1, ∑1, µ2, ∑2, µ3, ∑3, µN, ∑N). (3)
During Hidden Markov Model modeling of acoustic models, means vector µi and covariance
matrix ∑i are calculated initially from features extracted from speech corpora and re-estimated
for each state of all phone models.
3.2 Data Preparation and Feature Extraction
The training of Hidden Markov Model models and testing of speech synthesis system require
speech utterances and their phone level transcriptions. Punjabi speech corpora used for training
the system contains speech utterances of one female speaker. In phase-I we have considered
recording data that consists of 61 words (i.e words starting with letter ਤ and ਟ) that are arranged
in 17 samples and in phase-II training data of 81 words (words containing ਅ and ਆ) arranged in
23 samples is considered. The data is recorded using microphones at room environment. Distance
between speaker’s mouth and microphone is approximately 5-7 cm.[8] Samples are recorded at a
sampling rate of 8000 Hz, 16 bits bit depth and mono channel using Power Sound Editor.
Recorded speech files are stored in .wav format.
For training Hidden Markov Models each recorded sample need to have corresponding phone
level transcriptions. This is done using Hidden Markov Model Toolkit label editor HLEd that
generate phone level MLF (Master Label File) by using mkphones.led edit script. E.g. For sample
word “Tony” phone transcription generated is
Figure 3. Phone Transcription File (Phones0.mlf)
Further, the recorded speech samples are parameterized into sequence of excitation and spectral
features. For this Mel Frequency Cepstral Coefficients (MFCCs), which are derived from FFT
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
62
(Fast Fourier Transform) based log spectra are used. All input .wav files are converted to Mel
Frequency Cepstral Coefficient vectors by using HCOPY tool of Hidden Markov Model Toolkit.
The speech signals were windowed using a 25 ms Blackman window and 10 ms frame period.
The spectral feature vector consisted of 39 mel-cepstral coefficients including the zeroth
coefficient (=13) and its delta coefficients (=13) and acceleration coefficients (=13).
3.3 Training of Hidden Markov Model
Initially for training of Hidden Markov Model a prototype model, proto, is defined. It is initialized
using HMM Toolkit tool HCompV that computes the global mean and variance and set all of the
Gaussians in a given Hidden Markov Model to have the same mean and variance. 5-state left-to-
right Hidden Markov Models with no skips are used in which first and last states are non-emitting
states. The system is trained for 27 monophones models. These flat start monophones stored in
various hidden markov models directories are re-estimated using the embedded re-estimation tool
HERest following Baum-Welch Re-estimation theoretically. For each state of all the monophones
mean and variance vectors are estimated. After that the triphones models were made out of
monophones models and trained using HERest. These triphones are created from monophones by
HLEd tool following l-p+r (where p is phoneme, l & r are left & right context) structure for each
phoneme ‘p’ in monophones model making it context dependent e.g. in word Bharat, for phoneme
‘a’ triphones generated is bh-a+r.
Re-estimated monophones model obtained using HMM Toolkit:-
Figure 4. Monophone hmmdefs File
After making triphones, Decision tree state tying is performed by running HHEd tool of Hidden
Markov Model Toolkit. HHEd is used to cluster the states and then each cluster is tied. Decision
trees are based on asking questions about the left and right contexts of each triphones and find
those contexts which make the largest difference to the acoustics and which should therefore
distinguish clusters .[6]
Then we used edit script tree.hed, which contains the instructions regarding which contexts to
examine for possible clustering and the questions (QS) defined by user according to language.
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
63
Figure 5. Tree.hed File
Decision tree clustering of states is performed by TB commands.
Re-estimated triphones model obtained using Hidden Markov Model Toolkit :-
Figure 6. Triphone hmmdefs File
After decision tree clustering following file is obtained:
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
64
Figure 7. Clustered hmmdefs file
Figure 8. Decision Tree Clustering
3.4 Test Data Decoding
In this part the test data i.e. text which is to be converted to speech signals is given as input along
with re-estimated context-dependent Hidden Markov Models obtained after training phase.
According to the phoneme sequence in text labels the context-dependent Hidden Markov Models
are concatenated with the help of HVite tool and word network file wdnet. According to the
obtained state, the sequence of Mel cepstral coefficients and log F0 values including voiced/
unvoiced decisions are determined by maximizing the output probability of Hidden Markov
Model [3]. Hidden Markov Model Toolkit is analyzed and used to find the appropriate
pronunciation of a word from several alternate pronunciations of the words containing ਤ and ਟ
i.e. whether Tony corresponds to ਤੋਿਨ or ਟੋਿਨ and words containing ਅ or ਆ and corresponding
phonemes will be generated.
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
65
4. SPEECH SYNTHESIS RESULTS
The conflicting words will be dealt in phased manner. Initially to begin with, in the
testing phase, Text-To-Speech data include:
Test-I : 28 Punjabi words with ਤ or ਟ and 45 Punjabi words with ਅ or ਆ
Test-II: 33 Punjabi words with ਤ or ਟ and 36 Punjabi words with ਅ or ਆ
The text labels are transformed into triphones format with the help of HMM Toolkit. For
each word we have recorded a wav file i.e. bharat.wav etc. Appropriate pronunciation is
selected from the network of alternate pronunciations.
A MLF (Master Label File) phonemes.mlf was generated by HLed tool that contained the
correct phoneme sequence, amongst various other alternatives, of the test word contained
in recout.mlf file which was initially produced by HVite tool of HMM Toolkit by making
use of dictionary dict and word network file wdnet. Phonemes of different test words
were arranged in different .lab files according to the input test samples.
Through HMM Toolkit correct sequences of phonemes is generated to a greater extent
and satisfied results are obtained. Further rule based approach is used to formulate certain
rules for generating phoneme sequences of words whose sequences are not correctly
produced by HMM Toolkit.
Figure 9. (phonemes.mlf) Phoneme File Generated
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
66
After testing HTK for various test samples having words containing ਤ or ਟ and ਅ or ਆ
for which system is trained in training phase following results are obtained:-
• Both Test-I with 45 test words and Test-II with 36 words (containing ਅ or ਆ)
and words obtained, after testing, with correct and incorrect phoneme sequences
are represented by following bar graphs.
Figure 10. Test Samples for words containing ਅ or ਆ
• Number of words in Test Samples containing ਤ or ਟ with total 28 and 33 words
and no of correct and incorrect phoneme sequences obtained are shown by
following bar graph.
•
Figure 11. Test Samples for words containing ਤ or ਟ
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
67
• Comparisons of overall accuracies for both sets of test i.e. one containing words
with ਤ or ਟ and one with ਅ or ਆ is depicted in following cylindrical bar graph.
Figure 12. Comparison of Overall Accuracies
Figure 13 presents the result of generated speech for the sentences using Matlab code
mfcc2spectrum [10].
Figure 13. Spectrum representation from Mel Cepstral Coeff. data of word TONY
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
68
5. DISCUSSION AND CONCLUSION
HMM-based Punjabi speech synthesis system is presented in this paper. The developed Text-to-
Speech was trained in phase -I on 17 samples with total 61 words all starting with letter ਤ and ਟ
and tested for selection of appropriate phoneme sequence on 30 Punjabi words in test 1 and
trained for 23 samples containing 81 words containing ਅ and ਆ and tested for 45 selected words
in corresponding test-1. Hidden Markov Model Text-to-Speech system approach is very effective
for developing Text-to-Speech systems for various languages and can easily implement changes
in voice characteristics of synthesized speech with the help of speaker adaptation technique
developed for speech recognition [7]. In order to improve efficiency, context-dependent phone
models used for synthesis need to be improvised by recording, annotating more Punjabi speech
data and applying filters using custom rules/ procedures.
REFERENCES
[1] S.D.Shirbahadurkar and D.S.Bormane, (2009) “Marathi Language Speech Synthesizer Using
Concatenative Synthesis Strategy (Spoken in Maharashtra, India)”, Second International Conference
on Machine Vision, pp. 181-185.
[2] S. Martincic-Ipsic and I. Ipsic, (2006) “Croatian HMM-based speech synthesis,” Journal of
Computing and Information Technology, Vol. 14, no. 4, pp. 307–313.
[3] D. H. Klatt, “Review of Text to Speech Conversion for English”, Journal of the Acoustic Society of
America, 1987. Vol. 82, pp. 737–793.
[4] K. Tokuda, et al. (2003), “Multi-Space Probability Distribution HMM”, IEICE Trans. Inf. & System,
Vol. E85-D, No.3, pp. 455-464.
[5] L. R. Rabiner, (1989), “A tutorial on hidden Markov models and selected applications in speech
recognition”, Proc. IEEE, Vol. 77, No. 2, pp. 257–286.
[6] S. Young, et al. (2002), “The HTK Book (for HTK Version 3.2)”, Cambridge University Engineering
Department, Cambridge, Great Britain.
[7] K. Tokuda, H. Zen, A.W. Black, (2002), “An HMM-based speech synthesis system applied to
English”, Proc. of 2002 IEEE Workshop in Speech Synthesis.
[8] K. Kumar and R. K. Aggarwal, (2011), “Hindi Speech Recognition System Using HTK”,
International Journal of Computing and Business Research, Vol. 2, no. 2, pp. 2229-6166.
[9] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, (1999), “Simultaneous
modeling of spectrum, pitch and duration in HMM-based speech synthesis,” in Proc. Eurospeech, pp.
2347–2350.
[10] N. Meseguer, “Speech Analysis for Automatic Speech Recognition”, Master’s thesis, Norwegian
University of Science and Technology, Norway.
[11] S. King, (2011), “An introduction to statistical parametric speech synthesis”, Sadhana - Engineering
Science, Vol. 36 no. 5, pp. 837-852.
[12] P. Singh, (2005), Development of A Punjabi Text-To-Speech Synthesis System, M.Tech Thesis,
Punjabi University.
[13] P. Singh and G. S. Lehal, (2006), Text-to-Speech Synthesis system for Punjabi language,
Proceedings of International Conference on Multidisciplinary Information Sciences and Technologies,
pp. 388-391.
[14] P. Gera, (2006), Text-To-Speech Synthesis for Punjabi Language, M.Tech Thesis, Thapar University.
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012
69
Authors
Khushneet Jindal received his Post Graduation degree in M.Tech.(Information
Technology) from KoSU, Master of Computer Applications from Punjabi University,
Patiala, Punjab, India and Graduation in BSc.(Computer Applications) from Khalsa
College Patiala, Punjab, India. His research interests are in the area of Speech analysis
and processing, Character Recognition & TTS.
Divya Bansal received her graduation degree in Computer Science and Engineering
from Punjab Technical University and is currently doing her post graduation in
Computer Science and Application at Thapar College of Engineering and Technology,
Patiala, Punjab, India. Her research interests are in the area of Speech analysis and
processing.

Ankita Goel graduation received her graduation degree in Information Technology from
Indraprastha University and is currently doing her post graduation in Computer Science
and Application at Thapar College of Engineering and Technology, Patiala, Punjab,
India. Her
.

More Related Content

What's hot

Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...IJERA Editor
 
Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Modelgerogepatton
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELijaia
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationeSAT Publishing House
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...csandit
 
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITIONSEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITIONcscpconf
 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCTELKOMNIKA JOURNAL
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEAbdurrahimDerric
 

What's hot (15)

Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
 
Extractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language ModelExtractive Summarization with Very Deep Pretrained Language Model
Extractive Summarization with Very Deep Pretrained Language Model
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
 
Mjfg now
Mjfg nowMjfg now
Mjfg now
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
D111823
D111823D111823
D111823
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITIONSEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
SEARCH TIME REDUCTION USING HIDDEN MARKOV MODELS FOR ISOLATED DIGIT RECOGNITION
 
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on HmmEquirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCC
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 

Similar to PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK

SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingiosrjce
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech SynthesisCSCJournals
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignmentskevig
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEIRJET Journal
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...IRJET Journal
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesTELKOMNIKA JOURNAL
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...AIRCC Publishing Corporation
 
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTAMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTNathan Mathis
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifiereSAT Publishing House
 

Similar to PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK (20)

SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
H010625862
H010625862H010625862
H010625862
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law companding
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech Synthesis
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approaches
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTAMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifier
 

More from ijistjournal

Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...ijistjournal
 
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINA MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINijistjournal
 
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDYDECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDYijistjournal
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...ijistjournal
 
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...ijistjournal
 
Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...ijistjournal
 
Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...ijistjournal
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSijistjournal
 
Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...ijistjournal
 
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHMANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHMijistjournal
 
Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...ijistjournal
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...ijistjournal
 
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial DomainColour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domainijistjournal
 
Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...ijistjournal
 
International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)ijistjournal
 
Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...ijistjournal
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithmijistjournal
 
Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...ijistjournal
 
Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...ijistjournal
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
 

More from ijistjournal (20)

Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...
 
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINA MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
 
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDYDECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDY
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...
 
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
 
Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...
 
Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
 
Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...
 
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHMANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
 
Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...
 
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial DomainColour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domain
 
Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...
 
International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)
 
Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithm
 
Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...
 
Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
 

Recently uploaded

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 

Recently uploaded (20)

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 

PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK

  • 1. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 DOI : 10.5121/ijist.2012.2406 57 PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK Divya Bansal1 , Ankita Goel2 , Khushneet Jindal3 School of Mathematics and Computer Applications, Thapar University, Patiala (Punjab) – India 1 divyabansal150@yahoo.com 2 goel.ankitathapar@gmail.com 3 khushneet.jindal@thapar.edu ABSTRACT This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text messages and caller’s identity in English language are mapped to tokens in Punjabi language which are further concatenated to form speech with certain rules and procedures. To build the synthesizer we recorded the speech database and phonetically segmented it, thus first extracting context-independent monophones and then context-dependent triphones. For e.g. for word bharat monophones are a, bh, t etc. & triphones are bh-a+r. These speech utterances and their phone level transcriptions (monophones and triphones) are the inputs to the speech synthesis system. System outputs the sequence of phonemes after resolving various ambiguities regarding selection of phonemes using word network files e.g. for the word Tapas the output phoneme sequence is ਤ,ਪ,ਸ instead of phoneme sequence ਟ,ਪ,ਸ . KEYWORDS Hidden Markov models, Context-dependent acoustic modeling, Punjabi speech corpora. 1. INTRODUCTION Speech is the most important form of communication in everyday life. However, the dependence of human computer interaction on written text and images makes the use of computers impossible for visually and physically impaired and illiterate masses [1]. Text-to-speech synthesis (TTS) helps speech processing researchers to act upon this problem by synthesizing speech (in local languages e.g. Tamil, Hindi, Punjabi etc.) from written text like in browsers, mobile phones etc. Speech can be synthesized by mainly three methods: Articulatory synthesis, Concatenative synthesis and Formant synthesis
  • 2. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 58 Articulatory synthesis tries to model the human speech production system (especially vocal tract system, various articulators viz. lip, tongue, jaw etc.) and articulatory processes directly. However, it is also the most difficult method to implement due to lack of knowledge of the complex human articulation organs. Concatenative speech synthesis systems can synthesize high quality and more natural sound speech but in order to synthesize speech with various voice characteristics such as speaker individualities, speaking styles, emotions, etc., a large amount of speech corpus and memory is required as stored basic speech units (like syllables, diphones etc.) are concatenated to form word sequence using pronunciation dictionary. Formant synthesis is based on the rules which describe the resonant frequencies of the vocal tract. The formant method uses the source-filter model of speech production, where speech is modeled by parameters of the filter model [2]. Rule-based formant synthesis can produce quality speech which sounds unnatural, since it is difficult to estimate the vocal tract model and source parameters [3]. One more approach for speech synthesis is Hidden Markov Model based synthesis i.e. HTS. It was initially implemented for Japanese language but, today, can be implemented for various languages viz. Hindi, English, Tamil etc. It is used easily for implementing prosody and various voice characteristics on the basis of probabilities without having large databases. In this approach speech utterances are used to extract spectral (Mel-Cepstral Coeff.), excitation parameters and model context dependent phone models which are, in turn, concatenated and used to synthesize speech waveform corresponding to the text input. This paper is organized as follows: In section 2 we present Hidden Markov Model based speech synthesis, section 3 describes overall implementation of Hidden Markov Model based Text-to- Speech System on Hidden Markov Model Toolkit architecture from feature extraction to training of system, the fourth part contains results of speech synthesis and finally the fifth part concludes the paper with Discussion and Conclusion.
  • 3. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 59 Figure 1. HMM Based Speech Synthesis System 2. HIDDEN MARKOV MODEL BASED SPEECH SYNTHESIS In speech synthesis, Viterbi algorithm is used to find the most probable path through Hidden Markov Models that can generate speech signal feature vectors like MFCC (Mel Cepstral Coeff.) which are used, in turn, to generate speech signal. 2.1 Training Part In this the spectral parameters i.e. Mel Cepstral Coefficients and excitation parameters i.e. fundamental frequency F0 are extracted from the speech database and concatenated further to use them for Hidden Markov Models training acoustic models. The training of phone Hidden Markov Models using pitch and Mel cepstrum simultaneously is enabled in a unified framework by using multi-space probability distribution Hidden Markov Models and multi-dimensional Gaussian distributions [4]. The simultaneous modeling of pitch and spectrum results in the set of context- dependent Hidden Markov Models. [2] 2.2 Synthesis Part In this part, the speech parameters like Mel Frequency Cepstral Coefficients etc. are generated according to the text given as input from the context dependent Hidden Markov Model phone models e.g. for word tapas phone model is t-a + p etc. which are obtained as output from the
  • 4. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 60 training part. These generated speech parameters are, in turn, used to synthesize speech signal as final output. This approach is very flexible as is implemented by the acoustical features of phone models obtained from speech corpora. Thus characteristics of synthesized speech can easily be modified by altering Hidden Markov Model parameters and acoustical features. 3. HTS IMPLEMENTATION ON HTK ARCHITECTURE 3.1 Signal Features Generation HMM (Hidden Markov Model) have three model parameters (A, B, π) that is there are finite number, say N, of states in Hidden Markov Model. At each time t, a new state is entered based on the transition probability distribution (A) which depends on previous state. After each transition, an observation output symbol depends on the current state based on output probability distribution (B) and π is initial state probability. In order to synthesize speech, most probable sequence of state feature vectors  is required to find from Hidden Markov Model λ, which contains concatenated context-dependent triphones like t-o+n or context-independent monophones like t, o, n (phone transcriptions) corresponding to the symbols in a word w like Tony which is present in text that is required to be synthesized. These acoustic phone models are obtained in training phase after modeling Hidden Markov Models by various feature parameters obtained from stored speech corpora. Thus we need to generate feature vector sequence Â= Aq1, Aq2, Aq3…AqL of length L by maximizing the likelihood P(A | λ) of a Hidden Markov Models  = arg max {P(A | λ)} = arg max {∑Q P(A |q, λ) P(q | λ)} (1) In this equation P(A | λ) of a Hidden Markov Model is calculated by adding the product of joint output probability P(A | q, λ) and state sequence probability P(q | λ) over all possible paths Q [4] Where Q = q1, q2., qL is the path through the states of the model λ and qi is a state at time ti as in Fig. 2 2(a) 2(b) Figure 2(a). Concatenated HMM chain 2(b). HMM chain for word bharat Aqi
  • 5. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 61 Thus we are using Viterbi approximation as we need to find most probable state sequence for generating feature vector sequence  because searching for all possible paths through the model is time consuming and complex. The state sequence q^ of the model λ can be maximized independently of  q^ = arg max {P(q | λ, L)} (2) Hidden Markov Model Toolkit represents output distributions by Gaussian mixture densities. Thus output probability distribution of each state qi is represented by one Gaussian density function with a mean vector µi and covariance matrix ∑i. The Hidden Markov Model λ is a set of all means and covariance matrices for all N states: λ = (µ1, ∑1, µ2, ∑2, µ3, ∑3, µN, ∑N). (3) During Hidden Markov Model modeling of acoustic models, means vector µi and covariance matrix ∑i are calculated initially from features extracted from speech corpora and re-estimated for each state of all phone models. 3.2 Data Preparation and Feature Extraction The training of Hidden Markov Model models and testing of speech synthesis system require speech utterances and their phone level transcriptions. Punjabi speech corpora used for training the system contains speech utterances of one female speaker. In phase-I we have considered recording data that consists of 61 words (i.e words starting with letter ਤ and ਟ) that are arranged in 17 samples and in phase-II training data of 81 words (words containing ਅ and ਆ) arranged in 23 samples is considered. The data is recorded using microphones at room environment. Distance between speaker’s mouth and microphone is approximately 5-7 cm.[8] Samples are recorded at a sampling rate of 8000 Hz, 16 bits bit depth and mono channel using Power Sound Editor. Recorded speech files are stored in .wav format. For training Hidden Markov Models each recorded sample need to have corresponding phone level transcriptions. This is done using Hidden Markov Model Toolkit label editor HLEd that generate phone level MLF (Master Label File) by using mkphones.led edit script. E.g. For sample word “Tony” phone transcription generated is Figure 3. Phone Transcription File (Phones0.mlf) Further, the recorded speech samples are parameterized into sequence of excitation and spectral features. For this Mel Frequency Cepstral Coefficients (MFCCs), which are derived from FFT
  • 6. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 62 (Fast Fourier Transform) based log spectra are used. All input .wav files are converted to Mel Frequency Cepstral Coefficient vectors by using HCOPY tool of Hidden Markov Model Toolkit. The speech signals were windowed using a 25 ms Blackman window and 10 ms frame period. The spectral feature vector consisted of 39 mel-cepstral coefficients including the zeroth coefficient (=13) and its delta coefficients (=13) and acceleration coefficients (=13). 3.3 Training of Hidden Markov Model Initially for training of Hidden Markov Model a prototype model, proto, is defined. It is initialized using HMM Toolkit tool HCompV that computes the global mean and variance and set all of the Gaussians in a given Hidden Markov Model to have the same mean and variance. 5-state left-to- right Hidden Markov Models with no skips are used in which first and last states are non-emitting states. The system is trained for 27 monophones models. These flat start monophones stored in various hidden markov models directories are re-estimated using the embedded re-estimation tool HERest following Baum-Welch Re-estimation theoretically. For each state of all the monophones mean and variance vectors are estimated. After that the triphones models were made out of monophones models and trained using HERest. These triphones are created from monophones by HLEd tool following l-p+r (where p is phoneme, l & r are left & right context) structure for each phoneme ‘p’ in monophones model making it context dependent e.g. in word Bharat, for phoneme ‘a’ triphones generated is bh-a+r. Re-estimated monophones model obtained using HMM Toolkit:- Figure 4. Monophone hmmdefs File After making triphones, Decision tree state tying is performed by running HHEd tool of Hidden Markov Model Toolkit. HHEd is used to cluster the states and then each cluster is tied. Decision trees are based on asking questions about the left and right contexts of each triphones and find those contexts which make the largest difference to the acoustics and which should therefore distinguish clusters .[6] Then we used edit script tree.hed, which contains the instructions regarding which contexts to examine for possible clustering and the questions (QS) defined by user according to language.
  • 7. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 63 Figure 5. Tree.hed File Decision tree clustering of states is performed by TB commands. Re-estimated triphones model obtained using Hidden Markov Model Toolkit :- Figure 6. Triphone hmmdefs File After decision tree clustering following file is obtained:
  • 8. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 64 Figure 7. Clustered hmmdefs file Figure 8. Decision Tree Clustering 3.4 Test Data Decoding In this part the test data i.e. text which is to be converted to speech signals is given as input along with re-estimated context-dependent Hidden Markov Models obtained after training phase. According to the phoneme sequence in text labels the context-dependent Hidden Markov Models are concatenated with the help of HVite tool and word network file wdnet. According to the obtained state, the sequence of Mel cepstral coefficients and log F0 values including voiced/ unvoiced decisions are determined by maximizing the output probability of Hidden Markov Model [3]. Hidden Markov Model Toolkit is analyzed and used to find the appropriate pronunciation of a word from several alternate pronunciations of the words containing ਤ and ਟ i.e. whether Tony corresponds to ਤੋਿਨ or ਟੋਿਨ and words containing ਅ or ਆ and corresponding phonemes will be generated.
  • 9. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 65 4. SPEECH SYNTHESIS RESULTS The conflicting words will be dealt in phased manner. Initially to begin with, in the testing phase, Text-To-Speech data include: Test-I : 28 Punjabi words with ਤ or ਟ and 45 Punjabi words with ਅ or ਆ Test-II: 33 Punjabi words with ਤ or ਟ and 36 Punjabi words with ਅ or ਆ The text labels are transformed into triphones format with the help of HMM Toolkit. For each word we have recorded a wav file i.e. bharat.wav etc. Appropriate pronunciation is selected from the network of alternate pronunciations. A MLF (Master Label File) phonemes.mlf was generated by HLed tool that contained the correct phoneme sequence, amongst various other alternatives, of the test word contained in recout.mlf file which was initially produced by HVite tool of HMM Toolkit by making use of dictionary dict and word network file wdnet. Phonemes of different test words were arranged in different .lab files according to the input test samples. Through HMM Toolkit correct sequences of phonemes is generated to a greater extent and satisfied results are obtained. Further rule based approach is used to formulate certain rules for generating phoneme sequences of words whose sequences are not correctly produced by HMM Toolkit. Figure 9. (phonemes.mlf) Phoneme File Generated
  • 10. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 66 After testing HTK for various test samples having words containing ਤ or ਟ and ਅ or ਆ for which system is trained in training phase following results are obtained:- • Both Test-I with 45 test words and Test-II with 36 words (containing ਅ or ਆ) and words obtained, after testing, with correct and incorrect phoneme sequences are represented by following bar graphs. Figure 10. Test Samples for words containing ਅ or ਆ • Number of words in Test Samples containing ਤ or ਟ with total 28 and 33 words and no of correct and incorrect phoneme sequences obtained are shown by following bar graph. • Figure 11. Test Samples for words containing ਤ or ਟ
  • 11. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 67 • Comparisons of overall accuracies for both sets of test i.e. one containing words with ਤ or ਟ and one with ਅ or ਆ is depicted in following cylindrical bar graph. Figure 12. Comparison of Overall Accuracies Figure 13 presents the result of generated speech for the sentences using Matlab code mfcc2spectrum [10]. Figure 13. Spectrum representation from Mel Cepstral Coeff. data of word TONY
  • 12. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 68 5. DISCUSSION AND CONCLUSION HMM-based Punjabi speech synthesis system is presented in this paper. The developed Text-to- Speech was trained in phase -I on 17 samples with total 61 words all starting with letter ਤ and ਟ and tested for selection of appropriate phoneme sequence on 30 Punjabi words in test 1 and trained for 23 samples containing 81 words containing ਅ and ਆ and tested for 45 selected words in corresponding test-1. Hidden Markov Model Text-to-Speech system approach is very effective for developing Text-to-Speech systems for various languages and can easily implement changes in voice characteristics of synthesized speech with the help of speaker adaptation technique developed for speech recognition [7]. In order to improve efficiency, context-dependent phone models used for synthesis need to be improvised by recording, annotating more Punjabi speech data and applying filters using custom rules/ procedures. REFERENCES [1] S.D.Shirbahadurkar and D.S.Bormane, (2009) “Marathi Language Speech Synthesizer Using Concatenative Synthesis Strategy (Spoken in Maharashtra, India)”, Second International Conference on Machine Vision, pp. 181-185. [2] S. Martincic-Ipsic and I. Ipsic, (2006) “Croatian HMM-based speech synthesis,” Journal of Computing and Information Technology, Vol. 14, no. 4, pp. 307–313. [3] D. H. Klatt, “Review of Text to Speech Conversion for English”, Journal of the Acoustic Society of America, 1987. Vol. 82, pp. 737–793. [4] K. Tokuda, et al. (2003), “Multi-Space Probability Distribution HMM”, IEICE Trans. Inf. & System, Vol. E85-D, No.3, pp. 455-464. [5] L. R. Rabiner, (1989), “A tutorial on hidden Markov models and selected applications in speech recognition”, Proc. IEEE, Vol. 77, No. 2, pp. 257–286. [6] S. Young, et al. (2002), “The HTK Book (for HTK Version 3.2)”, Cambridge University Engineering Department, Cambridge, Great Britain. [7] K. Tokuda, H. Zen, A.W. Black, (2002), “An HMM-based speech synthesis system applied to English”, Proc. of 2002 IEEE Workshop in Speech Synthesis. [8] K. Kumar and R. K. Aggarwal, (2011), “Hindi Speech Recognition System Using HTK”, International Journal of Computing and Business Research, Vol. 2, no. 2, pp. 2229-6166. [9] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, (1999), “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” in Proc. Eurospeech, pp. 2347–2350. [10] N. Meseguer, “Speech Analysis for Automatic Speech Recognition”, Master’s thesis, Norwegian University of Science and Technology, Norway. [11] S. King, (2011), “An introduction to statistical parametric speech synthesis”, Sadhana - Engineering Science, Vol. 36 no. 5, pp. 837-852. [12] P. Singh, (2005), Development of A Punjabi Text-To-Speech Synthesis System, M.Tech Thesis, Punjabi University. [13] P. Singh and G. S. Lehal, (2006), Text-to-Speech Synthesis system for Punjabi language, Proceedings of International Conference on Multidisciplinary Information Sciences and Technologies, pp. 388-391. [14] P. Gera, (2006), Text-To-Speech Synthesis for Punjabi Language, M.Tech Thesis, Thapar University.
  • 13. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.4, July 2012 69 Authors Khushneet Jindal received his Post Graduation degree in M.Tech.(Information Technology) from KoSU, Master of Computer Applications from Punjabi University, Patiala, Punjab, India and Graduation in BSc.(Computer Applications) from Khalsa College Patiala, Punjab, India. His research interests are in the area of Speech analysis and processing, Character Recognition & TTS. Divya Bansal received her graduation degree in Computer Science and Engineering from Punjab Technical University and is currently doing her post graduation in Computer Science and Application at Thapar College of Engineering and Technology, Patiala, Punjab, India. Her research interests are in the area of Speech analysis and processing. Ankita Goel graduation received her graduation degree in Information Technology from Indraprastha University and is currently doing her post graduation in Computer Science and Application at Thapar College of Engineering and Technology, Patiala, Punjab, India. Her .