SlideShare a Scribd company logo
INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY
VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303
IJTET©2015 20
Speech Analysis and synthesis using Vocoder
Kala A1
1
SNS College of Technology,
ME ECE, Department of ECE,
Coimbatore-641035
kalaalwar@gmail.com
Vanitha S2
2
SNS College of Technology,
Assistant Professor, Department of ECE,
Coimbatore-641035
vanitharajanneel@gmail.com
Abstract— In this paper, I proposed a speech analysis and synthesis using a vocoder. Voice conversion systems do not
create new speech signals, but just transform existing one. The proposed speech vocoding is different from speech coding. To
analyze the speech signal and represent it with less number of bits, so that bandwidth efficiency can be increased. The Synthesis
of speech signal from the received bits of information. In this paper three aspects of analysis have been discussed: pitch
refinement, spectral envelope estimation and maximum voiced frequency estimation. A Quasi-harmonic analysis model can be
used to implement a pitch refinement algorithm which improves the accuracy of the spectral estimation. Harmonic plus noise
model to reconstruct the speech signal from parameter. Finally to achieve the highest possible resynthesis quality using the lowest
possible number of bits to transmit the speech signal. Future work aims at incorporating the phase information into the analysis
and modeling process and also synthesis these three aspects in different pitch period.
Index Terms— Frequency Cepstral Coefficient, Pitch Detection, Spectral Envelope Estimation, Maximum Voiced Frequency,
Harmonic plus Noise Model.
——————————  ——————————s
1 INTRODUCTION
oday, the synthesis quality and recognition rate are so that
commercial applications are following from them. So, for speech
synthesis, I can cite Telecommunications, Multimedia and
Automobile with for example for Telecommunications, the
vocalization of SMS, the reading of mails, the phone access to fax
and e-mails, the consulting of databases, automatic answer phones
(ex : Chrono Post) ; for Multimedia, the speech interface between
man and machine, the help for teaching reading or / new languages
(educational tools and / software), the help for teaching reading to
blind people, bureaucratic tools ; and at last for the Automobile, the
alert and video surveillance systems, the Internet access among
others for the mail reading. Some companies as Nuance, Scan soft
and Acapela-Group are present on the business market.
It aims at the development of implementing a speech
processing model called "Harmonics Plus Noise Model" (HNM). It‘s
in fact a hybrid model since it decomposes the speech frames into a
harmonic part and a noise part. It normally has to produce a high
quality of artificial speech.
In voice conversion systems do not create new speech
signals, but just transform existing ones. This is the reason why this
paper has been focused on synthesis. Understood in this context,
speech vocoding is different from speech coding. The main goal of
speech coding is to achieve the highest possible resynthesis quality
using the lowest possible number of bits to transmit the speech
signal. Real time performance during analysis and reconstruction is
also one of its typical requirements. In the statistical parametric
frameworks mentioned above, vocoders must have not only these
high resynthesis capabilities, but also provide parameters that are
adequate to statistically model the underlying structure of speech,
while information compression is not a priority.
Vocoders are a class of speech coding systems that analyze
the voice signal at the transmitter, transmit parameters derived from
the analysis, and then synthesize the voice at the analysis, and then
synthesize the voice at the receiver using those parameters. All
vocoder attempts to model the speech generation process as a
dynamic system and try to quantify certain physical constraints of
the system. These physical constraints are used to provide a
parsimonious description of the speech signal [4]. Vocoders are, in
general, much more complex than the waveform coders and achieve
very high economy in transmission bit rate. However, they are less
robust, and their performance tends to be talker dependent. The most
popular among the vocoding systems is the linear predictive coder
(LPC). The other vocoding schemes include the channel vocoder,
formant vocoder, cepstrum vocoder and voice excited vocoder.
Fig. 1 Speech Generation Model
Fig 1. Shows the traditional speech generation model that is the basis
of all vocoding systems. The sound generating mechanism forms the
source and is linearly separated from the intelligence modulating
vocal tract filter which forms the system. The speech signal is
assumed to be of two types: voiced and unvoiced sound (―m‖, ―n‖,
―v‖ pronunciations) are a result of quasiperiodic vibrations of the
T
INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY
VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303
IJTET©2015 21
vocal chord and unvoiced sounds (―f‖, ―s‖, ―sh‖ pronunciations) are
fricatives produced by turbulent air flow through a construction. The
parameters associated with this model are the voice pitch, the pole
frequencies of the modulating filter, and the corresponding amplitude
parameters. The pitch frequency for most speakers is below 300 Hz,
and extracting this information from the signal is very difficult. The
pole frequencies correspond to the resonant frequencies of the vocal
tract and are often called the formants of the speech signal. For adult
speakers, the formants are centered around 500 Hz, 1500 Hz, 2500
Hz and 3500 Hz. By meticulously adjusting the parameters of the
speech generation model, good quality speech can be synthesized
[2].
MFCC
Wrapping of signals in the frequency domain using 24 filter
banks are done. This filter is developed based on the behavior of
human ear‘s perception, or each tone of a voice signal with an actual
frequency f, measured in Hz, it can also be found as a subjective
pitch in Mel frequency scale [10]. The Mel frequency scale is
determined to have a linear frequency relationship below 1000 Hz
and a logarithmic relationship higher than 1000Hz. The Mel
frequency higher than 1000 Hz is,
Fig. 2 MFCC Processor
In this final step, convert the log Mel spectrum returns to
time. The result is called the Mel frequency Cepstrum coefficients
(MFCC).
2 HNM ANALYSES
The analysis consists of estimating the harmonic and noise
parameters. By the decomposition into two independent parts, these
are estimated separately. First, I have to separate the voiced frames
from the unvoiced frames and then compute the parameters used for
the synthesis. The Proposed system is shown in figure 2.1.
Fig. 2.1 Block diagram of Harmonic plus Noise Model System
The steps of the analysis process are the following:
 First Estimation of fundamental frequency
 Voiced/Unvoiced decision for each frame
 Spectral Envelope Estimation
 Estimation of the maximum voiced frequency FM
 Refinement of the fundamental frequency
2.1 FIRST ESTIMATION OF THE FUNDAMENTAL
FREQUENCY (PITCH):
The first step consists of the estimation of the pitch f0. This
parameter is estimated every 10ms. I will see that the length of the
analysis window will depend on this local pitch. One method co-
developed with the HNM algorithm is explained here. This method is
based on an autocorrelation approach and is obtained by fitting the
original signal with another defined by a pitch (sum of harmonics) in
the frequency domain:
Where are the Short Term Fourier Transform of the
speech segment S (t) (Blackman weighted segment whose length is
equal to 3 times the maximum fundamental period )
and , the Short Term Fourier Transform of a purely harmonic
signal obtained from a fundamental frequency F0.
To avoid some pitch errors, a ―peak tracking‖ method is
needed. This has to look at two frames forward and backward from
the current one. The minimum error path is found and by this way
the pitch is associated.
INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY
VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303
IJTET©2015 22
2.2 VOICED/UNVOICED DECISION
The frames extracted every 10 ms (whose length is always
3 times T0max) have then to be classified as ―voiced‖ or ―unvoiced‖.
We apply the Short-Term Fourier Transform (STFT) with a number
of points NFFT equal to 4096 (with zero padding) to the current
frame and we call it S (f).
From this first STFT, we can evaluate the first four
amplitudes of the harmonics (the first of which is the fundamental).
We note S^ (f), which is a set of amplitudes of harmonics (of f0).
The following criterion is applied:
The frame is declared ―voiced‖ if E is less than the threshold of -
15dB, and ―unvoiced‖ otherwise.
.
2.3 SPECTRAL ENVELOPE ESTIMATION:
Assuming a simplified speech production model in which a
pulse-or-noise excitation passes through a shaping filter, the term
spectral envelope denotes the amplitude response of this Filter in
frequency. Such an envelope contains not only the contribution of the
vocal tract, but also the contribution of the glottal source. In
unvoiced frames, the spectrum of the noise-like excitation is flat,
which means that the response of the filter coincides with the
spectrum of the signal itself (except for a scaling factor). In voiced
frames, the spectrum of the pulse-like excitation has the form of an
impulse train with constant amplitude and linear-in-frequency phase
placed at multiples of, Therefore, the spectrum of the signal shows a
series of peaks that result from multiplying the impulses of the
excitation by uniformly spaced spectral samples of the filter
response. Assuming local stationarity, full-band harmonic analysis
returns these discrete samples of the spectral envelope. Then, a
continuous envelope can be estimated via interpolation.
2.4 ESTIMATION OF MAXIMUM VOCED FREQUENCY
This parameter is also estimated every 10ms. In the
beginning, I work in the Interval of the absolute
spectrum. I look for the greatest amplitude and the corresponding
voiced frequency in this interval, which I denote Am and fc,
respectively. I also compute the sum of the amplitudes (called the
cumulative amplitude Amc) located between the two minima around
the greatest voiced frequency. The other peaks in the band are also
considered (occurring at frequencies denoted by fi) in the same
interval, with the two types of amplitudes Am(fi) and Amc(fi). I
compute the mean of these cumulative amplitudes, denoted by
.
Then apply the following test to the greatest frequency fc:
If and -max{Am(i)}>13dB
L being the number of the nearest harmonic of fc..
Then the frequency fc is declared ―voiced‖ and the next
is considered and the same criterion is applied. The
highest voiced frequency found will correspond to the maximum
voiced frequency FM. However, to avoid mistakes, a 3 points, median
smoothing filter is applied.
As well, the frequency FM can vary greatly from one frame
to the next. In order to reduce abrupt jumps I can also use another
median filter on this time-varying frequency. Five points are in
general used here.
2.5 REESTIMATION OF THE FUNDAMENTAL
FREQUENCY
Using the frequencies (fi) declared as voiced in the
previous step, I try to minimize the following function:
with L (i) representing the number of
voiced frequencies and f0 the initial estimation of the pitch. The
minimum is reached for the new estimation of the pitch.
3 RESULTS AND DISCUSSION
In this paper, the enrollment of the user a data record was
maintained in the database with different text information. This
database contains 53 different voices (25 female, 28 male). The
voices were taken from different speech synthesis and recognition
databases in English. The specific utterances representing each voice
were chosen randomly among candidates with a suitable duration
(around 5 s). Although the recording conditions were database
dependent, in all cases the sampling frequency was 16 kHz and the
signal-to-noise ratio was checked to be high enough for analysis-
synthesis purposes.
Table 1 Information about the Speakers
NUMBER
OF MALE
SPEAKERS
NUMBER
OF FEMALE
SPEAKERS
AVERAGE
AGE
LANGUAGE
25 28 24 ENGLISH
The speech signal used in the training phase for a
particular speaker is shown in Table 1.
The first analysis step to be performed is pitch detection.
Pitch detection algorithms (PDA) used to exhibiting very good
performance when applied clean signals (that the signals involved in
speech synthesis usually show high signal-to-noise ratio). The
vocoder presented in this paper includes an implementation of the
Autocorrelation-based algorithm.
INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY
VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303
IJTET©2015 23
Fig 3.1 Autocorrelation of a Signal
Fig 3.2 Pitch Estimation of a Signal
Fig 3.3 Pitch Tracking of a Signal
Fig 3.4 Mel scale Filter Bank
In fig 3.1 shows the Autocorrelation output of a speech
signal. Here the correlation coefficient of a signal have the length is
2*maxlag+1. The correlation coefficient has the maximum value is
‗1‘.
After autocorrelation the pitch will be estimated from each
and every frame using correlation coefficients. It is shown in fig 3.2.
In fig 3.3 shows that the computation time, which should be
minimized to remove the delay. Determination of voiced segment
from voice, and the capable to remove the silence from sound. It
provides good resolution as well as to avoid gross errors.
When I sample a spoken syllable, we will be having many
samples. Then I try to extract features from these sampled values.
Cepstral coefficient calculation is one of such methods. Here I
initially derive Short Term Fourier Transform of sampled values,
then take their absolute value ( they can be complex) and calculate
log of these absolute values. There after I, go for converting back
them to time domain using Discrete Cosine Transform (DCT). I
have done it for five users and first ten DCT coefficients are
Cepstral coefficients.
It takes into account physiological behavior of perception
of human ear, which follows linear scale up to 1000 Hz and then
follows the log scale. Hence I convert frequency to Mel domain
using a number of filters. Then I take its absolute value, apply log
function and convert back them into time domain using dct. For
each user I had feature vectors having 20 MFCC coefficients each.
For visualization purposes I only show few feature vectors and
their MFCC. In the above figure I have only chosen few feature
vectors. Each column refers to a feature vector. The element of each
column and the corresponding MFCCs. As I had chosen the first 24
DCT coefficients, hence each column will be having 24 elements.
In this paper the envelope of the signal estimates by using
spectral envelope estimation. Fig 3.5 gives the spectral envelope
estimation of speech signal.
Fig 3.5 Spectral Envelope Estimation
Fig 3.6 3-period Hanning Window
Maximum Voiced Frequency (MVF) is used in various
speech models as the spectral boundary separating periodic and a
periodic components during the production of voiced sounds.
Windowing is essential as it determines the harmonicity properties of
INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY
VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303
IJTET©2015 24
the resulting spectra. In all cases, the window length should
be proportional to the pitch period. In this work, I have used a 3
period-long Hanning window as I found it to be suited for the
amplitude spectra to exhibit a good peak-to-valley structure.
Fig 3.7 HMvsQHM
From this figure the quasi harmonic model provides smooth
signal than the Harmonic model.
Fig 3.8 Resynthesized Speech Signal
Periodic signals can be approximated by a sum of sinusoids
whose frequencies are integer multiples of the fundamental
frequency and whose magnitudes and phases can be uniquely
determined to match the signal - so-called Fourier analysis. One
manifestation of this is the spectrogram, which shows the short-time
Fourier transform magnitude as a function of time. A narrowband
spectrogram (i.e. one produced with a short-time window longer than
the fundamental period of the sound) will reveal a series of nearly-
horizontal, uniformly-spaced energy ridges, corresponding to the
sinusoidal Fourier components or harmonics that are an equivalent
representation of the sound waveform. Below is a spectrogram of a
brief clarinet Melody; the harmonics are clearly defined.
The key idea behind maximum voiced frequency is to
represent each one of those ridges explicitly and separately as a set
of frequency and magnitude values. The resulting analysis can be
resynthesized by using low bit rate of information.
4 CONCLUSION
In this paper three aspects of analysis have been discussed:
pitch refinement, spectral envelope and maximum voiced frequency
estimation. The proposed a vocoder system was analyzed under
different windowing schemes and by varying the length of frames
with different overlaps. From the analysis of the three techniques, the
fundamental frequency (173.611 Hz) and the formant frequencies (5
different values) are obtained. The Quasi harmonic analysis model is
used to implement a pitch refinement algorithm which improves the
accuracy of the subsequent spectral envelope estimation.
Harmonic plus noise model is also used to reconstruct the speech
signals from parameters. Therefore the harmonic model yields more
spectral envelopes than sinusoidal analysis. Future work aims at
incorporating the phase information into the analysis and modeling
process and also synthesis these three aspects in different pitch
period.
REFERENCES
[1] H. Zen, K. Tokuda, and A. W. Black, ―Statistical parametric
speech synthesis,‖ Speech Commun., vol. 51, no. 11, pp.
1039–1064, 2009.
[2] Y. Stylianou, ―Harmonic plus noise models for speech,
Combined with statistical methods, for speech and speaker
modification,‖ Ph.D. dissertations, École Nationale Supèrieure
de Télécommunications, Paris, France, 1996.
[3] A.Kain, ―High resolution voice transformation,‖ Ph.D.
dissertation, OGI School of Sci. and Eng. at OHSU, Portland ,
OR, 2001.
[4] T. Toda, A. W. Black, and K. Tokuda, ―Voice conversion based
on maximum-likelihood estimation of spectral parameter
trajectory,‖ IEEE Trans. Audio, Speech, Lang. Process., vol. 15,
no. 8, pp. 2222–2235, Nov. 2007.
[5] J. L. Flanagan, ―Parametric representation of speech signals,‖
IEEE Signal Process. Mag., vol. 27, no. 3, pp. 141–145, 2010.
[6] HMM-Based Speech Synthesis System (HTS), [Online].
Available: http://hts.sp.nitech.ac.jp/
[7] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T.
Kitamura, ―Simultaneous modeling of spectrum, pitch and
duration in HMMbased speech synthesis,‖ in Proc. Eurospeech,
1999, pp. 2347–2350.
[8] K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, ―Mel-
generalized cepstral analysis—A unified approach to speech
spectral estimation,‖ Proc. Int. Conf. Spoken Lang. Process.,
vol. 3, pp. 1043–1046, 1994.
[9] G.Senthil Raja, Dr.S.Dandapat, ―Performance of Selective
Speech Features for Speaker Identification‖, Journal of the
Institution of Engineers (India), Vol. 89, May 29, 2008
[10] Md.Rashidul Hasan, Mustafa Jamil Md.Golam
Rabbani,Md.Saifur Rahman, ―Speaker Identification using Mel
Frequency Cepstral Coefficients‖, 3rd
International conference
on Electrical and computer engineering ICECE 2004, Dec 2004
[11] Sandipan Chakroborty, Goutam Saha, ―Improved Text-
Independent Speaker Identification using Fused MFCC &
IMFCC Feature Sets based on Gaussian Filter‖ , International
Journal of Signal Processing 5:1, 2009
[12] I.Saratxaga, I. Hernaez, M. Pucher, E. Navas, and I. Sainz,
―Perceptual importance of the phase related information in
speech,‖ in Proc. Interspeech, 2012.
[13] I.Sainz, D. Erro, E. Navas, I. Hernaez, J. Sanchez, I. Saratxaga,
I.Odriozola, and I. Luengo, ―Aholab Speech Synthesizers for
Albayzin 2010,‖ in Proc. FALA, 2010, pp. 343–347.
[14] H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura,
―Hidden semi-Markov model based speech synthesis,‖ in Proc.
ICSLP, 2004, vol. II, pp. 1397–1400.

More Related Content

What's hot

Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentationchakravarthy Gopi
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
Dr. Uday Saikia
 
Speech compression-using-gsm
Speech compression-using-gsmSpeech compression-using-gsm
Speech compression-using-gsm
Nghĩa Trần Trung
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
sivakumar m
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
Murtadha Alsabbagh
 
G010424248
G010424248G010424248
G010424248
IOSR Journals
 
Speech Signal Analysis
Speech Signal AnalysisSpeech Signal Analysis
Speech Signal Analysis
Pradeep Reddy Guvvala
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
Angelo Salatino
 
Ijetr021253
Ijetr021253Ijetr021253
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
Vinodhini
 
Interactive voice conversion for augmented speech production
Interactive voice conversion for augmented speech productionInteractive voice conversion for augmented speech production
Interactive voice conversion for augmented speech production
NU_I_TODALAB
 
Multimedia Compression and Communication
Multimedia Compression and CommunicationMultimedia Compression and Communication
Multimedia Compression and Communication
Benesh Selvanesan
 
Mini Project- Audio Enhancement
Mini Project- Audio EnhancementMini Project- Audio Enhancement
COLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech AnalysisCOLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech Analysis
Rushin Shah
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speech
Rudra Prasad Maiti
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
Lizy Abraham
 

What's hot (20)

Speech coding std
Speech coding stdSpeech coding std
Speech coding std
 
Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentation
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
 
lpc and horn noise detection
lpc and horn noise detectionlpc and horn noise detection
lpc and horn noise detection
 
Speech compression-using-gsm
Speech compression-using-gsmSpeech compression-using-gsm
Speech compression-using-gsm
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
G010424248
G010424248G010424248
G010424248
 
Speech Signal Analysis
Speech Signal AnalysisSpeech Signal Analysis
Speech Signal Analysis
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
 
Ijetr021253
Ijetr021253Ijetr021253
Ijetr021253
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
Interactive voice conversion for augmented speech production
Interactive voice conversion for augmented speech productionInteractive voice conversion for augmented speech production
Interactive voice conversion for augmented speech production
 
Multimedia Compression and Communication
Multimedia Compression and CommunicationMultimedia Compression and Communication
Multimedia Compression and Communication
 
Mini Project- Audio Enhancement
Mini Project- Audio EnhancementMini Project- Audio Enhancement
Mini Project- Audio Enhancement
 
COLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech AnalysisCOLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech Analysis
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speech
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 

Viewers also liked

VOCODER - Low Bit Rate Voice Module: LST (LINKSTAR TECHNOLOGIES LIMITED)
VOCODER - Low Bit Rate Voice Module: LST (LINKSTAR TECHNOLOGIES LIMITED)VOCODER - Low Bit Rate Voice Module: LST (LINKSTAR TECHNOLOGIES LIMITED)
VOCODER - Low Bit Rate Voice Module: LST (LINKSTAR TECHNOLOGIES LIMITED)
BALA CHANDER
 
VOCODER LST 36 pin signal data
VOCODER LST 36 pin signal dataVOCODER LST 36 pin signal data
History of the vocoder (final)
History of the vocoder (final) History of the vocoder (final)
History of the vocoder (final)
connorfisher
 
Vocoder
VocoderVocoder
Vocoder
hkvidal
 
M.tech Term paper report | Cognitive Radio Network
M.tech Term paper report | Cognitive Radio Network M.tech Term paper report | Cognitive Radio Network
M.tech Term paper report | Cognitive Radio Network
Shashank Narayan
 
Cognitive Radio, Introduction and Main Issues
Cognitive Radio, Introduction and Main IssuesCognitive Radio, Introduction and Main Issues
Cognitive Radio, Introduction and Main IssuesKuncoro Wastuwibowo
 
What is Cognitive Radio?
What is Cognitive Radio? What is Cognitive Radio?
What is Cognitive Radio?
xG Technology, Inc.
 
Cognitive radio networks
Cognitive radio networksCognitive radio networks
Cognitive radio networks
Ameer Sameer
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
Richie
 

Viewers also liked (10)

VOCODER - Low Bit Rate Voice Module: LST (LINKSTAR TECHNOLOGIES LIMITED)
VOCODER - Low Bit Rate Voice Module: LST (LINKSTAR TECHNOLOGIES LIMITED)VOCODER - Low Bit Rate Voice Module: LST (LINKSTAR TECHNOLOGIES LIMITED)
VOCODER - Low Bit Rate Voice Module: LST (LINKSTAR TECHNOLOGIES LIMITED)
 
VOCODER LST 36 pin signal data
VOCODER LST 36 pin signal dataVOCODER LST 36 pin signal data
VOCODER LST 36 pin signal data
 
History of the vocoder (final)
History of the vocoder (final) History of the vocoder (final)
History of the vocoder (final)
 
Vocoder
VocoderVocoder
Vocoder
 
M.tech Term paper report | Cognitive Radio Network
M.tech Term paper report | Cognitive Radio Network M.tech Term paper report | Cognitive Radio Network
M.tech Term paper report | Cognitive Radio Network
 
Cognitive Radio, Introduction and Main Issues
Cognitive Radio, Introduction and Main IssuesCognitive Radio, Introduction and Main Issues
Cognitive Radio, Introduction and Main Issues
 
What is Cognitive Radio?
What is Cognitive Radio? What is Cognitive Radio?
What is Cognitive Radio?
 
Cognitive Radio
Cognitive RadioCognitive Radio
Cognitive Radio
 
Cognitive radio networks
Cognitive radio networksCognitive radio networks
Cognitive radio networks
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 

Similar to Speech Analysis and synthesis using Vocoder

Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
phyuhsan
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
ijsrd.com
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
ijsrd.com
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
Arcanjo Salazaku
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
CSCJournals
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
IRJET Journal
 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
Bala Murugan
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
IJCSEIT Journal
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
editor1knowledgecuddle
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
IRJET Journal
 
F010334548
F010334548F010334548
F010334548
IOSR Journals
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
IDES Editor
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
TELKOMNIKA JOURNAL
 
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LANMETHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
IJNSA Journal
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPCDisha Modi
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
ijsc
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
TANVIRAHMED611926
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
cscpconf
 

Similar to Speech Analysis and synthesis using Vocoder (20)

Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
H0814247
H0814247H0814247
H0814247
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
 
F010334548
F010334548F010334548
F010334548
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
 
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LANMETHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
 

More from IJTET Journal

Beaglebone Black Webcam Server For Security
Beaglebone Black Webcam Server For SecurityBeaglebone Black Webcam Server For Security
Beaglebone Black Webcam Server For Security
IJTET Journal
 
Biometrics Authentication Using Raspberry Pi
Biometrics Authentication Using Raspberry PiBiometrics Authentication Using Raspberry Pi
Biometrics Authentication Using Raspberry Pi
IJTET Journal
 
Conceal Traffic Pattern Discovery from Revealing Form of Ad Hoc Networks
Conceal Traffic Pattern Discovery from Revealing Form of Ad Hoc NetworksConceal Traffic Pattern Discovery from Revealing Form of Ad Hoc Networks
Conceal Traffic Pattern Discovery from Revealing Form of Ad Hoc Networks
IJTET Journal
 
Node Failure Prevention by Using Energy Efficient Routing In Wireless Sensor ...
Node Failure Prevention by Using Energy Efficient Routing In Wireless Sensor ...Node Failure Prevention by Using Energy Efficient Routing In Wireless Sensor ...
Node Failure Prevention by Using Energy Efficient Routing In Wireless Sensor ...
IJTET Journal
 
Prevention of Malicious Nodes and Attacks in Manets Using Trust worthy Method
Prevention of Malicious Nodes and Attacks in Manets Using Trust worthy MethodPrevention of Malicious Nodes and Attacks in Manets Using Trust worthy Method
Prevention of Malicious Nodes and Attacks in Manets Using Trust worthy Method
IJTET Journal
 
Effective Pipeline Monitoring Technology in Wireless Sensor Networks
Effective Pipeline Monitoring Technology in Wireless Sensor NetworksEffective Pipeline Monitoring Technology in Wireless Sensor Networks
Effective Pipeline Monitoring Technology in Wireless Sensor Networks
IJTET Journal
 
Raspberry Pi Based Client-Server Synchronization Using GPRS
Raspberry Pi Based Client-Server Synchronization Using GPRSRaspberry Pi Based Client-Server Synchronization Using GPRS
Raspberry Pi Based Client-Server Synchronization Using GPRS
IJTET Journal
 
ECG Steganography and Hash Function Based Privacy Protection of Patients Medi...
ECG Steganography and Hash Function Based Privacy Protection of Patients Medi...ECG Steganography and Hash Function Based Privacy Protection of Patients Medi...
ECG Steganography and Hash Function Based Privacy Protection of Patients Medi...
IJTET Journal
 
An Efficient Decoding Algorithm for Concatenated Turbo-Crc Codes
An Efficient Decoding Algorithm for Concatenated Turbo-Crc CodesAn Efficient Decoding Algorithm for Concatenated Turbo-Crc Codes
An Efficient Decoding Algorithm for Concatenated Turbo-Crc Codes
IJTET Journal
 
Improved Trans-Z-source Inverter for Automobile Application
Improved Trans-Z-source Inverter for Automobile ApplicationImproved Trans-Z-source Inverter for Automobile Application
Improved Trans-Z-source Inverter for Automobile Application
IJTET Journal
 
Wind Energy Conversion System Using PMSG with T-Source Three Phase Matrix Con...
Wind Energy Conversion System Using PMSG with T-Source Three Phase Matrix Con...Wind Energy Conversion System Using PMSG with T-Source Three Phase Matrix Con...
Wind Energy Conversion System Using PMSG with T-Source Three Phase Matrix Con...
IJTET Journal
 
Comprehensive Path Quality Measurement in Wireless Sensor Networks
Comprehensive Path Quality Measurement in Wireless Sensor NetworksComprehensive Path Quality Measurement in Wireless Sensor Networks
Comprehensive Path Quality Measurement in Wireless Sensor Networks
IJTET Journal
 
Optimizing Data Confidentiality using Integrated Multi Query Services
Optimizing Data Confidentiality using Integrated Multi Query ServicesOptimizing Data Confidentiality using Integrated Multi Query Services
Optimizing Data Confidentiality using Integrated Multi Query Services
IJTET Journal
 
Foliage Measurement Using Image Processing Techniques
Foliage Measurement Using Image Processing TechniquesFoliage Measurement Using Image Processing Techniques
Foliage Measurement Using Image Processing Techniques
IJTET Journal
 
Harmonic Mitigation Method for the DC-AC Converter in a Single Phase System
Harmonic Mitigation Method for the DC-AC Converter in a Single Phase SystemHarmonic Mitigation Method for the DC-AC Converter in a Single Phase System
Harmonic Mitigation Method for the DC-AC Converter in a Single Phase System
IJTET Journal
 
Comparative Study on NDCT with Different Shell Supporting Structures
Comparative Study on NDCT with Different Shell Supporting StructuresComparative Study on NDCT with Different Shell Supporting Structures
Comparative Study on NDCT with Different Shell Supporting Structures
IJTET Journal
 
Experimental Investigation of Lateral Pressure on Vertical Formwork Systems u...
Experimental Investigation of Lateral Pressure on Vertical Formwork Systems u...Experimental Investigation of Lateral Pressure on Vertical Formwork Systems u...
Experimental Investigation of Lateral Pressure on Vertical Formwork Systems u...
IJTET Journal
 
A Five – Level Integrated AC – DC Converter
A Five – Level Integrated AC – DC ConverterA Five – Level Integrated AC – DC Converter
A Five – Level Integrated AC – DC Converter
IJTET Journal
 
A Comprehensive Approach for Multi Biometric Recognition Using Sclera Vein an...
A Comprehensive Approach for Multi Biometric Recognition Using Sclera Vein an...A Comprehensive Approach for Multi Biometric Recognition Using Sclera Vein an...
A Comprehensive Approach for Multi Biometric Recognition Using Sclera Vein an...
IJTET Journal
 
Study of Eccentrically Braced Outrigger Frame under Seismic Exitation
Study of Eccentrically Braced Outrigger Frame under Seismic ExitationStudy of Eccentrically Braced Outrigger Frame under Seismic Exitation
Study of Eccentrically Braced Outrigger Frame under Seismic Exitation
IJTET Journal
 

More from IJTET Journal (20)

Beaglebone Black Webcam Server For Security
Beaglebone Black Webcam Server For SecurityBeaglebone Black Webcam Server For Security
Beaglebone Black Webcam Server For Security
 
Biometrics Authentication Using Raspberry Pi
Biometrics Authentication Using Raspberry PiBiometrics Authentication Using Raspberry Pi
Biometrics Authentication Using Raspberry Pi
 
Conceal Traffic Pattern Discovery from Revealing Form of Ad Hoc Networks
Conceal Traffic Pattern Discovery from Revealing Form of Ad Hoc NetworksConceal Traffic Pattern Discovery from Revealing Form of Ad Hoc Networks
Conceal Traffic Pattern Discovery from Revealing Form of Ad Hoc Networks
 
Node Failure Prevention by Using Energy Efficient Routing In Wireless Sensor ...
Node Failure Prevention by Using Energy Efficient Routing In Wireless Sensor ...Node Failure Prevention by Using Energy Efficient Routing In Wireless Sensor ...
Node Failure Prevention by Using Energy Efficient Routing In Wireless Sensor ...
 
Prevention of Malicious Nodes and Attacks in Manets Using Trust worthy Method
Prevention of Malicious Nodes and Attacks in Manets Using Trust worthy MethodPrevention of Malicious Nodes and Attacks in Manets Using Trust worthy Method
Prevention of Malicious Nodes and Attacks in Manets Using Trust worthy Method
 
Effective Pipeline Monitoring Technology in Wireless Sensor Networks
Effective Pipeline Monitoring Technology in Wireless Sensor NetworksEffective Pipeline Monitoring Technology in Wireless Sensor Networks
Effective Pipeline Monitoring Technology in Wireless Sensor Networks
 
Raspberry Pi Based Client-Server Synchronization Using GPRS
Raspberry Pi Based Client-Server Synchronization Using GPRSRaspberry Pi Based Client-Server Synchronization Using GPRS
Raspberry Pi Based Client-Server Synchronization Using GPRS
 
ECG Steganography and Hash Function Based Privacy Protection of Patients Medi...
ECG Steganography and Hash Function Based Privacy Protection of Patients Medi...ECG Steganography and Hash Function Based Privacy Protection of Patients Medi...
ECG Steganography and Hash Function Based Privacy Protection of Patients Medi...
 
An Efficient Decoding Algorithm for Concatenated Turbo-Crc Codes
An Efficient Decoding Algorithm for Concatenated Turbo-Crc CodesAn Efficient Decoding Algorithm for Concatenated Turbo-Crc Codes
An Efficient Decoding Algorithm for Concatenated Turbo-Crc Codes
 
Improved Trans-Z-source Inverter for Automobile Application
Improved Trans-Z-source Inverter for Automobile ApplicationImproved Trans-Z-source Inverter for Automobile Application
Improved Trans-Z-source Inverter for Automobile Application
 
Wind Energy Conversion System Using PMSG with T-Source Three Phase Matrix Con...
Wind Energy Conversion System Using PMSG with T-Source Three Phase Matrix Con...Wind Energy Conversion System Using PMSG with T-Source Three Phase Matrix Con...
Wind Energy Conversion System Using PMSG with T-Source Three Phase Matrix Con...
 
Comprehensive Path Quality Measurement in Wireless Sensor Networks
Comprehensive Path Quality Measurement in Wireless Sensor NetworksComprehensive Path Quality Measurement in Wireless Sensor Networks
Comprehensive Path Quality Measurement in Wireless Sensor Networks
 
Optimizing Data Confidentiality using Integrated Multi Query Services
Optimizing Data Confidentiality using Integrated Multi Query ServicesOptimizing Data Confidentiality using Integrated Multi Query Services
Optimizing Data Confidentiality using Integrated Multi Query Services
 
Foliage Measurement Using Image Processing Techniques
Foliage Measurement Using Image Processing TechniquesFoliage Measurement Using Image Processing Techniques
Foliage Measurement Using Image Processing Techniques
 
Harmonic Mitigation Method for the DC-AC Converter in a Single Phase System
Harmonic Mitigation Method for the DC-AC Converter in a Single Phase SystemHarmonic Mitigation Method for the DC-AC Converter in a Single Phase System
Harmonic Mitigation Method for the DC-AC Converter in a Single Phase System
 
Comparative Study on NDCT with Different Shell Supporting Structures
Comparative Study on NDCT with Different Shell Supporting StructuresComparative Study on NDCT with Different Shell Supporting Structures
Comparative Study on NDCT with Different Shell Supporting Structures
 
Experimental Investigation of Lateral Pressure on Vertical Formwork Systems u...
Experimental Investigation of Lateral Pressure on Vertical Formwork Systems u...Experimental Investigation of Lateral Pressure on Vertical Formwork Systems u...
Experimental Investigation of Lateral Pressure on Vertical Formwork Systems u...
 
A Five – Level Integrated AC – DC Converter
A Five – Level Integrated AC – DC ConverterA Five – Level Integrated AC – DC Converter
A Five – Level Integrated AC – DC Converter
 
A Comprehensive Approach for Multi Biometric Recognition Using Sclera Vein an...
A Comprehensive Approach for Multi Biometric Recognition Using Sclera Vein an...A Comprehensive Approach for Multi Biometric Recognition Using Sclera Vein an...
A Comprehensive Approach for Multi Biometric Recognition Using Sclera Vein an...
 
Study of Eccentrically Braced Outrigger Frame under Seismic Exitation
Study of Eccentrically Braced Outrigger Frame under Seismic ExitationStudy of Eccentrically Braced Outrigger Frame under Seismic Exitation
Study of Eccentrically Braced Outrigger Frame under Seismic Exitation
 

Recently uploaded

Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 

Recently uploaded (20)

Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 

Speech Analysis and synthesis using Vocoder

  • 1. INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303 IJTET©2015 20 Speech Analysis and synthesis using Vocoder Kala A1 1 SNS College of Technology, ME ECE, Department of ECE, Coimbatore-641035 kalaalwar@gmail.com Vanitha S2 2 SNS College of Technology, Assistant Professor, Department of ECE, Coimbatore-641035 vanitharajanneel@gmail.com Abstract— In this paper, I proposed a speech analysis and synthesis using a vocoder. Voice conversion systems do not create new speech signals, but just transform existing one. The proposed speech vocoding is different from speech coding. To analyze the speech signal and represent it with less number of bits, so that bandwidth efficiency can be increased. The Synthesis of speech signal from the received bits of information. In this paper three aspects of analysis have been discussed: pitch refinement, spectral envelope estimation and maximum voiced frequency estimation. A Quasi-harmonic analysis model can be used to implement a pitch refinement algorithm which improves the accuracy of the spectral estimation. Harmonic plus noise model to reconstruct the speech signal from parameter. Finally to achieve the highest possible resynthesis quality using the lowest possible number of bits to transmit the speech signal. Future work aims at incorporating the phase information into the analysis and modeling process and also synthesis these three aspects in different pitch period. Index Terms— Frequency Cepstral Coefficient, Pitch Detection, Spectral Envelope Estimation, Maximum Voiced Frequency, Harmonic plus Noise Model. ——————————  ——————————s 1 INTRODUCTION oday, the synthesis quality and recognition rate are so that commercial applications are following from them. So, for speech synthesis, I can cite Telecommunications, Multimedia and Automobile with for example for Telecommunications, the vocalization of SMS, the reading of mails, the phone access to fax and e-mails, the consulting of databases, automatic answer phones (ex : Chrono Post) ; for Multimedia, the speech interface between man and machine, the help for teaching reading or / new languages (educational tools and / software), the help for teaching reading to blind people, bureaucratic tools ; and at last for the Automobile, the alert and video surveillance systems, the Internet access among others for the mail reading. Some companies as Nuance, Scan soft and Acapela-Group are present on the business market. It aims at the development of implementing a speech processing model called "Harmonics Plus Noise Model" (HNM). It‘s in fact a hybrid model since it decomposes the speech frames into a harmonic part and a noise part. It normally has to produce a high quality of artificial speech. In voice conversion systems do not create new speech signals, but just transform existing ones. This is the reason why this paper has been focused on synthesis. Understood in this context, speech vocoding is different from speech coding. The main goal of speech coding is to achieve the highest possible resynthesis quality using the lowest possible number of bits to transmit the speech signal. Real time performance during analysis and reconstruction is also one of its typical requirements. In the statistical parametric frameworks mentioned above, vocoders must have not only these high resynthesis capabilities, but also provide parameters that are adequate to statistically model the underlying structure of speech, while information compression is not a priority. Vocoders are a class of speech coding systems that analyze the voice signal at the transmitter, transmit parameters derived from the analysis, and then synthesize the voice at the analysis, and then synthesize the voice at the receiver using those parameters. All vocoder attempts to model the speech generation process as a dynamic system and try to quantify certain physical constraints of the system. These physical constraints are used to provide a parsimonious description of the speech signal [4]. Vocoders are, in general, much more complex than the waveform coders and achieve very high economy in transmission bit rate. However, they are less robust, and their performance tends to be talker dependent. The most popular among the vocoding systems is the linear predictive coder (LPC). The other vocoding schemes include the channel vocoder, formant vocoder, cepstrum vocoder and voice excited vocoder. Fig. 1 Speech Generation Model Fig 1. Shows the traditional speech generation model that is the basis of all vocoding systems. The sound generating mechanism forms the source and is linearly separated from the intelligence modulating vocal tract filter which forms the system. The speech signal is assumed to be of two types: voiced and unvoiced sound (―m‖, ―n‖, ―v‖ pronunciations) are a result of quasiperiodic vibrations of the T
  • 2. INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303 IJTET©2015 21 vocal chord and unvoiced sounds (―f‖, ―s‖, ―sh‖ pronunciations) are fricatives produced by turbulent air flow through a construction. The parameters associated with this model are the voice pitch, the pole frequencies of the modulating filter, and the corresponding amplitude parameters. The pitch frequency for most speakers is below 300 Hz, and extracting this information from the signal is very difficult. The pole frequencies correspond to the resonant frequencies of the vocal tract and are often called the formants of the speech signal. For adult speakers, the formants are centered around 500 Hz, 1500 Hz, 2500 Hz and 3500 Hz. By meticulously adjusting the parameters of the speech generation model, good quality speech can be synthesized [2]. MFCC Wrapping of signals in the frequency domain using 24 filter banks are done. This filter is developed based on the behavior of human ear‘s perception, or each tone of a voice signal with an actual frequency f, measured in Hz, it can also be found as a subjective pitch in Mel frequency scale [10]. The Mel frequency scale is determined to have a linear frequency relationship below 1000 Hz and a logarithmic relationship higher than 1000Hz. The Mel frequency higher than 1000 Hz is, Fig. 2 MFCC Processor In this final step, convert the log Mel spectrum returns to time. The result is called the Mel frequency Cepstrum coefficients (MFCC). 2 HNM ANALYSES The analysis consists of estimating the harmonic and noise parameters. By the decomposition into two independent parts, these are estimated separately. First, I have to separate the voiced frames from the unvoiced frames and then compute the parameters used for the synthesis. The Proposed system is shown in figure 2.1. Fig. 2.1 Block diagram of Harmonic plus Noise Model System The steps of the analysis process are the following:  First Estimation of fundamental frequency  Voiced/Unvoiced decision for each frame  Spectral Envelope Estimation  Estimation of the maximum voiced frequency FM  Refinement of the fundamental frequency 2.1 FIRST ESTIMATION OF THE FUNDAMENTAL FREQUENCY (PITCH): The first step consists of the estimation of the pitch f0. This parameter is estimated every 10ms. I will see that the length of the analysis window will depend on this local pitch. One method co- developed with the HNM algorithm is explained here. This method is based on an autocorrelation approach and is obtained by fitting the original signal with another defined by a pitch (sum of harmonics) in the frequency domain: Where are the Short Term Fourier Transform of the speech segment S (t) (Blackman weighted segment whose length is equal to 3 times the maximum fundamental period ) and , the Short Term Fourier Transform of a purely harmonic signal obtained from a fundamental frequency F0. To avoid some pitch errors, a ―peak tracking‖ method is needed. This has to look at two frames forward and backward from the current one. The minimum error path is found and by this way the pitch is associated.
  • 3. INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303 IJTET©2015 22 2.2 VOICED/UNVOICED DECISION The frames extracted every 10 ms (whose length is always 3 times T0max) have then to be classified as ―voiced‖ or ―unvoiced‖. We apply the Short-Term Fourier Transform (STFT) with a number of points NFFT equal to 4096 (with zero padding) to the current frame and we call it S (f). From this first STFT, we can evaluate the first four amplitudes of the harmonics (the first of which is the fundamental). We note S^ (f), which is a set of amplitudes of harmonics (of f0). The following criterion is applied: The frame is declared ―voiced‖ if E is less than the threshold of - 15dB, and ―unvoiced‖ otherwise. . 2.3 SPECTRAL ENVELOPE ESTIMATION: Assuming a simplified speech production model in which a pulse-or-noise excitation passes through a shaping filter, the term spectral envelope denotes the amplitude response of this Filter in frequency. Such an envelope contains not only the contribution of the vocal tract, but also the contribution of the glottal source. In unvoiced frames, the spectrum of the noise-like excitation is flat, which means that the response of the filter coincides with the spectrum of the signal itself (except for a scaling factor). In voiced frames, the spectrum of the pulse-like excitation has the form of an impulse train with constant amplitude and linear-in-frequency phase placed at multiples of, Therefore, the spectrum of the signal shows a series of peaks that result from multiplying the impulses of the excitation by uniformly spaced spectral samples of the filter response. Assuming local stationarity, full-band harmonic analysis returns these discrete samples of the spectral envelope. Then, a continuous envelope can be estimated via interpolation. 2.4 ESTIMATION OF MAXIMUM VOCED FREQUENCY This parameter is also estimated every 10ms. In the beginning, I work in the Interval of the absolute spectrum. I look for the greatest amplitude and the corresponding voiced frequency in this interval, which I denote Am and fc, respectively. I also compute the sum of the amplitudes (called the cumulative amplitude Amc) located between the two minima around the greatest voiced frequency. The other peaks in the band are also considered (occurring at frequencies denoted by fi) in the same interval, with the two types of amplitudes Am(fi) and Amc(fi). I compute the mean of these cumulative amplitudes, denoted by . Then apply the following test to the greatest frequency fc: If and -max{Am(i)}>13dB L being the number of the nearest harmonic of fc.. Then the frequency fc is declared ―voiced‖ and the next is considered and the same criterion is applied. The highest voiced frequency found will correspond to the maximum voiced frequency FM. However, to avoid mistakes, a 3 points, median smoothing filter is applied. As well, the frequency FM can vary greatly from one frame to the next. In order to reduce abrupt jumps I can also use another median filter on this time-varying frequency. Five points are in general used here. 2.5 REESTIMATION OF THE FUNDAMENTAL FREQUENCY Using the frequencies (fi) declared as voiced in the previous step, I try to minimize the following function: with L (i) representing the number of voiced frequencies and f0 the initial estimation of the pitch. The minimum is reached for the new estimation of the pitch. 3 RESULTS AND DISCUSSION In this paper, the enrollment of the user a data record was maintained in the database with different text information. This database contains 53 different voices (25 female, 28 male). The voices were taken from different speech synthesis and recognition databases in English. The specific utterances representing each voice were chosen randomly among candidates with a suitable duration (around 5 s). Although the recording conditions were database dependent, in all cases the sampling frequency was 16 kHz and the signal-to-noise ratio was checked to be high enough for analysis- synthesis purposes. Table 1 Information about the Speakers NUMBER OF MALE SPEAKERS NUMBER OF FEMALE SPEAKERS AVERAGE AGE LANGUAGE 25 28 24 ENGLISH The speech signal used in the training phase for a particular speaker is shown in Table 1. The first analysis step to be performed is pitch detection. Pitch detection algorithms (PDA) used to exhibiting very good performance when applied clean signals (that the signals involved in speech synthesis usually show high signal-to-noise ratio). The vocoder presented in this paper includes an implementation of the Autocorrelation-based algorithm.
  • 4. INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303 IJTET©2015 23 Fig 3.1 Autocorrelation of a Signal Fig 3.2 Pitch Estimation of a Signal Fig 3.3 Pitch Tracking of a Signal Fig 3.4 Mel scale Filter Bank In fig 3.1 shows the Autocorrelation output of a speech signal. Here the correlation coefficient of a signal have the length is 2*maxlag+1. The correlation coefficient has the maximum value is ‗1‘. After autocorrelation the pitch will be estimated from each and every frame using correlation coefficients. It is shown in fig 3.2. In fig 3.3 shows that the computation time, which should be minimized to remove the delay. Determination of voiced segment from voice, and the capable to remove the silence from sound. It provides good resolution as well as to avoid gross errors. When I sample a spoken syllable, we will be having many samples. Then I try to extract features from these sampled values. Cepstral coefficient calculation is one of such methods. Here I initially derive Short Term Fourier Transform of sampled values, then take their absolute value ( they can be complex) and calculate log of these absolute values. There after I, go for converting back them to time domain using Discrete Cosine Transform (DCT). I have done it for five users and first ten DCT coefficients are Cepstral coefficients. It takes into account physiological behavior of perception of human ear, which follows linear scale up to 1000 Hz and then follows the log scale. Hence I convert frequency to Mel domain using a number of filters. Then I take its absolute value, apply log function and convert back them into time domain using dct. For each user I had feature vectors having 20 MFCC coefficients each. For visualization purposes I only show few feature vectors and their MFCC. In the above figure I have only chosen few feature vectors. Each column refers to a feature vector. The element of each column and the corresponding MFCCs. As I had chosen the first 24 DCT coefficients, hence each column will be having 24 elements. In this paper the envelope of the signal estimates by using spectral envelope estimation. Fig 3.5 gives the spectral envelope estimation of speech signal. Fig 3.5 Spectral Envelope Estimation Fig 3.6 3-period Hanning Window Maximum Voiced Frequency (MVF) is used in various speech models as the spectral boundary separating periodic and a periodic components during the production of voiced sounds. Windowing is essential as it determines the harmonicity properties of
  • 5. INTERNATIONAL JOURNAL FOR TRENDS IN ENGINEERING & TECHNOLOGY VOLUME 3 ISSUE 3 – MARCH 2015 – ISSN: 2349 – 9303 IJTET©2015 24 the resulting spectra. In all cases, the window length should be proportional to the pitch period. In this work, I have used a 3 period-long Hanning window as I found it to be suited for the amplitude spectra to exhibit a good peak-to-valley structure. Fig 3.7 HMvsQHM From this figure the quasi harmonic model provides smooth signal than the Harmonic model. Fig 3.8 Resynthesized Speech Signal Periodic signals can be approximated by a sum of sinusoids whose frequencies are integer multiples of the fundamental frequency and whose magnitudes and phases can be uniquely determined to match the signal - so-called Fourier analysis. One manifestation of this is the spectrogram, which shows the short-time Fourier transform magnitude as a function of time. A narrowband spectrogram (i.e. one produced with a short-time window longer than the fundamental period of the sound) will reveal a series of nearly- horizontal, uniformly-spaced energy ridges, corresponding to the sinusoidal Fourier components or harmonics that are an equivalent representation of the sound waveform. Below is a spectrogram of a brief clarinet Melody; the harmonics are clearly defined. The key idea behind maximum voiced frequency is to represent each one of those ridges explicitly and separately as a set of frequency and magnitude values. The resulting analysis can be resynthesized by using low bit rate of information. 4 CONCLUSION In this paper three aspects of analysis have been discussed: pitch refinement, spectral envelope and maximum voiced frequency estimation. The proposed a vocoder system was analyzed under different windowing schemes and by varying the length of frames with different overlaps. From the analysis of the three techniques, the fundamental frequency (173.611 Hz) and the formant frequencies (5 different values) are obtained. The Quasi harmonic analysis model is used to implement a pitch refinement algorithm which improves the accuracy of the subsequent spectral envelope estimation. Harmonic plus noise model is also used to reconstruct the speech signals from parameters. Therefore the harmonic model yields more spectral envelopes than sinusoidal analysis. Future work aims at incorporating the phase information into the analysis and modeling process and also synthesis these three aspects in different pitch period. REFERENCES [1] H. Zen, K. Tokuda, and A. W. Black, ―Statistical parametric speech synthesis,‖ Speech Commun., vol. 51, no. 11, pp. 1039–1064, 2009. [2] Y. Stylianou, ―Harmonic plus noise models for speech, Combined with statistical methods, for speech and speaker modification,‖ Ph.D. dissertations, École Nationale Supèrieure de Télécommunications, Paris, France, 1996. [3] A.Kain, ―High resolution voice transformation,‖ Ph.D. dissertation, OGI School of Sci. and Eng. at OHSU, Portland , OR, 2001. [4] T. Toda, A. W. Black, and K. Tokuda, ―Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,‖ IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222–2235, Nov. 2007. [5] J. L. Flanagan, ―Parametric representation of speech signals,‖ IEEE Signal Process. Mag., vol. 27, no. 3, pp. 141–145, 2010. [6] HMM-Based Speech Synthesis System (HTS), [Online]. Available: http://hts.sp.nitech.ac.jp/ [7] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, ―Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis,‖ in Proc. Eurospeech, 1999, pp. 2347–2350. [8] K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, ―Mel- generalized cepstral analysis—A unified approach to speech spectral estimation,‖ Proc. Int. Conf. Spoken Lang. Process., vol. 3, pp. 1043–1046, 1994. [9] G.Senthil Raja, Dr.S.Dandapat, ―Performance of Selective Speech Features for Speaker Identification‖, Journal of the Institution of Engineers (India), Vol. 89, May 29, 2008 [10] Md.Rashidul Hasan, Mustafa Jamil Md.Golam Rabbani,Md.Saifur Rahman, ―Speaker Identification using Mel Frequency Cepstral Coefficients‖, 3rd International conference on Electrical and computer engineering ICECE 2004, Dec 2004 [11] Sandipan Chakroborty, Goutam Saha, ―Improved Text- Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter‖ , International Journal of Signal Processing 5:1, 2009 [12] I.Saratxaga, I. Hernaez, M. Pucher, E. Navas, and I. Sainz, ―Perceptual importance of the phase related information in speech,‖ in Proc. Interspeech, 2012. [13] I.Sainz, D. Erro, E. Navas, I. Hernaez, J. Sanchez, I. Saratxaga, I.Odriozola, and I. Luengo, ―Aholab Speech Synthesizers for Albayzin 2010,‖ in Proc. FALA, 2010, pp. 343–347. [14] H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, ―Hidden semi-Markov model based speech synthesis,‖ in Proc. ICSLP, 2004, vol. II, pp. 1397–1400.