SlideShare a Scribd company logo
1 of 7
Download to read offline
This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece 
An Introduction to Various Features of Speech Signal 
Compiled by: Sivaranjan Goswami, Pursuing M. Tech. (2013-15 batch) 
Dept. of ECE, Gauhati University, Guwahati, India 
Contact: sivgos@gmail.com 
Speech is the most fundamental mode of communication among human beings as well as many 
other creatures. In our day to day life we always communicate through speech. In the last few decades 
a large number of researches have been undergone to make use of speech to control various electronic 
systems. Speech has a number of advantages over hand control through panel and switches since 
speech can be easily transmitted over telephone channel and hence remote controlling of devices 
become easier using speech. 
The audibility range of human ear is 20Hz to 2 kHz. However, the frequency of human speech 
varies from 300 Hz to 3400 Hz. Thus according to Nyquist theorem, the sampling rate should be 
greater than or equal to 6800 Hz. In telecommunication, the sampling rate is considered to be 8 kHz. 
Therefore, Analog-to-Digital converters of mobile phones, sample the signal at a sampling rate of 8 
kHz. However, for multimedia applications, the sampling rate is usually much higher. In MP3 songs 
that we download, the sampling rate is usually 44100 Hz. This is the reason, why the quality of sound 
recorded using a mobile phone is very poor compared to MP3 songs that we download. 
1 
A. What is Speech Signal 
Speech or any sound is basically an acoustic signal that travels through air or any other material 
through expansion and compression of the particles. It is hence a pressure wave. A microphone is a 
transducer that converts this pressure wave into a voltage signal. 
A detailed description of the human speech generation system is beyond the scope of this discussion. 
However a brief discussion which is inevitable in the context of feature extraction is presented. The 
human speech production system is a complex mechanical system. The air exhaled by the lungs is 
modulated by various hard and soft tissues initially by the glottal fold and then by the tissues of the 
vocal tract such as tongue, lips, jaw, and velum. In Digital Speech Processing, this process is 
represented as a discrete time model as shown in figure 1. The system containing the lungs and the 
glottal fold comes in the block Excitation Generator. The vocal tract is modeled as a linear system, 
which is usually a digital FIR filter. The vocal tract parameters are the parameters of the digital filter. 
Fig. 1. Block diagram of speech generation model
This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece 
Based on the type of the excitation signal, a speech signal can be classified into two major types: 
1. Voiced Speech: Voiced sounds are produced by forcing air through the glottis or an opening 
between the vocal folds. The excitation is a quasi-stationary impulse train, that is, a signal 
whose frequency remain constant for a small amount of time, sometimes referred to as the 
stationarity period. Example of voiced speech are vowel sounds as in cat, hear, too etc. 
2. Unvoiced Speech: Unvoiced sounds are generated by forming a constriction at some point 
along the vocal tract, and forcing air through the constriction to produce turbulence. The 
excitation is a random signal. It can be modeled as a White Gaussian Noise. These are 
consonant sounds as in ship, key etc. 
It can be said that the voiced component of a word is responsible for its tone or shape of the waveform 
of the word, whereas the unvoiced section carries the actual meaning. The two waveforms below 
correspond to the words CUP and DUCK. Their voiced part is similar so the waveforms are also 
similar. 
As shown in figure 1 the speech production mechanism is modeled as a cascade combination of an 
excitation generator and a digital filter. The excitation of the filter determines the type of speech and 
the digital filter simulates the effect of various organs or tissues of the vocal tract on the excitation. 
The parameters of the filter are known as vocal tract parameters. The excitation is either an impulse 
train or a random noise based on whether the speech is voiced or unvoiced respectively. Thus figure 1 
can be drawn as figure 2. 
2
This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece 
Figure 2: Block diagram of speech generation model for Linear Predictive Analysis 
3 
B. Short-Time Analysis of Speech Signal 
From the above discussion we have seen that the properties of a speech signal remain same only 
for a short duration of time. Therefore, any kind of speech processing first requires segmentation of 
the speech signal into frames of short duration. The duration for which the properties of a speech 
signal remains stationary varies from speaker to speaker. It usually ranges from 15 to 25 milliseconds. 
It is a common practice to take the range as 20 milliseconds. If the speech signal is sampled at a rate 
of 8 kHz, it implies that there will are 160 samples per frame. 
Sometimes to overcome certain difficulties particular to some problem, the speech frames are 
overlapped or multiplied with some window function. Such cases are not covered in this tutorial. Such 
cases are discussed on the tutorial on “Short Term Spectral and Cepstral Analysis of Speech 
Signal”. 
C. Features of Speech Signal 
Till now we had a brief introduction to the generation and types of speech signal. Now we will come 
to feature extractions. 
1. Zero-Crossing Rate: 
Zero-crossing rate is a measure of frequency of the signal over a small period. It can be 
obtained by measuring the number of times the sign of the signal changes and dividing it by 
two.
This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece 
4 
Figure 3: Zero crossing 
It can be seen that during one period, the signal crosses zero twice. Thus for any frame, the zero-crossing 
rate (ZCR) is given by: 
ܼܥܴ = 
ܰ݋. ݋݂ ܵ݅݃݊ ܥℎܽ݊݃݁ݏ ݅݊ ݐℎ݁ ݂ݎܽ݉݁ 
ܨݎܽ݉݁ ݀ݑݎܽݐ݅݋݊ 
(ݏ݁ܿିଵ) 
2. Mean Square or Mean Magnitude value: 
This is a mean value of the signal for a particular frame ignoring the sign. The mean square value of 
the k-th frame is given by: 
ܲ௔௩௚(݇) = 
1 
ܰ 
ே௞ାேିଵ 
෍ ݔଶ(݊) 
௡ୀே௞ 
; ݇ = 0,1,2,…, 
ܮ − 1 
ܰ 
Similarly, the mean magnitude is given by: 
ܣ௔௩௚(݇) = 
1 
ܰ 
ே௞ାேିଵ 
෍ |ݔ(݊)| 
௡ୀே௞ 
; ݇ = 0,1,2,…, 
ܮ − 1 
ܰ 
Where, L is the total number of samples in a given audio clip. 
Both mean square and mean magnitude carries information about the short time energy of the signal. 
If the magnitude of the signal is normalized in the range [-1, 1], then the range of mean square value 
and mean magnitude value are also same [0, 1]. Usually a selection between these two parameters is 
done to determine a suitable threshold value. In case of mean square value, sometimes, it is easier to 
select a threshold value during some operation. 
In theory books, usually, these equations are written using sliding window. In this introductory 
tutorial I avoid that notation as the present notation is easier for implementing in a computer program. 
3. Voice Activity Detection
This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece 
We don’t speak continuously. During our speech, there are many pauses and breaks. To perform any 
speech processing, it is necessary to distinguish between presence and absence of speech in an audio 
clip. Presence or absence of speech in a short-duration frame can be easily determined if there is no 
background noise. If speech is not present, the mean magnitude value or mean square value is very 
small. On the other hand, a high value of mean magnitude or mean square value indicates presence of 
speech. If there is background noise, it is a challenging task to determine voice activity. Many 
literatures have been published for detection of voice activity in presence of background noise. 
4. Detection of Voiced and Unvoiced Speech 
Voiced and unvoiced speeches are already introduced. It is relevant to mention here that most of the 
features of a speech signal are extracted for voiced speech. Hence identification of voiced and 
unvoiced speech is another important task after voice activity detection. 
It is to be noted that, for voiced speech, the mean square value is large, whereas the zero-crossing rate 
is small. On the other hand, the zero-crossing rate for unvoiced speech is large and the average 
magnitude is very small. 
Detection of voiced and unvoiced speech is also a challenging task in presence of background noise. 
Usually voiced speech is somewhat easy to distinguish if background noise is stationary; however, the 
unvoiced speech is difficult. In this field also a number of literatures have been published. 
5 
5. Pitch and Pitch Period Estimation 
Pitch is the perceived fundamental frequency of musical note or voiced speech. It may not be same as 
the actual fundamental frequency of the speech signal. However, in many literatures, the terms pitch 
and fundamental frequency are used interchangeably. Pitch period is the fundamental period of voiced 
speech. Pitch estimation is a great challenge. 
Pitch is one of the most important parameters that are required for high level speech processing like 
speech recognition, speaker recognition etc. Everyone has a pitch range to which he or she is 
constrained by simple physics of his or her larynx. For men, the possible pitch range is usually found 
somewhere between the two bounds 50-250 Hz, while for women the range usually falls somewhere 
in the interval 120-500 Hz. Everyone has a "habitual pitch level," which is a sort of "preferred" pitch 
that will be used naturally on the average. Pitch is shifted up and down in speaking in response to 
factors relating to stress, intonation, and emotion. Stress* refers to a change in fundamental frequency 
and loudness to signify a change in emphasis of a syllable, word, or phrase. Intonation* is associated 
with the pitch contour over time and performs several functions in a language, the most important 
being to signal grammatical structure. The markings of sentence, clause, and other boundaries are 
accomplished through intonation patterns. 
There are many of literatures on pitch estimation techniques published in various journals and 
conferences worldwide. However, a classical approach using cepstral analysis has been discussed on 
the tutorial on “Short Term Spectral and Cepstral Analysis of Speech Signal”. 
*The terms stress and intonation are explained bellow at feature no.7
This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece 
6. Phonemics and Phonetics of Speech Signal 
Phonemes are the basic theoretical unit of speech. Each phoneme can be considered to be a code that 
consists of a unique set of articulatory gestures. In English there are about 42 phonemes. Due to many 
different factors including, for example, accents, gender, and, most importantly, coarticulatory effects, 
a given "phoneme" will have a variety of acoustic manifestations in the course of flowing speech. 
Therefore, any acoustic utterance that is clearly "supposed to be" that ideal phoneme, would be 
labeled as that phoneme. The phonemes of a language, therefore, comprise a minimal theoretical set 
of units that are sufficient to convey all meaning in the language. 
One common approach of speech recognition is to segment and distinguish the phonemes from 
phones (the sound produced in speaking). 
The study of the abstract units (phonemes) and their relationships in a language is called phonemics, 
while the study of the actual sounds of the language is called phonetics. More specifically, there are 
three branches of phonetics each of which approaches the subject somewhat differently: 
(a) Articulatory phonetics is concerned with the manner in which speech sounds are produced by 
6 
the articulators of the vocal system. 
(b) Acoustic phonetics studies the sounds of speech through analysis of the acoustic waveform. 
(c) Auditory phonetics studies the perceptual response to speech sounds as reflected in listener 
trials. 
In speech recognition systems or speech to text-conversion systems, a corpus is to be made that 
contains the whole set of phonemes of a particular language and corresponding letters or meanings. In 
languages like English, the same phoneme may correspond to a number of letters as there is no one-to- 
one correspondence between sounds and letters. In languages like Hindi or Assamese, it is 
somewhat simpler. But all languages have their own challenges. 
When a test speech is input to the system, it segments the speech into segments and identifies the 
corresponding phoneme in the corpus using some suitable algorithm such as Dynamic Time Wrapping 
(DTW) etc. The succeeding and preceding phoneme helps to resolve ambiguity if a phoneme 
corresponds to more than one letter. These systems are really complex and beyond the scope of this 
basic tutorial. 
7. Prosodic Features: Stress and Intonation of Speech 
The tonal and rhythmic aspects of speech are generally called prosodic features. These features have 
significant contributions to the formal linguistic structure of a language. These features extend over 
more than one phoneme; therefore such features are also known as suprasegmental. 
Prosodic features are created by certain special manipulations of the speech production system during 
the normal sequence of phoneme production. These manipulations are categorized as either source 
factors or vocal-tract shaping factors. The source factors are based on subtle changes in the speech 
breathing muscles and vocal folds, while the vocal-tract shaping factors operate via movements of the 
upper articulators. The acoustic patterns of prosodic features are heard in systematic changes in 
duration, intensity, fundamental frequency, and spectral patterns of the individual phonemes. 
Stress and intonation are most important prosodic features of speech signal. Stress refers to a change 
in fundamental frequency and loudness to signify a change in emphasis of a syllable, word, or phrase. 
Intonation is associated with the pitch contour over time and performs several functions in a language, 
the most important being to signal grammatical structure. The marking of sentence, clause, and other 
boundaries is accomplished through intonation patterns.
This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece 
Stress is used to distinguish similar phonetic sequences or to highlight a syllable or word against a 
background of unstressed syllables. For example, consider the two phrases "That is insight" and "That 
is in sight." In the first phrase there is stress on "in" but "sight" is unstressed, while the converse is 
true in the second phrase. 
Extraction of features like stress or intonation can be performed using pattern recognition based 
approach using various methods. 
*Prosodic feature extraction and speech recognition need very in-depth study of the subject. 
Here I have given only a hint of such features. 
Suggested book: “Discrete Time Processing of Speech Signal” by John R. Deller, John H. L. 
Hansen and John G. Prokais. 
References: 
[1] Lawrence R. Rabiner and Ronald W. Schafer, “Introduction to Digital Speech Processing”, now 
Publishers Inc. 
[2] John R. Deller, John H. L. Hansen and John G. Prokais, “Discrete Time Processing of Speech 
Signal”, The Instituteof Electrical and Electronics Engineers (IEEE), lnc.,NewYork. 
*Download links of both of these two books are available at the website. 
7

More Related Content

What's hot

Presentation natural-classes-and-naturalness-2-3
Presentation natural-classes-and-naturalness-2-3Presentation natural-classes-and-naturalness-2-3
Presentation natural-classes-and-naturalness-2-3Mohamed Benhima
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basicssivakumar m
 
Digital communication systems
Digital communication systemsDigital communication systems
Digital communication systemsNisreen Bashar
 
Voice therapy to treat voice disorders
Voice  therapy to treat voice disordersVoice  therapy to treat voice disorders
Voice therapy to treat voice disordersGirish S
 
Anatomy of speech production
Anatomy of speech productionAnatomy of speech production
Anatomy of speech productionbethfernandezaud
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IAmr E. Mohamed
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive CodingSrishti Kakade
 
Decimation and Interpolation
Decimation and InterpolationDecimation and Interpolation
Decimation and InterpolationFernando Ojeda
 
Introduction to Analog signal
Introduction to Analog signalIntroduction to Analog signal
Introduction to Analog signalHirdesh Vishwdewa
 

What's hot (20)

Speech encoding techniques
Speech encoding techniquesSpeech encoding techniques
Speech encoding techniques
 
Presentation natural-classes-and-naturalness-2-3
Presentation natural-classes-and-naturalness-2-3Presentation natural-classes-and-naturalness-2-3
Presentation natural-classes-and-naturalness-2-3
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basics
 
Spectrograms
SpectrogramsSpectrograms
Spectrograms
 
Digital communication systems
Digital communication systemsDigital communication systems
Digital communication systems
 
PHONES
PHONESPHONES
PHONES
 
Voice therapy to treat voice disorders
Voice  therapy to treat voice disordersVoice  therapy to treat voice disorders
Voice therapy to treat voice disorders
 
Syntax analysis
Syntax analysisSyntax analysis
Syntax analysis
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniques
 
Anatomy of speech production
Anatomy of speech productionAnatomy of speech production
Anatomy of speech production
 
Lec 2
Lec 2Lec 2
Lec 2
 
Pll ppt
Pll pptPll ppt
Pll ppt
 
Speech acoustics
Speech acousticsSpeech acoustics
Speech acoustics
 
Subband Coding
Subband CodingSubband Coding
Subband Coding
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Decimation and Interpolation
Decimation and InterpolationDecimation and Interpolation
Decimation and Interpolation
 
Introduction to Analog signal
Introduction to Analog signalIntroduction to Analog signal
Introduction to Analog signal
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
Mobile computing unit 1
Mobile computing unit 1Mobile computing unit 1
Mobile computing unit 1
 

Similar to An Introduction to Various Features of Speech SignalSpeech features

Iciiecs1461
Iciiecs1461Iciiecs1461
Iciiecs1461phyuhsan
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technologySrijanKumar18
 
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...IRJET Journal
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depressionijsrd.com
 
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...IJMER
 
SpeechProcessing_using_Librosa__1___1_.pptx
SpeechProcessing_using_Librosa__1___1_.pptxSpeechProcessing_using_Librosa__1___1_.pptx
SpeechProcessing_using_Librosa__1___1_.pptxvivek87yogi
 
Silent Sound Technology
Silent Sound TechnologySilent Sound Technology
Silent Sound TechnologyHafiz Sanni
 
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...ijistjournal
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition Systemijtsrd
 

Similar to An Introduction to Various Features of Speech SignalSpeech features (20)

Iciiecs1461
Iciiecs1461Iciiecs1461
Iciiecs1461
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Speech Analysis
Speech AnalysisSpeech Analysis
Speech Analysis
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
50120140501002
5012014050100250120140501002
50120140501002
 
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
 
D011132635
D011132635D011132635
D011132635
 
SpeechProcessing_using_Librosa__1___1_.pptx
SpeechProcessing_using_Librosa__1___1_.pptxSpeechProcessing_using_Librosa__1___1_.pptx
SpeechProcessing_using_Librosa__1___1_.pptx
 
Silent Sound Technology
Silent Sound TechnologySilent Sound Technology
Silent Sound Technology
 
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
VOWEL PHONEME RECOGNITION BASED ON AVERAGE ENERGY INFORMATION IN THE ZEROCROS...
 
Final thesis
Final thesisFinal thesis
Final thesis
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition System
 

More from Sivaranjan Goswami

Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...
Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...
Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...Sivaranjan Goswami
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with PythonSivaranjan Goswami
 
AI-ML in Business: Unlocking Opportunities and Navigating Challenges
AI-ML in Business: Unlocking Opportunities and Navigating ChallengesAI-ML in Business: Unlocking Opportunities and Navigating Challenges
AI-ML in Business: Unlocking Opportunities and Navigating ChallengesSivaranjan Goswami
 
Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...
Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...
Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...Sivaranjan Goswami
 

More from Sivaranjan Goswami (6)

Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...
Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...
Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with Python
 
AI-ML in Business: Unlocking Opportunities and Navigating Challenges
AI-ML in Business: Unlocking Opportunities and Navigating ChallengesAI-ML in Business: Unlocking Opportunities and Navigating Challenges
AI-ML in Business: Unlocking Opportunities and Navigating Challenges
 
Antenna
AntennaAntenna
Antenna
 
Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...
Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...
Introduction to Embedded C for 8051 and Implementation of Timer and Interrupt...
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 

Recently uploaded

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...vershagrag
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 

Recently uploaded (20)

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 

An Introduction to Various Features of Speech SignalSpeech features

  • 1. This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece An Introduction to Various Features of Speech Signal Compiled by: Sivaranjan Goswami, Pursuing M. Tech. (2013-15 batch) Dept. of ECE, Gauhati University, Guwahati, India Contact: sivgos@gmail.com Speech is the most fundamental mode of communication among human beings as well as many other creatures. In our day to day life we always communicate through speech. In the last few decades a large number of researches have been undergone to make use of speech to control various electronic systems. Speech has a number of advantages over hand control through panel and switches since speech can be easily transmitted over telephone channel and hence remote controlling of devices become easier using speech. The audibility range of human ear is 20Hz to 2 kHz. However, the frequency of human speech varies from 300 Hz to 3400 Hz. Thus according to Nyquist theorem, the sampling rate should be greater than or equal to 6800 Hz. In telecommunication, the sampling rate is considered to be 8 kHz. Therefore, Analog-to-Digital converters of mobile phones, sample the signal at a sampling rate of 8 kHz. However, for multimedia applications, the sampling rate is usually much higher. In MP3 songs that we download, the sampling rate is usually 44100 Hz. This is the reason, why the quality of sound recorded using a mobile phone is very poor compared to MP3 songs that we download. 1 A. What is Speech Signal Speech or any sound is basically an acoustic signal that travels through air or any other material through expansion and compression of the particles. It is hence a pressure wave. A microphone is a transducer that converts this pressure wave into a voltage signal. A detailed description of the human speech generation system is beyond the scope of this discussion. However a brief discussion which is inevitable in the context of feature extraction is presented. The human speech production system is a complex mechanical system. The air exhaled by the lungs is modulated by various hard and soft tissues initially by the glottal fold and then by the tissues of the vocal tract such as tongue, lips, jaw, and velum. In Digital Speech Processing, this process is represented as a discrete time model as shown in figure 1. The system containing the lungs and the glottal fold comes in the block Excitation Generator. The vocal tract is modeled as a linear system, which is usually a digital FIR filter. The vocal tract parameters are the parameters of the digital filter. Fig. 1. Block diagram of speech generation model
  • 2. This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece Based on the type of the excitation signal, a speech signal can be classified into two major types: 1. Voiced Speech: Voiced sounds are produced by forcing air through the glottis or an opening between the vocal folds. The excitation is a quasi-stationary impulse train, that is, a signal whose frequency remain constant for a small amount of time, sometimes referred to as the stationarity period. Example of voiced speech are vowel sounds as in cat, hear, too etc. 2. Unvoiced Speech: Unvoiced sounds are generated by forming a constriction at some point along the vocal tract, and forcing air through the constriction to produce turbulence. The excitation is a random signal. It can be modeled as a White Gaussian Noise. These are consonant sounds as in ship, key etc. It can be said that the voiced component of a word is responsible for its tone or shape of the waveform of the word, whereas the unvoiced section carries the actual meaning. The two waveforms below correspond to the words CUP and DUCK. Their voiced part is similar so the waveforms are also similar. As shown in figure 1 the speech production mechanism is modeled as a cascade combination of an excitation generator and a digital filter. The excitation of the filter determines the type of speech and the digital filter simulates the effect of various organs or tissues of the vocal tract on the excitation. The parameters of the filter are known as vocal tract parameters. The excitation is either an impulse train or a random noise based on whether the speech is voiced or unvoiced respectively. Thus figure 1 can be drawn as figure 2. 2
  • 3. This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece Figure 2: Block diagram of speech generation model for Linear Predictive Analysis 3 B. Short-Time Analysis of Speech Signal From the above discussion we have seen that the properties of a speech signal remain same only for a short duration of time. Therefore, any kind of speech processing first requires segmentation of the speech signal into frames of short duration. The duration for which the properties of a speech signal remains stationary varies from speaker to speaker. It usually ranges from 15 to 25 milliseconds. It is a common practice to take the range as 20 milliseconds. If the speech signal is sampled at a rate of 8 kHz, it implies that there will are 160 samples per frame. Sometimes to overcome certain difficulties particular to some problem, the speech frames are overlapped or multiplied with some window function. Such cases are not covered in this tutorial. Such cases are discussed on the tutorial on “Short Term Spectral and Cepstral Analysis of Speech Signal”. C. Features of Speech Signal Till now we had a brief introduction to the generation and types of speech signal. Now we will come to feature extractions. 1. Zero-Crossing Rate: Zero-crossing rate is a measure of frequency of the signal over a small period. It can be obtained by measuring the number of times the sign of the signal changes and dividing it by two.
  • 4. This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece 4 Figure 3: Zero crossing It can be seen that during one period, the signal crosses zero twice. Thus for any frame, the zero-crossing rate (ZCR) is given by: ܼܥܴ = ܰ݋. ݋݂ ܵ݅݃݊ ܥℎܽ݊݃݁ݏ ݅݊ ݐℎ݁ ݂ݎܽ݉݁ ܨݎܽ݉݁ ݀ݑݎܽݐ݅݋݊ (ݏ݁ܿିଵ) 2. Mean Square or Mean Magnitude value: This is a mean value of the signal for a particular frame ignoring the sign. The mean square value of the k-th frame is given by: ܲ௔௩௚(݇) = 1 ܰ ே௞ାேିଵ ෍ ݔଶ(݊) ௡ୀே௞ ; ݇ = 0,1,2,…, ܮ − 1 ܰ Similarly, the mean magnitude is given by: ܣ௔௩௚(݇) = 1 ܰ ே௞ାேିଵ ෍ |ݔ(݊)| ௡ୀே௞ ; ݇ = 0,1,2,…, ܮ − 1 ܰ Where, L is the total number of samples in a given audio clip. Both mean square and mean magnitude carries information about the short time energy of the signal. If the magnitude of the signal is normalized in the range [-1, 1], then the range of mean square value and mean magnitude value are also same [0, 1]. Usually a selection between these two parameters is done to determine a suitable threshold value. In case of mean square value, sometimes, it is easier to select a threshold value during some operation. In theory books, usually, these equations are written using sliding window. In this introductory tutorial I avoid that notation as the present notation is easier for implementing in a computer program. 3. Voice Activity Detection
  • 5. This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece We don’t speak continuously. During our speech, there are many pauses and breaks. To perform any speech processing, it is necessary to distinguish between presence and absence of speech in an audio clip. Presence or absence of speech in a short-duration frame can be easily determined if there is no background noise. If speech is not present, the mean magnitude value or mean square value is very small. On the other hand, a high value of mean magnitude or mean square value indicates presence of speech. If there is background noise, it is a challenging task to determine voice activity. Many literatures have been published for detection of voice activity in presence of background noise. 4. Detection of Voiced and Unvoiced Speech Voiced and unvoiced speeches are already introduced. It is relevant to mention here that most of the features of a speech signal are extracted for voiced speech. Hence identification of voiced and unvoiced speech is another important task after voice activity detection. It is to be noted that, for voiced speech, the mean square value is large, whereas the zero-crossing rate is small. On the other hand, the zero-crossing rate for unvoiced speech is large and the average magnitude is very small. Detection of voiced and unvoiced speech is also a challenging task in presence of background noise. Usually voiced speech is somewhat easy to distinguish if background noise is stationary; however, the unvoiced speech is difficult. In this field also a number of literatures have been published. 5 5. Pitch and Pitch Period Estimation Pitch is the perceived fundamental frequency of musical note or voiced speech. It may not be same as the actual fundamental frequency of the speech signal. However, in many literatures, the terms pitch and fundamental frequency are used interchangeably. Pitch period is the fundamental period of voiced speech. Pitch estimation is a great challenge. Pitch is one of the most important parameters that are required for high level speech processing like speech recognition, speaker recognition etc. Everyone has a pitch range to which he or she is constrained by simple physics of his or her larynx. For men, the possible pitch range is usually found somewhere between the two bounds 50-250 Hz, while for women the range usually falls somewhere in the interval 120-500 Hz. Everyone has a "habitual pitch level," which is a sort of "preferred" pitch that will be used naturally on the average. Pitch is shifted up and down in speaking in response to factors relating to stress, intonation, and emotion. Stress* refers to a change in fundamental frequency and loudness to signify a change in emphasis of a syllable, word, or phrase. Intonation* is associated with the pitch contour over time and performs several functions in a language, the most important being to signal grammatical structure. The markings of sentence, clause, and other boundaries are accomplished through intonation patterns. There are many of literatures on pitch estimation techniques published in various journals and conferences worldwide. However, a classical approach using cepstral analysis has been discussed on the tutorial on “Short Term Spectral and Cepstral Analysis of Speech Signal”. *The terms stress and intonation are explained bellow at feature no.7
  • 6. This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece 6. Phonemics and Phonetics of Speech Signal Phonemes are the basic theoretical unit of speech. Each phoneme can be considered to be a code that consists of a unique set of articulatory gestures. In English there are about 42 phonemes. Due to many different factors including, for example, accents, gender, and, most importantly, coarticulatory effects, a given "phoneme" will have a variety of acoustic manifestations in the course of flowing speech. Therefore, any acoustic utterance that is clearly "supposed to be" that ideal phoneme, would be labeled as that phoneme. The phonemes of a language, therefore, comprise a minimal theoretical set of units that are sufficient to convey all meaning in the language. One common approach of speech recognition is to segment and distinguish the phonemes from phones (the sound produced in speaking). The study of the abstract units (phonemes) and their relationships in a language is called phonemics, while the study of the actual sounds of the language is called phonetics. More specifically, there are three branches of phonetics each of which approaches the subject somewhat differently: (a) Articulatory phonetics is concerned with the manner in which speech sounds are produced by 6 the articulators of the vocal system. (b) Acoustic phonetics studies the sounds of speech through analysis of the acoustic waveform. (c) Auditory phonetics studies the perceptual response to speech sounds as reflected in listener trials. In speech recognition systems or speech to text-conversion systems, a corpus is to be made that contains the whole set of phonemes of a particular language and corresponding letters or meanings. In languages like English, the same phoneme may correspond to a number of letters as there is no one-to- one correspondence between sounds and letters. In languages like Hindi or Assamese, it is somewhat simpler. But all languages have their own challenges. When a test speech is input to the system, it segments the speech into segments and identifies the corresponding phoneme in the corpus using some suitable algorithm such as Dynamic Time Wrapping (DTW) etc. The succeeding and preceding phoneme helps to resolve ambiguity if a phoneme corresponds to more than one letter. These systems are really complex and beyond the scope of this basic tutorial. 7. Prosodic Features: Stress and Intonation of Speech The tonal and rhythmic aspects of speech are generally called prosodic features. These features have significant contributions to the formal linguistic structure of a language. These features extend over more than one phoneme; therefore such features are also known as suprasegmental. Prosodic features are created by certain special manipulations of the speech production system during the normal sequence of phoneme production. These manipulations are categorized as either source factors or vocal-tract shaping factors. The source factors are based on subtle changes in the speech breathing muscles and vocal folds, while the vocal-tract shaping factors operate via movements of the upper articulators. The acoustic patterns of prosodic features are heard in systematic changes in duration, intensity, fundamental frequency, and spectral patterns of the individual phonemes. Stress and intonation are most important prosodic features of speech signal. Stress refers to a change in fundamental frequency and loudness to signify a change in emphasis of a syllable, word, or phrase. Intonation is associated with the pitch contour over time and performs several functions in a language, the most important being to signal grammatical structure. The marking of sentence, clause, and other boundaries is accomplished through intonation patterns.
  • 7. This Tutorial is Downloaded from: https://sites.google.com/site/enggprojectece Stress is used to distinguish similar phonetic sequences or to highlight a syllable or word against a background of unstressed syllables. For example, consider the two phrases "That is insight" and "That is in sight." In the first phrase there is stress on "in" but "sight" is unstressed, while the converse is true in the second phrase. Extraction of features like stress or intonation can be performed using pattern recognition based approach using various methods. *Prosodic feature extraction and speech recognition need very in-depth study of the subject. Here I have given only a hint of such features. Suggested book: “Discrete Time Processing of Speech Signal” by John R. Deller, John H. L. Hansen and John G. Prokais. References: [1] Lawrence R. Rabiner and Ronald W. Schafer, “Introduction to Digital Speech Processing”, now Publishers Inc. [2] John R. Deller, John H. L. Hansen and John G. Prokais, “Discrete Time Processing of Speech Signal”, The Instituteof Electrical and Electronics Engineers (IEEE), lnc.,NewYork. *Download links of both of these two books are available at the website. 7