SlideShare a Scribd company logo
1 of 144
EC6007 SPEECH PROCESSING
1
AAZHAGUJAISUDHANRITECE
Course Objectives:
1. To enable the students to learn the fundamentals and
classification of speech sounds.
2. To make the students to analyze and compare different
speech parameters using various methods.
3. To equip the students with various speech modelling
techniques.
4. To enable the students to acquire knowledge on various
speech recognition systems.
5. To gain knowledge about the various methods used for the
process of speech synthesis.
2
AAZHAGUJAISUDHANRITECE
Course Outcomes:
After completion of the course, it is expected that:
The students will be able to
1. Explain the fundamentals and classification of speech sounds.
2. Analyse, Extract and compare the various speech parameters.
3. Apply an appropriate speech model for a given application.
4. Explain the various speech recognition systems.
5. Apply different speech synthesis techniques depending upon
the classification of speech parameters.
3
AAZHAGUJAISUDHANRITECE
UNIT I BASIC CONCEPTS
Speech Fundamentals: Articulatory Phonetics – Production and Classification of Speech
Sounds; Acoustic Phonetics – Acoustics of speech production; Review of Digital Signal
Processing concepts; Short-Time Fourier Transform, Filter-Bank and LPC Methods.
UNIT II SPEECH ANALYSIS
Features, Feature Extraction and Pattern Comparison Techniques: Speech distortion
measures– mathematical and perceptual – Log–Spectral Distance, Cepstral Distances,
Weighted Cepstral Distances and Filtering, Likelihood Distortions, Spectral Distortion using
a Warped Frequency Scale, LPC, PLP and MFCC Coefficients, Time Alignment and
Normalization – Dynamic Time Warping, Multiple Time – Alignment Paths.
UNIT III SPEECH MODELING
Hidden Markov Models: Markov Processes, HMMs – Evaluation, Optimal State Sequence –
Viterbi Search, Baum-Welch Parameter Re-estimation, Implementation issues.
UNIT IV SPEECH RECOGNITION
Large Vocabulary Continuous Speech Recognition: Architecture of a large vocabulary
continuous speech recognition system – acoustics and language models – n-grams, context
dependent sub-word units; Applications and present status.
UNIT V SPEECH SYNTHESIS
Text-to-Speech Synthesis: Concatenative and waveform synthesis methods, sub-word units
for TTS, intelligibility and naturalness – role of prosody, Applications and present status.
4
AAZHAGUJAISUDHANRITECE
INTRODUCTION
 Speech processing is the study of speech signals and the
processing methods of these signals.
 Speech Processing is the application of DSP techniques to the
processing and/or analysis of speech signals.
5
AAZHAGUJAISUDHANRITECE
 Speech is the most natural form of human-
human communications.
 Speech is related to language; linguistics is a
branch of social science.
 Speech is related to human physiological
capability; physiology is a branch of medical
science.
 Speech is also related to sound and acoustics, a
branch of physical science.
 Therefore, speech is one of the most intriguing
signals that humans work with every day.
6
AAZHAGUJAISUDHANRITECE
Speech Processing
Signal
Processing Information
Theory
Phonetics
Acoustics
Algorithms
(Programming)
Fourier transforms
Discrete time filters
AR(MA) models
Entropy
Communication theory
Rate-distortion theory
Statistical SP
Stochastic
models
Psychoacoustics
Room acoustics
Speech production
7
AAZHAGUJAISUDHANRITECE
 Analysis of speech signals:
 Fourier analysis; spectrogram
 Autocorrelation; pitch estimation
 Linear prediction; compression, recognition
 Cepstral analysis; pitch estimation, enhancement
8
AAZHAGUJAISUDHANRITECE
 Speech coding: Compression of speech signals for
telecommunication
 Speech recognition: Extracting the linguistic content of
the speech signal
 Speaker recognition: Recognizing the identity of
speakers by their voice
 Speech synthesis: Computer generated speech
(e.g., from text)
 Speech enhancement: Improving intelligibility or
perceptual quality of speech signal
9
AAZHAGUJAISUDHANRITECE
APPLICATIONS
 Translation of spoken language into text by
computers
 Voice user interfaces such as voice dialing
(Call home)
 Speech to text processing (Word processors or
emails)
 Recognizing the speaker
10
AAZHAGUJAISUDHANRITECE
APPLICATIONS OF SPEECH
PROCESSING
 –Human computer interfaces(e.g., speechI/O,
affective)
 –Telecommunication(e.g., speech enhancement,
translation)
 –Assistive technologies(e.g., blindness/deafness,
language learning)
 –Audio mining(e.g., diarization, tagging)
 –Security (e.g., biometrics, forensics)
11
AAZHAGUJAISUDHANRITECE
SPEECH PRODUCTION
Lungs
12
AAZHAGUJAISUDHANRITECE
13
AAZHAGUJAISUDHANRITECE
Speech Generation
•The production process (generation) begins when the talker
formulates a message in his mind which he wants to transmit to
the listener via speech.
•In case of machine
•First step: message formation in terms of printed text.
•Next step: conversion of the message into a language code.
•After the language code is chosen the talker must execute a
series of neuromuscular commands to cause the vocal cord to
vibrate such that the proper sequence of speech sounds is created.
•The neuromuscular commands must simultaneously control the
movement of lips, jaw, tongue, and velum.
14
AAZHAGUJAISUDHANRITECE
SPEECH PERCEPTION
 The speech signal is generated and propagated to the
listener, the speech perception (recognition) process
begins.
 First the listener processes the acoustic signal along
the basilar membrane in the inner ear, which
provides running spectral analysis of the incoming
signal.
 A neural transduction process converts the spectral
signal into activity signals on the auditory nerve.
 Finally the message comprehension (understanding of
meaning) is achieved.
15
AAZHAGUJAISUDHANRITECE
SOUND PERCEPTION
 The audible frequency range for human is
approximately 20Hz to 20KHz
 The three distinct parts of the ear are outer ear,
middle ear and inner ear.
 Outer ear:
 The perceived sound is sensitive to the pinna’s shape
 By changing the pinnas shape the sound quality
alters as well as background noise
 After passing through ear cannal sound wave strikes
the eardrum which is part of middle ear.
16
AAZHAGUJAISUDHANRITECE
17
AAZHAGUJAISUDHANRITECE
18
AAZHAGUJAISUDHANRITECE
MIDDLE EAR
EAR DRUM
 This oscillates with the frequency as that of the
sound wave
 Movements of this membrane are then transmitted
through the system of small bones called as ossicular
system
 From ossicular system to cochlea.
 Inner ear
 It consist of two membranes Reissner’s membrane and
basilar membrane
 When vibrations enter cochlea they stimulate 20,000 to
30,000 stiff hairs on the basilar membrane
 These hair in turn vibrate and generate electrical signal
that travel to the brain and become sound 19
AAZHAGUJAISUDHANRITECE
PHONEME HIERARCHY
Speech sounds
Vowels ConsonantsDiphtongs
Plosive
Nasal
Fricative
Retroflex
liquid
Lateral
liquid
Glide
iy, ih, ae, aa,
ah, ao,ax, eh,
er, ow, uh, uw
ay, ey,
oy, aw
w, y
p, b, t,
d, k, g
m, n, ng f, v, th, dh,
s, z, sh, zh, h
r
l
Language dependent.
About 50 in English.
20
AAZHAGUJAISUDHANRITECE
SPEECH WAVEFORM
CHARACTERISTICS
 Loudness
 Voiced/Unvoiced.
 Pitch.
 Fundamental frequency.
 Spectral envelope.
 Formants.
21
AAZHAGUJAISUDHANRITECE
THE SPEECH STACK
22
AAZHAGUJAISUDHANRITECE
SPEECH CLASSIFICATION
23
AAZHAGUJAISUDHANRITECE
VOWELS
 Vowels are produced by exciting an essentially fixed
vocal tract shape with quasi periodic pulses of air caused
by the vibration of the vocal cords.
 A speech sound produced by humans when the breath
flows out through the mouth without being blocked by
the teeth, tongue, or lips
A short vowel is a short sound as in the word "cup"
A long vowel is a long sound as in the word "shoe"
24
AAZHAGUJAISUDHANRITECE
VOWELS
25
AAZHAGUJAISUDHANRITECE
VOWELS
26
AAZHAGUJAISUDHANRITECE
WHY VOWELS ARE EASILY
DECODABLE?
 Vowels are generally long in duration as
compared to consonants.
 Spectrally well defined
 Vowels are easily and reliably recognized by both
human and machine.
 Vowels can be subdivided into three sub groups
based on tongue hump being along the front,
central and back part of the palate. 27
AAZHAGUJAISUDHANRITECE
VOWELS
 For the vowel /i/ - eve, beat- the vocal tract is
open at the back, the tongue is raised at the front
and there is a high degree of constriction of the
tongue against the palate
 For the vowel /a/ - father, bob - the vocal tract is
open at the front, the tongue is raised at the back
and there is a low degree of constriction by the
tongue against the palate
28
AAZHAGUJAISUDHANRITECE
 i – IY - beat, eve
 I – IH – bit
 e – EH – bet, hate 29
AAZHAGUJAISUDHANRITECE
 a – AA - Bob
 - AH- butə
30
AAZHAGUJAISUDHANRITECE
 u – UW - boot
 U – UH –book
 O – OW -boat
31
AAZHAGUJAISUDHANRITECE
32
AAZHAGUJAISUDHANRITECE
DIPHTONGS
 Diphthongs is a gliding monosyllabic speech sound
that starts at or near the articulatory position for
one vowel and moves to or toward the position for
another.
 According to this there are six diphthongs in
American english.
 Examples : Buy, boy, down, bait
33
AAZHAGUJAISUDHANRITECE
DIPHTONGS
 A vowel sound in which the tongue changes position to
produce the sound of two vowels
 A sound formed by the combination of two vowels in a
single syllable
34
AAZHAGUJAISUDHANRITECE
SEMIVOWELS
 Groups of sound consisting of /w/ - W - Wit
/l/ - L – Let, /r/ - R – rent is quite difficult to
characterize.
 These sounds are called semivowels because of
their vowel like nature.
 It is characterized by a gliding transition in vocal
tract area function between adjacent phonemes.
35
AAZHAGUJAISUDHANRITECE
LIQUIDS
 Liquids is a consonant produced when the tongue approaches a
point of articulation within the mouth but does not come close
enough to obstruct or constrict the flow of air enough to create
turbulence (as with fricatives).
 The primary difference between liquids and glides is that with a
liquid, the tip of the tongue is used, whereas with glides, body of
the tongue is used and not the tip is raised.
 /w/ - W - Wit
/l/ - L - Let
36
AAZHAGUJAISUDHANRITECE
GLIDES
 To move easily without stopping and without effort or noise
 Glides – like a liquid, is a consonant produced when the tongue
approaches a point of articulation within the mouth but does not
come close enough to obstruct or constrict the flow of air enough to
create turbulence.
 Unlike nasals, the flow of air is not redirected into the nose. Instead,
as with liquids, the air is still allowed to escape via the mouth.
 /r/ - R - Rent
37
AAZHAGUJAISUDHANRITECE
CONSONANTS
 One of the speech sounds or letters of the alphabet that is not a
vowel
 Consonants are pronounced by stopping the air from flowing
easily through the mouth, especially by closing the lips or
touching the teeth with the tongue
 A nasal consonant is one in which air escapes only through the
nose
In English, "m" and "n" are nasal consonants
In hat, H and T are consonants.
m(me), n(no), G(sing)
38
AAZHAGUJAISUDHANRITECE
NASAL CONSONANTS
 Nasals – a nasal is a consonant produced by redirecting out air
through the nose instead of allowing it to escape out of the mouth.
 Nasal consonants are /m/ - EM -bottom /n/ - EN -button, are
produced with glottal excitation and vocal tract totally constricted at
some point along the oral passageway.
 The velum is lowered so that air flows through the nasal tract with
sound being radiated through the nostrils
 /m/ - constriction at the lips
 /n/ - constriction is just behind the teeth
39
AAZHAGUJAISUDHANRITECE
40
AAZHAGUJAISUDHANRITECE
41
AAZHAGUJAISUDHANRITECE
UNVOICED FRICATIVES
 Produced by exciting the vocal tract by a steady air
flow
 Becomes turbulent in the region of a constriction in
the vocal tract
 Location of the constriction determines the fricative
sound
 /f/-Constriction is near the lips
 /θ/- Constriction is near the teeth
 /s/-Constriction is near the middle of the oral tract
 /sh/-Constriction is near the back of the oral tract
 Vocal tract is separated into two cavities by the
source of noise at the constriction 42
AAZHAGUJAISUDHANRITECE
VOICED FRICATIVES
 /v/,/z/ and /zh/ are some of the examples of voiced
fricatives
 The place of constriction for each of the
corresponding phoneme is essentially identical
 Vocal cords will vibrate
 There is only one excitation source i.e glottis
 Eg: vat, assure
43
AAZHAGUJAISUDHANRITECE
STOPS/PLOSIVES
 Produced by completely stopping the air flow
 Airstream cannot escape through the mouth
44
AAZHAGUJAISUDHANRITECE
45
AAZHAGUJAISUDHANRITECE
VOICED STOPS
 These are transient, non continuant sounds
produced by building up pressure behind a total
constriction somewhere in the oral tract and then
suddenly releasing the pressure
 /b/- Constriction is at the lips
 /d/- Constriction is at the back of the teeth
 /g/- Constriction is near the velum
 No sound is radiated from the lips
 Vocal cords will vibrate
 Their properties are highly influenced by the
vowel that follows the stop consonant. 46
AAZHAGUJAISUDHANRITECE
UNVOICED STOPS
 /p/,/t/ and /k/ are some examples
 The vocal cords do not vibrate
47
AAZHAGUJAISUDHANRITECE
48
AAZHAGUJAISUDHANRITECE
WHISPERS
 Vocal cords are not vibrate
 Air passes between the arytenoid cartilages to create
audible turbulence during speech
 To convey secret information without being overheard
or to avoid disturbing others in a quiet place such as
a library or place of worship
49
AAZHAGUJAISUDHANRITECE
50
AAZHAGUJAISUDHANRITECE
APPROACHES TO AUTOMATIC
SPEECH RECOGNITION BY
MACHINE
There are three approaches
The acoustic-phonetic approach
The pattern recognition approach
The artificial intelligence approach
 
51
AAZHAGUJAISUDHANRITECE
ACOUSTIC-PHONETIC APPROACH
 First step: Segmentation and Labelling
 Second step: To determine a valid word
52
AAZHAGUJAISUDHANRITECE
SEGMENTATION AND LABELLING
 Segmenting the speech signal into discrete regions
depending on the acoustic properties of the signal
 Attaching one or more phonetic labels to each
segmented region
 Second step attempts to determine a valid word from
the sequence of phonetic labels.
 The problem is to decode the phoneme lattice in to a
word string such that every instant of time is
included in one of the phonemes in the lattice.
53
AAZHAGUJAISUDHANRITECE
 One phoneme can be pronounced in different ways,
therefore a phone group containing similar variants
of a single phoneme is called an allphone
The symbol SIL -Silence
SIL – AO – L – AX – B – AW – T - “all
about”
Lattice structure refer page. No. 38
L , AX and B corresponding to second and
third choices in the lattice. 54
AAZHAGUJAISUDHANRITECE
PROBLEMS IN ACOUSTIC PHONETIC
APPROACH
 The method requires extensive knowledge of the
acoustic properties of phonetic units
 For most systems the choice of features is based on
intuition and is not optimal in a well defined and
meaningful sense
 The design of sound classifiers is also not optimal
 No well-defined, automatic procedure exists for tuning
the method on real, labeled speech.
55
AAZHAGUJAISUDHANRITECE
PATTERN RECOGNITION APPROACH
 Speech pattern are used directly without explicit
feature determination and segmentation.
 Step one: training of speech patterns
 Step two: recognition of pattern via pattern
comparison
56
AAZHAGUJAISUDHANRITECE
PATTERN RECOGNITION APPROACH
 Speech knowledge is brought in to the system via
the training procedure.
 Enough version of a pattern to be recognized are
included in a training set provided to the algorithm.
 Machine learns which acoustic properties of the
speech class are reliable and repeatable across all
training token of the pattern.
57
AAZHAGUJAISUDHANRITECE
ADVANTAGES
 Simplicity of use – Mathematical representation is
easy
 Robustness and invariance to different speech
vocabularies, users, feature sets , pattern comparison
algorithms and decision rule.
 Proven high performance
58
AAZHAGUJAISUDHANRITECE
ARTIFICIAL INTELLIGENCE
APPROACH
 It is the hybrid of acoustic phonetic and pattern
recognition approach
 This approach recognition procedure according to the
way a person applies its intelligence in visualizing
analyzing and finally making a decision on the
measure acoustic feature
 Neural network - For learning the relationship
between phonetic events and all known inputs as well
as for discrimination between similar sound classes.
59
AAZHAGUJAISUDHANRITECE
ACOUSTIC-PHONETIC APPROACH
60
AAZHAGUJAISUDHANRITECE
SPEECH ANALYSIS SYSTEM
 It provide an appropriate spectral representation
of the time varying speech signal
 Technique used is linear predictive coding (LPC)
method
61
AAZHAGUJAISUDHANRITECE
FEATURE DETECTION STAGE
 Convert the spectral measurements to a set of features
that describe the acoustic properties of the different
phonetic units.
 Features
 Nasality: presence or absence of nasal resonance
 Friction: presence or absence of random excitation in the
speech
 Formant locations: frequencies of the first three resonances
 Voiced and unvoiced classification: periodic and aperiodic
excitation 62
AAZHAGUJAISUDHANRITECE
SEGMENTATION AND LABELLING
PHASE
 The system tries to find stable regions
 To label the segmented region according to how well the
features within that region match those of individual
phonetic units
 This stage is the heart of the acoustic-phonetics
recognizer and is the most difficult one to carry out
reliably
 various control strategies are used to limit the range of
segmentation points and label possibilities
 The final output of the recognizer is the word or word
sequence, in some well-defined sense.
63
AAZHAGUJAISUDHANRITECE
VOWEL CLASSIFIER
 Formant - bands of frequency that determine the
phonetic quality of a vowel.
 Compact sounds have a concentration of energy in the
middle of the frequency range of the spectrum. An
example is the vowel ɑ which has a relatively high
first formant which is close to the frequency of the
second formant. 
 The opposite of compact is diffuse. A diffuse vowel,
such as i, has no centrally located concentration of
energy – the first and second formants are widely
separated.
64
AAZHAGUJAISUDHANRITECE
COMPACT AND DIFFUSE VOWEL
65
AAZHAGUJAISUDHANRITECE
ACUTE AND GRAVE
 "acute" typically refers to front vowels
 Grave typically refers to back vowels
66
AAZHAGUJAISUDHANRITECE
 Three features have been detected over the segment,
first formant, F1, second formant, F2, and duration of
the segment, D.
 The first test separates vowels with low F1 from
vowels with high F1.
 Each of these subsets can be split further on the basis
of F2 measurement with high F2 and low F2.
 The third test is based on segment duration, which
separates tense vowels (large value of D) from lax
vowels (small values of D).
 Finally, a finer test on formant values separates the
remaining unresolved vowels, resolving the vowels
into flat vowels and plain vowels. 67
AAZHAGUJAISUDHANRITECE
SPEECH SOUND CLASSIFIER
68
AAZHAGUJAISUDHANRITECE
STATISTICAL PATTERN-
RECOGNITION APPROACH TO
SPEECH RECOGNITION
69
AAZHAGUJAISUDHANRITECE
 Feature measurement: in which sequence of measurement is made
on the input signal to define “test pattern”.
 The feature measurements are usually the o/p of spectral analysis
technique, such as filter bank analyzer, a LPC, or a DFT analysis
 Pattern training: creates a reference pattern (for different sound
class) called as Template
 Pattern classification: unknown test pattern is compared with each
(sound) class reference pattern and a measure of distance between
the test pattern and each reference pattern is computed.
 Decision logic: the reference pattern similarity (distance) scores are
used to decide which reference pattern best matches the unknown
test pattern.
70
AAZHAGUJAISUDHANRITECE
STRENGTHS AND WEAKNESS OF
THE PATTERN-RECOGNITION
MODEL
 The performance of the system is sensitive to the
amount of training data available for creating sound
class reference pattern.( more training, higher
performance)
 The reference patterns are sensitive to the speaking
environment and transmission characteristics of the
medium used to create the speech. (because speech
spectral characteristics are affected by transmission
and background noise)
71
AAZHAGUJAISUDHANRITECE
 The method is relatively insensitive syntax and
semantics.
 The system is insensitive to the sound class. So
the techniques are applied to wide range of
speech sounds (phrases).
72
AAZHAGUJAISUDHANRITECE
AI APPROACHES TO SPEECH
RECOGNITION
 The basic idea of AI is to compile and incorporate the
knowledge from variety of knowledge sources to solve
the problem.
 Acoustic Knowledge: Knowledge related to sound or
sense of hearing
 Lexical Knowledge: Knowledge of the words of the
language. (decomposing words into sounds)
 Syntactic Knowledge: Knowledge of syntax (rules)
 Semantic Knowledge: Knowledge of the meaning of the
language.
 Pragmatic Knowledge: (sense derived from meaning)
inference ability necessary in resolving ambiguity of
meaning based on ways in which words are generally
used.
73
AAZHAGUJAISUDHANRITECE
SEVERAL WAYS TO INTEGRATE
KNOWLEDGE SOURCES WITHIN A
SPEECH RECOGNIZER
 1. Bottom-Up
 2. Top-Down
 3. Black Board
74
AAZHAGUJAISUDHANRITECE
BOTTOM-UP” APPROACH:
 The lowest level processes (feature detection,
phonetic decoding) precede higher level processes
(lexical coding) in a sequential manner.
75
AAZHAGUJAISUDHANRITECE
TOP-UP” APPROACH:
 In this the language model generate word hypotheses that
are matched against the speech signal, and syntactically
and semantically meaningful sentences are built up on
the basis of word match scores.
76
AAZHAGUJAISUDHANRITECE
77
Signal Processing And Analysis
Methods For Speech
Recognition
A AZHAGUJAISUDHAN RIT
ECE
78
Introduction
• Spectral analysis is the process of
defining the speech in different
parameters for further processing
• Eg short term energy, zero crossing
rates, level crossing rates and so on
• Methods for spectral analysis are
therefore considered as core of the
signal processing front end in a speech
recognition system
A AZHAGUJAISUDHAN RIT
ECE
80
Spectral Analysis models
• Pattern recognition model
• Acoustic phonetic model
A AZHAGUJAISUDHAN RIT
ECE
81
Spectral Analysis Model
Parameter measurement is common in both the systems
A AZHAGUJAISUDHAN RIT
ECE
82
Pattern recognition Model
• The three basic steps in pattern recognition
model are
– 1. parameter measurement
– 2. pattern comparison
– 3. decision making
A AZHAGUJAISUDHAN RIT
ECE
83
1. Parameter measurement
• To represent the relevant acoustic events in
speech signal in terms of compact efficient
set of speech parameters
• The choice of which parameters to use is
dictated by other consideration
• eg
– computational efficiency,
– type of Implementation ,
– available memory
• The way in which representation is computed
is based on signal processing considerations
A AZHAGUJAISUDHAN RIT
ECE
84
Acoustic phonetic Model
A AZHAGUJAISUDHAN RIT
ECE
85
Spectral Analysis
• Two methods:
– The Filter Bank spectrum
– The Linear Predictive coding (LPC)
A AZHAGUJAISUDHAN RIT
ECE
86
The Filter Bank spectrum
Digital i/p
Spectral representation
The band pass filters coverage spans the frequency range of interest in the signal
A AZHAGUJAISUDHAN RIT
ECE
87
1.The Bank of Filters Front end
Processor
• One of the most common approaches
for processing the speech signal is the
bank-of-filters model
• This method takes a speech signal as
input and passes it through a set of
filters in order to obtain the spectral
representation of each frequency band
of interest.
A AZHAGUJAISUDHAN RIT
ECE
88
• Eg
• 100-3000 Hz for telephone quality
signal
• 100-8000 Hz for broadband signal
• The individual filters generally do
overlap in frequency
• The output of the ith bandpass filter
• where Wi is the normalized frequency
A AZHAGUJAISUDHAN RIT
ECE
89
• Each bandpass filter processes the
speech signal independently to produce
the spectral representation Xn
A AZHAGUJAISUDHAN RIT
ECE
90
The Bank of Filters Front end
Processor
A AZHAGUJAISUDHAN RIT
ECE
91
The Bank of Filters Front end
Processor
∑
−
=
−=
≤≤=
1
0
)()(
Qi1,)(*)()(
iM
m
i
ii
mnsmh
nhnsns
The sampled speech signal, s(n), is passed
through a bank of Q Band pass filters,
giving the signals
A AZHAGUJAISUDHAN RIT
ECE
92
The Bank of Filters Front end
Processor
The bank-of-filters approach obtains the
energy value of the speech signal
considering the following steps:
• Signal enhancement and noise
elimination.- To make the speech signal
more evident to the bank of filters.
• Set of bandpass filters.- Separate the
signal in frequency bands. (uniform/non
uniform filters )
A AZHAGUJAISUDHAN RIT
ECE
93
• Nonlinearity.- The filtered signal at
every band is passed through a non
linear function (for example a wave
rectifier full wave or half wave) for
shifting the bandpass spectrum to the
low-frequency band.
A AZHAGUJAISUDHAN RIT
ECE
94
The Bank of Filters Front end
Processor
• Low pass filter.- This filter eliminates
the high-frequency generated by the
non linear function.
• Sampling rate reduction and amplitude
compression.- The resulting signals are
now represented in a more economic way
by re-sampling with a reduced rate and
compressing the signal dynamic range.
The role of the final lowpass filter is to eliminate the undesired spectral peaks
A AZHAGUJAISUDHAN RIT
ECE
95
The Bank of Filters Front end
Processor
)sin()( nns iii ωα=
Assume that the output of the ith
bandpass filter is a pure
sinusoid at frequency ωI
If full wave rectifier is used as the nonlinearity






<
≥+
=
==
<−=
≥=
0(n)sif1-
0(n)sif1
)(
where
)().())((
:outputtynonlineariThe
0(n)sfor)(
0(n)sfor)())((s
i
i
i
ii
nw
nwnsnsfv
ns
nsnf
iii
i
i
A AZHAGUJAISUDHAN RIT
ECE
97
Types of Filter Bank Used For
Speech Recognition
• uniform filter bank
• Non uniform filter bank
A AZHAGUJAISUDHAN RIT
ECE
98
uniform filter bank
• The most common filter bank is the uniform filter
bank
• The center frequency, fi, of the ith
bandpass filter is
defined as
• Q is number of filters used in bank of filters
speech.theofrangefrequencyspan the
torequiredfiltersspaceduniformlyofnumbertheisN
signalspeechtheofratesamplingtheisFswhere
Qi1, ≤≤= i
N
Fs
fi
A AZHAGUJAISUDHAN RIT
ECE
99
uniform filter bank
• The actual number of filters used in the
filter bank
• bi is the bandwidth of the ith
filter
• There should not be any frequency overlap
between adjacent filter channels
2/NQ ≤
A AZHAGUJAISUDHAN RIT
ECE
100
uniform filter bank
If bi < Fs/N, then the certain portions
of the speech spectrum would be
missing from the analysis and the
resulting speech spectrum would not be
considered very meaningful
A AZHAGUJAISUDHAN RIT
ECE
101
nonuniform filter bank
• Alternative to uniform filter bank is
nonuniform filter bank
• The criterion is to space the filters
uniformly along a logarithmic
frequency scale.
• For a set of Q bandpass filters with
center frequncies fi and bandwidths
bi, 1≤i≤Q, we set
A AZHAGUJAISUDHAN RIT
ECE
102
nonuniform filter bank
factorgrowth
clogarithmitheisandfilterfirsttheoffrequency
centertheandbandwidtharbitaryareandCwhere
2
)(
2
1
1
1
1
1
,1
1
α
α
f
bb
bff
Qibb
Cb
i
i
j
ji
ii
−
++=
≤≤=
=
∑
−
=
−
A AZHAGUJAISUDHAN RIT
ECE
103
Implementations of Filter
Banks
• Depending on the method of designing
the filter bank can be implemented in
various ways.
• Design methods for digital filters fall
into two classes:
– Infinite impulse response (IIR)
(recursive filters)
– Finite impulse response
A AZHAGUJAISUDHAN RIT
ECE
104
The FIR filter: (finite impulse response)
or non recursive filter
• The present output is depend on the
present input sample and previous input
samples
• The impulse response is restricted to
finite number of samples
A AZHAGUJAISUDHAN RIT
ECE
105
• Advantages:
– Stable, noise less sever
– Excellent design methods are available for
various kinds of FIR filters
– Phase response is linear
• Disadvantage:
– Costly to implement
– Memory requirement and execution time
are high
– Require powerful computational facilities
A AZHAGUJAISUDHAN RIT
ECE
106
The IIR filter: (Infinite impulse
response) or recursive filter
• The present output sample is depends
on the present input, past input samples
and output samples
• The impulse response extends over an
infinite duration
A AZHAGUJAISUDHAN RIT
ECE
107
• Advantage:
– Simple to design
– Efficient
• Disadvantage:
– Phase response is non linear
– Noise affects more
– Not stable
A AZHAGUJAISUDHAN RIT
ECE
108
FIR Filters
signalinput)(
channelitheofoutputtheis)(
channelitheofresponseimpulsetheis)(
1,2,...Qifor)()(
samplesareLwhere1-Ln0)()()(
th
th
1
0
ns
nx
nh
mnsmh
nhnsnx
i
i
L
m
i
ii
∑
−
=
=−=
≤≤∗=
A AZHAGUJAISUDHAN RIT
ECE
109
FIR Filters
• Less expensive implementation can be
derived by representing each bandpass
filter by a fixed low pass window ω(n)
modulated by the complex exponential
fiwnseS
eSne
emnmse
emnms
mnsemnx
ennh
i
jw
n
jwnjw
mjw
m
njw
mnjw
m
m
njw
i
njw
i
i
ii
ii
i
i
i
Π=
=
−=
−=
−=
=
−
−
∑
∑
∑
2at)(ofansformFourier trtheis)(where
)(
)()(
)()(
)()()(
)()(
)(
ω
ω
ω
ω
A AZHAGUJAISUDHAN RIT
ECE
110
A AZHAGUJAISUDHAN RIT
ECE
111
A AZHAGUJAISUDHAN RIT
ECE
112
Frequency Domain Interpretation
For Short Term Fourier
Transform
mjw
m
jw ii
emnmseSn −
∑ −= )()()( ω
At n=n0
i
jw
mnmsFTeSn i
ωωω =−= |)]()([)( 00
Where FT[.] denotes Fourier Transform
Sn0(ejωi
) is the conventional Fourier transform
of the windowed signal, s(m)w(n0-m), evaluated
at the frequency ω= ωi
A
A AZHAGUJAISUDHAN RIT
ECE
113
Frequency Domain Interpretation
For Short Term Fourier
Transform
Shows which part of s(m) are used in the computation of
the short time Fourier transform
A AZHAGUJAISUDHAN RIT
ECE
114
Frequency Domain Interpretation
For Short Term Fourier
Transform
• Since w(m) is an FIR filter with size L
then from the definition of Sn(ejωi
) we
can state that
– If L is large, relative to the signal
periodicity then Sn(ejωi
) gives good
frequency resolution
– If L is small, relative to the signal
periodicity then Sn(ejωi
) gives poor
frequency resolutionA AZHAGUJAISUDHAN RIT
ECE
115
Frequency Domain Interpretation
For Short Term Fourier
Transform
For L=500 points Hamming
window is applied to a
section of voiced speech.
The periodicity of the signal
is seen in the windowed time
waveform as well as in the
short time spectrum in which
the fundamental frequency
and its harmonics show up as
narrow peaks at equally
spaced frequencies.
A AZHAGUJAISUDHAN RIT
ECE
116
Frequency Domain Interpretation
For Short Term Fourier
Transform
For short windows, the time
sequence s(m)w(n-m) doesn’t
show the signal periodicity, nor
does the signal spectrum.
It shows the broad spectral
envelop very well.
A AZHAGUJAISUDHAN RIT
ECE
117
Frequency Domain Interpretation
For Short Term Fourier
Transform
Shows irregular series of local
peaks and valleys due to the
random nature of the unvoiced
speech
A AZHAGUJAISUDHAN RIT
ECE
118
Frequency Domain Interpretation
For Short Term Fourier
Transform
Using the shorter window
smoothes out the random
fluctuations in the short time
spectral magnitude and
shows the broad spectral
envelope very well
A AZHAGUJAISUDHAN RIT
ECE
119
Linear Filtering Interpretation of
the short-time Fourier
Transform
• The linear filtering interpretation of
the short time Fourier Transform
• i.e Sn(ejwi
) is a convolution of the low
pass window, w(n), with the speech
signal, s(n), modulated to the center
frequency wi
)()()( nenseSn njwjw ii
ωΘ= −
* From A
A AZHAGUJAISUDHAN RIT
ECE
120
A AZHAGUJAISUDHAN RIT
ECE
121
A AZHAGUJAISUDHAN RIT
ECE
122
Summary of considerations for
speech recognition filter banks
1st
. Type of digital filter used (IIR
(recursive) or FIR (nonrecursive))
• IIR: Advantage: simple to implement and
efficient.
Disadvantage: phase response is nonlinear
• FIR: Advantage: phase response is linear
Disadvantage: expensive in implementation
A AZHAGUJAISUDHAN RIT
ECE
123
Summary of considerations for
speech recognition filter banks
2nd
. The number of filters to be used in the
filter bank.
1. For uniform filter banks the number of
filters, Q, can not be too small or else the
ability of the filter bank to resolve the
speech spectrum is greatly damaged. The
value of Q less than 8 are generally avoided
2. The value of Q can not be too large, because
the filter bandwidths would eventually be
too narrow for some talker (eg. High-pitch
females) i.e no prominent harmonics would
fall within the band. (in practical systems
the value of Q≤32).
A AZHAGUJAISUDHAN RIT
ECE
124
Summary of considerations for
speech recognition filter banks
In order to reduce overall computation,
many practical systems have used
nonuniform spaced filter banks
A AZHAGUJAISUDHAN RIT
ECE
125
Summary of considerations for
speech recognition filter banks
3rd
. The choice of nonlinearity and LPF
used at the output of each channel
• Nonlinearity: Full wave or Half wave
rectifier
• LPF: varies from simple integrator to a
good quality IIR lowpass filter.
A AZHAGUJAISUDHAN RIT
ECE
126
AAZHAGUJAISUDHANRITECE
127
AAZHAGUJAISUDHANRITECE
LINEAR PREDICTIVE CODING MODEL
FOR SPEECH RECOGNITION
 LPC provides a good model of the speech signal.
 Voiced region – good approximation
 Unvoiced region - less effective than for voiced region
 LPC is an analytically tractable model. The method of
LPC is mathematically precise and is simple and
straightforward to implement in either software or
hardware.
 Computation required in LPC processing is
considerably less that that required for an all digital
implementation of the bank of filters
 LPC model works well in recognition application. 128
AAZHAGUJAISUDHANRITECE
129
3.3.1 The LPC Model3.3.1 The LPC Model
.
)(
1
1
1
)(
)(
)(
)()()(
,)()()(
),(...)2()1()(
1
1
1
21
zA
za
zGU
zS
zH
zGUzSzazS
nGuinsans
pnsansansans
p
i
i
i
p
i
i
i
p
i
i
p
=
−
==
+=
+−=
−++−+−≈
∑
∑
∑
=
−
=
−
=
Convert this to equality by including an excitation term:
A AZHAGUJAISUDHAN RIT ECE
130
3.3.2 LPC Analysis Equations3.3.2 LPC Analysis Equations
.1
)(
)(
)(
)()()()()(
).()(
).()()(
1
1
~
1
~
1
∑
∑
∑
∑
=
−
=
=
=
−==
−−=−=
−=
+−=
p
k
k
k
p
k
k
p
k
k
p
k
k
za
zS
zE
zA
knsansnsnsne
knsans
nGuknsans
The prediction error:
Error transfer function:
A AZHAGUJAISUDHAN RIT ECE
131
3.3 LINEAR PREDICTIVE CODING3.3 LINEAR PREDICTIVE CODING
MODEL FOR SPEECHMODEL FOR SPEECH
RECOGNITIONRECOGNITION
)(nu
G
)(/1 zA
)(ns
⊗
A AZHAGUJAISUDHAN RIT ECE
132
Linear Prediction Model:Linear Prediction Model:
 Using LP analysis :Using LP analysis :
DT
Impulse
generator
White
Noise
generator
Time varying
Digital
Filter
Voiced
Unvoiced
Pitch
Gain
estimate
V
U
Vocal Tract
Parameters
s(n)
Speech
Signal
A AZHAGUJAISUDHAN RIT ECE
 The basic problem of linear prediction analysis isThe basic problem of linear prediction analysis is
to determine the set of predictor coefficientsto determine the set of predictor coefficients
 Spectral characteristics of speech vary over timeSpectral characteristics of speech vary over time
the predictor coefficients at a given time n mustthe predictor coefficients at a given time n must
be estimated from a short segment of speechbe estimated from a short segment of speech
signalsignal
 Short time spectral analysis is performed onShort time spectral analysis is performed on
successive frames of speech, with frame spacingsuccessive frames of speech, with frame spacing
on the order of 10msec.on the order of 10msec.
133
ka
A AZHAGUJAISUDHAN RIT ECE
134
3.3.2 LPC Analysis Equations3.3.2 LPC Analysis Equations
.)()(
)(
)()(
)()(
2
1
2
∑ ∑
∑






−−=
=
+=
+=
=m
p
k
nknn
m
nn
n
n
kmsamsE
meE
mneme
mnsmS
We seek to minimize the mean squared error signal:
A AZHAGUJAISUDHAN RIT ECE
135
pikiai
kmSimSki
kmSimSamsims
pk
a
E
p
k
nkn
m
nnn
m m
nn
p
k
knn
k
n
,...,2,1),()0,(
)()(),(
)()()()(
,...,2,1,0
1
1
==
−−=
−−=−
==
∂
∂
∑
∑
∑ ∑∑
=
∧
=
∧
φφ
φ
Terms of short-term covariance:
(*)
With this notation, we can write (*) as:
A set of P equations, P unknowns
A AZHAGUJAISUDHAN RIT ECE
136
3.3.2 LPC Analysis Equations3.3.2 LPC Analysis Equations
∑
∑∑∑
=
∧
=
∧∧
−=
−−=
p
k
nkn
m
nn
p
k
k
m
nn
ka
kmsmsamsE
1
1
2
).,0()0,0(
)()()(
φφ
The minimum mean-squared error can be expressed as:
A AZHAGUJAISUDHAN RIT ECE
137
3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method
.
0
1
),()(),(
0
1
),()(),(
)(
.,0
10),().(
)(
)(1
0
1
0
1
0
2
pk
pi
kimsmski
pk
pi
kmsimski
meE
otherwise
Nmmwnms
ms
kiN
m
nnn
pN
m
nnn
pN
m
nn
n
≤≤
≤≤
−+=
≤≤
≤≤
−−=
=


 −≤≤+
=
∑
∑
∑
−−−
=
+−
=
+−
=
φ
φ
w(m): a window zero outside 0≤m≤N-1
The mean squared error is:
And:
A AZHAGUJAISUDHAN RIT ECE
138
3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method
)(),(
:functionationautocorrelsimpletoreducesfunction
covariancethek,-ioffunctionaonlyis),(Since
.
0
1
),()(),(
)(1
0
kirki
ki
pk
pi
kimsmski
nn
n
kin
m
nnn
−=
≤≤
≤≤
−+= ∑
−−−
=
φ
φ
φ
A AZHAGUJAISUDHAN RIT ECE
139
3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method
.
)(
)3(
)2(
)1(
)0(...)3()2()1(
)3(...)0()1()2(
)2(...)1()0()1(
)1(...)2()1()0(
:asformmatrixinexpressedbecanand
1),(|)(|
:)()(i.e.
symmetric,isfunctionationautocorreltheSince
2
1
1












=


























−−−
−
−
−
≤≤=−
=−
∧
∧
∧
=
∧
∑
pr
r
r
r
a
a
a
rprprpr
prrrr
prrrr
prrrr
piirakir
sokrkr
n
n
n
n
pnnnn
nnnn
nnnn
nnnn
p
k
nkn
nn
A AZHAGUJAISUDHAN RIT ECE
140
3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method
A AZHAGUJAISUDHAN RIT ECE
141
3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method
A AZHAGUJAISUDHAN RIT ECE
142
3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method
A AZHAGUJAISUDHAN RIT ECE
143
3.3.4 The Covariance Method3.3.4 The Covariance Method
∑
∑
∑
−−
−=
−
=
−
=
≤≤
≤≤
−+=
≤≤
≤≤
−−=
=
−≤≤
1
1
0
1
0
2
.
0
1
),()(),(
,variablesofchangebyor,
0
1
),()(),(
:asdefined),(with
)(
:directlyspeechunweightedtheuseto
and10error tocomputingofintervalthechange
iN
im
nnn
n
N
m
nn
n
N
m
nn
pk
pi
kimsmski
pk
pi
kmsimski
ki
meE
Nm
φ
φ
φ
A AZHAGUJAISUDHAN RIT ECE
144
3.3.4 The Covariance Method3.3.4 The Covariance Method
.
)0,(
)0,3(
)0,2(
)0,1(
),()3,()2,()1,(
),3()3,3()2,3()1,3(
),2()3,2()2,2()1,2(
),1()3,1()2,1()1,1(
3
2
1
















=


































∧
∧
∧
∧
pa
a
a
a
ppppp
p
p
p
n
n
n
n
p
nnnn
nnnn
nnnn
nnnn
φ
φ
φ
φ
φφφφ
φφφφ
φφφφ
φφφφ






The resulting covariance matrix is symmetric, but not Toeplitz,
and can be solved efficiently by a set of techniques called
Cholesky decomposition
A AZHAGUJAISUDHAN RIT ECE
145
3.3.6 Examples of LPC Analysis3.3.6 Examples of LPC Analysis
A AZHAGUJAISUDHAN RIT ECE
REFERENCESREFERENCES
 TEXTBOOKS:

 1. Lawrence Rabiner and Biing-Hwang Juang, “Fundamentals of Speech Recognition”, Pearson Education,
2003.
 2. Daniel Jurafsky and James H Martin, “Speech and Language Processing – An Introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition”, Pearson Education, 2002.
 3. Frederick Jelinek, “Statistical Methods of Speech Recognition”, MIT Press, 1997.

 REFERENCES:

 1. Steven W. Smith, “The Scientist and Engineer s Guide to Digital Signal Processing”, California‟
Technical Publishing, 1997.
 2. Thomas F Quatieri, “Discrete-Time Speech Signal Processing – Principles and Practice”, Pearson
Education, 2004.
 3. Claudio Becchetti and Lucio Prina Ricotti, “Speech Recognition”, John Wiley and Sons, 1999.
 4. Ben Gold and Nelson Morgan, “Speech and Audio Signal Processing, Processing and Perception of
Speech and Music”, Wiley- India Edition, 2006.

146A AZHAGUJAISUDHAN RIT ECE

More Related Content

What's hot

What's hot (20)

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Final ppt
Final pptFinal ppt
Final ppt
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
Biomedical signal processing
Biomedical signal processingBiomedical signal processing
Biomedical signal processing
 
Speaker Recognition
Speaker RecognitionSpeaker Recognition
Speaker Recognition
 
Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptx
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition System
 

Similar to Unit 1 speech processing

SPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLPSPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLPHimaniBansal15
 
26 Speech Lecture.ppt
26 Speech Lecture.ppt26 Speech Lecture.ppt
26 Speech Lecture.pptssuser99ca78
 
Presentation 2 phonetic in prosthodontic
Presentation 2 phonetic in prosthodonticPresentation 2 phonetic in prosthodontic
Presentation 2 phonetic in prosthodonticPratik Hodar
 
Phonetics/ orthodontic straight wire technique
Phonetics/ orthodontic straight wire techniquePhonetics/ orthodontic straight wire technique
Phonetics/ orthodontic straight wire techniqueIndian dental academy
 
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)Jeff Nelson
 
Neurogenic communication disorders
Neurogenic communication disordersNeurogenic communication disorders
Neurogenic communication disordersAswathi P
 
Talking at the Speed of Neurons
Talking at the Speed of NeuronsTalking at the Speed of Neurons
Talking at the Speed of Neuronsmbadalam
 
Phonetics/cosmetic dentistry courses
Phonetics/cosmetic dentistry coursesPhonetics/cosmetic dentistry courses
Phonetics/cosmetic dentistry coursesIndian dental academy
 
Language and Human's Brain
Language and Human's  Brain Language and Human's  Brain
Language and Human's Brain Irma Fitriani
 
PHYSIOLOGY OF SPEECH & ARTICULATION
PHYSIOLOGY OF SPEECH & ARTICULATION PHYSIOLOGY OF SPEECH & ARTICULATION
PHYSIOLOGY OF SPEECH & ARTICULATION Dr. Aniket Shilwant
 
Speech consideration in complete dentures
Speech consideration in complete denturesSpeech consideration in complete dentures
Speech consideration in complete denturespadmini rani
 
Theories of speech perception.pptx
Theories of speech perception.pptxTheories of speech perception.pptx
Theories of speech perception.pptxsherin444916
 
Neurobiology of everyday life_Nicolle
Neurobiology of everyday life_NicolleNeurobiology of everyday life_Nicolle
Neurobiology of everyday life_NicollejohannaNicolle
 
SPEECH PERCEPTION THEORIES MASLP
SPEECH PERCEPTION THEORIES MASLPSPEECH PERCEPTION THEORIES MASLP
SPEECH PERCEPTION THEORIES MASLPHimaniBansal15
 
speech production in psycholinguistics
speech production in psycholinguistics speech production in psycholinguistics
speech production in psycholinguistics Aseel K. Mahmood
 
The human mind at work
The human mind at workThe human mind at work
The human mind at workFaith Clavaton
 
APHASIA AND DYSARTHRIA last.pptx
APHASIA AND DYSARTHRIA last.pptxAPHASIA AND DYSARTHRIA last.pptx
APHASIA AND DYSARTHRIA last.pptxZelekewoldeyohannes
 

Similar to Unit 1 speech processing (20)

SPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLPSPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLP
 
26 Speech Lecture.ppt
26 Speech Lecture.ppt26 Speech Lecture.ppt
26 Speech Lecture.ppt
 
Presentation 2 phonetic in prosthodontic
Presentation 2 phonetic in prosthodonticPresentation 2 phonetic in prosthodontic
Presentation 2 phonetic in prosthodontic
 
Phonetics/ orthodontic straight wire technique
Phonetics/ orthodontic straight wire techniquePhonetics/ orthodontic straight wire technique
Phonetics/ orthodontic straight wire technique
 
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
 
Neurogenic communication disorders
Neurogenic communication disordersNeurogenic communication disorders
Neurogenic communication disorders
 
Talking at the Speed of Neurons
Talking at the Speed of NeuronsTalking at the Speed of Neurons
Talking at the Speed of Neurons
 
Phonetics/cosmetic dentistry courses
Phonetics/cosmetic dentistry coursesPhonetics/cosmetic dentistry courses
Phonetics/cosmetic dentistry courses
 
Language and Human's Brain
Language and Human's  Brain Language and Human's  Brain
Language and Human's Brain
 
Kalpana phonetics
Kalpana phoneticsKalpana phonetics
Kalpana phonetics
 
PHYSIOLOGY OF SPEECH & ARTICULATION
PHYSIOLOGY OF SPEECH & ARTICULATION PHYSIOLOGY OF SPEECH & ARTICULATION
PHYSIOLOGY OF SPEECH & ARTICULATION
 
Speech consideration in complete dentures
Speech consideration in complete denturesSpeech consideration in complete dentures
Speech consideration in complete dentures
 
Theories of speech perception.pptx
Theories of speech perception.pptxTheories of speech perception.pptx
Theories of speech perception.pptx
 
The Phases of Speech
The Phases of SpeechThe Phases of Speech
The Phases of Speech
 
Neurobiology of everyday life_Nicolle
Neurobiology of everyday life_NicolleNeurobiology of everyday life_Nicolle
Neurobiology of everyday life_Nicolle
 
SPEECH PERCEPTION THEORIES MASLP
SPEECH PERCEPTION THEORIES MASLPSPEECH PERCEPTION THEORIES MASLP
SPEECH PERCEPTION THEORIES MASLP
 
speech production in psycholinguistics
speech production in psycholinguistics speech production in psycholinguistics
speech production in psycholinguistics
 
The human mind at work
The human mind at workThe human mind at work
The human mind at work
 
the sounds of language
the sounds of languagethe sounds of language
the sounds of language
 
APHASIA AND DYSARTHRIA last.pptx
APHASIA AND DYSARTHRIA last.pptxAPHASIA AND DYSARTHRIA last.pptx
APHASIA AND DYSARTHRIA last.pptx
 

Recently uploaded

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 

Recently uploaded (20)

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 

Unit 1 speech processing

  • 2. Course Objectives: 1. To enable the students to learn the fundamentals and classification of speech sounds. 2. To make the students to analyze and compare different speech parameters using various methods. 3. To equip the students with various speech modelling techniques. 4. To enable the students to acquire knowledge on various speech recognition systems. 5. To gain knowledge about the various methods used for the process of speech synthesis. 2 AAZHAGUJAISUDHANRITECE
  • 3. Course Outcomes: After completion of the course, it is expected that: The students will be able to 1. Explain the fundamentals and classification of speech sounds. 2. Analyse, Extract and compare the various speech parameters. 3. Apply an appropriate speech model for a given application. 4. Explain the various speech recognition systems. 5. Apply different speech synthesis techniques depending upon the classification of speech parameters. 3 AAZHAGUJAISUDHANRITECE
  • 4. UNIT I BASIC CONCEPTS Speech Fundamentals: Articulatory Phonetics – Production and Classification of Speech Sounds; Acoustic Phonetics – Acoustics of speech production; Review of Digital Signal Processing concepts; Short-Time Fourier Transform, Filter-Bank and LPC Methods. UNIT II SPEECH ANALYSIS Features, Feature Extraction and Pattern Comparison Techniques: Speech distortion measures– mathematical and perceptual – Log–Spectral Distance, Cepstral Distances, Weighted Cepstral Distances and Filtering, Likelihood Distortions, Spectral Distortion using a Warped Frequency Scale, LPC, PLP and MFCC Coefficients, Time Alignment and Normalization – Dynamic Time Warping, Multiple Time – Alignment Paths. UNIT III SPEECH MODELING Hidden Markov Models: Markov Processes, HMMs – Evaluation, Optimal State Sequence – Viterbi Search, Baum-Welch Parameter Re-estimation, Implementation issues. UNIT IV SPEECH RECOGNITION Large Vocabulary Continuous Speech Recognition: Architecture of a large vocabulary continuous speech recognition system – acoustics and language models – n-grams, context dependent sub-word units; Applications and present status. UNIT V SPEECH SYNTHESIS Text-to-Speech Synthesis: Concatenative and waveform synthesis methods, sub-word units for TTS, intelligibility and naturalness – role of prosody, Applications and present status. 4 AAZHAGUJAISUDHANRITECE
  • 5. INTRODUCTION  Speech processing is the study of speech signals and the processing methods of these signals.  Speech Processing is the application of DSP techniques to the processing and/or analysis of speech signals. 5 AAZHAGUJAISUDHANRITECE
  • 6.  Speech is the most natural form of human- human communications.  Speech is related to language; linguistics is a branch of social science.  Speech is related to human physiological capability; physiology is a branch of medical science.  Speech is also related to sound and acoustics, a branch of physical science.  Therefore, speech is one of the most intriguing signals that humans work with every day. 6 AAZHAGUJAISUDHANRITECE
  • 7. Speech Processing Signal Processing Information Theory Phonetics Acoustics Algorithms (Programming) Fourier transforms Discrete time filters AR(MA) models Entropy Communication theory Rate-distortion theory Statistical SP Stochastic models Psychoacoustics Room acoustics Speech production 7 AAZHAGUJAISUDHANRITECE
  • 8.  Analysis of speech signals:  Fourier analysis; spectrogram  Autocorrelation; pitch estimation  Linear prediction; compression, recognition  Cepstral analysis; pitch estimation, enhancement 8 AAZHAGUJAISUDHANRITECE
  • 9.  Speech coding: Compression of speech signals for telecommunication  Speech recognition: Extracting the linguistic content of the speech signal  Speaker recognition: Recognizing the identity of speakers by their voice  Speech synthesis: Computer generated speech (e.g., from text)  Speech enhancement: Improving intelligibility or perceptual quality of speech signal 9 AAZHAGUJAISUDHANRITECE
  • 10. APPLICATIONS  Translation of spoken language into text by computers  Voice user interfaces such as voice dialing (Call home)  Speech to text processing (Word processors or emails)  Recognizing the speaker 10 AAZHAGUJAISUDHANRITECE
  • 11. APPLICATIONS OF SPEECH PROCESSING  –Human computer interfaces(e.g., speechI/O, affective)  –Telecommunication(e.g., speech enhancement, translation)  –Assistive technologies(e.g., blindness/deafness, language learning)  –Audio mining(e.g., diarization, tagging)  –Security (e.g., biometrics, forensics) 11 AAZHAGUJAISUDHANRITECE
  • 14. Speech Generation •The production process (generation) begins when the talker formulates a message in his mind which he wants to transmit to the listener via speech. •In case of machine •First step: message formation in terms of printed text. •Next step: conversion of the message into a language code. •After the language code is chosen the talker must execute a series of neuromuscular commands to cause the vocal cord to vibrate such that the proper sequence of speech sounds is created. •The neuromuscular commands must simultaneously control the movement of lips, jaw, tongue, and velum. 14 AAZHAGUJAISUDHANRITECE
  • 15. SPEECH PERCEPTION  The speech signal is generated and propagated to the listener, the speech perception (recognition) process begins.  First the listener processes the acoustic signal along the basilar membrane in the inner ear, which provides running spectral analysis of the incoming signal.  A neural transduction process converts the spectral signal into activity signals on the auditory nerve.  Finally the message comprehension (understanding of meaning) is achieved. 15 AAZHAGUJAISUDHANRITECE
  • 16. SOUND PERCEPTION  The audible frequency range for human is approximately 20Hz to 20KHz  The three distinct parts of the ear are outer ear, middle ear and inner ear.  Outer ear:  The perceived sound is sensitive to the pinna’s shape  By changing the pinnas shape the sound quality alters as well as background noise  After passing through ear cannal sound wave strikes the eardrum which is part of middle ear. 16 AAZHAGUJAISUDHANRITECE
  • 19. MIDDLE EAR EAR DRUM  This oscillates with the frequency as that of the sound wave  Movements of this membrane are then transmitted through the system of small bones called as ossicular system  From ossicular system to cochlea.  Inner ear  It consist of two membranes Reissner’s membrane and basilar membrane  When vibrations enter cochlea they stimulate 20,000 to 30,000 stiff hairs on the basilar membrane  These hair in turn vibrate and generate electrical signal that travel to the brain and become sound 19 AAZHAGUJAISUDHANRITECE
  • 20. PHONEME HIERARCHY Speech sounds Vowels ConsonantsDiphtongs Plosive Nasal Fricative Retroflex liquid Lateral liquid Glide iy, ih, ae, aa, ah, ao,ax, eh, er, ow, uh, uw ay, ey, oy, aw w, y p, b, t, d, k, g m, n, ng f, v, th, dh, s, z, sh, zh, h r l Language dependent. About 50 in English. 20 AAZHAGUJAISUDHANRITECE
  • 21. SPEECH WAVEFORM CHARACTERISTICS  Loudness  Voiced/Unvoiced.  Pitch.  Fundamental frequency.  Spectral envelope.  Formants. 21 AAZHAGUJAISUDHANRITECE
  • 24. VOWELS  Vowels are produced by exciting an essentially fixed vocal tract shape with quasi periodic pulses of air caused by the vibration of the vocal cords.  A speech sound produced by humans when the breath flows out through the mouth without being blocked by the teeth, tongue, or lips A short vowel is a short sound as in the word "cup" A long vowel is a long sound as in the word "shoe" 24 AAZHAGUJAISUDHANRITECE
  • 27. WHY VOWELS ARE EASILY DECODABLE?  Vowels are generally long in duration as compared to consonants.  Spectrally well defined  Vowels are easily and reliably recognized by both human and machine.  Vowels can be subdivided into three sub groups based on tongue hump being along the front, central and back part of the palate. 27 AAZHAGUJAISUDHANRITECE
  • 28. VOWELS  For the vowel /i/ - eve, beat- the vocal tract is open at the back, the tongue is raised at the front and there is a high degree of constriction of the tongue against the palate  For the vowel /a/ - father, bob - the vocal tract is open at the front, the tongue is raised at the back and there is a low degree of constriction by the tongue against the palate 28 AAZHAGUJAISUDHANRITECE
  • 29.  i – IY - beat, eve  I – IH – bit  e – EH – bet, hate 29 AAZHAGUJAISUDHANRITECE
  • 30.  a – AA - Bob  - AH- butə 30 AAZHAGUJAISUDHANRITECE
  • 31.  u – UW - boot  U – UH –book  O – OW -boat 31 AAZHAGUJAISUDHANRITECE
  • 33. DIPHTONGS  Diphthongs is a gliding monosyllabic speech sound that starts at or near the articulatory position for one vowel and moves to or toward the position for another.  According to this there are six diphthongs in American english.  Examples : Buy, boy, down, bait 33 AAZHAGUJAISUDHANRITECE
  • 34. DIPHTONGS  A vowel sound in which the tongue changes position to produce the sound of two vowels  A sound formed by the combination of two vowels in a single syllable 34 AAZHAGUJAISUDHANRITECE
  • 35. SEMIVOWELS  Groups of sound consisting of /w/ - W - Wit /l/ - L – Let, /r/ - R – rent is quite difficult to characterize.  These sounds are called semivowels because of their vowel like nature.  It is characterized by a gliding transition in vocal tract area function between adjacent phonemes. 35 AAZHAGUJAISUDHANRITECE
  • 36. LIQUIDS  Liquids is a consonant produced when the tongue approaches a point of articulation within the mouth but does not come close enough to obstruct or constrict the flow of air enough to create turbulence (as with fricatives).  The primary difference between liquids and glides is that with a liquid, the tip of the tongue is used, whereas with glides, body of the tongue is used and not the tip is raised.  /w/ - W - Wit /l/ - L - Let 36 AAZHAGUJAISUDHANRITECE
  • 37. GLIDES  To move easily without stopping and without effort or noise  Glides – like a liquid, is a consonant produced when the tongue approaches a point of articulation within the mouth but does not come close enough to obstruct or constrict the flow of air enough to create turbulence.  Unlike nasals, the flow of air is not redirected into the nose. Instead, as with liquids, the air is still allowed to escape via the mouth.  /r/ - R - Rent 37 AAZHAGUJAISUDHANRITECE
  • 38. CONSONANTS  One of the speech sounds or letters of the alphabet that is not a vowel  Consonants are pronounced by stopping the air from flowing easily through the mouth, especially by closing the lips or touching the teeth with the tongue  A nasal consonant is one in which air escapes only through the nose In English, "m" and "n" are nasal consonants In hat, H and T are consonants. m(me), n(no), G(sing) 38 AAZHAGUJAISUDHANRITECE
  • 39. NASAL CONSONANTS  Nasals – a nasal is a consonant produced by redirecting out air through the nose instead of allowing it to escape out of the mouth.  Nasal consonants are /m/ - EM -bottom /n/ - EN -button, are produced with glottal excitation and vocal tract totally constricted at some point along the oral passageway.  The velum is lowered so that air flows through the nasal tract with sound being radiated through the nostrils  /m/ - constriction at the lips  /n/ - constriction is just behind the teeth 39 AAZHAGUJAISUDHANRITECE
  • 42. UNVOICED FRICATIVES  Produced by exciting the vocal tract by a steady air flow  Becomes turbulent in the region of a constriction in the vocal tract  Location of the constriction determines the fricative sound  /f/-Constriction is near the lips  /θ/- Constriction is near the teeth  /s/-Constriction is near the middle of the oral tract  /sh/-Constriction is near the back of the oral tract  Vocal tract is separated into two cavities by the source of noise at the constriction 42 AAZHAGUJAISUDHANRITECE
  • 43. VOICED FRICATIVES  /v/,/z/ and /zh/ are some of the examples of voiced fricatives  The place of constriction for each of the corresponding phoneme is essentially identical  Vocal cords will vibrate  There is only one excitation source i.e glottis  Eg: vat, assure 43 AAZHAGUJAISUDHANRITECE
  • 44. STOPS/PLOSIVES  Produced by completely stopping the air flow  Airstream cannot escape through the mouth 44 AAZHAGUJAISUDHANRITECE
  • 46. VOICED STOPS  These are transient, non continuant sounds produced by building up pressure behind a total constriction somewhere in the oral tract and then suddenly releasing the pressure  /b/- Constriction is at the lips  /d/- Constriction is at the back of the teeth  /g/- Constriction is near the velum  No sound is radiated from the lips  Vocal cords will vibrate  Their properties are highly influenced by the vowel that follows the stop consonant. 46 AAZHAGUJAISUDHANRITECE
  • 47. UNVOICED STOPS  /p/,/t/ and /k/ are some examples  The vocal cords do not vibrate 47 AAZHAGUJAISUDHANRITECE
  • 49. WHISPERS  Vocal cords are not vibrate  Air passes between the arytenoid cartilages to create audible turbulence during speech  To convey secret information without being overheard or to avoid disturbing others in a quiet place such as a library or place of worship 49 AAZHAGUJAISUDHANRITECE
  • 51. APPROACHES TO AUTOMATIC SPEECH RECOGNITION BY MACHINE There are three approaches The acoustic-phonetic approach The pattern recognition approach The artificial intelligence approach   51 AAZHAGUJAISUDHANRITECE
  • 52. ACOUSTIC-PHONETIC APPROACH  First step: Segmentation and Labelling  Second step: To determine a valid word 52 AAZHAGUJAISUDHANRITECE
  • 53. SEGMENTATION AND LABELLING  Segmenting the speech signal into discrete regions depending on the acoustic properties of the signal  Attaching one or more phonetic labels to each segmented region  Second step attempts to determine a valid word from the sequence of phonetic labels.  The problem is to decode the phoneme lattice in to a word string such that every instant of time is included in one of the phonemes in the lattice. 53 AAZHAGUJAISUDHANRITECE
  • 54.  One phoneme can be pronounced in different ways, therefore a phone group containing similar variants of a single phoneme is called an allphone The symbol SIL -Silence SIL – AO – L – AX – B – AW – T - “all about” Lattice structure refer page. No. 38 L , AX and B corresponding to second and third choices in the lattice. 54 AAZHAGUJAISUDHANRITECE
  • 55. PROBLEMS IN ACOUSTIC PHONETIC APPROACH  The method requires extensive knowledge of the acoustic properties of phonetic units  For most systems the choice of features is based on intuition and is not optimal in a well defined and meaningful sense  The design of sound classifiers is also not optimal  No well-defined, automatic procedure exists for tuning the method on real, labeled speech. 55 AAZHAGUJAISUDHANRITECE
  • 56. PATTERN RECOGNITION APPROACH  Speech pattern are used directly without explicit feature determination and segmentation.  Step one: training of speech patterns  Step two: recognition of pattern via pattern comparison 56 AAZHAGUJAISUDHANRITECE
  • 57. PATTERN RECOGNITION APPROACH  Speech knowledge is brought in to the system via the training procedure.  Enough version of a pattern to be recognized are included in a training set provided to the algorithm.  Machine learns which acoustic properties of the speech class are reliable and repeatable across all training token of the pattern. 57 AAZHAGUJAISUDHANRITECE
  • 58. ADVANTAGES  Simplicity of use – Mathematical representation is easy  Robustness and invariance to different speech vocabularies, users, feature sets , pattern comparison algorithms and decision rule.  Proven high performance 58 AAZHAGUJAISUDHANRITECE
  • 59. ARTIFICIAL INTELLIGENCE APPROACH  It is the hybrid of acoustic phonetic and pattern recognition approach  This approach recognition procedure according to the way a person applies its intelligence in visualizing analyzing and finally making a decision on the measure acoustic feature  Neural network - For learning the relationship between phonetic events and all known inputs as well as for discrimination between similar sound classes. 59 AAZHAGUJAISUDHANRITECE
  • 61. SPEECH ANALYSIS SYSTEM  It provide an appropriate spectral representation of the time varying speech signal  Technique used is linear predictive coding (LPC) method 61 AAZHAGUJAISUDHANRITECE
  • 62. FEATURE DETECTION STAGE  Convert the spectral measurements to a set of features that describe the acoustic properties of the different phonetic units.  Features  Nasality: presence or absence of nasal resonance  Friction: presence or absence of random excitation in the speech  Formant locations: frequencies of the first three resonances  Voiced and unvoiced classification: periodic and aperiodic excitation 62 AAZHAGUJAISUDHANRITECE
  • 63. SEGMENTATION AND LABELLING PHASE  The system tries to find stable regions  To label the segmented region according to how well the features within that region match those of individual phonetic units  This stage is the heart of the acoustic-phonetics recognizer and is the most difficult one to carry out reliably  various control strategies are used to limit the range of segmentation points and label possibilities  The final output of the recognizer is the word or word sequence, in some well-defined sense. 63 AAZHAGUJAISUDHANRITECE
  • 64. VOWEL CLASSIFIER  Formant - bands of frequency that determine the phonetic quality of a vowel.  Compact sounds have a concentration of energy in the middle of the frequency range of the spectrum. An example is the vowel ɑ which has a relatively high first formant which is close to the frequency of the second formant.   The opposite of compact is diffuse. A diffuse vowel, such as i, has no centrally located concentration of energy – the first and second formants are widely separated. 64 AAZHAGUJAISUDHANRITECE
  • 65. COMPACT AND DIFFUSE VOWEL 65 AAZHAGUJAISUDHANRITECE
  • 66. ACUTE AND GRAVE  "acute" typically refers to front vowels  Grave typically refers to back vowels 66 AAZHAGUJAISUDHANRITECE
  • 67.  Three features have been detected over the segment, first formant, F1, second formant, F2, and duration of the segment, D.  The first test separates vowels with low F1 from vowels with high F1.  Each of these subsets can be split further on the basis of F2 measurement with high F2 and low F2.  The third test is based on segment duration, which separates tense vowels (large value of D) from lax vowels (small values of D).  Finally, a finer test on formant values separates the remaining unresolved vowels, resolving the vowels into flat vowels and plain vowels. 67 AAZHAGUJAISUDHANRITECE
  • 69. STATISTICAL PATTERN- RECOGNITION APPROACH TO SPEECH RECOGNITION 69 AAZHAGUJAISUDHANRITECE
  • 70.  Feature measurement: in which sequence of measurement is made on the input signal to define “test pattern”.  The feature measurements are usually the o/p of spectral analysis technique, such as filter bank analyzer, a LPC, or a DFT analysis  Pattern training: creates a reference pattern (for different sound class) called as Template  Pattern classification: unknown test pattern is compared with each (sound) class reference pattern and a measure of distance between the test pattern and each reference pattern is computed.  Decision logic: the reference pattern similarity (distance) scores are used to decide which reference pattern best matches the unknown test pattern. 70 AAZHAGUJAISUDHANRITECE
  • 71. STRENGTHS AND WEAKNESS OF THE PATTERN-RECOGNITION MODEL  The performance of the system is sensitive to the amount of training data available for creating sound class reference pattern.( more training, higher performance)  The reference patterns are sensitive to the speaking environment and transmission characteristics of the medium used to create the speech. (because speech spectral characteristics are affected by transmission and background noise) 71 AAZHAGUJAISUDHANRITECE
  • 72.  The method is relatively insensitive syntax and semantics.  The system is insensitive to the sound class. So the techniques are applied to wide range of speech sounds (phrases). 72 AAZHAGUJAISUDHANRITECE
  • 73. AI APPROACHES TO SPEECH RECOGNITION  The basic idea of AI is to compile and incorporate the knowledge from variety of knowledge sources to solve the problem.  Acoustic Knowledge: Knowledge related to sound or sense of hearing  Lexical Knowledge: Knowledge of the words of the language. (decomposing words into sounds)  Syntactic Knowledge: Knowledge of syntax (rules)  Semantic Knowledge: Knowledge of the meaning of the language.  Pragmatic Knowledge: (sense derived from meaning) inference ability necessary in resolving ambiguity of meaning based on ways in which words are generally used. 73 AAZHAGUJAISUDHANRITECE
  • 74. SEVERAL WAYS TO INTEGRATE KNOWLEDGE SOURCES WITHIN A SPEECH RECOGNIZER  1. Bottom-Up  2. Top-Down  3. Black Board 74 AAZHAGUJAISUDHANRITECE
  • 75. BOTTOM-UP” APPROACH:  The lowest level processes (feature detection, phonetic decoding) precede higher level processes (lexical coding) in a sequential manner. 75 AAZHAGUJAISUDHANRITECE
  • 76. TOP-UP” APPROACH:  In this the language model generate word hypotheses that are matched against the speech signal, and syntactically and semantically meaningful sentences are built up on the basis of word match scores. 76 AAZHAGUJAISUDHANRITECE
  • 77. 77 Signal Processing And Analysis Methods For Speech Recognition A AZHAGUJAISUDHAN RIT ECE
  • 78. 78 Introduction • Spectral analysis is the process of defining the speech in different parameters for further processing • Eg short term energy, zero crossing rates, level crossing rates and so on • Methods for spectral analysis are therefore considered as core of the signal processing front end in a speech recognition system A AZHAGUJAISUDHAN RIT ECE
  • 79. 80 Spectral Analysis models • Pattern recognition model • Acoustic phonetic model A AZHAGUJAISUDHAN RIT ECE
  • 80. 81 Spectral Analysis Model Parameter measurement is common in both the systems A AZHAGUJAISUDHAN RIT ECE
  • 81. 82 Pattern recognition Model • The three basic steps in pattern recognition model are – 1. parameter measurement – 2. pattern comparison – 3. decision making A AZHAGUJAISUDHAN RIT ECE
  • 82. 83 1. Parameter measurement • To represent the relevant acoustic events in speech signal in terms of compact efficient set of speech parameters • The choice of which parameters to use is dictated by other consideration • eg – computational efficiency, – type of Implementation , – available memory • The way in which representation is computed is based on signal processing considerations A AZHAGUJAISUDHAN RIT ECE
  • 83. 84 Acoustic phonetic Model A AZHAGUJAISUDHAN RIT ECE
  • 84. 85 Spectral Analysis • Two methods: – The Filter Bank spectrum – The Linear Predictive coding (LPC) A AZHAGUJAISUDHAN RIT ECE
  • 85. 86 The Filter Bank spectrum Digital i/p Spectral representation The band pass filters coverage spans the frequency range of interest in the signal A AZHAGUJAISUDHAN RIT ECE
  • 86. 87 1.The Bank of Filters Front end Processor • One of the most common approaches for processing the speech signal is the bank-of-filters model • This method takes a speech signal as input and passes it through a set of filters in order to obtain the spectral representation of each frequency band of interest. A AZHAGUJAISUDHAN RIT ECE
  • 87. 88 • Eg • 100-3000 Hz for telephone quality signal • 100-8000 Hz for broadband signal • The individual filters generally do overlap in frequency • The output of the ith bandpass filter • where Wi is the normalized frequency A AZHAGUJAISUDHAN RIT ECE
  • 88. 89 • Each bandpass filter processes the speech signal independently to produce the spectral representation Xn A AZHAGUJAISUDHAN RIT ECE
  • 89. 90 The Bank of Filters Front end Processor A AZHAGUJAISUDHAN RIT ECE
  • 90. 91 The Bank of Filters Front end Processor ∑ − = −= ≤≤= 1 0 )()( Qi1,)(*)()( iM m i ii mnsmh nhnsns The sampled speech signal, s(n), is passed through a bank of Q Band pass filters, giving the signals A AZHAGUJAISUDHAN RIT ECE
  • 91. 92 The Bank of Filters Front end Processor The bank-of-filters approach obtains the energy value of the speech signal considering the following steps: • Signal enhancement and noise elimination.- To make the speech signal more evident to the bank of filters. • Set of bandpass filters.- Separate the signal in frequency bands. (uniform/non uniform filters ) A AZHAGUJAISUDHAN RIT ECE
  • 92. 93 • Nonlinearity.- The filtered signal at every band is passed through a non linear function (for example a wave rectifier full wave or half wave) for shifting the bandpass spectrum to the low-frequency band. A AZHAGUJAISUDHAN RIT ECE
  • 93. 94 The Bank of Filters Front end Processor • Low pass filter.- This filter eliminates the high-frequency generated by the non linear function. • Sampling rate reduction and amplitude compression.- The resulting signals are now represented in a more economic way by re-sampling with a reduced rate and compressing the signal dynamic range. The role of the final lowpass filter is to eliminate the undesired spectral peaks A AZHAGUJAISUDHAN RIT ECE
  • 94. 95 The Bank of Filters Front end Processor )sin()( nns iii ωα= Assume that the output of the ith bandpass filter is a pure sinusoid at frequency ωI If full wave rectifier is used as the nonlinearity       < ≥+ = == <−= ≥= 0(n)sif1- 0(n)sif1 )( where )().())(( :outputtynonlineariThe 0(n)sfor)( 0(n)sfor)())((s i i i ii nw nwnsnsfv ns nsnf iii i i A AZHAGUJAISUDHAN RIT ECE
  • 95. 97 Types of Filter Bank Used For Speech Recognition • uniform filter bank • Non uniform filter bank A AZHAGUJAISUDHAN RIT ECE
  • 96. 98 uniform filter bank • The most common filter bank is the uniform filter bank • The center frequency, fi, of the ith bandpass filter is defined as • Q is number of filters used in bank of filters speech.theofrangefrequencyspan the torequiredfiltersspaceduniformlyofnumbertheisN signalspeechtheofratesamplingtheisFswhere Qi1, ≤≤= i N Fs fi A AZHAGUJAISUDHAN RIT ECE
  • 97. 99 uniform filter bank • The actual number of filters used in the filter bank • bi is the bandwidth of the ith filter • There should not be any frequency overlap between adjacent filter channels 2/NQ ≤ A AZHAGUJAISUDHAN RIT ECE
  • 98. 100 uniform filter bank If bi < Fs/N, then the certain portions of the speech spectrum would be missing from the analysis and the resulting speech spectrum would not be considered very meaningful A AZHAGUJAISUDHAN RIT ECE
  • 99. 101 nonuniform filter bank • Alternative to uniform filter bank is nonuniform filter bank • The criterion is to space the filters uniformly along a logarithmic frequency scale. • For a set of Q bandpass filters with center frequncies fi and bandwidths bi, 1≤i≤Q, we set A AZHAGUJAISUDHAN RIT ECE
  • 101. 103 Implementations of Filter Banks • Depending on the method of designing the filter bank can be implemented in various ways. • Design methods for digital filters fall into two classes: – Infinite impulse response (IIR) (recursive filters) – Finite impulse response A AZHAGUJAISUDHAN RIT ECE
  • 102. 104 The FIR filter: (finite impulse response) or non recursive filter • The present output is depend on the present input sample and previous input samples • The impulse response is restricted to finite number of samples A AZHAGUJAISUDHAN RIT ECE
  • 103. 105 • Advantages: – Stable, noise less sever – Excellent design methods are available for various kinds of FIR filters – Phase response is linear • Disadvantage: – Costly to implement – Memory requirement and execution time are high – Require powerful computational facilities A AZHAGUJAISUDHAN RIT ECE
  • 104. 106 The IIR filter: (Infinite impulse response) or recursive filter • The present output sample is depends on the present input, past input samples and output samples • The impulse response extends over an infinite duration A AZHAGUJAISUDHAN RIT ECE
  • 105. 107 • Advantage: – Simple to design – Efficient • Disadvantage: – Phase response is non linear – Noise affects more – Not stable A AZHAGUJAISUDHAN RIT ECE
  • 107. 109 FIR Filters • Less expensive implementation can be derived by representing each bandpass filter by a fixed low pass window ω(n) modulated by the complex exponential fiwnseS eSne emnmse emnms mnsemnx ennh i jw n jwnjw mjw m njw mnjw m m njw i njw i i ii ii i i i Π= = −= −= −= = − − ∑ ∑ ∑ 2at)(ofansformFourier trtheis)(where )( )()( )()( )()()( )()( )( ω ω ω ω A AZHAGUJAISUDHAN RIT ECE
  • 110. 112 Frequency Domain Interpretation For Short Term Fourier Transform mjw m jw ii emnmseSn − ∑ −= )()()( ω At n=n0 i jw mnmsFTeSn i ωωω =−= |)]()([)( 00 Where FT[.] denotes Fourier Transform Sn0(ejωi ) is the conventional Fourier transform of the windowed signal, s(m)w(n0-m), evaluated at the frequency ω= ωi A A AZHAGUJAISUDHAN RIT ECE
  • 111. 113 Frequency Domain Interpretation For Short Term Fourier Transform Shows which part of s(m) are used in the computation of the short time Fourier transform A AZHAGUJAISUDHAN RIT ECE
  • 112. 114 Frequency Domain Interpretation For Short Term Fourier Transform • Since w(m) is an FIR filter with size L then from the definition of Sn(ejωi ) we can state that – If L is large, relative to the signal periodicity then Sn(ejωi ) gives good frequency resolution – If L is small, relative to the signal periodicity then Sn(ejωi ) gives poor frequency resolutionA AZHAGUJAISUDHAN RIT ECE
  • 113. 115 Frequency Domain Interpretation For Short Term Fourier Transform For L=500 points Hamming window is applied to a section of voiced speech. The periodicity of the signal is seen in the windowed time waveform as well as in the short time spectrum in which the fundamental frequency and its harmonics show up as narrow peaks at equally spaced frequencies. A AZHAGUJAISUDHAN RIT ECE
  • 114. 116 Frequency Domain Interpretation For Short Term Fourier Transform For short windows, the time sequence s(m)w(n-m) doesn’t show the signal periodicity, nor does the signal spectrum. It shows the broad spectral envelop very well. A AZHAGUJAISUDHAN RIT ECE
  • 115. 117 Frequency Domain Interpretation For Short Term Fourier Transform Shows irregular series of local peaks and valleys due to the random nature of the unvoiced speech A AZHAGUJAISUDHAN RIT ECE
  • 116. 118 Frequency Domain Interpretation For Short Term Fourier Transform Using the shorter window smoothes out the random fluctuations in the short time spectral magnitude and shows the broad spectral envelope very well A AZHAGUJAISUDHAN RIT ECE
  • 117. 119 Linear Filtering Interpretation of the short-time Fourier Transform • The linear filtering interpretation of the short time Fourier Transform • i.e Sn(ejwi ) is a convolution of the low pass window, w(n), with the speech signal, s(n), modulated to the center frequency wi )()()( nenseSn njwjw ii ωΘ= − * From A A AZHAGUJAISUDHAN RIT ECE
  • 120. 122 Summary of considerations for speech recognition filter banks 1st . Type of digital filter used (IIR (recursive) or FIR (nonrecursive)) • IIR: Advantage: simple to implement and efficient. Disadvantage: phase response is nonlinear • FIR: Advantage: phase response is linear Disadvantage: expensive in implementation A AZHAGUJAISUDHAN RIT ECE
  • 121. 123 Summary of considerations for speech recognition filter banks 2nd . The number of filters to be used in the filter bank. 1. For uniform filter banks the number of filters, Q, can not be too small or else the ability of the filter bank to resolve the speech spectrum is greatly damaged. The value of Q less than 8 are generally avoided 2. The value of Q can not be too large, because the filter bandwidths would eventually be too narrow for some talker (eg. High-pitch females) i.e no prominent harmonics would fall within the band. (in practical systems the value of Q≤32). A AZHAGUJAISUDHAN RIT ECE
  • 122. 124 Summary of considerations for speech recognition filter banks In order to reduce overall computation, many practical systems have used nonuniform spaced filter banks A AZHAGUJAISUDHAN RIT ECE
  • 123. 125 Summary of considerations for speech recognition filter banks 3rd . The choice of nonlinearity and LPF used at the output of each channel • Nonlinearity: Full wave or Half wave rectifier • LPF: varies from simple integrator to a good quality IIR lowpass filter. A AZHAGUJAISUDHAN RIT ECE
  • 126. LINEAR PREDICTIVE CODING MODEL FOR SPEECH RECOGNITION  LPC provides a good model of the speech signal.  Voiced region – good approximation  Unvoiced region - less effective than for voiced region  LPC is an analytically tractable model. The method of LPC is mathematically precise and is simple and straightforward to implement in either software or hardware.  Computation required in LPC processing is considerably less that that required for an all digital implementation of the bank of filters  LPC model works well in recognition application. 128 AAZHAGUJAISUDHANRITECE
  • 127. 129 3.3.1 The LPC Model3.3.1 The LPC Model . )( 1 1 1 )( )( )( )()()( ,)()()( ),(...)2()1()( 1 1 1 21 zA za zGU zS zH zGUzSzazS nGuinsans pnsansansans p i i i p i i i p i i p = − == += +−= −++−+−≈ ∑ ∑ ∑ = − = − = Convert this to equality by including an excitation term: A AZHAGUJAISUDHAN RIT ECE
  • 128. 130 3.3.2 LPC Analysis Equations3.3.2 LPC Analysis Equations .1 )( )( )( )()()()()( ).()( ).()()( 1 1 ~ 1 ~ 1 ∑ ∑ ∑ ∑ = − = = = −== −−=−= −= +−= p k k k p k k p k k p k k za zS zE zA knsansnsnsne knsans nGuknsans The prediction error: Error transfer function: A AZHAGUJAISUDHAN RIT ECE
  • 129. 131 3.3 LINEAR PREDICTIVE CODING3.3 LINEAR PREDICTIVE CODING MODEL FOR SPEECHMODEL FOR SPEECH RECOGNITIONRECOGNITION )(nu G )(/1 zA )(ns ⊗ A AZHAGUJAISUDHAN RIT ECE
  • 130. 132 Linear Prediction Model:Linear Prediction Model:  Using LP analysis :Using LP analysis : DT Impulse generator White Noise generator Time varying Digital Filter Voiced Unvoiced Pitch Gain estimate V U Vocal Tract Parameters s(n) Speech Signal A AZHAGUJAISUDHAN RIT ECE
  • 131.  The basic problem of linear prediction analysis isThe basic problem of linear prediction analysis is to determine the set of predictor coefficientsto determine the set of predictor coefficients  Spectral characteristics of speech vary over timeSpectral characteristics of speech vary over time the predictor coefficients at a given time n mustthe predictor coefficients at a given time n must be estimated from a short segment of speechbe estimated from a short segment of speech signalsignal  Short time spectral analysis is performed onShort time spectral analysis is performed on successive frames of speech, with frame spacingsuccessive frames of speech, with frame spacing on the order of 10msec.on the order of 10msec. 133 ka A AZHAGUJAISUDHAN RIT ECE
  • 132. 134 3.3.2 LPC Analysis Equations3.3.2 LPC Analysis Equations .)()( )( )()( )()( 2 1 2 ∑ ∑ ∑       −−= = += += =m p k nknn m nn n n kmsamsE meE mneme mnsmS We seek to minimize the mean squared error signal: A AZHAGUJAISUDHAN RIT ECE
  • 133. 135 pikiai kmSimSki kmSimSamsims pk a E p k nkn m nnn m m nn p k knn k n ,...,2,1),()0,( )()(),( )()()()( ,...,2,1,0 1 1 == −−= −−=− == ∂ ∂ ∑ ∑ ∑ ∑∑ = ∧ = ∧ φφ φ Terms of short-term covariance: (*) With this notation, we can write (*) as: A set of P equations, P unknowns A AZHAGUJAISUDHAN RIT ECE
  • 134. 136 3.3.2 LPC Analysis Equations3.3.2 LPC Analysis Equations ∑ ∑∑∑ = ∧ = ∧∧ −= −−= p k nkn m nn p k k m nn ka kmsmsamsE 1 1 2 ).,0()0,0( )()()( φφ The minimum mean-squared error can be expressed as: A AZHAGUJAISUDHAN RIT ECE
  • 135. 137 3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method . 0 1 ),()(),( 0 1 ),()(),( )( .,0 10),().( )( )(1 0 1 0 1 0 2 pk pi kimsmski pk pi kmsimski meE otherwise Nmmwnms ms kiN m nnn pN m nnn pN m nn n ≤≤ ≤≤ −+= ≤≤ ≤≤ −−= =    −≤≤+ = ∑ ∑ ∑ −−− = +− = +− = φ φ w(m): a window zero outside 0≤m≤N-1 The mean squared error is: And: A AZHAGUJAISUDHAN RIT ECE
  • 136. 138 3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method )(),( :functionationautocorrelsimpletoreducesfunction covariancethek,-ioffunctionaonlyis),(Since . 0 1 ),()(),( )(1 0 kirki ki pk pi kimsmski nn n kin m nnn −= ≤≤ ≤≤ −+= ∑ −−− = φ φ φ A AZHAGUJAISUDHAN RIT ECE
  • 137. 139 3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method . )( )3( )2( )1( )0(...)3()2()1( )3(...)0()1()2( )2(...)1()0()1( )1(...)2()1()0( :asformmatrixinexpressedbecanand 1),(|)(| :)()(i.e. symmetric,isfunctionationautocorreltheSince 2 1 1             =                           −−− − − − ≤≤=− =− ∧ ∧ ∧ = ∧ ∑ pr r r r a a a rprprpr prrrr prrrr prrrr piirakir sokrkr n n n n pnnnn nnnn nnnn nnnn p k nkn nn A AZHAGUJAISUDHAN RIT ECE
  • 138. 140 3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method A AZHAGUJAISUDHAN RIT ECE
  • 139. 141 3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method A AZHAGUJAISUDHAN RIT ECE
  • 140. 142 3.3.3 The Autocorrelation Method3.3.3 The Autocorrelation Method A AZHAGUJAISUDHAN RIT ECE
  • 141. 143 3.3.4 The Covariance Method3.3.4 The Covariance Method ∑ ∑ ∑ −− −= − = − = ≤≤ ≤≤ −+= ≤≤ ≤≤ −−= = −≤≤ 1 1 0 1 0 2 . 0 1 ),()(),( ,variablesofchangebyor, 0 1 ),()(),( :asdefined),(with )( :directlyspeechunweightedtheuseto and10error tocomputingofintervalthechange iN im nnn n N m nn n N m nn pk pi kimsmski pk pi kmsimski ki meE Nm φ φ φ A AZHAGUJAISUDHAN RIT ECE
  • 142. 144 3.3.4 The Covariance Method3.3.4 The Covariance Method . )0,( )0,3( )0,2( )0,1( ),()3,()2,()1,( ),3()3,3()2,3()1,3( ),2()3,2()2,2()1,2( ),1()3,1()2,1()1,1( 3 2 1                 =                                   ∧ ∧ ∧ ∧ pa a a a ppppp p p p n n n n p nnnn nnnn nnnn nnnn φ φ φ φ φφφφ φφφφ φφφφ φφφφ       The resulting covariance matrix is symmetric, but not Toeplitz, and can be solved efficiently by a set of techniques called Cholesky decomposition A AZHAGUJAISUDHAN RIT ECE
  • 143. 145 3.3.6 Examples of LPC Analysis3.3.6 Examples of LPC Analysis A AZHAGUJAISUDHAN RIT ECE
  • 144. REFERENCESREFERENCES  TEXTBOOKS:   1. Lawrence Rabiner and Biing-Hwang Juang, “Fundamentals of Speech Recognition”, Pearson Education, 2003.  2. Daniel Jurafsky and James H Martin, “Speech and Language Processing – An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition”, Pearson Education, 2002.  3. Frederick Jelinek, “Statistical Methods of Speech Recognition”, MIT Press, 1997.   REFERENCES:   1. Steven W. Smith, “The Scientist and Engineer s Guide to Digital Signal Processing”, California‟ Technical Publishing, 1997.  2. Thomas F Quatieri, “Discrete-Time Speech Signal Processing – Principles and Practice”, Pearson Education, 2004.  3. Claudio Becchetti and Lucio Prina Ricotti, “Speech Recognition”, John Wiley and Sons, 1999.  4. Ben Gold and Nelson Morgan, “Speech and Audio Signal Processing, Processing and Perception of Speech and Music”, Wiley- India Edition, 2006.  146A AZHAGUJAISUDHAN RIT ECE

Editor's Notes

  1. On board: Presentation of source-filter model.
  2. Here the bandwidth of the all filters is not same. It keeps on increasing logarithmically. For uniform filters the bandwidth that each individual filters spans is the same. So the name uniform.
  3. Recursion:an expression such that each term is generated by repeating a particular mathematical operation
  4. Time-frequency analysis plays a central role in signal analysis. Already long ago it has been recognized that a global Fourier transform of a long time signal is of little practical value to analyze the frequency spectrum of a signal. Transient signals, which are evolving in time in an unpredictable way (like a speech signal or an EEG signal) necessitate the notion of frequency analysis that is local in time. In many applications such as speech processing, we are interested in the frequency content of a signal locally in time. That is, the signal parameters (frequency content etc.) evolve over time. Such signals are called non-stationary. For a non-stationary signal, x(t), the standard Fourier Transform is not useful for analyzing the signal. Information which is localized in time such as spikes and high frequency bursts cannot be easily detected from the Fourier Transform. Time-localization can be achieved by first windowing the signal so as to cut off only a well-localized slice of x(t) and then taking its Fourier Transform. This gives rise to the Short Time Fourier Transform, (STFT) or Windowed Fourier Transform. The magnitude of the STFT is called the spectrogram. By restricting to a discrete range of frequencies and times we can obtain an orthogonal basis of functions.