1. VOICE RECOGNITION SYSTEM
Shailendra Singh Tiwari SGSITS ENGINEERING COLLEGE
Computer Science Engineering INDORE (M.P.)
singhtiwari.shailendra@gmail.com
AB-24108 Introduction
Abstract
R.N.:O801CS123D13 Speech recognition system performs two
fundamental operations: signal modeling and
Automatic speech recognition (ASR) has made
SGSITS pattern matching Signal modeling represents
great strides with the development of digital process of converting speech signal into a set of
signal processing hardware and software. But parameters. Pattern matching is the task of finding
despite of all these advances, machines cannot parameter set from memory which closely
match the performance of their human matches the parameter set obtained from the
counterparts in terms of accuracy and speed, input speech signal.
specially in case of speaker independent speech
recognition. So today significant portion of Signal Modeling
speech recognition research is focussed on
speaker independent speech recognition problem. To obtain the perceptually meaningful parameters
The reasons are its wide range of applications, i.e. parameters which are analogous to those used
and limitations of available techniques of speech behuman auditory system. To obtain the invariant
recognition. In this report we briefly discuss the parameters i.e. parameters which are robust to
signal modeling approach for speech recognition. variations in channel, speaker and transducer. To
It is followed by overview of basic operations obtain parameters that capture spectral dynamics,
involved in signal modeling. Further commonly or changes of spectrum with time. The signal
used temporal and spectral analysis techniques of modeling involves basic operationsSpectral
feature extraction are discussed in detail. shapin
spectral shapingis the process of converting the Analysis techniques for feature extraction have
speech signal from sound pressure wave to a digital been studied in detail and following conclusions
signal; and emphasizing important frequency are drawn
components in the signal.
Temporal analysis techniques involve less
Feature extraction computation, ease of implementation. But they
are limited to determination simple speech
Feature extraction is process of obtaining different parameters like power, energy and periodicity of
features such as power, pitch, and vocal tract speech. For finding vocal tract parameters we
configuration from the speech signal. Parameter require spectral analysis techniques. Critical
transformation is the process of converting
these features into signal parameters through band filter bank decomposes the speech signal
process of differentiation and concatenation. into discrete set of spectral samples containing
Statistical modeling involves conversion of information, which is similar to information,
parameters in signal observation vectors. presented to higher levels processing in auditory
system. Cepstral analysis separates the speech
Parametric transformation signal into component representing excitation
source and a component representing vocal tract
impulse response.
Feature Extraction
So it provides information about pitch and
In speaker independent speech recogniton, a vocal tract configuration. But it is computationally
premium is placed on extracting features that more intensive. Mel cepstral analysis has
are somewhat invariant to changes in the speaker. decorrelating property of cepstral analysis and
So feture extraction involves analysis of speech also includes some aspects of audition. LPC
siganl. Broadly the feature extraction techniques analysis provides compact representation of vocal
are classified as temporal analysis and spectral tract configuration by relatively simple
analysis technique. In temporal analysis the computation compared tocepstral analysis. To
speech waveform itself is used for analysis. In minimize analysis complexity it assumes all
spectral analysis spectral representation of speech pole model for speech production system. But
signal is used for analysis. speech has zeros due to nasals so in these cases the
result are not as good as in case of vowels but still
Conclusions reasonably acceptable if order of model is
sufficiently high.
The basic operations in speech recognition system
have been discussed briefly. Different temporal
and spectral
2. Acknowlegdement References
I wish to express my sincere gratitude to Prof. L. R. Rabiner and R. W. Schafer, Digital
puja gupta for her constant guidance throughout Processing of Speech Signals. Englewood Cliffs,
the course of the computer workshop and many New Jersey:
useful discussions which enabled me to know the
subtleties of the subject in proper way. Prentice-Hall, 1978. D.O. Shaughnessy, Speech
Communication: Human and Machine.
India:University Press ,2001.