Speech recognition (dr. m. sabarimalai manikandan)

CN 711 Speech Recognition

Course Instructor: Dr. M. Sabarimalai Manikandan
E-mail: msm.sabari@gmail.com
CN 711: Speech Recognition Course Topics
Course Objectives: B. Introduction to Speech Signals
This course provides an introduction to the field of • Speech production mechanism
digital speech processing and applications. Speech • Types of Sounds, Vowels and consonants
Processing offers a practical and theoretical • Loudness, Sound Pressure
understanding of how human speech can be processed • Nature of speech signal, models of speech production
by computers. It covers speech analysis and synthesis, • Silence, Voiced and Unvoiced Speech
speech features, speech and speaker recognition, speech • Naturalness and Intelligibility
synthesis and applications. The course involves practical • Speech data acquisition system
where the student will build working text-to-speech • Why speech processing
system in his native language, speech recognition • Speech perception model
systems, build their own synthetic voice and build a
complete telephone spoken dialog system.

A. Review some basic DSP concepts

C. Speech Analysis and Synthesis D. Speech Features for Recognition
• Short-time Fourier Analysis, Spectrogram • Temporal and Short-Time Fourier Transform Features
• Autocorrelation and cross-correlation • Teager Energy Based Features, Entropy
• Human speech production model • Cepstral Coefficients
• Temporal and spectral characteristics • Linear Prediction-based Cepstral coefficients (LPCC)
• Linear prediction (LP) filter theory • Mel Frequency Cepstral Coefficients (MFCCs)
• All-pole Filter, Inverse Filtering • AM-FM Features, Time-Frequency Analysis
• Formants and Pitch Determination • Wavelet Octave Coefficients of Residues (WOCR)
• LP Residuals and Hilbert Transform • Voice Activity Detection
• Vocal tract length normalization • Silence, Voiced, and Unvoiced Speech Classification

E. Enhancement
nhancement,
Speech Enhancement, Coding and Quality F. Recognition
Speaker Recognition
Assessment • Basic ASR System
• Acoustic echo cancellation • Close-set and Open-set ASR System
• Reverberant speech enhancement • Speaker Identification and Verification
• Removal of Different Types of noise and artifacts • Text-Independent and Text-Dependent Recognition
• Speech Coding • Mean Normalization, Feature Smoothing
• Subjective and Objective Metrics • Dynamic Time Warping (DTW), Vector Quantization
• Gaussian Mixture Models (GMMs) and Universal
Background Model (UBM)
• Log-Likelihood Ratio (LLR)
• False Acceptance Probability, False Rejection
probability
• Detection Error Trade-off (DET) curve
• Equal Error Rate (EER)
G. Speech Recognition H. Speech Preprocessing Applications
• Signal Processing, Template matching • Voice Conversion, Text-Speech Synthesis
• Phoneme-Recognition • Spoken Dialogue System,
• HMMs, Acoustic Modeling, Language Modeling • Interactive Voice Response (IVR) System
• Continuous and Emotional Speech Recognition • Identify Your ID
• Performance Evaluation

Textbooks and Materials
[1]. Li Tan, Digital Signal Processing: Fundamentals and Applications, Elsevier, 2008.
[2]. Jayant, N.S.; Noll, P. Digital coding of waveforms: principles and applications to speech and video. Englewood
Cliffs, NJ: Prentice Hall, 1984. ISBN 0132119137.
[3]. Rabiner, L.R.; Juang, B. Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall, 1993. ISBN
0130151572.
[4]. L.R. Rabiner and R.E Schafer : Digital processing of speech signals, Prentice Hall, 1978.
[5]. J.L Flanagan : Speech Analysis Synthesis and Perception - 2nd Edition - Sprenger Vertag, 1972.
[6]. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.
[7]. Jurafsky & Martin. Speech and Language Processing: An Introduction to NLP, CL, and Speech Recognition,
Prentice Hall, 2000.
[8]. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, 2001.
[9]. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd edition, IEEE
Press, 2000.
[10]. T. W. Parsons, Voice and Speech Processing, McGraw-Hill, 1987.
[11]. X. Huang, A. Acero, H. Hon, and R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and
System Development, Prentice-Hall, 2001.
[12]. Instructor's Notes

Programming Languages: MATLAB and Jave Media Framework
Languages:

Important Standard Journals in the Field of Audio and Speech Important Conferences in the Field of Audio
Processing and Speech Processing
• IEEE Transactions on Audio, Speech and Language Processing • IEEE Int. Conf. on Acoustics, Speech and
• IEEE Transactions on Signal Processing Signal Processing (ICASSP)
• IEEE Signal Processing Magazine • Eurospeech
• IEEE Transactions on Information Forensics and Security • Int. Conf. on Spoken Language Processing
• ACM Transactions on Speech and Language Processing (ICSLP)
• IEEE Multimedia • Acoustical Society of America
• Speech Communication (by Elsevier)
• IEEE Signal Processing Letters
• Signal Processing (by Elsevier)
• Digital Signal Processing (by Elsevier)
• International Journal of Speech Technology
• International Journal of Speech Technology (by Springer)
• Signal, Image and Video Processing (by Springer)
• Computer Speech and Language
• EURASIP Journal on Audio, Speech, and Music Processing wi)
• Journal of Acoustical Society of America (JASA )
• Audio Engineering Society

Speech recognition (dr. m. sabarimalai manikandan)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Speech recognition (dr. m. sabarimalai manikandan)

Similar to Speech recognition (dr. m. sabarimalai manikandan) (20)

Recently uploaded

Recently uploaded (20)

Speech recognition (dr. m. sabarimalai manikandan)