MARGINALIZATION (Different learners in Marginalized Group
Speech recognition (dr. m. sabarimalai manikandan)
1. CN 711 Speech Recognition
Course Instructor: Dr. M. Sabarimalai Manikandan
E-mail: msm.sabari@gmail.com
CN 711: Speech Recognition Course Topics
Course Objectives: B. Introduction to Speech Signals
This course provides an introduction to the field of • Speech production mechanism
digital speech processing and applications. Speech • Types of Sounds, Vowels and consonants
Processing offers a practical and theoretical • Loudness, Sound Pressure
understanding of how human speech can be processed • Nature of speech signal, models of speech production
by computers. It covers speech analysis and synthesis, • Silence, Voiced and Unvoiced Speech
speech features, speech and speaker recognition, speech • Naturalness and Intelligibility
synthesis and applications. The course involves practical • Speech data acquisition system
where the student will build working text-to-speech • Why speech processing
system in his native language, speech recognition • Speech perception model
systems, build their own synthetic voice and build a
complete telephone spoken dialog system.
A. Review some basic DSP concepts
C. Speech Analysis and Synthesis D. Speech Features for Recognition
• Short-time Fourier Analysis, Spectrogram • Temporal and Short-Time Fourier Transform Features
• Autocorrelation and cross-correlation • Teager Energy Based Features, Entropy
• Human speech production model • Cepstral Coefficients
• Temporal and spectral characteristics • Linear Prediction-based Cepstral coefficients (LPCC)
• Linear prediction (LP) filter theory • Mel Frequency Cepstral Coefficients (MFCCs)
• All-pole Filter, Inverse Filtering • AM-FM Features, Time-Frequency Analysis
• Formants and Pitch Determination • Wavelet Octave Coefficients of Residues (WOCR)
• LP Residuals and Hilbert Transform • Voice Activity Detection
• Vocal tract length normalization • Silence, Voiced, and Unvoiced Speech Classification
E. Enhancement
nhancement,
Speech Enhancement, Coding and Quality F. Recognition
Speaker Recognition
Assessment • Basic ASR System
• Acoustic echo cancellation • Close-set and Open-set ASR System
• Reverberant speech enhancement • Speaker Identification and Verification
• Removal of Different Types of noise and artifacts • Text-Independent and Text-Dependent Recognition
• Speech Coding • Mean Normalization, Feature Smoothing
• Subjective and Objective Metrics • Dynamic Time Warping (DTW), Vector Quantization
• Gaussian Mixture Models (GMMs) and Universal
Background Model (UBM)
• Log-Likelihood Ratio (LLR)
• False Acceptance Probability, False Rejection
probability
• Detection Error Trade-off (DET) curve
• Equal Error Rate (EER)
G. Speech Recognition H. Speech Preprocessing Applications
• Signal Processing, Template matching • Voice Conversion, Text-Speech Synthesis
• Phoneme-Recognition • Spoken Dialogue System,
• HMMs, Acoustic Modeling, Language Modeling • Interactive Voice Response (IVR) System
• Continuous and Emotional Speech Recognition • Identify Your ID
• Performance Evaluation
2. Textbooks and Materials
[1]. Li Tan, Digital Signal Processing: Fundamentals and Applications, Elsevier, 2008.
[2]. Jayant, N.S.; Noll, P. Digital coding of waveforms: principles and applications to speech and video. Englewood
Cliffs, NJ: Prentice Hall, 1984. ISBN 0132119137.
[3]. Rabiner, L.R.; Juang, B. Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall, 1993. ISBN
0130151572.
[4]. L.R. Rabiner and R.E Schafer : Digital processing of speech signals, Prentice Hall, 1978.
[5]. J.L Flanagan : Speech Analysis Synthesis and Perception - 2nd Edition - Sprenger Vertag, 1972.
[6]. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.
[7]. Jurafsky & Martin. Speech and Language Processing: An Introduction to NLP, CL, and Speech Recognition,
Prentice Hall, 2000.
[8]. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, 2001.
[9]. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd edition, IEEE
Press, 2000.
[10]. T. W. Parsons, Voice and Speech Processing, McGraw-Hill, 1987.
[11]. X. Huang, A. Acero, H. Hon, and R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and
System Development, Prentice-Hall, 2001.
[12]. Instructor's Notes
Programming Languages: MATLAB and Jave Media Framework
Languages:
Important Standard Journals in the Field of Audio and Speech Important Conferences in the Field of Audio
Processing and Speech Processing
• IEEE Transactions on Audio, Speech and Language Processing • IEEE Int. Conf. on Acoustics, Speech and
• IEEE Transactions on Signal Processing Signal Processing (ICASSP)
• IEEE Signal Processing Magazine • Eurospeech
• IEEE Transactions on Information Forensics and Security • Int. Conf. on Spoken Language Processing
• ACM Transactions on Speech and Language Processing (ICSLP)
• IEEE Multimedia • Acoustical Society of America
• Speech Communication (by Elsevier)
• IEEE Signal Processing Letters
• Signal Processing (by Elsevier)
• Digital Signal Processing (by Elsevier)
• International Journal of Speech Technology
• International Journal of Speech Technology (by Springer)
• Signal, Image and Video Processing (by Springer)
• Computer Speech and Language
• EURASIP Journal on Audio, Speech, and Music Processing wi)
• Journal of Acoustical Society of America (JASA )
• Audio Engineering Society