SlideShare a Scribd company logo
CN 711 Speech Recognition

                        Course Instructor: Dr. M. Sabarimalai Manikandan
                                 E-mail: msm.sabari@gmail.com
CN 711: Speech Recognition Course Topics
Course Objectives:                                          B.   Introduction to Speech Signals
This course provides an introduction to the field of        •    Speech production mechanism
digital speech processing and applications. Speech          •    Types of Sounds, Vowels and consonants
Processing offers a practical and theoretical               •    Loudness, Sound Pressure
understanding of how human speech can be processed          •    Nature of speech signal, models of speech production
by computers. It covers speech analysis and synthesis,      •    Silence, Voiced and Unvoiced Speech
speech features, speech and speaker recognition, speech     •    Naturalness and Intelligibility
synthesis and applications. The course involves practical   •    Speech data acquisition system
where the student will build working text-to-speech         •    Why speech processing
system in his native language, speech recognition           •    Speech perception model
systems, build their own synthetic voice and build a
complete telephone spoken dialog system.

A. Review some basic DSP concepts

C.   Speech Analysis and Synthesis                          D.   Speech Features for Recognition
•    Short-time Fourier Analysis, Spectrogram               •    Temporal and Short-Time Fourier Transform Features
•    Autocorrelation and cross-correlation                  •    Teager Energy Based Features, Entropy
•    Human speech production model                          •    Cepstral Coefficients
•    Temporal and spectral characteristics                  •    Linear Prediction-based Cepstral coefficients (LPCC)
•    Linear prediction (LP) filter theory                   •    Mel Frequency Cepstral Coefficients (MFCCs)
•    All-pole Filter, Inverse Filtering                     •    AM-FM Features, Time-Frequency Analysis
•    Formants and Pitch Determination                       •    Wavelet Octave Coefficients of Residues (WOCR)
•    LP Residuals and Hilbert Transform                     •    Voice Activity Detection
•    Vocal tract length normalization                       •    Silence, Voiced, and Unvoiced Speech Classification

E.             Enhancement
                 nhancement,
     Speech Enhancement, Coding and Quality                 F.         Recognition
                                                               Speaker Recognition
     Assessment                                             •  Basic ASR System
•    Acoustic echo cancellation                             •  Close-set and Open-set ASR System
•    Reverberant speech enhancement                         •  Speaker Identification and Verification
•    Removal of Different Types of noise and artifacts      •  Text-Independent and Text-Dependent Recognition
•    Speech Coding                                          •  Mean Normalization, Feature Smoothing
•    Subjective and Objective Metrics                       •  Dynamic Time Warping (DTW), Vector Quantization
                                                            •  Gaussian Mixture Models (GMMs) and Universal
                                                               Background Model (UBM)
                                                            • Log-Likelihood Ratio (LLR)
                                                            • False Acceptance Probability, False Rejection
                                                               probability
                                                            • Detection Error Trade-off (DET) curve
                                                            • Equal Error Rate (EER)
G.   Speech Recognition                                     H. Speech Preprocessing Applications
•    Signal Processing, Template matching                   • Voice Conversion, Text-Speech Synthesis
•    Phoneme-Recognition                                    • Spoken Dialogue System,
•    HMMs, Acoustic Modeling, Language Modeling             • Interactive Voice Response (IVR) System
•    Continuous and Emotional Speech Recognition            • Identify Your ID
•    Performance Evaluation
Textbooks and Materials
[1].    Li Tan, Digital Signal Processing: Fundamentals and Applications, Elsevier, 2008.
[2].    Jayant, N.S.; Noll, P. Digital coding of waveforms: principles and applications to speech and video. Englewood
        Cliffs, NJ: Prentice Hall, 1984. ISBN 0132119137.
[3].    Rabiner, L.R.; Juang, B. Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall, 1993. ISBN
        0130151572.
[4].    L.R. Rabiner and R.E Schafer : Digital processing of speech signals, Prentice Hall, 1978.
[5].    J.L Flanagan : Speech Analysis Synthesis and Perception - 2nd Edition - Sprenger Vertag, 1972.
[6].    Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.
[7].    Jurafsky & Martin. Speech and Language Processing: An Introduction to NLP, CL, and Speech Recognition,
        Prentice Hall, 2000.
[8].    T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, 2001.
[9].    J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd edition, IEEE
        Press, 2000.
[10].   T. W. Parsons, Voice and Speech Processing, McGraw-Hill, 1987.
[11].   X. Huang, A. Acero, H. Hon, and R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and
        System Development, Prentice-Hall, 2001.
[12]. Instructor's Notes

Programming Languages: MATLAB and Jave Media Framework
            Languages:


Important Standard Journals in the Field of Audio and Speech            Important Conferences in the Field of Audio
Processing                                                              and Speech Processing
• IEEE Transactions on Audio, Speech and Language Processing            • IEEE Int. Conf. on Acoustics, Speech and
• IEEE Transactions on Signal Processing                                    Signal Processing (ICASSP)
• IEEE Signal Processing Magazine                                       • Eurospeech
• IEEE Transactions on Information Forensics and Security               • Int. Conf. on Spoken Language Processing
• ACM Transactions on Speech and Language Processing                        (ICSLP)
• IEEE Multimedia                                                       • Acoustical Society of America
• Speech Communication (by Elsevier)
• IEEE Signal Processing Letters
• Signal Processing (by Elsevier)
• Digital Signal Processing (by Elsevier)
• International Journal of Speech Technology
• International Journal of Speech Technology (by Springer)
• Signal, Image and Video Processing (by Springer)
• Computer Speech and Language
• EURASIP Journal on Audio, Speech, and Music Processing wi)
• Journal of Acoustical Society of America (JASA )
• Audio Engineering Society

More Related Content

What's hot

Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
Guy De Pauw
 
Speech processing
Speech processingSpeech processing
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
sajanazoya
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
RHIMRJ Journal
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
Alexandru Chica
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
ankit_saluja
 
Forensic phonetics[1]
Forensic phonetics[1]Forensic phonetics[1]
Forensic phonetics[1]
PAHELI SHARMA
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Hugo Moreno
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 
Speaker identification based on temporal parameters
Speaker identification based on temporal parametersSpeaker identification based on temporal parameters
Speaker identification based on temporal parameters
Alexandria University
 
K12Translate Webinar Slides: Engaging ELL Parents
K12Translate Webinar Slides: Engaging ELL ParentsK12Translate Webinar Slides: Engaging ELL Parents
K12Translate Webinar Slides: Engaging ELL Parents
VIA
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
Yogesh Vijay
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
Bilgin Aksoy
 
American Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech RecognitionAmerican Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech Recognition
paperpublications3
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
Aamir-sheriff
 
Voice input and speech recognition system in tourism/social media
Voice input and speech recognition system in tourism/social mediaVoice input and speech recognition system in tourism/social media
Voice input and speech recognition system in tourism/social media
cidroypaes
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
sonukumar142
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
Kalluri Madhuri
 
Via language webinar_tips_to_streamline_and_save_on_healthcare_translations
Via language webinar_tips_to_streamline_and_save_on_healthcare_translationsVia language webinar_tips_to_streamline_and_save_on_healthcare_translations
Via language webinar_tips_to_streamline_and_save_on_healthcare_translations
VIA
 
Ece speech-recognition-report
Ece speech-recognition-reportEce speech-recognition-report
Ece speech-recognition-report
Anakali Mahesh
 

What's hot (20)

Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu ...
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Forensic phonetics[1]
Forensic phonetics[1]Forensic phonetics[1]
Forensic phonetics[1]
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Speaker identification based on temporal parameters
Speaker identification based on temporal parametersSpeaker identification based on temporal parameters
Speaker identification based on temporal parameters
 
K12Translate Webinar Slides: Engaging ELL Parents
K12Translate Webinar Slides: Engaging ELL ParentsK12Translate Webinar Slides: Engaging ELL Parents
K12Translate Webinar Slides: Engaging ELL Parents
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
American Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech RecognitionAmerican Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Voice input and speech recognition system in tourism/social media
Voice input and speech recognition system in tourism/social mediaVoice input and speech recognition system in tourism/social media
Voice input and speech recognition system in tourism/social media
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Via language webinar_tips_to_streamline_and_save_on_healthcare_translations
Via language webinar_tips_to_streamline_and_save_on_healthcare_translationsVia language webinar_tips_to_streamline_and_save_on_healthcare_translations
Via language webinar_tips_to_streamline_and_save_on_healthcare_translations
 
Ece speech-recognition-report
Ece speech-recognition-reportEce speech-recognition-report
Ece speech-recognition-report
 

Similar to Speech recognition (dr. m. sabarimalai manikandan)

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Zachary S. Brown
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
International Islamic University
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
JyothiMedisetty2
 
Amadou
AmadouAmadou
Amadou
Ben Hayoun
 
CEC356 SPEECH PROCESSING.pptx
CEC356 SPEECH PROCESSING.pptxCEC356 SPEECH PROCESSING.pptx
CEC356 SPEECH PROCESSING.pptx
Ravi554618
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
VishnuRajuV
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
Ram Yadav
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
SrijanKumar18
 
An HLT profile of the official South African languages
An HLT profile of the official South African languagesAn HLT profile of the official South African languages
An HLT profile of the official South African languages
Guy De Pauw
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
Goa App
 
WiHear - We Can Hear You with Wi-Fi!
WiHear - We Can Hear You with Wi-Fi!WiHear - We Can Hear You with Wi-Fi!
WiHear - We Can Hear You with Wi-Fi!
Pop Trinh
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
avinash raibole
 
Asr
AsrAsr
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
Hira Shaukat
 
CoLing 2016
CoLing 2016CoLing 2016
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
IJCSEA Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
IJCSEA Journal
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
IJCSEA Journal
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
diegogee
 
Assign
AssignAssign

Similar to Speech recognition (dr. m. sabarimalai manikandan) (20)

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
Amadou
AmadouAmadou
Amadou
 
CEC356 SPEECH PROCESSING.pptx
CEC356 SPEECH PROCESSING.pptxCEC356 SPEECH PROCESSING.pptx
CEC356 SPEECH PROCESSING.pptx
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
An HLT profile of the official South African languages
An HLT profile of the official South African languagesAn HLT profile of the official South African languages
An HLT profile of the official South African languages
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
WiHear - We Can Hear You with Wi-Fi!
WiHear - We Can Hear You with Wi-Fi!WiHear - We Can Hear You with Wi-Fi!
WiHear - We Can Hear You with Wi-Fi!
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Asr
AsrAsr
Asr
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
CoLing 2016
CoLing 2016CoLing 2016
CoLing 2016
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
 
Assign
AssignAssign
Assign
 

Recently uploaded

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
Payaamvohra1
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
EduSkills OECD
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
ImMuslim
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
Celine George
 

Recently uploaded (20)

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
 

Speech recognition (dr. m. sabarimalai manikandan)

  • 1. CN 711 Speech Recognition Course Instructor: Dr. M. Sabarimalai Manikandan E-mail: msm.sabari@gmail.com CN 711: Speech Recognition Course Topics Course Objectives: B. Introduction to Speech Signals This course provides an introduction to the field of • Speech production mechanism digital speech processing and applications. Speech • Types of Sounds, Vowels and consonants Processing offers a practical and theoretical • Loudness, Sound Pressure understanding of how human speech can be processed • Nature of speech signal, models of speech production by computers. It covers speech analysis and synthesis, • Silence, Voiced and Unvoiced Speech speech features, speech and speaker recognition, speech • Naturalness and Intelligibility synthesis and applications. The course involves practical • Speech data acquisition system where the student will build working text-to-speech • Why speech processing system in his native language, speech recognition • Speech perception model systems, build their own synthetic voice and build a complete telephone spoken dialog system. A. Review some basic DSP concepts C. Speech Analysis and Synthesis D. Speech Features for Recognition • Short-time Fourier Analysis, Spectrogram • Temporal and Short-Time Fourier Transform Features • Autocorrelation and cross-correlation • Teager Energy Based Features, Entropy • Human speech production model • Cepstral Coefficients • Temporal and spectral characteristics • Linear Prediction-based Cepstral coefficients (LPCC) • Linear prediction (LP) filter theory • Mel Frequency Cepstral Coefficients (MFCCs) • All-pole Filter, Inverse Filtering • AM-FM Features, Time-Frequency Analysis • Formants and Pitch Determination • Wavelet Octave Coefficients of Residues (WOCR) • LP Residuals and Hilbert Transform • Voice Activity Detection • Vocal tract length normalization • Silence, Voiced, and Unvoiced Speech Classification E. Enhancement nhancement, Speech Enhancement, Coding and Quality F. Recognition Speaker Recognition Assessment • Basic ASR System • Acoustic echo cancellation • Close-set and Open-set ASR System • Reverberant speech enhancement • Speaker Identification and Verification • Removal of Different Types of noise and artifacts • Text-Independent and Text-Dependent Recognition • Speech Coding • Mean Normalization, Feature Smoothing • Subjective and Objective Metrics • Dynamic Time Warping (DTW), Vector Quantization • Gaussian Mixture Models (GMMs) and Universal Background Model (UBM) • Log-Likelihood Ratio (LLR) • False Acceptance Probability, False Rejection probability • Detection Error Trade-off (DET) curve • Equal Error Rate (EER) G. Speech Recognition H. Speech Preprocessing Applications • Signal Processing, Template matching • Voice Conversion, Text-Speech Synthesis • Phoneme-Recognition • Spoken Dialogue System, • HMMs, Acoustic Modeling, Language Modeling • Interactive Voice Response (IVR) System • Continuous and Emotional Speech Recognition • Identify Your ID • Performance Evaluation
  • 2. Textbooks and Materials [1]. Li Tan, Digital Signal Processing: Fundamentals and Applications, Elsevier, 2008. [2]. Jayant, N.S.; Noll, P. Digital coding of waveforms: principles and applications to speech and video. Englewood Cliffs, NJ: Prentice Hall, 1984. ISBN 0132119137. [3]. Rabiner, L.R.; Juang, B. Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall, 1993. ISBN 0130151572. [4]. L.R. Rabiner and R.E Schafer : Digital processing of speech signals, Prentice Hall, 1978. [5]. J.L Flanagan : Speech Analysis Synthesis and Perception - 2nd Edition - Sprenger Vertag, 1972. [6]. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997. [7]. Jurafsky & Martin. Speech and Language Processing: An Introduction to NLP, CL, and Speech Recognition, Prentice Hall, 2000. [8]. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, 2001. [9]. J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals, 2nd edition, IEEE Press, 2000. [10]. T. W. Parsons, Voice and Speech Processing, McGraw-Hill, 1987. [11]. X. Huang, A. Acero, H. Hon, and R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice-Hall, 2001. [12]. Instructor's Notes Programming Languages: MATLAB and Jave Media Framework Languages: Important Standard Journals in the Field of Audio and Speech Important Conferences in the Field of Audio Processing and Speech Processing • IEEE Transactions on Audio, Speech and Language Processing • IEEE Int. Conf. on Acoustics, Speech and • IEEE Transactions on Signal Processing Signal Processing (ICASSP) • IEEE Signal Processing Magazine • Eurospeech • IEEE Transactions on Information Forensics and Security • Int. Conf. on Spoken Language Processing • ACM Transactions on Speech and Language Processing (ICSLP) • IEEE Multimedia • Acoustical Society of America • Speech Communication (by Elsevier) • IEEE Signal Processing Letters • Signal Processing (by Elsevier) • Digital Signal Processing (by Elsevier) • International Journal of Speech Technology • International Journal of Speech Technology (by Springer) • Signal, Image and Video Processing (by Springer) • Computer Speech and Language • EURASIP Journal on Audio, Speech, and Music Processing wi) • Journal of Acoustical Society of America (JASA ) • Audio Engineering Society