SlideShare a Scribd company logo
Research Issues in Speech Processing




                    Dr. M. Sabarimalai Manikandan
                        msm.sabari@gmail.com
Speech Production: the source-filter model
Speech signal conveys the information contained in the spoken word
         highly non-stationary signal
         Short segments of speech (20 to 30 ms )
         acoustical energy is in the frequency range of 100-6000 Hz




        Vocal tract transfer function can be modeled by an all-pole filter
Speech Processing Tasks


Speech recognition (recognizing lexical content)
Speech synthesis (Text-to speech)
Speaker recognition (recognizing who is speaking)
Speech understanding and vocal dialog
Speech coding (data rate deduction)
Speech enhancement (Noise reduction)
Speech transmission (noise free communication)
Voice conversion
Speech Processing
Speech measurements
       Short-time energy (STE)
       Zero crossing rate (ZCR)
       Autocorrelation (AC)
       Pitch period or frequency
       Formants

Speech signal components
       Speech-Silence or Non-speech
       Voiced speech-Unvoiced speech
Speech Processing
Speech representations or models
       Temporal features
          •   Low energy rate
          •   Zero crossing rate (ZCR)
          •   4Hz modulation energy
          •   Pitch contour

       Spectral features
           •    Spectral Centroid (sharpness)
           •    Spectral Flux (rate of change)
           •    Spectral Roll-Off (spectral shape)
           •    Spectral Flatness (deviation of the spectral form)
       Linear Predictive Coefficients (LPC)
       Cepstral coefficients
       Mel Frequency Cepstral Coefficients (MFCC): human auditory system
       Harmonic features: sinusoidal harmonic modelling
       Perceptual features: model of the human hearing process
       First order derivative (DELTA)
Elements of the speech signal
Phonemes: the smallest units of speech sounds
       Vowels and Consonants
       ~12 to 21 different vowel sounds used in the English language

       Consonants involve rapid and sometimes subtle changes in sound
              according to the manner of articulation:
                   •    plosive (p, b, t, etc.)
                   •    fricative (f, s, sh, etc.)
                   •    nasal (m, n, ng)
                   •    liquid (r, l) and
                   •    semivowel (w, y)

       Consonants are more independent of language than vowels are.

Syllable: one or more phonemes

Word: one or more syllables
Automatic Speech Recognition
There are two uses for speech recognition systems:

    Dictation: translation of the spoken word into written text
    Computer Control: control of the computer, and software
    applications by speaking commands

    Speaker dependent system: to operate for a single speaker
    Speaker independent system: to operate for any speaker
    of a particular type
    Speaker adaptive system: to adapt its operation to the
    characteristics of new speakers

    The size of vocabulary affects the complexity, processing
    requirements and the accuracy of the system
Speech Recognition: Applications

Automatic translation
Vehicle navigation systems
Human computer Interaction
Content-based spoken audio search
Home automation
Pronunciation evaluation
Robotics
Video games
Transcription of speech into mobile text messages
People with disabilities
Speech Recognition System

Sampling of speech

Acoustic signal processing:
   •     Linear Prediction Cepstral Coefficients (LPCC)
   •     Mel Frequency Cepstral Coefficients (MFCC)
   •     Perceptual Linear Prediction Cepstral Coefficients (PLPCC)

Recognition of phonemes, groups of phonemes and words:
   •    Dynamic Time Warping (DTW)
   •    hidden Markov models (HMMs)
   •    Gaussian mixture models (GMMs)
   •    Neural Networks (NNs)
   •    Expert systems and combinations of techniques
Automatic Speaker Recognition
Speaker recognition: the process of automatically recognizing who is
speaking by using the speaker-specific information included in speech
sounds

Speaker identity: physiological and behavioral characteristics of the speech
production model of an individual speaker
         the spectral envelope (vocal tract characteristics)
         the supra-segmental features (voice source characteristics) of
         speech

Applications:
    •    banking over a telephone network
    •    telephone shopping and database access services
    •    voice dialing and mail
    •     information and reservation services
    •    security control for confidential information
    •    forensics and surveillance applications
Speaker Recognition
Speaker identification: the process of determining which registered speaker
provides input speech sounds

                                  Similarity



                               Ref. template or
                              model (speaker #1)


                                   Similarity                     Identification
  Input       Feature                              Maximum
 speech      Extraction                                               result
                                                   selection
                                                                   (Speaker ID)
                               Ref. template or
                              model (speaker #2)



                                   Similarity



                               Ref. template or
                              model (speaker #N)
Speaker Recognition
Speaker verification: the process of accepting or rejecting the
identity claim of a speaker.
     Input        Feature                                   Verification
    speech       Extraction    Similarity     Decision         result
                                                          (Accept /Reject)


                              Ref. template   Threshold
                Input           or model
               speech         (speaker #M)




         Open Set and Closed Set Recognition

         Text-dependent and Text-independent Recognition
                 •   Vector quantization
                 •   Gaussian mixture models (GMM)
                 •   Dynamic time warping (DTW)
                 •   Hidden Markov model (HMM)
Text-to-Speech (TTS) System
    Synthesis of Speech for effective human machine communications
                     reading email messages
                     call center help desks and customer care
                     announcement machines



Raw or            Text             Phonetic          Prosodic        Speech            Synthetic
tagged text      Analysis          Analysis          Analysis       Synthesis          Speech

                    Document
                                      Homograph
                    Structure                           Pitch        Voice Rendering
                                    disambiguation
                    Detection


                                    Grapheme-to-
                       Text
                                      Phoneme          Duration
                   Normalization
                                     Conversion



                     Linguistic
                      Analysis




              Synthetic speech should be intelligible and natural
Speech Synthesis

Text-to-speech (TTS) synthesis systems
       Approach
       TTS system performance measure
          • Synthetic Speech Intelligibility
          • Synthetic speech naturalness

Speech Intelligibility Tests
      Segmental level analysis
          • the Rhyme Test
          • the Modified Rhyme Test
          • the Diagnostic Rhyme Test
      Supra-segmental analysis
          • the Harvard Psychoacoustic Sentences (HPS)
          • the Haskins syntactic sentences
Speech Coding (Compression)
Speech Coding for efficient transmission and storage of speech
           narrowband and broadband wired telephony
           cellular communications
           Voice over IP (VoIP) to utilize the Internet
           Telephone answering machines
           IVR systems
           Prerecorded messages
Speech-Assisted Translation Corrector System

 Objective: Develop a speech-assisted translation corrector (SATC)
 system which provides a grammatically correct sentence for a
 translated sentence from the machine translation
                              translated sentence                               grammatically
input                                 with                                      correct sentence
sentence       Multilingual   grammatical errors        Speech assisted
                Machine                               translation corrector
               Translation                                   system               text




He          came     here                                           speech               storage
                                                    Translator
                                                    speech signal is produced from the
                                                    words in the translated sentence.



“A MT system is correct and complete if it can analyze of the grammatical structures
encountered in the source language, and it can generate all of the grammatical structures
necessary in the target language translation.”
8/25/2011                                                                                    16
SATC System: Requirements and Challenging Tasks

   Creation of large scale rich multilingual speech databases is crucial
 task for research and development in language and speech technology

            Indian languages
            speakers (10 Males and 10 Females)
            age groups ( <20, 15-40, >40)
            audio format: 16-bit stereo, and sampling rate of 44.1 kHz
            annotation and assessment of speech databases


   Development of multilingual text to speech interface

   Development of spoken word matching module

   Development of speech signal processing (SSP) tools



8/25/2011                                                                17
Major Problems in Speech Processing
Acoustic variability: the same phonemes pronounced in
different contexts will have different acoustic realization
(coarticulation effect)

The signal is different when speech is uttered in various
environments:
       noise
       reverberation
       different types of microphones.

Speaking variability: when the same speaker speaks normally,
shouts, whispers, uses a creaky voice, or has a cold

Speaker variability: since different speakers have different
timbers and different speaking habits
Major Problems in Speech Processing
Linguistic variability: the same sentence can be pronounced
in many different ways, using many different words,
synonyms, and many different syntactic structures and
prosodic schemes

Phonetic variability: due to the different possible
pronunciations of the same words by speakers having
different regional accents

Lombard effect: noise modifies the utterance of the words (as
people tend to speak louder)
Major Problems in Speech Processing
Continuous speech:
   words are connected together (not separated by pauses or
   silences).

   It is difficult to find the start and end points of words

   The production of each phoneme is affected by the
   production of surrounding phonemes

   The start and end of words are affected by the preceding
   and following words

   the rate of speech (fast speech tends to be harder)
References

M. Honda, NTT CS Laboratories, Speech synthesis technology based on speech production mechanism, How to
observe and mimic speech production by human, Journal of the Acoustical Society of Japan, Vol. 55, No. 11, pp.
777-782, 1999

S. Saito and K. Nakata, Fundamentals of Speech Signal Processing, 1981

M. Honda, H. Gomi, T. Ito and A. Fujino, NTT CS Laboratories, Mechanism of articulatory cooperated movements
in speech production, Proceedings of Autumn Meeting of the Acoustical Society of Japan, Vol. 1, pp. 283-286,
2001

T. Kaburagi and M. Honda, NTT CS Laboratories “A model of articulator trajectory formation based on the motor
tasks of vocal-tract shapes,” J. Acoust. Soc. Am. Vol. 99, pp. 3154-3170, 1996.

S. Suzuki, T. Okadome and M. Honda, NTT CS Laboratories, “Determination of articulatory positions from speech
acoustics by applying dynamic articulatory constraints,” Proc. ICSLP98, pp. 2251-2254, 1998.

Benoit, C. and Grice, M. The SUS test: a method for the assessment of text-to-speech intelligibility using
Semantically Unpredictable Sentences, Speech Communication, vol. 18, pp. 381-392.

More Related Content

What's hot

Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
himanshubhatti
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
Ilhaan Marwat
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
Shradheshwar Verma
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
Shruti Bhatnagar Dasgupta
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionHugo Moreno
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
Srishti Kakade
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
Richie
 
Adaptive filters
Adaptive filtersAdaptive filters
Adaptive filters
Mustafa Khaleel
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Ahmed Moawad
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
Seminar Links
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
Angelo Salatino
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
Ankit Gujrati
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
Vijay Kumar
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
Kamonasish Hore
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speech
Rudra Prasad Maiti
 
Orthogonal Frequency Division Multiplexing (OFDM)
Orthogonal Frequency Division Multiplexing (OFDM)Orthogonal Frequency Division Multiplexing (OFDM)
Orthogonal Frequency Division Multiplexing (OFDM)
Gagan Randhawa
 
Applications of digital signal processing
Applications of digital signal processing Applications of digital signal processing
Applications of digital signal processing Rajeev Piyare
 

What's hot (20)

Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Adaptive filters
Adaptive filtersAdaptive filters
Adaptive filters
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speech
 
Orthogonal Frequency Division Multiplexing (OFDM)
Orthogonal Frequency Division Multiplexing (OFDM)Orthogonal Frequency Division Multiplexing (OFDM)
Orthogonal Frequency Division Multiplexing (OFDM)
 
Applications of digital signal processing
Applications of digital signal processing Applications of digital signal processing
Applications of digital signal processing
 

Viewers also liked

Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
Lizy Abraham
 
Essential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic OrganizerEssential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic Organizersheilacook
 
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeer
Ranbeer Tyagi
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speech
Raghu Veer
 
Radio communication presentation
Radio communication presentationRadio communication presentation
Radio communication presentation
randan88
 
Radio Presentation
Radio PresentationRadio Presentation
Radio Presentation
Theyagarajan Sundaramoorthy
 
Radio Communication
Radio CommunicationRadio Communication
Radio Communication
John Grace
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processing
sandhya jois
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSING
Snehal Hedau
 
Gsm.....ppt
Gsm.....pptGsm.....ppt
Gsm.....ppt
balu008
 

Viewers also liked (10)

Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Essential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic OrganizerEssential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic Organizer
 
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeer
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speech
 
Radio communication presentation
Radio communication presentationRadio communication presentation
Radio communication presentation
 
Radio Presentation
Radio PresentationRadio Presentation
Radio Presentation
 
Radio Communication
Radio CommunicationRadio Communication
Radio Communication
 
presentation on digital signal processing
presentation on digital signal processingpresentation on digital signal processing
presentation on digital signal processing
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSING
 
Gsm.....ppt
Gsm.....pptGsm.....ppt
Gsm.....ppt
 

Similar to Speech processing

Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overview
amr0mt
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
sonukumar142
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Zachary S. Brown
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
SrijanKumar18
 
lec26_audio.pptx
lec26_audio.pptxlec26_audio.pptx
lec26_audio.pptx
Karimdabbabi
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
Bilgin Aksoy
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
Goa App
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
narasimhalakkakula
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Assign
AssignAssign
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
csandit
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
Charu Joshi
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo TechnologyDaniel Ischenko
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognitionsunnysyed
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
JyothiMedisetty2
 

Similar to Speech processing (20)

Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Speech Technology Overview
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overview
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
lec26_audio.pptx
lec26_audio.pptxlec26_audio.pptx
lec26_audio.pptx
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
Assign
AssignAssign
Assign
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 

Recently uploaded

Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 

Speech processing

  • 1. Research Issues in Speech Processing Dr. M. Sabarimalai Manikandan msm.sabari@gmail.com
  • 2. Speech Production: the source-filter model Speech signal conveys the information contained in the spoken word highly non-stationary signal Short segments of speech (20 to 30 ms ) acoustical energy is in the frequency range of 100-6000 Hz Vocal tract transfer function can be modeled by an all-pole filter
  • 3. Speech Processing Tasks Speech recognition (recognizing lexical content) Speech synthesis (Text-to speech) Speaker recognition (recognizing who is speaking) Speech understanding and vocal dialog Speech coding (data rate deduction) Speech enhancement (Noise reduction) Speech transmission (noise free communication) Voice conversion
  • 4. Speech Processing Speech measurements Short-time energy (STE) Zero crossing rate (ZCR) Autocorrelation (AC) Pitch period or frequency Formants Speech signal components Speech-Silence or Non-speech Voiced speech-Unvoiced speech
  • 5. Speech Processing Speech representations or models Temporal features • Low energy rate • Zero crossing rate (ZCR) • 4Hz modulation energy • Pitch contour Spectral features • Spectral Centroid (sharpness) • Spectral Flux (rate of change) • Spectral Roll-Off (spectral shape) • Spectral Flatness (deviation of the spectral form) Linear Predictive Coefficients (LPC) Cepstral coefficients Mel Frequency Cepstral Coefficients (MFCC): human auditory system Harmonic features: sinusoidal harmonic modelling Perceptual features: model of the human hearing process First order derivative (DELTA)
  • 6. Elements of the speech signal Phonemes: the smallest units of speech sounds Vowels and Consonants ~12 to 21 different vowel sounds used in the English language Consonants involve rapid and sometimes subtle changes in sound according to the manner of articulation: • plosive (p, b, t, etc.) • fricative (f, s, sh, etc.) • nasal (m, n, ng) • liquid (r, l) and • semivowel (w, y) Consonants are more independent of language than vowels are. Syllable: one or more phonemes Word: one or more syllables
  • 7. Automatic Speech Recognition There are two uses for speech recognition systems: Dictation: translation of the spoken word into written text Computer Control: control of the computer, and software applications by speaking commands Speaker dependent system: to operate for a single speaker Speaker independent system: to operate for any speaker of a particular type Speaker adaptive system: to adapt its operation to the characteristics of new speakers The size of vocabulary affects the complexity, processing requirements and the accuracy of the system
  • 8. Speech Recognition: Applications Automatic translation Vehicle navigation systems Human computer Interaction Content-based spoken audio search Home automation Pronunciation evaluation Robotics Video games Transcription of speech into mobile text messages People with disabilities
  • 9. Speech Recognition System Sampling of speech Acoustic signal processing: • Linear Prediction Cepstral Coefficients (LPCC) • Mel Frequency Cepstral Coefficients (MFCC) • Perceptual Linear Prediction Cepstral Coefficients (PLPCC) Recognition of phonemes, groups of phonemes and words: • Dynamic Time Warping (DTW) • hidden Markov models (HMMs) • Gaussian mixture models (GMMs) • Neural Networks (NNs) • Expert systems and combinations of techniques
  • 10. Automatic Speaker Recognition Speaker recognition: the process of automatically recognizing who is speaking by using the speaker-specific information included in speech sounds Speaker identity: physiological and behavioral characteristics of the speech production model of an individual speaker the spectral envelope (vocal tract characteristics) the supra-segmental features (voice source characteristics) of speech Applications: • banking over a telephone network • telephone shopping and database access services • voice dialing and mail • information and reservation services • security control for confidential information • forensics and surveillance applications
  • 11. Speaker Recognition Speaker identification: the process of determining which registered speaker provides input speech sounds Similarity Ref. template or model (speaker #1) Similarity Identification Input Feature Maximum speech Extraction result selection (Speaker ID) Ref. template or model (speaker #2) Similarity Ref. template or model (speaker #N)
  • 12. Speaker Recognition Speaker verification: the process of accepting or rejecting the identity claim of a speaker. Input Feature Verification speech Extraction Similarity Decision result (Accept /Reject) Ref. template Threshold Input or model speech (speaker #M) Open Set and Closed Set Recognition Text-dependent and Text-independent Recognition • Vector quantization • Gaussian mixture models (GMM) • Dynamic time warping (DTW) • Hidden Markov model (HMM)
  • 13. Text-to-Speech (TTS) System Synthesis of Speech for effective human machine communications reading email messages call center help desks and customer care announcement machines Raw or Text Phonetic Prosodic Speech Synthetic tagged text Analysis Analysis Analysis Synthesis Speech Document Homograph Structure Pitch Voice Rendering disambiguation Detection Grapheme-to- Text Phoneme Duration Normalization Conversion Linguistic Analysis Synthetic speech should be intelligible and natural
  • 14. Speech Synthesis Text-to-speech (TTS) synthesis systems Approach TTS system performance measure • Synthetic Speech Intelligibility • Synthetic speech naturalness Speech Intelligibility Tests Segmental level analysis • the Rhyme Test • the Modified Rhyme Test • the Diagnostic Rhyme Test Supra-segmental analysis • the Harvard Psychoacoustic Sentences (HPS) • the Haskins syntactic sentences
  • 15. Speech Coding (Compression) Speech Coding for efficient transmission and storage of speech narrowband and broadband wired telephony cellular communications Voice over IP (VoIP) to utilize the Internet Telephone answering machines IVR systems Prerecorded messages
  • 16. Speech-Assisted Translation Corrector System Objective: Develop a speech-assisted translation corrector (SATC) system which provides a grammatically correct sentence for a translated sentence from the machine translation translated sentence grammatically input with correct sentence sentence Multilingual grammatical errors Speech assisted Machine translation corrector Translation system text He came here speech storage Translator speech signal is produced from the words in the translated sentence. “A MT system is correct and complete if it can analyze of the grammatical structures encountered in the source language, and it can generate all of the grammatical structures necessary in the target language translation.” 8/25/2011 16
  • 17. SATC System: Requirements and Challenging Tasks Creation of large scale rich multilingual speech databases is crucial task for research and development in language and speech technology Indian languages speakers (10 Males and 10 Females) age groups ( <20, 15-40, >40) audio format: 16-bit stereo, and sampling rate of 44.1 kHz annotation and assessment of speech databases Development of multilingual text to speech interface Development of spoken word matching module Development of speech signal processing (SSP) tools 8/25/2011 17
  • 18. Major Problems in Speech Processing Acoustic variability: the same phonemes pronounced in different contexts will have different acoustic realization (coarticulation effect) The signal is different when speech is uttered in various environments: noise reverberation different types of microphones. Speaking variability: when the same speaker speaks normally, shouts, whispers, uses a creaky voice, or has a cold Speaker variability: since different speakers have different timbers and different speaking habits
  • 19. Major Problems in Speech Processing Linguistic variability: the same sentence can be pronounced in many different ways, using many different words, synonyms, and many different syntactic structures and prosodic schemes Phonetic variability: due to the different possible pronunciations of the same words by speakers having different regional accents Lombard effect: noise modifies the utterance of the words (as people tend to speak louder)
  • 20. Major Problems in Speech Processing Continuous speech: words are connected together (not separated by pauses or silences). It is difficult to find the start and end points of words The production of each phoneme is affected by the production of surrounding phonemes The start and end of words are affected by the preceding and following words the rate of speech (fast speech tends to be harder)
  • 21. References M. Honda, NTT CS Laboratories, Speech synthesis technology based on speech production mechanism, How to observe and mimic speech production by human, Journal of the Acoustical Society of Japan, Vol. 55, No. 11, pp. 777-782, 1999 S. Saito and K. Nakata, Fundamentals of Speech Signal Processing, 1981 M. Honda, H. Gomi, T. Ito and A. Fujino, NTT CS Laboratories, Mechanism of articulatory cooperated movements in speech production, Proceedings of Autumn Meeting of the Acoustical Society of Japan, Vol. 1, pp. 283-286, 2001 T. Kaburagi and M. Honda, NTT CS Laboratories “A model of articulator trajectory formation based on the motor tasks of vocal-tract shapes,” J. Acoust. Soc. Am. Vol. 99, pp. 3154-3170, 1996. S. Suzuki, T. Okadome and M. Honda, NTT CS Laboratories, “Determination of articulatory positions from speech acoustics by applying dynamic articulatory constraints,” Proc. ICSLP98, pp. 2251-2254, 1998. Benoit, C. and Grice, M. The SUS test: a method for the assessment of text-to-speech intelligibility using Semantically Unpredictable Sentences, Speech Communication, vol. 18, pp. 381-392.