SlideShare a Scribd company logo
1 of 24
Speech Recognition Challenges




                    Presenter: Alexandru Chica
Contents

 Speech User Interface basic concepts

    •Speech recognition

    •Speech synthesis

 Speech Recognition Challenges

    •Accuracy

    •User responsiveness

    •Performance

    •Reliability

    •Fault tolerance
Speech User Interface basic concepts

 Speech Recognition

    •The translation of spoken text into written text

                 algorithm

                                   "#'spit&S#"                   "speech"

             •Statistical Processing
                                       Phonetic representation
            •Hidden Marcov Models
                                             of speech
            •Dynamic Time Warping




  Types of speech recognition:
     •Command and control
     •Dictation
Speech User Interface basic concepts

 Speech Recognition Components

    •Audio input (front-end)
    •Grammars – contain commands that can be spoken by the user
    •Acoustic models – language dependant, used to “define” the language features
    •Recognition algorithms (back-end)


                                                     Back end



                    feature extraction                                        result
    Audio input /                        Acoustic               Recognition
                                                    Grammars
                                         models                 algorithms
     Front end
Speech User Interface basic concepts

 Speech Recognition APIs




                Microsoft SAPI    IBM: Embedded ViaVoice




                 Nuance: VoCon   VoiceBox Speech Recognition
Speech User Interface basic concepts

 Speech Synthesis

    •The translation of written text into spoken text


                          g2p
     "speech"                      "#'spit&S#"
Speech User Interface basic concepts

 Speech Synthesis APIs




       Microsoft SAPI       SoftVoice TTS   Apple PlainTalk




        Nuance: Vocalizer     SVOX TTS       eSpeak
Speech User Interface basic concepts - Usage

 In car:
     •Control media player / radio stations

     •Control navigation

     •Control phone book and phone activities

     •Find POI locations (POI : point of interests)

     •E-mail/SMS reading

 On the web:
     •HTML 5 speech input

     •Google Search with voice input

     •Reading of web page content
Speech Recognition Challenges – Accuracy

 Audio Input

 Problem: Audio signal quality
 Impact: loss of recognition accuracy

 Solution 1: Echo cancellation

 Solution 2: Beamforming
Speech Recognition Challenges – Accuracy

 Audio Input

 Problem: Talk-over problem
 Impact: loss of recognition accuracy

 Solution: Barge-In

                         TTS




                                           User
Speech Recognition Challenges – User responsiveness

 Speech Recognition

 Problem: resources are not ready and user starts to speak the command
 Solution: Delayed speech recognition




           Resource loading /                Back-end processing
          Front-end processing


                                 Delayed Speech Recognition
Speech Recognition Challenges – User responsiveness

 Speech Recognition

 Problem: synchronization with multiple applications (media, phone, navigation)

 Solution: apply concurrent design patterns

     •Active Object


     •Monitor


     •Double-checked locking
Speech Recognition Challenges – Performance

 Grammars

 Use cases:

 • Command & Control grammars
     • 200 – 500 commands

 •Navigation grammars
    • 100k+ static data

 •Music grammars
    • 10k+ dynamic data
Speech Recognition Challenges – Performance

 Grammars (1)

 Problem: Grammar size too big
 Impact:
    • increased loading times of files from disk to memory

 Solution: Grammar optimization
     •merging of similar command tokens
Speech Recognition Challenges – Performance

 Grammars (2)

 •removal / replacement of recursion rules
Speech Recognition Challenges – Performance

 Grammars (3)

 Problem: Grammar token collisions
 Impact:
     • loss of recognition accuracy
 Solution:
     •replacement of collision prone tokens with synonyms
     •adding special pronunciation tokens to collision words


 Examples:

     sum – sun – sung

     bet – bed
Speech Recognition Challenges – Performance

 Dynamic Grammars

 Problem: synchronization with USB devices, phones, navigation databases takes
 too much time

 Solution 1: implementation of a caching mechanism
Speech Recognition Challenges – Performance
                      Use id3 parser to read from mp3 files
                                                                    Title: One
                        titles, artists, composers, genre, album.
                                                                    Artist: U2,
                        etc.                                        Album: Achtung Baby,
                                                                    Genre: rock

                                                                    ...



                                                    Phoneme
                                                      cache




         dynamic                                                          transcriptions
         grammar                               add to slot:
                                          title <DYN_TITLE>
                                        artist <DYN_ARTIST>
Speech Recognition Challenges – Performance

 Dynamic Grammars

 Solution 2: split the processing in two, and dispatch part of the work to a different
 processor
                           Use id3 parser to read from mp3 files         CPU1
                                                                                Title: One
                             titles, artists, composers, genre, album.          Artist: U2,
                             etc.                                               Album: Achtung Baby,
                                                                                Genre: rock

                                                                                ...

                  CPU2
                                                                         CPU1

        dynamic                     CPU2
                                                                                Preprocessing step
        grammar                                   add to slot:
                                             title <DYN_TITLE>
                                           artist <DYN_ARTIST>
Speech Recognition Challenges – Reliability

 Reliability - the ability of the system to keep operating over time

 Problem: system has to operate correctly over large periods of time

 Solution 1: automated tests

 Solution 2: drive tests
Speech Recognition Challenges – Fault tolerance

 Problem: Recovery from system failures must be possible

 Solution:

 • system is modeled in a modular manner, with components that
   communicate via internal car area network.

 • individual components can be restarted without affecting other system
   components
Speech Recognition Challenges




                    TTS & ASR Demo
Speech Recognition Challenges




                       Questions ?
Speech Recognition Challenges




                        Thank You

More Related Content

What's hot

Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
Diptimaya Sarangi
 

What's hot (20)

[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
Artificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language ProcessingArtificial Intelligence: Natural Language Processing
Artificial Intelligence: Natural Language Processing
 
NLP
NLPNLP
NLP
 
SPEECH BASED EMOTION RECOGNITION USING VOICE
SPEECH BASED  EMOTION RECOGNITION USING VOICESPEECH BASED  EMOTION RECOGNITION USING VOICE
SPEECH BASED EMOTION RECOGNITION USING VOICE
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
Artificial intelligence for speech recognition
Artificial intelligence for speech recognitionArtificial intelligence for speech recognition
Artificial intelligence for speech recognition
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Types of Compilers
Types of CompilersTypes of Compilers
Types of Compilers
 
Universal turing coastus
Universal turing coastusUniversal turing coastus
Universal turing coastus
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
NLP with Deep Learning
NLP with Deep LearningNLP with Deep Learning
NLP with Deep Learning
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
POST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMPOST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEM
 

Viewers also liked

Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
Amrita More
 
Good presentation!
Good presentation!Good presentation!
Good presentation!
Arry Arman
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentation
Neetu Jain
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
子毅 楊
 

Viewers also liked (20)

Uses of speech recognition system
Uses of speech recognition systemUses of speech recognition system
Uses of speech recognition system
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubert
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
fundamentals of speech recognition
fundamentals of speech recognitionfundamentals of speech recognition
fundamentals of speech recognition
 
Good presentation!
Good presentation!Good presentation!
Good presentation!
 
IT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & MultimediaIT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & Multimedia
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentation
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentation
 
Developing with Speech and Voice Recognition in Mobile Apps
Developing with Speech and Voice Recognition in Mobile AppsDeveloping with Speech and Voice Recognition in Mobile Apps
Developing with Speech and Voice Recognition in Mobile Apps
 
Biometrics Authentication Technology by Sayak Das
Biometrics Authentication Technology by Sayak DasBiometrics Authentication Technology by Sayak Das
Biometrics Authentication Technology by Sayak Das
 
E-Business: Chapter 1: Intro to E-B
E-Business: Chapter 1: Intro to E-BE-Business: Chapter 1: Intro to E-B
E-Business: Chapter 1: Intro to E-B
 

Similar to Speech recognition challenges

Speech recognition1
Speech recognition1Speech recognition1
Speech recognition1
Sai Kiran
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
Luke Summers
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
Daniel Ischenko
 
Media as Levers (pdf)
Media as Levers (pdf)Media as Levers (pdf)
Media as Levers (pdf)
Lawrie Hunter
 
Sound recording glossary improved
Sound recording glossary improvedSound recording glossary improved
Sound recording glossary improved
davidhall1415
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
Samiul Parag
 

Similar to Speech recognition challenges (20)

Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
 
Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaoke
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
DSL Construction rith Ruby
DSL Construction rith RubyDSL Construction rith Ruby
DSL Construction rith Ruby
 
Speech recognition1
Speech recognition1Speech recognition1
Speech recognition1
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music RecognitionScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition system
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
 
Media as Levers (pdf)
Media as Levers (pdf)Media as Levers (pdf)
Media as Levers (pdf)
 
Sound recording glossary improved
Sound recording glossary improvedSound recording glossary improved
Sound recording glossary improved
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
IG2 Task 1
IG2 Task 1 IG2 Task 1
IG2 Task 1
 
Ig2task1worksheet
Ig2task1worksheetIg2task1worksheet
Ig2task1worksheet
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 

Speech recognition challenges

  • 1. Speech Recognition Challenges Presenter: Alexandru Chica
  • 2. Contents Speech User Interface basic concepts •Speech recognition •Speech synthesis Speech Recognition Challenges •Accuracy •User responsiveness •Performance •Reliability •Fault tolerance
  • 3. Speech User Interface basic concepts Speech Recognition •The translation of spoken text into written text algorithm "#'spit&S#" "speech" •Statistical Processing Phonetic representation •Hidden Marcov Models of speech •Dynamic Time Warping Types of speech recognition: •Command and control •Dictation
  • 4. Speech User Interface basic concepts Speech Recognition Components •Audio input (front-end) •Grammars – contain commands that can be spoken by the user •Acoustic models – language dependant, used to “define” the language features •Recognition algorithms (back-end) Back end feature extraction result Audio input / Acoustic Recognition Grammars models algorithms Front end
  • 5. Speech User Interface basic concepts Speech Recognition APIs Microsoft SAPI IBM: Embedded ViaVoice Nuance: VoCon VoiceBox Speech Recognition
  • 6. Speech User Interface basic concepts Speech Synthesis •The translation of written text into spoken text g2p "speech" "#'spit&S#"
  • 7. Speech User Interface basic concepts Speech Synthesis APIs Microsoft SAPI SoftVoice TTS Apple PlainTalk Nuance: Vocalizer SVOX TTS eSpeak
  • 8. Speech User Interface basic concepts - Usage In car: •Control media player / radio stations •Control navigation •Control phone book and phone activities •Find POI locations (POI : point of interests) •E-mail/SMS reading On the web: •HTML 5 speech input •Google Search with voice input •Reading of web page content
  • 9. Speech Recognition Challenges – Accuracy Audio Input Problem: Audio signal quality Impact: loss of recognition accuracy Solution 1: Echo cancellation Solution 2: Beamforming
  • 10. Speech Recognition Challenges – Accuracy Audio Input Problem: Talk-over problem Impact: loss of recognition accuracy Solution: Barge-In TTS User
  • 11. Speech Recognition Challenges – User responsiveness Speech Recognition Problem: resources are not ready and user starts to speak the command Solution: Delayed speech recognition Resource loading / Back-end processing Front-end processing Delayed Speech Recognition
  • 12. Speech Recognition Challenges – User responsiveness Speech Recognition Problem: synchronization with multiple applications (media, phone, navigation) Solution: apply concurrent design patterns •Active Object •Monitor •Double-checked locking
  • 13. Speech Recognition Challenges – Performance Grammars Use cases: • Command & Control grammars • 200 – 500 commands •Navigation grammars • 100k+ static data •Music grammars • 10k+ dynamic data
  • 14. Speech Recognition Challenges – Performance Grammars (1) Problem: Grammar size too big Impact: • increased loading times of files from disk to memory Solution: Grammar optimization •merging of similar command tokens
  • 15. Speech Recognition Challenges – Performance Grammars (2) •removal / replacement of recursion rules
  • 16. Speech Recognition Challenges – Performance Grammars (3) Problem: Grammar token collisions Impact: • loss of recognition accuracy Solution: •replacement of collision prone tokens with synonyms •adding special pronunciation tokens to collision words Examples: sum – sun – sung bet – bed
  • 17. Speech Recognition Challenges – Performance Dynamic Grammars Problem: synchronization with USB devices, phones, navigation databases takes too much time Solution 1: implementation of a caching mechanism
  • 18. Speech Recognition Challenges – Performance Use id3 parser to read from mp3 files Title: One titles, artists, composers, genre, album. Artist: U2, etc. Album: Achtung Baby, Genre: rock ... Phoneme cache dynamic transcriptions grammar add to slot: title <DYN_TITLE> artist <DYN_ARTIST>
  • 19. Speech Recognition Challenges – Performance Dynamic Grammars Solution 2: split the processing in two, and dispatch part of the work to a different processor Use id3 parser to read from mp3 files CPU1 Title: One titles, artists, composers, genre, album. Artist: U2, etc. Album: Achtung Baby, Genre: rock ... CPU2 CPU1 dynamic CPU2 Preprocessing step grammar add to slot: title <DYN_TITLE> artist <DYN_ARTIST>
  • 20. Speech Recognition Challenges – Reliability Reliability - the ability of the system to keep operating over time Problem: system has to operate correctly over large periods of time Solution 1: automated tests Solution 2: drive tests
  • 21. Speech Recognition Challenges – Fault tolerance Problem: Recovery from system failures must be possible Solution: • system is modeled in a modular manner, with components that communicate via internal car area network. • individual components can be restarted without affecting other system components