SlideShare a Scribd company logo
1 of 23
Download to read offline
Optimizing speech
recognizer rejection
    thresholds
              Dan Burnett
 Director of Speech Technologies, Voxeo
            August 24, 2009
Why this talk?
• Sometimes we forget the basics, which are:
 • Recognizers are not perfect
 • They can be optimized in a
    straightforward manner
 • The simplest optimization is the rejection
    threshold
The Goal
• End user goal: optimal experience
• Our Goal: determine user experience for
  each possible rejection threshold, then
  choose optimum threshold
• Must compare true classification of an
  audio sample against the ASR engine’s
  classification
True classifications
•    Assume human-level recognition
•    App should still distinguish (i.e. possibly behave
     differently) among the following cases:
              Case                    Possible behavior
    No speech in audio sample      Mention that you didn’t hear
           (nospeech)              anything and ask for repeat
    Speech, but not intelligible
                                         Ask for repeat
         (unintelligible)
Intelligible speech, but not in
         app grammar            Encourage in-grammar speech
       (out-of-grammar)
Intelligible speech, and within
 app grammar (in-grammar)          Respond to what person said
ASR Engine
       Classifications

• Silence/nospeech (nospeech)
• Reject (rejected)
• Recognize (recognized)
Crossing these two . . .
                                   ASR
                          nospeech          rejected    recognized
                           Correct         Improperly
        nospeech                                        Incorrect
                        classification       rejected

                         Improperly         Correct      Assume
       unintelligible
                      treated as silence    behavior    incorrect
True
         out-of-        Improperly          Correct
                                                        Incorrect
        grammar      treated as silence     behavior

                     Improperly      Improperly Either correct
       in-grammar
                  treated as silence rejected    or incorrect
Crossing these two . . .
                                            Misrecognitions
                                   ASR
                          nospeech          rejected    recognized
                           Correct         Improperly
        nospeech                                        Incorrect
                        classification       rejected

                         Improperly         Correct      Assume
       unintelligible
                      treated as silence    behavior    incorrect
True
         out-of-        Improperly          Correct
                                                        Incorrect
        grammar      treated as silence     behavior

                     Improperly      Improperly Either correct
       in-grammar
                  treated as silence rejected    or incorrect
Crossing these two . . .
                                            “Misrejections”
                                   ASR
                          nospeech          rejected    recognized
                           Correct         Improperly
        nospeech                                        Incorrect
                        classification       rejected

                         Improperly         Correct      Assume
       unintelligible
                      treated as silence    behavior    incorrect
True
         out-of-        Improperly          Correct
                                                        Incorrect
        grammar      treated as silence     behavior

                     Improperly      Improperly Either correct
       in-grammar
                  treated as silence rejected    or incorrect
Crossing these two . . .
       “Missilences”                ASR
                           nospeech          rejected    recognized
                            Correct         Improperly
         nospeech                                        Incorrect
                         classification       rejected

                          Improperly         Correct      Assume
        unintelligible
                       treated as silence    behavior    incorrect
True
          out-of-        Improperly          Correct
                                                         Incorrect
         grammar      treated as silence     behavior

                      Improperly      Improperly Either correct
        in-grammar
                   treated as silence rejected    or incorrect
Three types of errors

• Missilences -- called silence, but wasn’t
• Misrejections -- rejected inappropriately
• Misrecognitions -- recognized
  inappropriately or incorrectly
Three types of errors

• Missilences -- called silence, but wasn’t
• Misrejections -- rejected inappropriately
• Misrecognitions -- recognized
  inappropriately or incorrectly

     So how do we evaluate these?
Evaluating errors

1. Evaluation data set
2. Try every rejection threshold value
3. Plot errors as function of threshold
4. Select optimal value for your app
1. Evaluation data set(s)
•   Data selection

    •   Must be representative (“every nth call”)

    •   Ideally at least 100 recordings per grammar path
        for good confidence in results

•   Transcription

    •   Goal is to compare against recognition results,
        so no punctuation, coughs, etc. needed in
        transcription itself (but good to have in separate
        comments)
2. Try every rejection
     threshold value
• Run recognizer in batch mode with
  rejection threshold of 0 (i.e., no rejection)
  Remember to collect confidence scores!
• Then, for each threshold from 0 to 100
 • Calculate number of misrecognitions,
    misrejections, and missilences
3. Plot errors
                                              “Misrejections”




    Misrecognitions



                          Equal Error
                             Rate


“Missilences”

0                       Rejection Threshold                     100
3. Plot errors


          Minimum
         Total Error

       Sum



0     Rejection Threshold   100
4. Select optimal value
4. Select optimal value
•   Equal-error-rate: not necessarily the optimum
4. Select optimal value
•   Equal-error-rate: not necessarily the optimum
•   Minimum of the sum: good starting point, great for
    comparing across engines (on same data set only!!)
4. Select optimal value
•   Equal-error-rate: not necessarily the optimum
•   Minimum of the sum: good starting point, great for
    comparing across engines (on same data set only!!)
•   Optimal: depends on your app; some errors may
    be more critical than others
4. Select optimal value
•   Equal-error-rate: not necessarily the optimum
•   Minimum of the sum: good starting point, great for
    comparing across engines (on same data set only!!)
•   Optimal: depends on your app; some errors may
    be more critical than others
•   Question: if missilences not affected by threshold,
    why did I include it?
Further optimizations

• Move OOG into IG category if semantically
  correct (“You bet” -> “yes”)
• Consider additional threshold for
  confirmation
• Optimize endpointer parameters (affects
  missilences and/or “too much speech”)
Optimizing speech
recognizer rejection
    thresholds
              Dan Burnett
 Director of Speech Technologies, Voxeo
            August 24, 2009

More Related Content

More from Voxeo Corp

Voxeo Summit Day 2 - Using CXP hotspot analytics
Voxeo Summit Day 2 - Using CXP hotspot analyticsVoxeo Summit Day 2 - Using CXP hotspot analytics
Voxeo Summit Day 2 - Using CXP hotspot analytics
Voxeo Corp
 
Voxeo Summit Day 2 - Securing customer interactions
Voxeo Summit Day 2 - Securing customer interactionsVoxeo Summit Day 2 - Securing customer interactions
Voxeo Summit Day 2 - Securing customer interactions
Voxeo Corp
 
Voxeo Summit Day 2 - Real-time communications with WebRTC
Voxeo Summit Day 2 - Real-time communications with WebRTCVoxeo Summit Day 2 - Real-time communications with WebRTC
Voxeo Summit Day 2 - Real-time communications with WebRTC
Voxeo Corp
 
Voxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Summit Day 2 - Voxeo CXP for business usersVoxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Corp
 
Voxeo Summit Day 2 - Creating raving fans
Voxeo Summit Day 2 - Creating raving fansVoxeo Summit Day 2 - Creating raving fans
Voxeo Summit Day 2 - Creating raving fans
Voxeo Corp
 
Voxeo Summit Day 2 - Advanced CCXML topics
Voxeo Summit Day 2 - Advanced CCXML topicsVoxeo Summit Day 2 - Advanced CCXML topics
Voxeo Summit Day 2 - Advanced CCXML topics
Voxeo Corp
 
Voxeo Summit Day 2 - The science of customer obsession
Voxeo Summit Day 2 - The science of customer obsessionVoxeo Summit Day 2 - The science of customer obsession
Voxeo Summit Day 2 - The science of customer obsession
Voxeo Corp
 

More from Voxeo Corp (20)

Voxeo Summit Day 2 - Using CXP hotspot analytics
Voxeo Summit Day 2 - Using CXP hotspot analyticsVoxeo Summit Day 2 - Using CXP hotspot analytics
Voxeo Summit Day 2 - Using CXP hotspot analytics
 
Voxeo Summit Day 2 - Securing customer interactions
Voxeo Summit Day 2 - Securing customer interactionsVoxeo Summit Day 2 - Securing customer interactions
Voxeo Summit Day 2 - Securing customer interactions
 
Voxeo Summit Day 2 - Real-time communications with WebRTC
Voxeo Summit Day 2 - Real-time communications with WebRTCVoxeo Summit Day 2 - Real-time communications with WebRTC
Voxeo Summit Day 2 - Real-time communications with WebRTC
 
Voxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Summit Day 2 - Voxeo CXP for business usersVoxeo Summit Day 2 - Voxeo CXP for business users
Voxeo Summit Day 2 - Voxeo CXP for business users
 
Voxeo Summit Day 2 - Creating raving fans
Voxeo Summit Day 2 - Creating raving fansVoxeo Summit Day 2 - Creating raving fans
Voxeo Summit Day 2 - Creating raving fans
 
Voxeo Summit Day 2 - Advanced CCXML topics
Voxeo Summit Day 2 - Advanced CCXML topicsVoxeo Summit Day 2 - Advanced CCXML topics
Voxeo Summit Day 2 - Advanced CCXML topics
 
Voxeo Summit Day 2 - The science of customer obsession
Voxeo Summit Day 2 - The science of customer obsessionVoxeo Summit Day 2 - The science of customer obsession
Voxeo Summit Day 2 - The science of customer obsession
 
Voxeo Summit Day 1 - Extending your IVR investment to mobile
Voxeo Summit Day 1 - Extending your IVR investment to mobileVoxeo Summit Day 1 - Extending your IVR investment to mobile
Voxeo Summit Day 1 - Extending your IVR investment to mobile
 
Voxeo Summit Day 1 - The Art of The Possible
Voxeo Summit Day 1 - The Art of The PossibleVoxeo Summit Day 1 - The Art of The Possible
Voxeo Summit Day 1 - The Art of The Possible
 
Voxeo Summit Day 1 - Prophecy log search
Voxeo Summit Day 1 - Prophecy log searchVoxeo Summit Day 1 - Prophecy log search
Voxeo Summit Day 1 - Prophecy log search
 
Voxeo Summit Day 1 - Customer experience analytics
Voxeo Summit Day 1 - Customer experience analyticsVoxeo Summit Day 1 - Customer experience analytics
Voxeo Summit Day 1 - Customer experience analytics
 
Voxeo Summit Day 1 - Communications-enabled Business Processes (CEBP)
Voxeo Summit Day 1 - Communications-enabled Business Processes (CEBP)Voxeo Summit Day 1 - Communications-enabled Business Processes (CEBP)
Voxeo Summit Day 1 - Communications-enabled Business Processes (CEBP)
 
Voxeo Summit Day 1 - A view into the Voxeo cloud
Voxeo Summit Day 1 - A view into the Voxeo cloudVoxeo Summit Day 1 - A view into the Voxeo cloud
Voxeo Summit Day 1 - A view into the Voxeo cloud
 
Voxeo Summit Day 1 - Lessons learned from large scale deployments
Voxeo Summit Day 1 - Lessons learned from large scale deploymentsVoxeo Summit Day 1 - Lessons learned from large scale deployments
Voxeo Summit Day 1 - Lessons learned from large scale deployments
 
Voxeo Jam Session: What's New in Prophecy 11 and VoiceObjects 11?
Voxeo Jam Session: What's New in Prophecy 11 and VoiceObjects 11?Voxeo Jam Session: What's New in Prophecy 11 and VoiceObjects 11?
Voxeo Jam Session: What's New in Prophecy 11 and VoiceObjects 11?
 
How Do You Hear Me Now?
How Do You Hear Me Now?How Do You Hear Me Now?
How Do You Hear Me Now?
 
CCXML For Advanced Communications Applications
CCXML For Advanced Communications ApplicationsCCXML For Advanced Communications Applications
CCXML For Advanced Communications Applications
 
IPv6 and How It Impacts Communication Applications
IPv6 and How It Impacts Communication ApplicationsIPv6 and How It Impacts Communication Applications
IPv6 and How It Impacts Communication Applications
 
7 Critical Success Factors for Outbound IVR
7 Critical Success Factors for Outbound IVR7 Critical Success Factors for Outbound IVR
7 Critical Success Factors for Outbound IVR
 
5 Questions When Analyzing Your Analytics Options
5 Questions When Analyzing Your Analytics Options5 Questions When Analyzing Your Analytics Options
5 Questions When Analyzing Your Analytics Options
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

  • 1. Optimizing speech recognizer rejection thresholds Dan Burnett Director of Speech Technologies, Voxeo August 24, 2009
  • 2. Why this talk? • Sometimes we forget the basics, which are: • Recognizers are not perfect • They can be optimized in a straightforward manner • The simplest optimization is the rejection threshold
  • 3. The Goal • End user goal: optimal experience • Our Goal: determine user experience for each possible rejection threshold, then choose optimum threshold • Must compare true classification of an audio sample against the ASR engine’s classification
  • 4. True classifications • Assume human-level recognition • App should still distinguish (i.e. possibly behave differently) among the following cases: Case Possible behavior No speech in audio sample Mention that you didn’t hear (nospeech) anything and ask for repeat Speech, but not intelligible Ask for repeat (unintelligible) Intelligible speech, but not in app grammar Encourage in-grammar speech (out-of-grammar) Intelligible speech, and within app grammar (in-grammar) Respond to what person said
  • 5. ASR Engine Classifications • Silence/nospeech (nospeech) • Reject (rejected) • Recognize (recognized)
  • 6. Crossing these two . . . ASR nospeech rejected recognized Correct Improperly nospeech Incorrect classification rejected Improperly Correct Assume unintelligible treated as silence behavior incorrect True out-of- Improperly Correct Incorrect grammar treated as silence behavior Improperly Improperly Either correct in-grammar treated as silence rejected or incorrect
  • 7. Crossing these two . . . Misrecognitions ASR nospeech rejected recognized Correct Improperly nospeech Incorrect classification rejected Improperly Correct Assume unintelligible treated as silence behavior incorrect True out-of- Improperly Correct Incorrect grammar treated as silence behavior Improperly Improperly Either correct in-grammar treated as silence rejected or incorrect
  • 8. Crossing these two . . . “Misrejections” ASR nospeech rejected recognized Correct Improperly nospeech Incorrect classification rejected Improperly Correct Assume unintelligible treated as silence behavior incorrect True out-of- Improperly Correct Incorrect grammar treated as silence behavior Improperly Improperly Either correct in-grammar treated as silence rejected or incorrect
  • 9. Crossing these two . . . “Missilences” ASR nospeech rejected recognized Correct Improperly nospeech Incorrect classification rejected Improperly Correct Assume unintelligible treated as silence behavior incorrect True out-of- Improperly Correct Incorrect grammar treated as silence behavior Improperly Improperly Either correct in-grammar treated as silence rejected or incorrect
  • 10. Three types of errors • Missilences -- called silence, but wasn’t • Misrejections -- rejected inappropriately • Misrecognitions -- recognized inappropriately or incorrectly
  • 11. Three types of errors • Missilences -- called silence, but wasn’t • Misrejections -- rejected inappropriately • Misrecognitions -- recognized inappropriately or incorrectly So how do we evaluate these?
  • 12. Evaluating errors 1. Evaluation data set 2. Try every rejection threshold value 3. Plot errors as function of threshold 4. Select optimal value for your app
  • 13. 1. Evaluation data set(s) • Data selection • Must be representative (“every nth call”) • Ideally at least 100 recordings per grammar path for good confidence in results • Transcription • Goal is to compare against recognition results, so no punctuation, coughs, etc. needed in transcription itself (but good to have in separate comments)
  • 14. 2. Try every rejection threshold value • Run recognizer in batch mode with rejection threshold of 0 (i.e., no rejection) Remember to collect confidence scores! • Then, for each threshold from 0 to 100 • Calculate number of misrecognitions, misrejections, and missilences
  • 15. 3. Plot errors “Misrejections” Misrecognitions Equal Error Rate “Missilences” 0 Rejection Threshold 100
  • 16. 3. Plot errors Minimum Total Error Sum 0 Rejection Threshold 100
  • 18. 4. Select optimal value • Equal-error-rate: not necessarily the optimum
  • 19. 4. Select optimal value • Equal-error-rate: not necessarily the optimum • Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!)
  • 20. 4. Select optimal value • Equal-error-rate: not necessarily the optimum • Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!) • Optimal: depends on your app; some errors may be more critical than others
  • 21. 4. Select optimal value • Equal-error-rate: not necessarily the optimum • Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!) • Optimal: depends on your app; some errors may be more critical than others • Question: if missilences not affected by threshold, why did I include it?
  • 22. Further optimizations • Move OOG into IG category if semantically correct (“You bet” -> “yes”) • Consider additional threshold for confirmation • Optimize endpointer parameters (affects missilences and/or “too much speech”)
  • 23. Optimizing speech recognizer rejection thresholds Dan Burnett Director of Speech Technologies, Voxeo August 24, 2009