SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

•

1 like•318 views

At SpeechTEK 2009 in New York on August 24, 2009, Dr. Daniel C Burnett, Director of Speech Technologies at Voxeo, spoke on optimizing speech recognizer rejection thresholds. Abstract: This session will explain ASR (automatic speech recognizer) confidence rejection thresholds: what they are, where they come from, and their criticality to your ASR-enabled IVR. We describe the steps necessary to optimize this important threshold value throughout your application, covering transcription, the importance of grammar coverage, and an explanation of terms such as the Equal Error Rate. This session is ideal for those ready to take their ASR-enabled IVR tuning to the next level.

Technology Health & Medicine

Optimizing speech
recognizer rejection
thresholds
Dan Burnett
Director of Speech Technologies, Voxeo
August 24, 2009

Why this talk?
• Sometimes we forget the basics, which are:
• Recognizers are not perfect
• They can be optimized in a
straightforward manner
• The simplest optimization is the rejection
threshold

The Goal
• End user goal: optimal experience
• Our Goal: determine user experience for
each possible rejection threshold, then
choose optimum threshold
• Must compare true classiﬁcation of an
audio sample against the ASR engine’s
classiﬁcation

True classiﬁcations
• Assume human-level recognition
• App should still distinguish (i.e. possibly behave
differently) among the following cases:
Case Possible behavior
No speech in audio sample Mention that you didn’t hear
(nospeech) anything and ask for repeat
Speech, but not intelligible
Ask for repeat
(unintelligible)
Intelligible speech, but not in
app grammar Encourage in-grammar speech
(out-of-grammar)
Intelligible speech, and within
app grammar (in-grammar) Respond to what person said

ASR Engine
Classiﬁcations

• Silence/nospeech (nospeech)
• Reject (rejected)
• Recognize (recognized)

Crossing these two . . .
ASR
nospeech rejected recognized
Correct Improperly
nospeech Incorrect
classiﬁcation rejected

Improperly Correct Assume
unintelligible
treated as silence behavior incorrect
True
out-of- Improperly Correct
Incorrect
grammar treated as silence behavior

Improperly Improperly Either correct
in-grammar
treated as silence rejected or incorrect

Crossing these two . . .
Misrecognitions
ASR
nospeech rejected recognized
Correct Improperly
nospeech Incorrect
classiﬁcation rejected

Improperly Correct Assume
unintelligible
treated as silence behavior incorrect
True
out-of- Improperly Correct
Incorrect
grammar treated as silence behavior

Improperly Improperly Either correct
in-grammar
treated as silence rejected or incorrect

Crossing these two . . .
“Misrejections”
ASR
nospeech rejected recognized
Correct Improperly
nospeech Incorrect
classiﬁcation rejected

Improperly Correct Assume
unintelligible
treated as silence behavior incorrect
True
out-of- Improperly Correct
Incorrect
grammar treated as silence behavior

Improperly Improperly Either correct
in-grammar
treated as silence rejected or incorrect

Crossing these two . . .
“Missilences” ASR
nospeech rejected recognized
Correct Improperly
nospeech Incorrect
classiﬁcation rejected

Improperly Correct Assume
unintelligible
treated as silence behavior incorrect
True
out-of- Improperly Correct
Incorrect
grammar treated as silence behavior

Improperly Improperly Either correct
in-grammar
treated as silence rejected or incorrect

Three types of errors

• Missilences -- called silence, but wasn’t
• Misrejections -- rejected inappropriately
• Misrecognitions -- recognized
inappropriately or incorrectly

Evaluating errors

1. Evaluation data set
2. Try every rejection threshold value
3. Plot errors as function of threshold
4. Select optimal value for your app

1. Evaluation data set(s)
• Data selection

• Must be representative (“every nth call”)

• Ideally at least 100 recordings per grammar path
for good conﬁdence in results

• Transcription

• Goal is to compare against recognition results,
so no punctuation, coughs, etc. needed in
transcription itself (but good to have in separate
comments)

2. Try every rejection
threshold value
• Run recognizer in batch mode with
rejection threshold of 0 (i.e., no rejection)
Remember to collect conﬁdence scores!
• Then, for each threshold from 0 to 100
• Calculate number of misrecognitions,
misrejections, and missilences

3. Plot errors
“Misrejections”

Misrecognitions

Equal Error
Rate

“Missilences”

0 Rejection Threshold 100

3. Plot errors

Minimum
Total Error

Sum

0 Rejection Threshold 100

4. Select optimal value
• Equal-error-rate: not necessarily the optimum

4. Select optimal value
• Equal-error-rate: not necessarily the optimum
• Minimum of the sum: good starting point, great for
comparing across engines (on same data set only!!)

Further optimizations

• Move OOG into IG category if semantically
correct (“You bet” -> “yes”)
• Consider additional threshold for
conﬁrmation
• Optimize endpointer parameters (affects
missilences and/or “too much speech”)

More from Voxeo Corp

Voxeo Summit Day 2 - Using CXP hotspot analytics

Voxeo Corp

Voxeo Summit Day 2 - Securing customer interactions

Voxeo Corp

Voxeo Summit Day 2 - Real-time communications with WebRTC

Voxeo Corp

Voxeo Summit Day 2 - Voxeo CXP for business users

Voxeo Corp

Voxeo Summit Day 2 - Creating raving fans

Voxeo Corp

Voxeo Summit Day 2 - Advanced CCXML topics

Voxeo Corp

Voxeo Summit Day 2 - The science of customer obsession

Voxeo Corp

Voxeo Summit Day 1 - Extending your IVR investment to mobile

Voxeo Corp

Voxeo Summit Day 1 - The Art of The Possible

Voxeo Corp

Voxeo Summit Day 1 - Prophecy log search

Voxeo Corp

Voxeo Summit Day 1 - Customer experience analytics

Voxeo Corp

Voxeo Summit Day 1 - Communications-enabled Business Processes (CEBP)

Voxeo Corp

Voxeo Summit Day 1 - A view into the Voxeo cloud

Voxeo Corp

Voxeo Summit Day 1 - Lessons learned from large scale deployments

Voxeo Corp

Voxeo Jam Session: What's New in Prophecy 11 and VoiceObjects 11?

Voxeo Corp

How Do You Hear Me Now?

Voxeo Corp

CCXML For Advanced Communications Applications

Voxeo Corp

What is all the IPv6 buzz about? And how will it impact your communications applications? In February 2011, the last IPv4 addresses were allocated to the global registries. While IPv4 addresses will be available for some time, the reality is that IPv6 addresses will be required in the future. Service providers, enterprises and integrators need to understand how IPv6 works and what impacts it may or may not have on applications. In this session, Dan York provides an overview of IPv6, how it impacts the SIP protocol and potential future action. Along the way he provides links for more information. This was part of a webinar found at http://blogs.voxeo.com/jamsessions/

IPv6 and How It Impacts Communication Applications

Voxeo Corp

Do you want to learn how you can use outbound communication to build stronger relationships with your customers? Would you like to hear actual customer use cases of outbound communication? As your customers move into multi-channel communication across voice, SMS, IM and more, would you like how to use all those channels for outbound interaction as well? In this presentation for a Speech Technology Magazine webinar, Voxeo's Dan York goes into all of this and more. Learn more at http://pages.voxeo.com/outbound-success

7 Critical Success Factors for Outbound IVR

Voxeo Corp

5 Questions When Analyzing Your Analytics Options

Voxeo Corp

More from Voxeo Corp (20)

Voxeo Summit Day 2 - Using CXP hotspot analytics

Voxeo Summit Day 2 - Securing customer interactions

Voxeo Summit Day 2 - Real-time communications with WebRTC

Voxeo Summit Day 2 - Voxeo CXP for business users

Voxeo Summit Day 2 - Creating raving fans

Voxeo Summit Day 2 - Advanced CCXML topics

Voxeo Summit Day 2 - The science of customer obsession

Voxeo Summit Day 1 - Extending your IVR investment to mobile

Voxeo Summit Day 1 - The Art of The Possible

Voxeo Summit Day 1 - Prophecy log search

Voxeo Summit Day 1 - Customer experience analytics

Voxeo Summit Day 1 - Communications-enabled Business Processes (CEBP)

Voxeo Summit Day 1 - A view into the Voxeo cloud

Voxeo Summit Day 1 - Lessons learned from large scale deployments

Voxeo Jam Session: What's New in Prophecy 11 and VoiceObjects 11?

How Do You Hear Me Now?

CCXML For Advanced Communications Applications

IPv6 and How It Impacts Communication Applications

7 Critical Success Factors for Outbound IVR

5 Questions When Analyzing Your Analytics Options

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Delhi Call girls

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

Choosing the right accounts payable services provider is a strategic decision that can significantly impact your business's financial performance and operational efficiency. By considering factors such as expertise, range of services, technology infrastructure, scalability, cost, and reputation, businesses can make informed decisions and select a provider that aligns with their unique needs and objectives. Partnering with the right provider can streamline accounts payable processes, drive cost savings, and position your business for long-term success. https://katprotech.com/accounts-payable-and-purchase-order-automation/

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Katpro Technologies

Histor y of HAM Radio presentation slide

vu2urc

Explore 'The Codex of Business: Writing Software for Real-World Solutions,' a compelling SlideShare presentation that delves into digital transformation in healthcare. Discover through a detailed case study how Agile methodologies empower healthcare providers to develop, iterate, and refine digital solutions that address real-world challenges. Learn how strategic planning, user feedback, and continuous improvement drive success in deploying technologies that enhance patient care and operational efficiency. Ideal for healthcare professionals, IT specialists, and digital transformation advocates seeking actionable insights and practical examples of technology making a real difference.

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Malak Abu Hammad

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

08448380779 Call Girls In Friends Colony Women Seeking Men

Delhi Call girls

Microsoft's Threat Matrix for Kubernetes helps organizations understand the attack surface a Kubernetes deployment introduces to their environments. This ensures that adequate detections and mitigations are in place. By covering over 40 different attacker techniques, defenders can learn about Kubernetes-specific mitigations and controls to deploy to their environments. In this session, we will explore the MS-TA9013 Host Path Mount technique, which is commonly used by attackers to perform privilege escalation in a Kubernetes cluster. Attendees will learn how attackers and defenders can: * Escape the container's host volume mount to gain persistence on an underlying node * Move laterally from the underlying node into the customer's cloud environment * Analyze Kubernetes audit logs to detect pods deployed with a hostPath mount * Deploy an admission controller that prevents new pods from using a hostPath mount

Breaking the Kubernetes Kill Chain: Host Path Mount

Puma Security, LLC

The Raspberry Pi 5 was announced on October 2023. This new version of the popular embedded device comes with a new iteration of Broadcom’s VideoCore GPU platform, and was released with a fully open source driver stack, developed by Igalia. The presentation will discuss some of the major changes required to support this new Video Core iteration, the challenges we faced in the process and the solutions we provided in order to deliver conformant OpenGL ES and Vulkan drivers. The talk will also cover the next steps for the open source Raspberry Pi 5 graphics stack. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://eoss24.sched.com/event/1aBEx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Igalia

08448380779 Call Girls In Civil Lines Women Seeking Men

Delhi Call girls

Enterprise Knowledge’s Urmi Majumder, Principal Data Architecture Consultant, and Fernando Aguilar Islas, Senior Data Science Consultant, presented "Driving Behavioral Change for Information Management through Data-Driven Green Strategy" on March 27, 2024 at Enterprise Data World (EDW) in Orlando, Florida. In this presentation, Urmi and Fernando discussed a case study describing how the information management division in a large supply chain organization drove user behavior change through awareness of the carbon footprint of their duplicated and near-duplicated content, identified via advanced data analytics. Check out their presentation to gain valuable perspectives on utilizing data-driven strategies to influence positive behavioral shifts and support sustainability initiatives within your organization. In this session, participants gained answers to the following questions: - What is a Green Information Management (IM) Strategy, and why should you have one? - How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? - How can an organization use insights into their data to influence employee behavior for IM? - How can you reap additional benefits from content reduction that go beyond Green IM?

Driving Behavioral Change for Information Management through Data-Driven Gree...

Enterprise Knowledge

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

apidays

Real Time Object Detection Using Open CV

Khem

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

Sara Mae O’Brien Scott and Tatiana Baquero Cakici, Senior Consultants at Enterprise Knowledge (EK), presented “AI Fast Track to Search-Focused AI Solutions” at the Information Architecture Conference (IAC24) that took place on April 11, 2024 in Seattle, WA. In their presentation, O’Brien-Scott and Cakici focused on what Enterprise AI is, why it is important, and what it takes to empower organizations to get started on a search-based AI journey and stay on track. The presentation explored the complexities of enterprise search challenges and how IA principles can be leveraged to provide AI solutions through the use of a semantic layer. O’Brien-Scott and Cakici showcased a case study where a taxonomy, an ontology, and a knowledge graph were used to structure content at a healthcare workforce solutions organization, providing personalized content recommendations and increasing content findability. In this session, participants gained insights about the following: Most common types of AI categories and use cases; Recommended steps to design and implement taxonomies and ontologies, ensuring they evolve effectively and support the organization’s search objectives; Taxonomy and ontology design considerations and best practices; Real-world AI applications that illustrated the value of taxonomies, ontologies, and knowledge graphs; and Tools, roles, and skills to design and implement AI-powered search solutions.

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Enterprise Knowledge

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Exploring the Future Potential of AI-Enabled Smartphone Processors

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Axa Assurance Maroc - Insurer Innovation Award 2024

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Histor y of HAM Radio presentation slide

The Codex of Business Writing Software for Real-World Solutions 2.pptx

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

08448380779 Call Girls In Friends Colony Women Seeking Men

Breaking the Kubernetes Kill Chain: Host Path Mount

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

08448380779 Call Girls In Civil Lines Women Seeking Men

Driving Behavioral Change for Information Management through Data-Driven Gree...

What Are The Drone Anti-jamming Systems Technology?

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Real Time Object Detection Using Open CV

How to Troubleshoot Apps for the Modern Connected Worker

The 7 Things I Know About Cyber Security After 25 Years | April 2024

IAC 2024 - IA Fast Track to Search Focused AI Solutions

SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

1. Optimizing speech recognizer rejection thresholds Dan Burnett Director of Speech Technologies, Voxeo August 24, 2009

2. Why this talk? • Sometimes we forget the basics, which are: • Recognizers are not perfect • They can be optimized in a straightforward manner • The simplest optimization is the rejection threshold

3. The Goal • End user goal: optimal experience • Our Goal: determine user experience for each possible rejection threshold, then choose optimum threshold • Must compare true classiﬁcation of an audio sample against the ASR engine’s classiﬁcation

4. True classiﬁcations • Assume human-level recognition • App should still distinguish (i.e. possibly behave differently) among the following cases: Case Possible behavior No speech in audio sample Mention that you didn’t hear (nospeech) anything and ask for repeat Speech, but not intelligible Ask for repeat (unintelligible) Intelligible speech, but not in app grammar Encourage in-grammar speech (out-of-grammar) Intelligible speech, and within app grammar (in-grammar) Respond to what person said

5. ASR Engine Classiﬁcations • Silence/nospeech (nospeech) • Reject (rejected) • Recognize (recognized)

6. Crossing these two . . . ASR nospeech rejected recognized Correct Improperly nospeech Incorrect classiﬁcation rejected Improperly Correct Assume unintelligible treated as silence behavior incorrect True out-of- Improperly Correct Incorrect grammar treated as silence behavior Improperly Improperly Either correct in-grammar treated as silence rejected or incorrect

7. Crossing these two . . . Misrecognitions ASR nospeech rejected recognized Correct Improperly nospeech Incorrect classiﬁcation rejected Improperly Correct Assume unintelligible treated as silence behavior incorrect True out-of- Improperly Correct Incorrect grammar treated as silence behavior Improperly Improperly Either correct in-grammar treated as silence rejected or incorrect

8. Crossing these two . . . “Misrejections” ASR nospeech rejected recognized Correct Improperly nospeech Incorrect classiﬁcation rejected Improperly Correct Assume unintelligible treated as silence behavior incorrect True out-of- Improperly Correct Incorrect grammar treated as silence behavior Improperly Improperly Either correct in-grammar treated as silence rejected or incorrect

9. Crossing these two . . . “Missilences” ASR nospeech rejected recognized Correct Improperly nospeech Incorrect classiﬁcation rejected Improperly Correct Assume unintelligible treated as silence behavior incorrect True out-of- Improperly Correct Incorrect grammar treated as silence behavior Improperly Improperly Either correct in-grammar treated as silence rejected or incorrect

10. Three types of errors • Missilences -- called silence, but wasn’t • Misrejections -- rejected inappropriately • Misrecognitions -- recognized inappropriately or incorrectly

11. Three types of errors • Missilences -- called silence, but wasn’t • Misrejections -- rejected inappropriately • Misrecognitions -- recognized inappropriately or incorrectly So how do we evaluate these?

12. Evaluating errors 1. Evaluation data set 2. Try every rejection threshold value 3. Plot errors as function of threshold 4. Select optimal value for your app

13. 1. Evaluation data set(s) • Data selection • Must be representative (“every nth call”) • Ideally at least 100 recordings per grammar path for good conﬁdence in results • Transcription • Goal is to compare against recognition results, so no punctuation, coughs, etc. needed in transcription itself (but good to have in separate comments)

14. 2. Try every rejection threshold value • Run recognizer in batch mode with rejection threshold of 0 (i.e., no rejection) Remember to collect conﬁdence scores! • Then, for each threshold from 0 to 100 • Calculate number of misrecognitions, misrejections, and missilences

15. 3. Plot errors “Misrejections” Misrecognitions Equal Error Rate “Missilences” 0 Rejection Threshold 100

16. 3. Plot errors Minimum Total Error Sum 0 Rejection Threshold 100

17. 4. Select optimal value

18. 4. Select optimal value • Equal-error-rate: not necessarily the optimum

19. 4. Select optimal value • Equal-error-rate: not necessarily the optimum • Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!)

20. 4. Select optimal value • Equal-error-rate: not necessarily the optimum • Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!) • Optimal: depends on your app; some errors may be more critical than others

21. 4. Select optimal value • Equal-error-rate: not necessarily the optimum • Minimum of the sum: good starting point, great for comparing across engines (on same data set only!!) • Optimal: depends on your app; some errors may be more critical than others • Question: if missilences not affected by threshold, why did I include it?

22. Further optimizations • Move OOG into IG category if semantically correct (“You bet” -> “yes”) • Consider additional threshold for conﬁrmation • Optimize endpointer parameters (affects missilences and/or “too much speech”)

23. Optimizing speech recognizer rejection thresholds Dan Burnett Director of Speech Technologies, Voxeo August 24, 2009

SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds

Recommended

Recommended

More Related Content

More from Voxeo Corp

More from Voxeo Corp (20)

Recently uploaded

Recently uploaded (20)

SpeechTEK 2009: Optimizing Speech Recognizer Rejection Thresholds