0
State of the Art/Best Practices in
Speech Technology
Dan Burnett, Director of Speech Technologies
Why speech?


                    Ma Ma




                    Vok Say Oh




2   © Voxeo Corporation
Speech is the natural human interface

      15% of world population has a personal
       computer

      Greater than ...
What is communication?


                                       Your
                                     Customer




   ...
Communication is natural?




                      249694




© Voxeo Corporation
But for IVRs . . .


                                            Your
                                          Customer

...
So why do we tune?

      For better communication, which leads to



                       More satisfied customers
  ...
What can we tune?




                      Your untuned speech-enabled IVR



© Voxeo Corporation
What can we tune?




                      Your untuned speech-enabled IVR



© Voxeo Corporation
What we say – prompts

      Goal: naturally reduce variability in caller's
       responses

      Because: predictabil...
Prompt tuning

        Vocabulary
          •  Use the words your customers use
          •  For sales, say “sales”; For ...
Prompt tuning

        Prompt specificity
          •  General: “What would you like?”
          •  More specific: “Which...
Ever heard this before?

        For Sales, press 1
        For Billing, press 2
        For option I can't remember, p...
Prompt tuning

        Prompt length
          •  Keep it short: less than a few sentences total, only
             one o...
What can we tune?




                      Your untuned speech-enabled IVR



© Voxeo Corporation
What we listen for – grammars

      Goal: Cover everything they are likely to say,
       and nothing more

      Becau...
Grammar tuning

        Cover everything they say
          •  Pre- and post- phrases such as please, I would like,
     ...
Grammar tuning

        Include only what they say
          •  Write grammars that don't overgenerate
          •  If ma...
What can we tune?




                      Your untuned speech-enabled IVR



© Voxeo Corporation
How we listen – parameter optimization

      Goal: Optimize recognizer parameter settings
      Because: Better accurac...
Parameter optimization – which parameters?

      Rejection threshold
      Endpointer settings (sensitivity)

      La...
Rejection threshold – what is it?

                                                     False
                            ...
Rejection threshold – what is it?

Cutoff value for the recognizer confidence below      False
 which the speaker's utteran...
Rejection threshold – total error

                                                     False
                            ...
Rejection threshold – comparison
                                        ASR
                                      Engine ...
Rejection threshold – comparison
                                            ASR
                                         ...
Rejection threshold – another
comparison

                                                      ASR
                      ...
Parameter optimization

        Rejection threshold
          •  Generally largest impact on accuracy
          •  Optimu...
Endpointer sensitivity


                                       Your
                                     Customer




   ...
Parameter optimization

        Endpointer sensitivity
          •  Second-largest impact on accuracy
          •  Unnece...
Parameter optimization

        Large grammar parameters
          •  Typically need to be adjusted if grammar has more
 ...
What can we tune?




                      Your untuned speech-enabled IVR



© Voxeo Corporation
Summary – Keep in mind

        Speech allows your customer to describe things THEIR way rather than to
         use your...
For help




34   © Voxeo Corporation
State of the Art/Best Practices in
Speech Technology
Dan Burnett, Director of Speech Technologies

dburnett@voxeo.com
Voxeo Summit 2010: Best Practices in Speech Technology
Upcoming SlideShare
Loading in...5
×

Voxeo Summit 2010: Best Practices in Speech Technology

1,106

Published on

At the Voxeo Customer Summit 2010, Dan Burnett, Voxeo's Director of Speech Technologies, explained the state of the art with regard to speech technologies and what the best practices are in implementing speech-enabled applications and technology.

More information at:
http://www.voxeo.com/
http://www.voxeo.com/summit2010

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,106
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Voxeo Summit 2010: Best Practices in Speech Technology"

  1. 1. State of the Art/Best Practices in Speech Technology Dan Burnett, Director of Speech Technologies
  2. 2. Why speech? Ma Ma Vok Say Oh 2 © Voxeo Corporation
  3. 3. Speech is the natural human interface   15% of world population has a personal computer   Greater than 60% of world population has a mobile phone © Voxeo Corporation
  4. 4. What is communication? Your Customer You (Your speech-enabled IVR) © Voxeo Corporation
  5. 5. Communication is natural? 249694 © Voxeo Corporation
  6. 6. But for IVRs . . . Your Customer You (Your untuned speech-enabled IVR) © Voxeo Corporation
  7. 7. So why do we tune?   For better communication, which leads to  More satisfied customers  Shorter call durations © Voxeo Corporation
  8. 8. What can we tune? Your untuned speech-enabled IVR © Voxeo Corporation
  9. 9. What can we tune? Your untuned speech-enabled IVR © Voxeo Corporation
  10. 10. What we say – prompts   Goal: naturally reduce variability in caller's responses   Because: predictability simplifies grammars and increases recognition accuracy © Voxeo Corporation
  11. 11. Prompt tuning   Vocabulary •  Use the words your customers use •  For sales, say “sales”; For billing, say “billing”; ... •  Are you calling to learn more about our products, to fix a problem with your bill, or …   Keep in mind •  Speech allows your customer to describe things THEIR way rather than to use your internal company description •  Make it easier for them to do that! © Voxeo Corporation
  12. 12. Prompt tuning   Prompt specificity •  General: “What would you like?” •  More specific: “Which department would you like?” •  Precise: “Would you like A, B, C, or something else?”   Keep in mind •  The caller will often use the exact words YOU use © Voxeo Corporation
  13. 13. Ever heard this before?   For Sales, press 1   For Billing, press 2   For option I can't remember, press 3   For another option I can't remember, press 4   For yet another option I can't remember, press 5   For more of the same, press 6   Blah blah, press 7   For help with this menu, press 8   To hear these options again, press 9 © Voxeo Corporation
  14. 14. Prompt tuning   Prompt length •  Keep it short: less than a few sentences total, only one of which asks for input •  Or: provide pauses (at least one second long) for interruption   Keep in mind •  Speech communication is only natural if it's not drawn out •  Primacy and recency effects © Voxeo Corporation
  15. 15. What can we tune? Your untuned speech-enabled IVR © Voxeo Corporation
  16. 16. What we listen for – grammars   Goal: Cover everything they are likely to say, and nothing more   Because: Accuracy in grammar coverage directly affects recognition accuracy © Voxeo Corporation
  17. 17. Grammar tuning   Cover everything they say •  Pre- and post- phrases such as please, I would like, and thank you •  Synonyms such as (for yes/no) yeah, sure, absolutely not   Keep in mind •  Recognizers can only hear it if it's in the grammar © Voxeo Corporation
  18. 18. Grammar tuning   Include only what they say •  Write grammars that don't overgenerate •  If matching numbers/digits, only include valid strings if at all possible   Keep in mind •  Every unnecessary grammar phrase is a potential misrecognition © Voxeo Corporation
  19. 19. What can we tune? Your untuned speech-enabled IVR © Voxeo Corporation
  20. 20. How we listen – parameter optimization   Goal: Optimize recognizer parameter settings   Because: Better accuracy, of course! © Voxeo Corporation
  21. 21. Parameter optimization – which parameters?   Rejection threshold   Endpointer settings (sensitivity)   Large grammar parameters © Voxeo Corporation
  22. 22. Rejection threshold – what is it? False Rejections Misrecognitions 0 Rejection Threshold 100 © Voxeo Corporation
  23. 23. Rejection threshold – what is it? Cutoff value for the recognizer confidence below False which the speaker's utterance will be rejected Rejections Misrecognitions 0 Rejection Threshold 100 © Voxeo Corporation
  24. 24. Rejection threshold – total error False Rejections Misrecognitions 0 Rejection Threshold 100 © Voxeo Corporation
  25. 25. Rejection threshold – comparison ASR Engine A ASR Engine B 0 Rejection Threshold 100 © Voxeo Corporation
  26. 26. Rejection threshold – comparison ASR Engine A ASR Engine B Optimal thresholds 0 Rejection Threshold 100 © Voxeo Corporation
  27. 27. Rejection threshold – another comparison ASR Engine A Optimal thresholds ASR Engine B 0 Rejection Threshold 100 © Voxeo Corporation
  28. 28. Parameter optimization   Rejection threshold •  Generally largest impact on accuracy •  Optimum varies across recognition engines •  Optimum varies by set of active grammars   Keep in mind •  Optimizing the rejection threshold is the SINGLE MOST IMPORTANT parameter tuning you can do © Voxeo Corporation
  29. 29. Endpointer sensitivity Your Customer You (Your hard-of-hearing speech-enabled IVR) © Voxeo Corporation
  30. 30. Parameter optimization   Endpointer sensitivity •  Second-largest impact on accuracy •  Unnecessarily high and low sensitivity are both bad •  Optimum should be set once, checked annually   Keep in mind •  If the recognizer can't hear you, it can't understand what you say © Voxeo Corporation
  31. 31. Parameter optimization   Large grammar parameters •  Typically need to be adjusted if grammar has more than 5000 entries •  Typically consumes more memory and/or CPU •  Vary by ASR engine, so ask   Keep in mind •  If your grammar has many options, your recognizer needs to “think” more than the default settings usually allow © Voxeo Corporation
  32. 32. What can we tune? Your untuned speech-enabled IVR © Voxeo Corporation
  33. 33. Summary – Keep in mind   Speech allows your customer to describe things THEIR way rather than to use your internal company description – make it easy for them!   The caller will often use the exact words YOU use   Speech communication is only natural if it's not drawn out   Recognizers can only hear it if it's in the grammar   Every unnecessary grammar phrase is a potential misrecognition   Optimizing the rejection threshold is the SINGLE MOST IMPORTANT parameter tuning you can do   If the recognizer can't hear you, it can't understand what you say   If your grammar has many options, your recognizer needs to “think” more than the default settings usually allow © Voxeo Corporation
  34. 34. For help 34 © Voxeo Corporation
  35. 35. State of the Art/Best Practices in Speech Technology Dan Burnett, Director of Speech Technologies
 dburnett@voxeo.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×