Creating a Voice User Interface with Speech Server 2007 Jason Townsend
Jason Townsend <ul><li>President, Bartlesville .NET User Group </li></ul><ul><li>Sr. Analyst, ConocoPhillips </li></ul><ul...
 
Speech Server 2007 <ul><li>Speech Server is an IVR (interactive voice response) platform that allows you to develop teleph...
Common Application Scenarios <ul><li>Customer Service </li></ul><ul><ul><li>Pay bills by phone (ex: ChoicePay) </li></ul><...
New Features <ul><li>Support for .NET 2.0 Framework </li></ul><ul><li>Support for VoiceXML  </li></ul><ul><li>Voice Respon...
Speech Server Architecture
Speech Recognition Supported Languages <ul><li>English – Austalia </li></ul><ul><li>English – United Kingdom </li></ul><ul...
VoiceXML <ul><li>W3C’s standard XML Format for specifying interactive voice dialogues between a human and a computer </li>...
SALT <ul><li>SALT Forum was founded on October 15, 2001  </li></ul><ul><ul><li>Microsoft </li></ul></ul><ul><ul><li>Cisco ...
SALT Usage <ul><li>Microsoft Speech Server 2004 </li></ul><ul><ul><li>Only SALT </li></ul></ul><ul><li>Microsoft Speech Se...
Key Workflow Concepts <ul><li>Workflows are a set of activities </li></ul><ul><ul><li>The work flow itself is an Activity ...
Dialogue Flow is a Workflow <ul><li>Speech Server only supports sequential workflow development </li></ul>
Speech Application Development <ul><li>Define the dialogue flow </li></ul><ul><ul><li>Statements, questions, answers, etc…...
Developing Your Prototype Managed Code Assembly
Tuning Applications <ul><li>Out of the box speech applications </li></ul><ul><ul><li>Are not robust to real world user inp...
Reporting in Speech Server <ul><li>Measuring application performance and server performance </li></ul><ul><ul><li>Call-Vol...
Data Management – Trace Logging <ul><li>Logs </li></ul><ul><ul><li>Call details </li></ul></ul><ul><ul><li>Application ins...
Logged Information - Prompt <ul><li>Prompt </li></ul><ul><ul><li>Content </li></ul></ul><ul><ul><li>Barge-in detection </l...
Logged Information - Response <ul><li>Input Mode </li></ul><ul><ul><li>Speech </li></ul></ul><ul><ul><li>DTMF </li></ul></...
 
Voice User Interface (VUI) <ul><li>Allows for human interaction with computers through a voice/speech platform </li></ul><...
Grammars <ul><li>Best practice: constrain the grammar as much as possible. </li></ul><ul><li>Good prompt design guides the...
VUI Design Best Practices <ul><li>Use DTMF for long numbers </li></ul><ul><li>Don’t use open ended prompts </li></ul><ul><...
Use DTMF for Long Numbers <ul><li>Limit spoken digits to 4 or less </li></ul><ul><li>This rule is often broken for: </li><...
Don’t Use Open Ended Prompts <ul><li>BAD: “Hello, thank you for calling Tulsa Techfest.  May I help you? </li></ul><ul><li...
Don’t Repeat Prompts <ul><li>Callers will tend to repeat the same response you did not understand the first time, when pro...
Focus on Grammar Accuracy <ul><li>Spend time TUNING and REFINING your grammars </li></ul><ul><li>Accuracy is IMPERATIVE </...
If Natural Dialogs Fail, Fall back to Directed Dialog <ul><li>Natural Dialogs are great, but they have a higher rate of fa...
Always Confirm What Was Recognized <ul><li>Mismatches are common </li></ul><ul><ul><li>Austin/Boston </li></ul></ul><ul><u...
Generate Prompts Based on Recognition Confidence Scores <ul><li>Speech recognition errors are common </li></ul><ul><li>How...
Confidence Scores & N-Best Lists <ul><li>The recognition engine returns a confidence score along with a result </li></ul><...
Skip Lists <ul><li>Skip List is a  type  of N-Best processing </li></ul><ul><li>Keep track of results that caller has conf...
Bail Out If Too Many Errors <ul><li>Don’t make your customer become a “0” (zero) jammer </li></ul><ul><li>Transfer to a li...
Keep TTS Output to a Minimum <ul><li>Does not sound professional </li></ul><ul><li>Hire a voice talent.. The payoff will j...
Be Aware of Human Memory <ul><li>Make lists short </li></ul><ul><li>No more than 5 items </li></ul><ul><li>Present large l...
Platinum Rule <ul><li>Treat users as they want to be treated, not how you want to be treated </li></ul><ul><li>Step into t...
Let The Caller Drive <ul><li>Provide instant gratification (let’s the caller get in a zone, and they enjoy the experience ...
VUI Design is a Science <ul><li>Design before development </li></ul><ul><li>Wizard of Oz Testing </li></ul><ul><li>Find ba...
Demos
 
Additional Information <ul><li>http://www.microsoft.com/speech </li></ul><ul><li>http://www.microsoft.com/uc </li></ul><ul...
Further Resources <ul><li>My Blog </li></ul><ul><ul><li>http://www.okcodemonkey.com </li></ul></ul><ul><li>Linkedin </li><...
Key Terms
Voice Dialogue
 
Voice Browser <ul><li>“ Web Browser” that presents and IVR VUI to the user </li></ul><ul><li>Provides interface to the PST...
Speech Recognition <ul><li>Converts spoken words to machine readable input </li></ul>
DTMF (Dual-tone Multi-Frequency) <ul><li>Used for telephone signaling over the line in the voice-frequency band to the cal...
Text-To-Speech (Speech Synthesis) <ul><li>Artificial production of human speech </li></ul><ul><li>Computer used is called ...
PSTN (Public Switched Telephone Network) <ul><li>Network of the world’s public circuit switched telephone networks </li></...
ITU-T (International Telecommunication Union Standardization Sector) <ul><li>Coordinates standards for telecommunications ...
ITU (International Telecommunication Union) <ul><li>Established to standardize and regulate international radio and teleco...
PBX (Private Branch Exchange) <ul><li>Is a telephone exchange that serves as a particular business or office, as opposed t...
Upcoming SlideShare
Loading in …5
×

Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

2,132 views
1,990 views

Published on

This is the slide deck from my presentation at Tulsa Techfest 2008 on Microsoft Speech Server and Creating Successful Voice User Interfaces.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,132
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
69
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server

    1. 1. Creating a Voice User Interface with Speech Server 2007 Jason Townsend
    2. 2. Jason Townsend <ul><li>President, Bartlesville .NET User Group </li></ul><ul><li>Sr. Analyst, ConocoPhillips </li></ul><ul><li>11+ Years Development Experience </li></ul><ul><li>Father of 4 wonderful children </li></ul><ul><li>Married to an amazing and forgiving wife! </li></ul><ul><li>Avid Sailor </li></ul>
    3. 4. Speech Server 2007 <ul><li>Speech Server is an IVR (interactive voice response) platform that allows you to develop telephony applications using standards such as Speech Application Language Tags (SALT) and VoiceXML. </li></ul><ul><li>New Features </li></ul><ul><ul><li>Native Voice Over IP (VoIP) </li></ul></ul><ul><ul><li>Voice Response Workflow </li></ul></ul><ul><ul><li>Conversational Grammar Builder </li></ul></ul>
    4. 5. Common Application Scenarios <ul><li>Customer Service </li></ul><ul><ul><li>Pay bills by phone (ex: ChoicePay) </li></ul></ul><ul><ul><li>Order products (ex: Tickets.com) </li></ul></ul><ul><ul><li>Customer Support (ex: Dell) </li></ul></ul><ul><ul><li>Banking (ex: Bank of America) </li></ul></ul><ul><li>Information Worker Markets </li></ul><ul><ul><li>Pipeline workers </li></ul></ul><ul><ul><li>Insurance Appraisers </li></ul></ul><ul><ul><li>Realtors </li></ul></ul><ul><ul><li>For workers that may not be in front of a desktop </li></ul></ul>
    5. 6. New Features <ul><li>Support for .NET 2.0 Framework </li></ul><ul><li>Support for VoiceXML </li></ul><ul><li>Voice Response Workflow Applications </li></ul><ul><ul><li>Based on Windows Workflow Foundation </li></ul></ul><ul><li>Native Support for VoIP </li></ul><ul><li>Integrated into Office Communications Server. </li></ul>
    6. 7. Speech Server Architecture
    7. 8. Speech Recognition Supported Languages <ul><li>English – Austalia </li></ul><ul><li>English – United Kingdom </li></ul><ul><li>English – North America </li></ul><ul><li>German – Germany </li></ul><ul><li>Spanish – Americas </li></ul><ul><li>More to come… </li></ul>
    8. 9. VoiceXML <ul><li>W3C’s standard XML Format for specifying interactive voice dialogues between a human and a computer </li></ul><ul><li>Interpreted by a voice browser </li></ul>
    9. 10. SALT <ul><li>SALT Forum was founded on October 15, 2001 </li></ul><ul><ul><li>Microsoft </li></ul></ul><ul><ul><li>Cisco </li></ul></ul><ul><ul><li>Comverse </li></ul></ul><ul><ul><li>Intel </li></ul></ul><ul><ul><li>Philips </li></ul></ul><ul><ul><li>ScanSoft </li></ul></ul><ul><li>W3C work initiated in July 2002 </li></ul><ul><li>SALT Forum seems to have gone dead. The last press release was in 2003. </li></ul><ul><li>Main concept was multimodal applications </li></ul><ul><ul><li>Speechify the web, ivr, handhelds, etc… </li></ul></ul>
    10. 11. SALT Usage <ul><li>Microsoft Speech Server 2004 </li></ul><ul><ul><li>Only SALT </li></ul></ul><ul><li>Microsoft Speech Server 2007 </li></ul><ul><ul><li>SALT and VXML </li></ul></ul><ul><li>Plugin for Internet Explorer </li></ul>
    11. 12. Key Workflow Concepts <ul><li>Workflows are a set of activities </li></ul><ul><ul><li>The work flow itself is an Activity </li></ul></ul><ul><li>Activities are the building blocks of the application </li></ul><ul><ul><li>A single unit of Reuse </li></ul></ul><ul><ul><li>A single unit of Execution </li></ul></ul><ul><li>An Activity has associated properties, conditions, and events </li></ul><ul><li>Developers can build their own Custom Activity Libraries </li></ul><ul><ul><li>Image your own Telerik RAD Controls, Infragistics Controls, etc… Just for VUI’s </li></ul></ul><ul><li>A Workflow runs within a Host Process </li></ul><ul><ul><li>WAS </li></ul></ul><ul><ul><li>IIS </li></ul></ul><ul><ul><li>.EXE </li></ul></ul><ul><ul><li>Windows Managed Services </li></ul></ul>
    12. 13. Dialogue Flow is a Workflow <ul><li>Speech Server only supports sequential workflow development </li></ul>
    13. 14. Speech Application Development <ul><li>Define the dialogue flow </li></ul><ul><ul><li>Statements, questions, answers, etc… </li></ul></ul><ul><ul><li>Other activities </li></ul></ul><ul><li>Specify possible answers (grammars) </li></ul><ul><li>Record questions (prompts) </li></ul><ul><li>Integrate into the back-end (Web Services) </li></ul><ul><li>Deploy, test, and tune application </li></ul>
    14. 15. Developing Your Prototype Managed Code Assembly
    15. 16. Tuning Applications <ul><li>Out of the box speech applications </li></ul><ul><ul><li>Are not robust to real world user input </li></ul></ul><ul><ul><li>Need real data to optimize </li></ul></ul><ul><li>Trial phases required for gathering data </li></ul><ul><ul><li>Wizard of Oz phase </li></ul></ul><ul><ul><li>Pilot phases </li></ul></ul><ul><li>Visual Studio Integrated Analytics and Tuning Studio tool can be used to analyze the data and find problems </li></ul>
    16. 17. Reporting in Speech Server <ul><li>Measuring application performance and server performance </li></ul><ul><ul><li>Call-Volume </li></ul></ul><ul><ul><li>Self Service completion rates </li></ul></ul><ul><li>Sharing reporting date throughout the business </li></ul><ul><ul><li>Speech server can leverage the full SQL Server stack </li></ul></ul><ul><ul><ul><li>Reporting Services </li></ul></ul></ul><ul><ul><ul><li>Analysis Services </li></ul></ul></ul><ul><ul><ul><li>Integration Services </li></ul></ul></ul>
    17. 18. Data Management – Trace Logging <ul><li>Logs </li></ul><ul><ul><li>Call details </li></ul></ul><ul><ul><li>Application instrumentation </li></ul></ul><ul><ul><li>Audio and grammers </li></ul></ul><ul><ul><li>Server latencies </li></ul></ul><ul><ul><li>More.. </li></ul></ul><ul><li>Saved in Speech Server Log files </li></ul><ul><li>Can import via Log import tool into your SQL Server Database/Farm </li></ul><ul><li>Analyze via Speech Server 2007 Analytics and Tuning Stuiod </li></ul><ul><li>Present reports via SQL Server Reporting Services </li></ul>
    18. 19. Logged Information - Prompt <ul><li>Prompt </li></ul><ul><ul><li>Content </li></ul></ul><ul><ul><li>Barge-in detection </li></ul></ul><ul><ul><li>Rate/Volume </li></ul></ul><ul><ul><li>Persona </li></ul></ul>
    19. 20. Logged Information - Response <ul><li>Input Mode </li></ul><ul><ul><li>Speech </li></ul></ul><ul><ul><li>DTMF </li></ul></ul><ul><li>Grammar </li></ul><ul><ul><li>Content (coverage) </li></ul></ul><ul><ul><li>Rule weights </li></ul></ul><ul><ul><li>Pronunciations </li></ul></ul><ul><li>Confirmation Threshold </li></ul><ul><li>SR configuration </li></ul><ul><ul><li>Speech Detection </li></ul></ul><ul><ul><li>Rejection Threshold </li></ul></ul><ul><ul><li>Silence Timeout </li></ul></ul><ul><ul><li>Endsilence </li></ul></ul><ul><ul><li>Decoder … </li></ul></ul><ul><ul><li>Acoustic Models … </li></ul></ul>
    20. 22. Voice User Interface (VUI) <ul><li>Allows for human interaction with computers through a voice/speech platform </li></ul><ul><li>VUI is the interface to any speech application </li></ul><ul><li>Drive to make them conversational </li></ul><ul><li>Instead of Browser Incompatibility you have dialect incompatibility. </li></ul><ul><li>Not all business processes are suited to VUIs. </li></ul><ul><ul><li>Some are too complex </li></ul></ul><ul><ul><li>Sometimes automation is impossible or impractical </li></ul></ul>
    21. 23. Grammars <ul><li>Best practice: constrain the grammar as much as possible. </li></ul><ul><li>Good prompt design guides the caller to use in-grammar responses. </li></ul><ul><li>Out-of-grammar (OOG) responses are handled with more explicit prompting to elicit in-grammar response. </li></ul>
    22. 24. VUI Design Best Practices <ul><li>Use DTMF for long numbers </li></ul><ul><li>Don’t use open ended prompts </li></ul><ul><li>Don’t repeat prompts </li></ul><ul><li>Focus on grammar accuracy </li></ul><ul><li>If natural dialogs fail, fall back to directed dialog </li></ul><ul><li>Always confirm what was recognized </li></ul><ul><li>Generate prompts based on recognition confidence scores. </li></ul><ul><li>Bail out if too many errors occur </li></ul><ul><li>Keep text-to-speech output to a minimum </li></ul><ul><li>Be aware of human memory </li></ul><ul><li>“ Platinum Rule” </li></ul><ul><li>Let the Caller Drive </li></ul>
    23. 25. Use DTMF for Long Numbers <ul><li>Limit spoken digits to 4 or less </li></ul><ul><li>This rule is often broken for: </li></ul><ul><ul><li>Credit Card Numbers </li></ul></ul><ul><ul><li>Social Security Numbers </li></ul></ul><ul><ul><li>Bank Account Numbers </li></ul></ul><ul><ul><li>Telephone Numbers </li></ul></ul><ul><li>DON’T Break This Rule!!! </li></ul><ul><li>Remember customer privacy! </li></ul>
    24. 26. Don’t Use Open Ended Prompts <ul><li>BAD: “Hello, thank you for calling Tulsa Techfest. May I help you? </li></ul><ul><li>BETTER: “Hello, thank you for calling Tulsa Techfest, would you like to hear about today’s speakers? </li></ul>
    25. 27. Don’t Repeat Prompts <ul><li>Callers will tend to repeat the same response you did not understand the first time, when prompts are repeated </li></ul><ul><li>Provide Escalated Help </li></ul>
    26. 28. Focus on Grammar Accuracy <ul><li>Spend time TUNING and REFINING your grammars </li></ul><ul><li>Accuracy is IMPERATIVE </li></ul><ul><li>To reduce recognition failures: </li></ul><ul><ul><li>Create prompts that make it clear what the user can and should say </li></ul></ul><ul><ul><li>Test grammars with many different utterances from several people </li></ul></ul><ul><ul><li>Record incoming calls once the system is in production and use this information to continually tune the grammars. </li></ul></ul><ul><li>Watch for dialects! </li></ul>
    27. 29. If Natural Dialogs Fail, Fall back to Directed Dialog <ul><li>Natural Dialogs are great, but they have a higher rate of failure. </li></ul><ul><li>Don’t want to frustrate the user </li></ul>
    28. 30. Always Confirm What Was Recognized <ul><li>Mismatches are common </li></ul><ul><ul><li>Austin/Boston </li></ul></ul><ul><ul><li>Sharp/Shark </li></ul></ul><ul><ul><li>Brittney Spears/Kevin Federline </li></ul></ul><ul><li>Even for grammars with low ambiguity it’s important to confirm your recognition </li></ul><ul><li>Implicit confirmation </li></ul><ul><ul><li>Ok Jason, Are you coming to Techfest? </li></ul></ul><ul><li>QA Control makes it easy to provide confirmation </li></ul>
    29. 31. Generate Prompts Based on Recognition Confidence Scores <ul><li>Speech recognition errors are common </li></ul><ul><li>How to handle? </li></ul><ul><ul><li>Changing prompts </li></ul></ul><ul><ul><li>Falling back to directed dialogs </li></ul></ul><ul><ul><li>Transferring to operator </li></ul></ul><ul><li>Humans change their interaction based on perceived confidence, whether implicitly or explicitly </li></ul><ul><li>N-Best lists are of great value here </li></ul>
    30. 32. Confidence Scores & N-Best Lists <ul><li>The recognition engine returns a confidence score along with a result </li></ul><ul><li>The recognition engine can return several “guesses” of what it understood. </li></ul><ul><li>You tell the engine to return up to N guesses. </li></ul>
    31. 33. Skip Lists <ul><li>Skip List is a type of N-Best processing </li></ul><ul><li>Keep track of results that caller has confirmed ‘no’ to, and don’t ask again. </li></ul>
    32. 34. Bail Out If Too Many Errors <ul><li>Don’t make your customer become a “0” (zero) jammer </li></ul><ul><li>Transfer to a live person if they error out more than twice </li></ul><ul><li>Remember, some people have speech impediments, or patterns that may not correlate well into recognition confidence. </li></ul><ul><li>Find the threshold! (This takes testing) </li></ul>
    33. 35. Keep TTS Output to a Minimum <ul><li>Does not sound professional </li></ul><ul><li>Hire a voice talent.. The payoff will justify the upfront cost </li></ul><ul><li>Can use as a fall back for data or prompts that need to be dynamic </li></ul>
    34. 36. Be Aware of Human Memory <ul><li>Make lists short </li></ul><ul><li>No more than 5 items </li></ul><ul><li>Present large lists in chunks </li></ul><ul><li>Make the prompts short </li></ul>
    35. 37. Platinum Rule <ul><li>Treat users as they want to be treated, not how you want to be treated </li></ul><ul><li>Step into their shoes </li></ul><ul><li>Use vocabulary they understand </li></ul>
    36. 38. Let The Caller Drive <ul><li>Provide instant gratification (let’s the caller get in a zone, and they enjoy the experience due to small successes) </li></ul><ul><li>Only ask for what you need, not everything at once. </li></ul>
    37. 39. VUI Design is a Science <ul><li>Design before development </li></ul><ul><li>Wizard of Oz Testing </li></ul><ul><li>Find balance between business requirements and the caller experience </li></ul><ul><li>Run usability trials on test subjects to validate your design </li></ul><ul><li>Use a pilot to trial the application. If caller behavior is not as expected, make adjustments. </li></ul>
    38. 40. Demos
    39. 42. Additional Information <ul><li>http://www.microsoft.com/speech </li></ul><ul><li>http://www.microsoft.com/uc </li></ul><ul><li>http://www.gotspeech.net </li></ul><ul><li>http://www.nuance.com </li></ul><ul><li>https://www.intervoice.com/ </li></ul><ul><li>http://www.tellme.com/ </li></ul><ul><li>http://www.vuidesign.org/ </li></ul>
    40. 43. Further Resources <ul><li>My Blog </li></ul><ul><ul><li>http://www.okcodemonkey.com </li></ul></ul><ul><li>Linkedin </li></ul><ul><ul><li>http://www.linkedin.com/in/okcodemonkey </li></ul></ul><ul><li>Bartlesville .NET User Group </li></ul><ul><ul><li>http://www.bdnug.com </li></ul></ul><ul><li>Twitter </li></ul><ul><ul><li>http://twitter.com/okcodemonkey </li></ul></ul><ul><li>Email </li></ul><ul><ul><li>[email_address] </li></ul></ul>
    41. 44. Key Terms
    42. 45. Voice Dialogue
    43. 47. Voice Browser <ul><li>“ Web Browser” that presents and IVR VUI to the user </li></ul><ul><li>Provides interface to the PSTN or a PBX </li></ul><ul><li>Works with Voice Dialogues (were web browsers work with HTML/XHMTL) </li></ul><ul><li>Presents information aurally via: </li></ul><ul><ul><li>Text-To-Speech </li></ul></ul><ul><ul><li>Prerecorded prompts </li></ul></ul><ul><li>Obtains information through: </li></ul><ul><ul><li>Speech Recognition </li></ul></ul><ul><ul><li>DTMF detection </li></ul></ul>
    44. 48. Speech Recognition <ul><li>Converts spoken words to machine readable input </li></ul>
    45. 49. DTMF (Dual-tone Multi-Frequency) <ul><li>Used for telephone signaling over the line in the voice-frequency band to the call switching center. </li></ul><ul><li>Standardardized ny the ITU-T Recommendation Q.23 </li></ul>
    46. 50. Text-To-Speech (Speech Synthesis) <ul><li>Artificial production of human speech </li></ul><ul><li>Computer used is called the speech synthesizer </li></ul><ul><li>Can be implemented in software or hardware </li></ul><ul><li>Converts normal language text into speech </li></ul>
    47. 51. PSTN (Public Switched Telephone Network) <ul><li>Network of the world’s public circuit switched telephone networks </li></ul><ul><li>Similar to the way the Internet is the network of the world’s public IP-based packet-switched networks. </li></ul><ul><li>Originally a network of fixed-line analog telephone systems </li></ul><ul><li>Now almost completely digital and includes mobile phones </li></ul><ul><li>Governed by technical standards created by the ITU-T, and uses E.163/E.164 addresses (telephone numbers) </li></ul>
    48. 52. ITU-T (International Telecommunication Union Standardization Sector) <ul><li>Coordinates standards for telecommunications on behalf of the International Telecommunications Union </li></ul><ul><li>Based in Geneva, Switzerland </li></ul><ul><li>Original work dates back to 1865, with the birth of the International Telegraph Union </li></ul><ul><li>Became a United Nations specialized agency in 1947 </li></ul>
    49. 53. ITU (International Telecommunication Union) <ul><li>Established to standardize and regulate international radio and telecommunications. </li></ul><ul><li>Founded as the International Telegraph Union on May 17, 1865 in Paris </li></ul><ul><li>Main tasks include standardization, allocation of the radio spectrum, and organizing interconnection agreements between countries </li></ul>
    50. 54. PBX (Private Branch Exchange) <ul><li>Is a telephone exchange that serves as a particular business or office, as opposed to one that a common carrier or telephone company operates for many businesses </li></ul>

    ×