Voice User Interface for Mobile Applications

4,043 views

Published on

Android Speech Recognition and Text-To-Speech - How to voice-enable your mobile application
"What does a weasel look like?" We are taking a closer look at Android's Speech-To-Text (STT) and Text-To-Speech (TTS) capabilities - and will develop and deploy three small apps, each a little more capable, and finally walk through the steps of building a voice controlled assistant.

Android uses Google's Speech-To-Text engine in the cloud but has Text-To-Speech capabilities baked right into Android since Android 2.0 (Donut), using SVOX Pico with six language packages (US and UK English, German, French, Italian and Spanish).

While Speech Recognition, Interpretation, and Text-To-Speech Synthesizer are addressed by phone equipment- and OS makers, the core problem of how to capture knowledge and make it accessible to smart software agents is ignored and all service like SIRI or Google Voice Actions remain closed, i.e. not easily extendable with 3rd party information/knowledge.

Published in: Technology
  • Be the first to comment

Voice User Interface for Mobile Applications

  1. 1. Building a Voice User Interface Android Speech Recognition and Text-To-SpeechTuesday, January 29, 13
  2. 2. Building a Voice User Interface Android Speech Recognition and Text-To-SpeechTuesday, January 29, 13
  3. 3. http://wolfpaulus.comTuesday, January 29, 13
  4. 4. Star Trek © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  5. 5. Tuesday, January 29, 13
  6. 6. Red Planet © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  7. 7. Tuesday, January 29, 13
  8. 8. 2001 Space Odyssey © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  9. 9. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  10. 10. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  11. 11. If a computer could think, how could we tell? © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  12. 12. In 1950, Alan Turing suggested: “If the responses from the computer were indistinguishable from that of a human, the computer could be said to be thinking.” © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  13. 13. Loebner Prize Solid 18 Carat Gold Medal Grand Prize of $100,000 and a Gold Medal for the first computer whose responses were indistinguishable from a humans. Each year an annual prize of $2000 and a bronze medal is awarded to the most human-like computer. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  14. 14. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  15. 15. How can we create a Chat bot ? © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  16. 16. access Web Service perform on Device Capture Speech Input ai Convert Speech into Text Create Text Response Message or Msg Command ? Cmd Synthesize Voice Execute (Message) Command Message or Msg Action ? Speek Action Message Perform Action © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  17. 17. Echo Bot access Web Service perform on Device Capture Speech Input Convert Speech into Text Synthesize Voice (Message) Speek Message © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  18. 18. Capture Speech Input void startVoiceRecognitionActivity() { Intent intent = new Intent( RecognizerIntent.ACTION_RECOGNIZE_SPEECH ); intent.putExtra( RecognizerIntent.EXTRA_PROMPT, "Speak to Bot1"); intent.putExtra( RecognizerIntent.EXTRA_MAX_RESULTS, 1); intent.putExtra( RecognizerIntent.EXTRA_CALLING_PACKAGE, getClass().getPackage().getName() ); intent.putExtra( RecognizerIntent.EXTRA_LANGUAGE_MODEL, ! ! ! ! ! !                          RecognizerIntent.LANGUAGE_MODEL_FREE_FORM ); startActivityForResult( intent, VOICE_RECOGNITION_REQUEST_CODE ); } © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  19. 19. ... speech has been converted into text ...@Overrideprotected void onActivityResult( int requestCode, int resultCode, Intent data ) { switch (requestCode) { case VOICE_RECOGNITION_REQUEST_CODE: if (resultCode == RESULT_OK) { ArrayList<String> matches = data.getStringArrayListExtra( RecognizerIntent.EXTRA_RESULTS ); say( matches.get(0) ); } else { mTV_STT.setText(""); } break; }} © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  20. 20. Convert Speech into Text t Capture Speech Input © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  21. 21. Synthesize Voice (Message) private TextToSpeech mTts; .. mTts = new TextToSpeech( this, this ); !    // Context, TextToSpeech.OnInitListener .. // Implement TextToSpeech.OnInitListener @Override public void onInit( final int status ) { ! if ( status == TextToSpeech.SUCCESS && mTts != null ) { ! ! startVoiceRecognitionActivity(); ! ! mTts.setOnUtteranceCompletedListener( new TextToSpeech.OnUtteranceCompletedListener() { ! ! ! @Override ! ! ! public void onUtteranceCompleted( final String s ) { ! ! ! ! startVoiceRecognitionActivity(); ! ! ! }}); ! } else { ! ! mTV_TTS.setText("Could not initialize TextToSpeech."); ! } } © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  22. 22. Speek Message private void say(final String s) { final HashMap<String, String> map = new HashMap<String, String>(1); map.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, UTTERANCE_ID); mTts.speak(s, TextToSpeech.QUEUE_FLUSH, map); } © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  23. 23. Code and Demo access Web Service perform on Device Capture Speech Input Convert Speech into Text Synthesize Voice (Message) Speek Message © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  24. 24. “stock quote for ...” Stock Quote Bot access Web Service perform on Device Capture Speech Input Convert Speech into Text Execute Command Synthesize Voice (Message) Speek Message © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  25. 25. ... speech has been converted into text ... @Override protected void onActivityResult( int requestCode, int resultCode, Intent data ) { switch (requestCode) { case VOICE_RECOGNITION_REQUEST_CODE: if (resultCode == RESULT_OK) { ArrayList<String> matches = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS); say( matches.get(0) ); } else { mTV_STT.setText(""); } break; } } © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  26. 26. ... speech has been converted into text ... @Override protected void onActivityResult( int requestCode, int resultCode, Intent data ) { switch (requestCode) { case VOICE_RECOGNITION_REQUEST_CODE: if (resultCode == RESULT_OK) { ArrayList<String> matches = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS); final String s = matches.get(0); final int k = s.indexOf( KEY_WORD ); ! if (0 <= k) { ! final String ticker = s.substring( k + KEY_WORD.length() ).trim(); ! ! ! new YQuote(mHandler).execute( ticker ); ! ! } else { say( s ); } } else { mTV_STT.setText(""); } break; } } © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  27. 27. Code and Demo access Web Service perform on Device Capture Speech Input Convert Speech into Text Execute Command Synthesize Voice (Message) Speek Message © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  28. 28. Create Text Response What is cheese ? What is chocolate? How old are you? Who is the President? Where is Atlantis? What’s up? Did you have dinner already? © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  29. 29. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  30. 30. Artifical Intelligence Markup Language (AIML) <?xml version="1.0" encoding="ISO-8859-1"?> <aiml> <category> <pattern>WHAT IS AIML</pattern> <template> AIML is short for Artifical Intelligence Markup Language </template> </category> </aiml> © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  31. 31. <?xml version="1.0" encoding="ISO-8859-1"?> <aiml> <category> <pattern>TELL ME WHAT AIML IS</pattern> <template> <srai>WHAT IS AIML</srai> </template> </category> </aiml> © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  32. 32. <?xml version="1.0" encoding="ISO-8859-1"?> <aiml> <category> <pattern>WHAT IS AIML</pattern> <template> <random> <li>First response</li> <li>Second response</li> <li>3rd response</li> </random> </template> </category> </aiml> © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  33. 33. <?xml version="1.0" encoding="ISO-8859-1"?> <aiml> <category> <pattern>TELL ME WHAT * IS</pattern> <template> I dont know what <star/> is. </template> </category> </aiml> © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  34. 34. AIML Spec. • http://www.alicebot.org/TR/2011/ AIML Primer • http://www.alicebot.org/documentation/aiml-primer.html © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  35. 35. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  36. 36. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  37. 37. Program D v4.6 last updated: 14-Mar-2006 http://aitools.org/ © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  38. 38. CharlieBot 4.1.8 last updated: 14-Dec-2002 http://sourceforge.net/projects/charliebot/ Forked from Program D v4.1.5 works on Mac OS X or any Java 1.3 or better VM © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  39. 39. ChatterBean last updated: 11-May-2006 http://www.geocities.ws/phelio/chatterbean/ ChatterBean is an AIML interpreter (also known as "Alicebot") written in pure Java. Fully AIML 1.0.1 compliant © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  40. 40. AIML Sets • http://aitools.org/Free_AIML_sets • http://code.google.com/p/aiml-en-us-foundation-alice/ • http://www.square-bear.co.uk/aiml/ © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  41. 41. http://myAIMLServer:PORT/talk?botid=xyz.. HTML: XML-RPC: http://myAIMLServer:PORT/talk-xml HTTP-POST RESPONSE: botid=”xzy..” <result status="0" input=”Hello” botid="xyz.." custid=”d22..” custid="d2228e2eee12d255"> <input>Hello</input> <that>Hi there!</that> </result> © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  42. 42. access Web Service perform on Device Capture Speech Input Convert Speech into Text Create Text Response Message or Command ? Cmd Msg Execute Command Msg Synthesize Voice (Message) Speek Message © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  43. 43. Speech has been converted into text ... @Override protected void onActivityResult(final int requestCode, final int resultCode, final Intent data) { ! switch (requestCode) { ! ! case VOICE_RECOGNITION_REQUEST_CODE: ! ! if (resultCode == RESULT_OK) { ! ! ! final ArrayList<String> matches = data.getStringArrayListExtra( RecognizerIntent.EXTRA_RESULT ! ! ! final String s = matches.get(0); ! ! ! new AIML_RPC(mHandler).execute(s); ! ! ! mTV_STT.setText(s); ! ! } else { ! ! ! mTV_STT.setText(""); ! ! } ! ! break; ! } } © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  44. 44. @Override public void handleMessage(final Message msg) { Handler ! if (msg.getData() != null && !msg.getData().isEmpty()) { ! ! String s = msg.getData().getString(Bot3.BUNDLE_KEY_NAME_FOR_MSG); ! ! if (s != null && 0 < s.length()) { ! ! ! int i = s.indexOf("CMD "); ! ! ! if (0 <= i) { ! ! ! ! s = s.substring(i + 4, s.endsWith(".") ? s.length() - 1 : s.length()); ! ! ! ! String cmd; ! ! ! ! int k = s.indexOf(" "); ! ! ! ! if (0 < k) { ! ! ! ! ! cmd = s.substring(0, k); ! ! ! ! ! s = s.substring(k + 1).replace(" ", ""); ! ! ! ! ! if (Bot3.KEY_WORD.equals(cmd)) { ! ! ! ! ! ! new YQuote(mHandler).execute(s); ! ! ! ! ! } ! ! ! ! } ! ! ! } else { ! ! ! ! Bot3.this.say(s); ! ! ! } ! ! } ! } } © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  45. 45. Code and Demo access Web Service perform on Device Capture Speech Input Convert Speech into Text Create Text Response Message or Command ? Cmd Msg Execute Command Msg Synthesize Voice (Message) Speek Message © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  46. 46. SummaryTuesday, January 29, 13
  47. 47. access Web Service perform on Device Capture Speech Input Convert Speech into Text Synthesize Voice (Message) Speek Message © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  48. 48. access Web Service perform on Device Capture Speech Input Convert Speech into Text Execute Command Synthesize Voice (Message) Speek Message © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  49. 49. access Web Service perform on Device Capture Speech Input Convert Speech into Text AIML Bot Create Text Response Message or Command ? Cmd Msg Execute Command Msg Synthesize Voice (Message) Speek Message © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  50. 50. Cora, your imaginary friend Techcasita Productions © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  51. 51. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  52. 52. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  53. 53. © 2012-2013 Wolf Paulus - http://wolfpaulus.comTuesday, January 29, 13
  54. 54. Tuesday, January 29, 13

×