Successfully reported this slideshow.
Your SlideShare is downloading. ×

Add more Speech API to your bot

More Related Content

Similar to Add more Speech API to your bot

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Add more Speech API to your bot

  1. 1. You should add more Speech API into your Bot @GosiaBorzecka
  2. 2. About me @GosiaBorzecka
  3. 3. Agenda Create your bot.. And add more intelligent! With Speech API? Azure Speech API (Cognitive Services) Summary @GosiaBorzecka
  4. 4. Your bot
  5. 5. Your bot  Answer common questions  Making appointment in your system  Automate helpdesk  Help with interview process  Finding events near you  Showing latest updates about your company/organisation  Helping plan your day/meeting  Finding people that works on similar project @GosiaBorzecka
  6. 6. The basic (and most popular) bot QnA @GosiaBorzecka
  7. 7. Add some intelligent
  8. 8. But which one? Microsoft (Cognitive Services) Google (Cloud) AWS IBM Watson  Mozilla  Speechmatics  Neospeech  iSpeech  …. @GosiaBorzecka
  9. 9. But which one? Microsoft (Cognitive Services) Google (Cloud) AWS IBM Watson Speech Natural Language Bot Custom Model @GosiaBorzecka
  10. 10. Intelligent API providers Microsoft (Cognitive Google (Cloud) AWS IBM Watson Speech  Speech to Text  Text to Speech  Speech translation Natural Language  LUIS Bot  Bot Framework Custom Model  LUIS @GosiaBorzecka
  11. 11. Intelligent API providers Microsoft (Cognitive Services) Google (Cloud) AWS IBM Watson Speech  Speech-to-Text  Text-to-Speech  Translation Natural Language  Google NLP Bot  Dialogflow Custom Model  Auto ML @GosiaBorzecka
  12. 12. Intelligent API providers Microsoft (Cognitive Services) Google (Cloud) AWS IBM Watson Speech  Amazon Polly (Text-to- Speech)  Amazon Transcribe (Speech-to-Text)  Amazon Translate Natural Language  Amazon Comprehend Bot  Amazon Lex Custom Model  Sage Maker @GosiaBorzecka
  13. 13. Intelligent API providers Microsoft (Cognitive Services) Google (Cloud) AWS IBM Watson Speech  Speech to Text  Text to Speech  Language Translator Natural Language  Natural Language Understanding Bot  Watson Assistant Custom Model  Knowledge Studio @GosiaBorzecka
  14. 14. Which one choose? @GosiaBorzecka
  15. 15. In this case… @GosiaBorzecka
  16. 16. Bot Framework @GosiaBorzecka
  17. 17. @GosiaBorzecka
  18. 18. Bot Framework @GosiaBorzecka
  19. 19. Bot Framework @GosiaBorzecka
  20. 20. Bot Framework @GosiaBorzecka
  21. 21. QnA Maker @GosiaBorzecka
  22. 22. QnA Maker @GosiaBorzecka
  23. 23. QnA Maker @GosiaBorzecka
  24. 24. Call QnA Maker @GosiaBorzecka
  25. 25. LUIS @GosiaBorzecka
  26. 26. LUIS @GosiaBorzecka
  27. 27. LUIS @GosiaBorzecka
  28. 28. Natural Language Processing Intents  None  List  Help  Confirmation  Purchase  Weather Entities  Address  Age  Location  Category  Url  Time @GosiaBorzecka
  29. 29. Call LUIS @GosiaBorzecka
  30. 30. We have a bot with some intelligent… LET IT SPEAK! @GosiaBorzecka
  31. 31. Cognitive Services: Speech Speech-to-Text Text to Speech Speaker Recognition (Preview) Speech translation @GosiaBorzecka
  32. 32. Speech APIs (30 days) @GosiaBorzecka
  33. 33. Pricing (Free)  Speech translation (5h free per month)  Speech-to-Text (5h free per month)  Speech to Text with Custom Speech Model (5h free per month)  Speech endpoint hosting (1 model free per month)  Text to Speech with Standard Voices (5M characters free per month)  Text to Speech with Custom Voice Font (5M characters free per month)  Text to Speech with Neural Voices (0.5M characters free per month) @GosiaBorzecka
  34. 34. Pricing (Standard)  Speech translation (£1.87 per hour)  Speech-to-Text (£0.746 hour)  Speech to Text with Custom Speech Model (£1.044 per hour)  Speech endpoint hosting (£29.82/model/month)  Text to Speech with Standard Voices (£2.982 per 1M chars)  Text to Speech with Custom Voice (£4.472 per 1M chars)  Text to Speech with Neural Voices (£11.925 per 1M chars)  Custom Voice Font hosting (£29.82/model/month) @GosiaBorzecka
  35. 35. SDKs C/C++ C# Java JavaScript/NodeJS Objective-C Python @GosiaBorzecka
  36. 36. Speech-to-Text @GosiaBorzecka
  37. 37. Speech-To-Text  Transcribes continuous real-time speech into text  Can batch transcribe speech from audio recording  Supports intermediate results, end-of-speech detection, automatic text formatting, and profanity masking  Can call on Language Understanding (LUIS) to deliver user intent from transcribed speech @GosiaBorzecka
  38. 38. Speech-To-Text: Custom
  39. 39. Text-to-Speech @GosiaBorzecka
  40. 40. Text to Speech (TTS) Synthesis into human-sounding speech Return as audio file @GosiaBorzecka
  41. 41. Neural Voices GuyNeural (en-US, male) JessaNeural (en-US, female) XiaoxiaoNeural (zh-CN, female) – only Southen Asia @GosiaBorzecka
  42. 42. Standard voices (English UK) Susan-Apollo (female) HazelRUS (female) George-Apollo (male) https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#standard-voices @GosiaBorzecka
  43. 43. Standard voices: other English (Ireland) English (Australia) English (Canada) English (India) English (US) Spanish (Spain) Spanish (Mexico) French (France) French (Canada) French (Switzerland)
  44. 44. Text to Speech (SDK, C#) @GosiaBorzecka
  45. 45. Text to Speech (SDK, C#)
  46. 46. Speech translation @GosiaBorzecka
  47. 47. Speech translation  End-to-end, real time, multilanguage transaction  Can be used to:  Speech to speech  Speech to text  Technology:  SMT (Statistical Machine Translation)  NMT (Neural Machine Translation) @GosiaBorzecka
  48. 48. Neural machine translation https://www.microsoft.com/en-us/translator/business/machine-translation/ @GosiaBorzecka
  49. 49. How does speech translation work? https://www.microsoft.com/en-us/translator/business/machine-translation/ @GosiaBorzecka
  50. 50. Speaker Recognition (Preview) @GosiaBorzecka
  51. 51. Speaker Recognition (Preview) Speaker Verification Speaker Identification @GosiaBorzecka
  52. 52. Speaker Recognition (Preview) SDK • Android (Java) • Windows (C#) @GosiaBorzecka
  53. 53. Speaker Recognition (Preview)  Audio format requirements: Container WAV Encoding PCM Rate 16K Sample Format 16 bit Channels Mono @GosiaBorzecka
  54. 54. Create Profile @GosiaBorzecka
  55. 55. Supported locale es-ES (Castilian Spanish) en-US (American English) fr-FR (Standard French) zh-CN (Mandarin Chinese) @GosiaBorzecka
  56. 56. Let’s add Speech to the bot! @GosiaBorzecka
  57. 57. Configure bot @GosiaBorzecka
  58. 58. Bot configuration: default.htm @GosiaBorzecka
  59. 59. Bot configuration: default.htm @GosiaBorzecka
  60. 60. Bot configuration: EchoBot.cs @GosiaBorzecka
  61. 61. Speech priming  Improve speech recognition accuracy for important words  Only U.S. regional LUIS apps  Applies to:  Cortana Channels  Web Channels @GosiaBorzecka
  62. 62. @GosiaBorzecka
  63. 63. Ok, ok.. But why add Speech API? @GosiaBorzecka
  64. 64. What most popular chatbots (with Speech API) do you know? Siri Google Now Alexa Cortana Facebook @GosiaBorzecka
  65. 65. Use Speech API for: DIFFICULT WITH SEEING DIFFICULT WITH TYPING SAVE TIME WITH TYPING AUTOMATIC TRANSLATION DETECT EMOTIONS @GosiaBorzecka
  66. 66. Who using Speech API?
  67. 67. This Photo by Unknown Author is licensed under CC BY-SA
  68. 68. Not bot but..
  69. 69. Are you going to use Speech API with your bot?
  70. 70. Thank you Dziękuję 谢谢 Dankie धन्यवाद ‫תודה‬ Mesi Ευχαριστούμε Questions? Je vous remercie Kiitos Salamat sa iyo Vinaka vakalevu Aitäh Dank u Tak Děkuju Hvala ti 謝謝 Gràcies 多謝 Благодаря Hvala ধন্যবাদ ‫شكرا‬ačiū paldies 감사합니다 Asante ありがとう Grazie Terima kasih köszönöm Þakka þér ua tsaug Спасибо vă mulţumesc Jamädi Obrigado ‫متشکرم‬ takk grazzi terima kasih Misaotra anaoGracias ďakujem Hvala ti Хвала ти Faafetai Tack நன்றி Mauruuru ia oe ขอบคุณ ధన్యవాదాలుFakamalo atu 'i ho'o ‫شکريا‬ Дякую Teşekkür ederiz Níib óolal Diolch Cảm ơn bạn @GosiaBorzecka

Editor's Notes

  • Plan
    As with any type of software, having a thorough understanding of the goals, processes and user needs is important to the process of creating a successful bot. Before writing code, review the bot design guidelines for best practices and identify the needs for your bot. You can create a simple bot or include more sophisticated capabilities such as speech, natural language understanding,and question answering.
    Build
    Your bot is a web service that implements a conversational interface and communicates with the Bot Framework Service to send and receive messages and events. Bot Framework Service is one of the components of the Azure Bot Service. You can create bots in any number of environments and languages. You can start your bot development in the Azure portal, or use [C# | JavaScript] templates for local development.
    As part of the Azure Bot Service, we offer additional components you can use to extend your bot's functionality
    FeatureDescriptionLinkAdd natural language processingEnable your bot to understand natural language, understand spelling errors, use speech, and recognize the user's intentHow to use LUISAnswer questionsAdd a knowledge base to answer questions users ask in a more natural, conversational wayHow to use QnA MakerManage multiple modelsIf using more than one model, such as for LUIS and QnA Maker, intelligently determine when to use which one during your bot's conversationDispatch toolAdd cards and buttonsEnhance the user experience with media other than text, such as graphics, menus, and cardsHow to add cards
     Note
    The table above is not a comprehensive list. Explore the articles on the left, starting with sending messages, for more bot functionality.
    Additionally, we provide command line tools to help you to create, manage, and test bot assets. These tools can manage a bot configuration file, configure LUIS apps, build a QnA knowledge base, mock a conversation, and more. You can find more details in the command line tools readme.
    You also have access to a variety of samples that showcase many of the capabilities available through the SDK. These are great for developers looking for a more feature rich starting point.
    Test
    Bots are complex apps, with a lot of different parts working together. Like any other complex app, this can lead to some interesting bugs or cause your bot to behave differently than expected. Before publishing, test your bot. We provide several ways to test bots before they are released for use:
    Test your bot locally with the emulator. The Bot Framework Emulator is a stand-alone app that not only provides a chat interface, but also debugging and interrogation tools to help understand how and why your bot does what it does. The emulator can be run on a locally alongside your in-development bot application.
    Test your bot on the web. Once configured through the Azure portal your bot can also be reached through a web chat interface. The web chat interface is a great way to grant access to your bot to testers and other people who do not have direct access to the bot's running code.
    Publish
    When you are ready for your bot to be available on the web, publish your bot to Azure or to your own web service or data center. Having an address on the public internet is the first step to your bot coming to life on your site, or inside chat channels.
    Connect         
    Connect your bot to channels such as Facebook, Messenger, Kik, Skype, Slack, Microsoft Teams, Telegram, text/SMS, Twilio, Cortana, and Skype. Bot Framework does most of the work necessary to send and receive messages from all of these different platforms - your bot application receives a unified, normalized stream of messages regardless of the number and type of channels it is connected to. For information on adding channels, see channels topic.
    Evaluate
    Use the data collected in Azure portal to identify opportunities to improve the capabilities and performance of your bot. You can get service-level and instrumentation data like traffic, latency, and integrations. Analytics also provides conversation-level reporting on user, message, and channel data. For more information, see how to gather analytics.
  • The steps neural network translations go through are the following:
    Each word, or more specifically the 500-dimension vector representing it, goes through a first layer of “neurons” that will encode it in a 1000-dimension vector (b) representing the word within the context of the other words in the sentence.
    Once all words have been encoded one time into these 1000-dimension vectors, the process is repeated several times, each layer allowing better fine-tuning of this 1000-dimension representation of the word within the context of the full sentence (contrary to SMT technology that can only take into consideration a 3 to 5 words window)
    The final output matrix is then used by the attention layer (i.e. a software algorithm) that will use both this final output matrix and the output of previously translated words to define which word, from the source sentence, should be translated next. It will also use these calculations to potentially drop unnecessary words in the target language.
    The decoder (translation) layer, translates the selected word (or more specifically the 1000-dimension vector representing this word within the context of the full sentence) in its most appropriate target language equivalent. The output of this last layer (c) is then fed back into the attention layer to calculate which next word from the source sentence should be translated.
  • Identification – who is speaking in the group of people
    Verification – verify and authenticate user using their voice or speech

×