Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product Sing


Published on

It has been said of the Beatles that the whole (the band) is greater than the sum of the parts (the band members). The same can hold true for open source software. This talk explores combining disparate open source technologies, backed by a Rasa "brain", to yield amazing results, explored through building a phone-based voice receptionist.

The open source ecosystem represents a suite of great standalone technologies. Combining them in a product can yield even more amazing results
Rasa provides the much-needed flexibility for your system to react and adapt to the real world.
Leveraging open source (Rasa included) allows you to spend more time on the most interesting parts of your product.

Josh Converse is the founder of Dynamic Offset, a boutique consulting firm specializing in mobile, web, and conversational experiences. Prior to consulting he held tech lead roles at both Google and Apple.

Published in: Technology
  • Login to see the comments

Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product Sing

  1. 1. Three Part Harmony: How Rasa and Open Source Can Make Your Product Sing Josh Converse Founder, Dynamic Offset Rasa Developer Summit - 2019
  2. 2. Three Part Harmony Josh Converse • Dynamic Offset How Rasa and Open Source Can Make Your Product Sing
  3. 3. Demo Wanted to provide a digital phone receptionist that could perform routine tasks on behalf of the business. It needed to: ● Have conversations just like a human would ● React to “curveballs” ● Take action on behalf of the user ● Act autonomously
  4. 4. At a conceptual level, what’s going on?
  5. 5. The telephony carrier receives a call from the regular phone network and starts a VoIP call to our system
  6. 6. The VoIP call is answered by our system, and the call audio is streamed to a speech-to-text system for transcription
  7. 7. The speech-to-text system converts the audio stream into text, forwarding that text to the agent for handling
  8. 8. The agent will interpret the text in the context of the overall conversation, ultimately taking action
  9. 9. The spoken text is classified into known, structured data called an intent
  10. 10. The classified intent is evaluated relative to the conversation as a whole (e.g. previous responses, conversational norms, etc)
  11. 11. The agent decides how to react to the structured intent (e.g. spoken response, database access, etc)
  12. 12. Having decided what to say and do, the agent provides a textual response
  13. 13. Text from the agent is synthesized into an audio stream using a text-to-speech system, and sent to the phone call
  14. 14. The system takes the audio of the agent’s response and feeds it into the ongoing VoIP call.
  15. 15. The VoIP Provider takes care of bridging the audio between the VoIP call and the regular phone network
  16. 16. More Detail!
  17. 17. Twilio receives a call from the regular phone network and starts a VoIP call to our Kamailio server
  18. 18. Kamailio routes the call to an Asterisk server which auto-answers the call. It taps into the incoming audio stream and sends it off for transcription
  19. 19. Google Cloud Speech-To-Text transcribes the audio and the results are sent to the Rasa agent
  20. 20. The Rasa agent will ultimately handle interpreting the text and taking action based on the current state of the conversation
  21. 21. First, Rasa NLU classifies the raw text into structured intents - e.g. inform_name, request_time_slot
  22. 22. Then, Rasa Core will evaluate that intent in the context of the entire conversation
  23. 23. Rasa Core will emit one or more actions that need to be performed in response to the conversation. In this example, it’s a query to MongoDB followed by a spoken response
  24. 24. The agent’s textual response is sent to Amazon Polly text-to-speech for synthesis.
  25. 25. Amazon Polly synthesizes the audio and forwards it to Asterisk to be played on the ongoing call
  26. 26. Asterisk injects the audio stream from Amazon Polly into the VoIP call
  27. 27. Twilio bridges the audio from Asterisk back to the regular phone network
  28. 28. This conversation between the customer and the agent continues in a loop for the duration of the call.
  29. 29. Even More Detail!
  30. 30. Golden Age Of Open Source Software
  31. 31. How can you run all this distributed software reliably?
  32. 32. Kubernetes “Kubernetes provides a container-centric management environment. It orchestrates computing, networking, and storage infrastructure on behalf of user workloads.”
  33. 33. X
  34. 34. Single-process systems can’t do the job and hand-run clusters can be painful.
  35. 35. Kubernetes Manages The Fleet So You Don’t Have To
  36. 36. Distributed Environments: Using Rasa Core As An Orchestrator
  37. 37. Rasa Core “Rather than a bunch of if/else statements, [your bot] uses a machine learning model trained on example conversations to decide what to do next.”
  38. 38. Rasa Core - Training Rasa Core training examples are a “historical record” of a past interaction – a blow-by-blow recounting of a known-good encounter. Three parts: ● Stimuli from the user (Responses, Button Clicks, etc) ● Actions taken by the agent. ● Context (Slots, History, etc)
  39. 39. Rasa Core - Training With training, the agent learns which actions to take based on stimuli & context. When presented with something wholly unseen, the agent will “improvise” using the tools (actions) it has available.
  40. 40. Rasa Core - What are actions? Actions are the “abilities” available to your agent. ● You write these yourself ● Reference them in training data ● Can influence the state of the conversation The agent may, based on its training, choose to run one or more actions in response to stimuli.
  41. 41. Rasa - Training Sample * request_menu{“restaurant”: “foo”} - action.restaurant_search - slot{“found_restaurants”: 2} - action.request_disambiguation * inform_location{“location”: “blah”} - action.restaurant_search - slot{“restaurant_id”: “12345”} - action.menu_lookup - slot{“menu_id”: “98765”} - action.prompt_menu_send * affirm - action.send_menu_text So the agent sends out the menu. * 👩 Asked for menu of restaurant - 🤖 Search db for restaurants - (Found 2 restaurants) - 🤖 Ask user to choose * 👩 Responded with their location - 🤖 Search db for restaurants - (Found a restaurant) - 🤖 Look up their menu - (Menu lookup success) - 🤖 Ask if ok to send menu * 👩 Yes it’s ok - 🤖 Send menu (SMS/Email)
  42. 42. Rasa = Flexibility With training, you can drive your whole system’s behavior if you have an expressive vocabulary of actions (As opposed to writing imperative code). Rasa can form the “brain” of the system – giving instructions (actions) that the other parts of the system carry out. This is the magic.
  43. 43. Rasa Core = System Flexibility
  44. 44. General Building your own Duplex AI agent using Rasa and Twilio Twine on Github (coming soon) Kubernetes Resources Kubernetes Tutorials What is Kubernetes? An old (but good) overview Google Kubernetes Engine
  45. 45. Appendix
  46. 46. Distributed Rasa Actions
  47. 47. Rasa + Distributed Actions don’t have to reside on the same host as the Rasa agent. Kubernetes makes this easy to do.
  48. 48. Attributions gpu by Phonlaphat Thongsriphong from the Noun Project