Thanks to the recently released v4 of the Bot Framework SDK, creating your first bot is a breeze; still, implementing a production viable one is no easy task since several aspects must be taken into account such as user authentication, integration within existing apps, multi language support, technical considerations (e.g.: Azure Functions vs. MVC Core, Blob Storage vs. CosmosDB) and, last but not least, operational costs.
Moreover, you might want to reuse your bot’s Azure hosted, Cognitive Services-backed code to address Amazon’s Alexa users to avoid the need to implement (and evolve) it twice.
Eager to learn how to do that for real? Don’t miss this code-based talk then.
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Azure Cognitive Services.pdf
1. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Implementing bots and Alexa Skills
using
Azure Cognitive Services
Andrea Saltarello
CTO @ Managed Designs
Microsoft Regional Director – Microsoft MVP
https://twitter.com/andysal74
https://github.com/andysal
https://www.linkedin.com/in/andysal/
3. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Me.About();
• CTO @ Managed Designs
• Microsoft MVP since 2003
• Microsoft Regional Director since 2015
• Author, along with Dino, of .NET: Architecting
Applications for the Enterprise (Microsoft Press)
• Basically, a software architect eager to write code J
5. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
The problem
(from a CTO’s perspective)
Sharing as much code as possible between
an Alexa skill and an Azure Bot Service-backed bot.
Why should I do that? Well, when I speak to
a person they usually behave coherently in spite of
the communication channel J
N.B.: All the code we’ll be looking at is available on
GitHub under AGPL 3.0
6. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
1. A user produces an utterance (es: «Hi, how are you?»)
2. The utterance is sent via a communication channel to a
cognitive service which tries to guess the user’s intent
3. The cognitive service’s analysis result is sent to a program
which produces an appropriate response
• If needed, the program might save info about the
conversation
4. The response is actuated
Scenario
7. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
The stack
7
Azure Alexa
Communication ch. Channel + Bot Connector Device/Alexa App
Cognitive Service LUIS Lex
Program Bot App AWS Lambda
Conversation State Blob/CosmosDB DynamoDB
8. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Given that both Alexa and the Bot Framework require
our custom logic to be accessible via http, we will:
1. Implement an Alexa viable https endpoint
2. Implement a Bot Framework https endpoint
3. Build both the above mentioned http endpoints with
ASP.NET Core, so to share the code which produces
the response
The Solution
11. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
• Create a skill
• Choose an invocation name
• Define intents
• Define utterances (and slots)
• Provide the intents handler url (by default, it’s an AWS
lambda)
–The Alexa.NET nuget package might be of help
Implementing an Alexa skill
13. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Attention, please!
• Spaces are not allowed within intent names
• Utterances:
–Can’t contain punctuation (e.g.: "?")
–Numbers have to be written in words (e.g.: 5235 -> five
thousand two hundred and thirty five)
–Capital letters will be read expanded (e.g.: EL -> /e/ /l/)
–Slots are «private» to a specific intent
17. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
• Create a Bot App
• Create a LUIS app
–Define intents
–Define utterances (and entities)
Implementing an Azure-hosted bot
18. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
It all begins creating a bot app via the dashboard,
choosing:
• an App type
– Azure Function
– Web App
• a Pricing tier
– F0 (unlimited messages via standard ch., 10K msg/mese via direct line. No SLA)
– S1 (unlimited messages via standard ch., €0.422 per 1,000 messages via direct line. SLA 99,9%)
Creating a Bot App
19. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Functions Bots
Functions Bots support:
• SDK v3 only, with support up to 31/12/2019
• C#/NET46, JS/Node
• Basic, Echo, LUIS
• App Service plan, Consumption
20. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Web App Bots
Web App Bots can be developed with both SDK v3 and
v4
• v3 – NodeJS, ASP .NET WebAPI 2
• v4 – NodeJS, ASP .NET Core
Further templates are available here
23. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
• Free: 10000 transactions per month (5 calls/sec)
• Standard:
–Text requests: €1,265 EUR per 1000 transactions (50
call/sec)
–Speech: €4,639 EUR per 1000 transactions (50 call/sec)
N.B.: 1 transaction == 500 characters
LUIS pricing
24. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Bot Framework: Channels vs direct line
As per the Bot Framework’s jargon:
• Channels: are the user agents supported out of the
box
• Direct line: is the API available for custom user agents
The web chat is an open source web-based client for the
Bot Framework V4 SDK which uses the direct line and:
• Can be embedded as is via an iframe element
• Supports customisations
26. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
web chat <3 human voice
The web chat component:
1. Can record a spoken utterance
2. Converts the recording to text either using the browser or
Speech Services
3. Send the text to LUIS, which triggers our program
4. Pronounces the response either using the browser or Speech
Services
Speech Services pricing:
• Free: 1 req, 5 hours/month S2T, 5M char/month T2S
• Standard: 20 req, € 0,844/hour S2T, € 3,374 1M chars T2S
28. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Channels
Out of the box support for:
Cortana Email Facebook GroupMe
Kik Microsoft Teams Skype Skype 4 Business
Slack Telegram Twilio (Web chat)
29. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Facebook Messenger
The recipe:
1. Create a Facebook page
2. Create a Facebook app
3. Create the channel specifying: page id, app id and
secret
4. Configure the FB app, allowing API access in
«Advanced options»
5. Configure the channel via the Azure Dashboard
31. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
TL;DR; Authentication isn’t provided out of the box
Channels are unaware of who the user is, so we have to map
the request to an ASP.NET Core Identity profile:
• For an Alexa skill, via the account linking feature
• For a Bot App, sending an adaptive card to the channel to
have it asking for user credentials
From now on, the conversation will be authenticated (up to
web app’s shutdown)
Channels vs. authentication
33. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
By default, every utterance is independent from
previous ones: to create a connection between them
(==a conversation), our bot/skill has to persist some
state.
Alexa and the Bot Framework support both in-memory,
ephemeral conversations and durable ones:
• Alexa: DynamoDB
• Bot Framework: Azure Table Storage, CosmosDb
Conversation State
34. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Given the rule of thumb:
1 LUIS/Lex app == 1 language
To support multiple languages we could either:
• Build as many bot apps/skills as the # of languages
• Build a polyglot bot/skill by picking the right LUIS/Lex
app at runtime
Multi language support
35. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Ad hoc Polyglot
Pros • Independent language support lifecycle • One app to deploy them all
• More natural behaviour
Cons • More deploys
• NNUI effect (Not-so-Natural User Interface J)
• Must wait for all languages to support every intent
• Costs
Ad hoc vs. Polyglot
36. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
• Step 1: understand the utterance
–add a new interaction model for your language via the
AWS portal
• Step 2: Retrieve the Locale
–Retrieve the Locale provided by the request (e.g.:
skillRequest.Request.Locale property)
• Step 3: produce a response
–Use the locale to pick up the strings from a resource file
Implementing polyglotism, the Alexa way
37. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
• Step 1: Choose a language picking strategy
–Have a user setting stored somewhere (e.g.: identity profile,
conversation state) stating the language to be used *or*
–Use a cognitive service (e.g.: Azure’s Text Analytics API) to
guess the language from an utterance
• Step 2: understand the utterance
–Send the utterance the appropriate LUIS app
• Step 3: produce a response
–Use the locale to pick up the strings from a resource file
Implementing polyglotism, the Bot FX way
38. @ITCAMPRO #ITCAMP19Community Conference for IT Professionals
Billing/Pricing model
• Bot Framework:
– Web app: 0 - 0,422 EUR per 1000 messages (S1)
– Compute
• LUIS: 1,265 EUR/m (S1)
• [Speech to Text: 3,374 EUR per 1000 utterances]
• [Text to Speech: 3,374 EUR per 1000 messages]
• CosmosDB: 18 EUR/m (400 RU, 1Gb)
Each tx:
• Voice: 0,8 cent (0,026 EUR including CosmosDB)
• Text: 0,013 cent (0,02 EUR including CosmosDB)