4. Overview of how Alexa works
Request
Audio
Response
Your Service
Text to Speech
Machine Learning
Natural Language
Understanding
Speech Recognition
Cards
17. Amazon Polly—language portfolio
Americas:
• Brazilian Portuguese
• Canadian French
• English (US)
• Spanish (US)
APAC:
• Australian English
• Indian English
• Japanese
EMEA:
• Danish
• Dutch
• British English
• French
• German
• Icelandic
• Italian
• Norwegian
• Polish
• Portuguese
• Romanian
• Russian
• Spanish
• Swedish
• Turkish
• Welsh
• Welsh English
19. Tying it all together
Request
Audio
Response
Your Service
Text to Speech
Machine Learning
Natural Language
understanding
Speech Recognition
Cards
20. Tying it all together
Your skill
Lambda Function
(request handler)
S3 Bucket (asset
storage)
DynamoDB Table
(state storage)
Amazon Polly
(text to speech)
Amazon SES Emails
(map, journal info)
21. Tying it all together
Marketing
S3 Bucket
(promo website)
Amazon SES Emails
(specials, updates)
Hi!
We’re Memo & Mark, Solutions Architects with the Alexa Skills Kit team.
What does that mean? Well, Alexa SAs are slightly different to AWS SAs, but we do much of the same things.
Memo – Events SA (based in US), works with hobby / indie developers to help them to understand how Alexa works, and to develop & publish a skill
Mark – SA (based in Germany), work with managed partners, mostly in Germany, and Europe to help them build skills for Alexa.
Oh, we should probably also introduce our good friend, Alexa.
I’m going to assume most of you have met Alexa before. Alexa is a voice-based service that lives in the cloud. You can think of her as a kind of virtual assistant, helping you to accomplish common tasks & providing you with useful info,. For example, “Alexa, turn the lights on”, or ”Alexa, what’s the weather in Las Vegas tomorrow?”
If you haven’t had chance to meet Alexa for yourself yet, I’d suggest to check out [INSERT SESSION] or drop by one of our booths for a test drive & to chat to the Alexa team. Or, if you’d just prefer to get hands-on fast, head to our hackathon!
Alright, just so I know everyone’s on the same page… Let’s briefly run through how Alexa works.
Starting from the left, we have the user.
They have a device. In this case, the device is an Amazon Echo, but Amazon produces quite a few Alexa-enabled devices, and there are many third party hardware manufacturers using Alexa Voice Service to put Alexa into their watches, lights, and even cars.
The user says “Alexa, ask All Recipes for a chicken recipe”.
Upon hearing the wake word “Alexa”, the device wakes up & starts streaming the user’s request to the Alexa service in the cloud.
The Alexa service then uses Automatic Speech Recognition to parse the user’s request & determine what was actually said.
From there the request is passed through Natural Language Understanding & Machine Learning to figure out the intent behind the request, and which platform feature or skill to send it to. In this case, the skill would be All Recipes, the intent would be to find a recipe, and the slot value ingredient slot would be chicken.
The request is then sent to skill, listed here as “your service”. The skill handles it, finds some recipes & returns them back in a response, maybe asking the user which on they would like details on.
The request & response are sent as JSON blobs, and typically include text or audio, and a card or template.
Text & audio are then played back by the Alexa-enabled device. Cards are sent to the user’s Alexa app.
If the device has a screen, such as an Echo Show or Echo Spot, it will render the template or card on screen.
~~Pictures of some of our most engaging skills.~~
So, what are skills?
They’re essentially voice-based applications, which developers can build to teach Alexa new tricks.
Alexa has quite a lot of useful built in functionality such as being able to play songs from Amazon Music, Spotify & Tune-In, or setting alarms, or telling me what’s on my calendar for today, etc., but obviously, there are going to be a few things we’ve missed or we’re not experts in. For example, train times for your local city, playing adventure games, or tips for snowboarding, etc.
These are all cases where a skill could help. For example, in Germany, the Deutsche Bahn skill could help you find train times, or the Runescape skill lets you play a nice adventure game. I don’t know of a snowboarding tips skill, but hey, if anyone wants to build one, go ahead!
Okay, so maybe you’re building a skill for the first time, or you’ve perhaps you already have built a skill, and now you’re asking, how do I get engagement? How do I make customers love my skill?
Well, that’s easy… make your skill interesting & delightful to use. Okay, maybe that’s a little easier said than done.
How about we share some tips & tricks on AWS services that you might be able to use to make the skill more delightful & interesting?
The great thing about AWS Services is that there’s one for just about everything! Want to spin up a database? There’s a service for that! Want to send push notifications? There’s a service for that! Want to emit data from IoT sensors, collect it, and then use Machine Learning to predict certain events or conditions? There are services for that.
No, in all seriousness though, there are quite a few AWS Services now. If you haven’t been using AWS prior to developing for Alexa, it might be a bit confusing figuring out where to start. Let’s introduce you to 5 that could help make your skills a little more fun. You can always branch out & discover the rest of the services from there.
Alright, let’s start with Lambda for those of you that don’t know it.
On the left, we have an event source. This could be anything for a file being PUT to S3, to an update in DynamoDB, to a request being made to an API Gateway endpoint. Oh, and of course, a user’s request to Alexa.
This event source acts as a trigger for your Lambda function. Your function is basically just a block of code – right now AWS supports Node.js, Python, Java & C# -- that’s booted & executed.
That function can also interact with other things. In Alexa skills, it sends a response to the Alexa service, but it could also do things like write to a database or check how many EC2 instances you have running.
So, why use this rather than just boot a few EC2 instances & build a web service?
You could also use a RESTful service that speaks JSON over HTTPS, which may be attractive for people with existing APIs. However, I typically find APIs are built for machine-to-machine interaction, not human-to-machine interaction, and so don’t account for spoken responses. Rather than changing those APIs, many developers with existing APIs will often use Lambda to wrap the API & convert the machine output into human understandable output for Alexa to speak back.
Well, there’s a few benefits there…
First, you don’t need to build & maintain servers yourself. That means, rather than worry whether you’re running the latest Linux package, you can just get on with building skills.
Next, Lambda can run many functions in parallel. No need to worry if your skill gets featured in TechCrunch, Slashdot, Reddit, etc., Lambda will just spin up enough instances of your function to handle it.
Finally, you’re not paying for cold servers. This is probably useful for most use cases. For example, a recipe skill is probably going to see large spikes in usage around meal times, with lower traffic during other times.
Some things I would also recommend looking into is Lambda versions & aliases, and environment variables features. This is super useful for development & deployment!
As you might know, Lambda functions have an Amazon Resource Name or ARN.
You can use point to a specific version or alias by appending “:x” on the end, where “x” is either the version of alias name. If you don’t specify that version or alias, you just get the LATEST version.
When I develop, I like to use this similar to say a git release workflow:
In my first version of a skill, I’ll work mostly on latest, then when happy, publish a new version.
I can then either use that version number, or perhaps better, use an alias like “release1” (allows me to correct any urgent issues and point the alias to a new version).
I then use the full ARN with the :release1 on the end in the Alexa Developer Portal. That way, when I submit & release the skill, it’s locked to that alias.
When I work on my second version of the skill, I create a new alias called “release2”.
I then put the skill into development mode, and update the ARN to point to the new release2. I can then continue working on the skill without risking breaking functionality for existing versions.
Environment Variables are basically key value pairs that are passed into your Lambda function’s environment.
They can easily be set or modified at any time, but apply to all versions of the Lambda function, so be careful with them. They can be really useful as feature flags though.
Let’s say you’re building a recipe skill that, and during the American pumpkin season, you want to bump all your pumpkin-spiced recipes a little. You could add some fancy logic to your skill to check if it’s October, but the dates for certain season never map to a month, and can shift a little from year to year. No worries, you can just add an environment variable such as IS_PUMPKIN_SEASON and set the value to true. Then in your Lambda function, you’d check for the presence of that environment variable, and if set to true, run the relevant code.
This is also super useful if you need to temporarily want to turn on more detailed logging to debug an issue. Just add and environment variable such as “DEBUG_MODE”. Then in your code, write some extra console.log statements and lock them behind some if statements that check for the presence of DEBUG_MODE, and that it’s set to true.
Another one, if you haven’t hit it already is to adjust your execution timeout for Lambda. By default, it’s 3 seconds. Alexa has a timeout of around 8 seconds. If you can execute everything in 3 seconds, great, but if Lambda kills your request before it returns, the user will get an error. You might want to try bumping that up to around 10 seconds just to give yourself enough time.
Okay, next service, S3.
S3 is Amazon’s Simple Storage Service.
If you haven’t tried it already, it’s super useful for serving assets to your skill.
That includes things such as audio files for playback via SSML or the audioplayer directive, video files for playback via
Things such as audio files for SSML or audioplayer, video files for video apps, and images for both cards & templates.
That’s generally a much better idea than uploading them as part of the app payload, as:
Smaller payloads = quicker deployment times
You can change the asset if you need to, without editing the Lambda function / re-certifying the skill. That useful if you’re doing something like a skill that sends you daily content (space photos, cat videos, etc.).
Okay, we’ll give you some tips for this one too…
Firstly, versioning. This one is maybe useful to explore, as it can help prevent ”oopsies” moments.
Let’s say you accidentally upload the wrong asset & overwrite something being used in production. Ideally, it shouldn’t happen, but we all know that sometimes human or code errors can lead to mistakes.
If you have versioning enabled, you’ve actually got 2 safety nets there:
1) Each version has a version ID. If you don’t specify it in the request url, you get the latest version. If you do, you can lock to a specific version.
2) You can also roll-back versions if you need to.
Next logging, you can enable this to give you more detailed logging about requests for your S3 objects.
To enable CORS, the image server must set the Access-Control-Allow-Origin header in its responses. If you want to restrict the resources to justthe Alexa app, allow just the originshttp://ask-ifr-download.s3.amazonaws.com andhttps://ask-ifr-download.s3.amazonaws.com.
Finally, S3 can host static websites. That is, if your site just uses HTML, CSS, JavaScript and normal assets like images or videos, you can run that site directly from S3.
This can be useful if you want to build a little marketing ”hype site” for your skill, that you can then promote through all the various networks.
Alright, time for number 3! DynamoDB!
DynamoDB is Amazon’s managed NoSQL database service.
If you were spinning up your own DB servers on EC2, or hosting one of the many database engines via RDS, one of the first things you typically need to ask is what are the storage requirements? How much data do I expect in there initially? How much do I expect it to grow by, and at what rate? Etc.
DynamoDB is a little different – Instead of specifying the storage capacity, you just specify the throughput you need & it handles all the compute & storage to give you that throughput. Just keep adding data & it’ll figure it out.
So, how is it useful to an Alexa developer?
Well, mostly for storing user preferences & state.
Let’s say you’re building a recipes skill… Alexa’s interaction method is request-response. If the user makes a request, Alexa responds & optionally keeps the mic open for a response from the user, and so on.
The timeouts there are around 8 seconds – Alexa has ~8 seconds to respond to user, and similarly the user has ~8 seconds to respond to Alexa. The screen on an Echo Show disappears after about 30 seconds.
I’m going to make a wild guess here, but I would probably say the majority of users won’t be able to complete a step in 8 seconds & be ready for the next one. That means your skill is going to close.
If you don’t have any form of persistence, then the user needs to re-open the skill, find the recipe again, and then keep saying next step until they get to where they were. That’s a pretty sucky user experience.
Instead, you can write a record to DynamoDB to track the user’s progress.
E.g. use the userId Alexa passes in as the key, and store things like recipe name / number & step number
That way, when they re-open the skill, you can just check that table for the last step they got to & respond with something like “Welcome back. It looks like you were making <recipe name>. Would you like to continue, or search for something new?”. That’s a much better user experience & makes your skill much more delightful to use.
You could also use DynamoDB to store preferences for a user. You could add an intent to handle ingredients that the user loves, or hates.
E.g. If the user says “Alexa, tell skill that I don’t like brocolli”, you could set that as a record in DynamoDB. If the user later asks for recipes for vegetables, you could look up their preferences, discover they don’t like brocolli & omit all those recipes from your search results.
Of course, some of that can be done by using session attributes. That’s fine for “session” things, but session attributes only persist while the skill is open. As soon as the session ends, they’re gone.
It can also be tempting to shove absolutely everything in the session attributes response block. That’s usually not a good idea.
Okay, some more cool features that could be interesting to you…
DynamoDB allows you to set a TTL on records. That’s pretty cool if you want timed sessions.
Interaction with Alexa is typically done in a request-response manner.
Alexa’s timeout from both sides is around 8 seconds. If the user asks Alexa something, and the skill doesn’t respond in time, it will return an error to the user.
Likewise, if the skill responds with a question, and the user doesn’t respond in time, the session will timeout.
Let’s say you’re building a recipes skill that guides a user through a recipe step by step… Some of those steps, like pre-heating ovens or cooking something until it begins to brown, is going to take a little longer than 8 seconds.
Sadly, without significant improvements in the cooking preparation world, you’re probably going to need to save a user’s state if they exit the skill or the session times out.
That’s pretty easily done, just the userID & the step they were on to DynamoDB on skill exit. Then when the user returns, check that value & resume from there.
That’s good if the user comes back a few minutes later for the next step but, if the user comes back a day or two later, they’re probably not still prepping the same recipe, so you’d just want to start from the top.
Luckily, DynamoDB supports setting a “Time To Live” or “TTL” for entries. Just set that, and the record will be deleted once the time expires. If there’s no record, the skill will just start with a search again.
Additionally, I wanted to tell you about DynamoDB Streams, which keeps a constant stream of changes made to a DynamoDB table. The cool thing about that is that you can use them as a trigger for Lambda functions.
Let’s imagine your recipe skill evolves a bit and allows a user to set preferences. You might allow them to set things like that they eat meat, they don’t like broccoli, they’re gluten free, and they set a maximum calorie count for the day. Okay, that’s cool, they can go through & get all recipes specifically for their likes.
What happens if the user suddenly decides to become vegetarian? Well, you could try and handle this request in-line & iterate through all settings in the table, changing them to match this new preference. The risk there though is that you only have 8 seconds to respond, or the user will get an error.
Instead, you could have the skill update the table to set that the user is vegetarian & and lambda function that receives events from the stream. This function would then trigger on every update & could be used to resolve issues with the preferences.
Alright, we’ve covered a few services so far.
Many of them are everyday services for Alexa developers, but we might have raised some interesting topics. Does anyone have any thoughts or questions so far?
Okay, let’s take things a little further shall we.
Last year AWS introduced Amazon Polly. This is a pretty nice service for converting text to speech.
Polly’s also designed to fast, so for most requests you can call it inline, but you can also pre-generate the files & store in S3 if you want to.
Polly also provides 47 like-like voices across 22 different languages, which you can see here
That means that if you wanted to, you could build a skill that makes it easy for users to translation “Good morning” into “Guten Morgen”.
You could also use Polly as an inexpensive way to add characters to an interactive fiction skill you might be building.
Do you have a skill that needs to send a summary of something to a user?
Let’s say a user wants to save that amazing recipe for later, or needs a copy of their event itinerary… Well, you can push it to a card, which shows up in the user’s Alexa app, but you may also want to offer sending via email.
To do that, you’re going to need some way of sending emails.
Now, if you want to try and configure Postfix or Sendmail to run inside a Lambda Function, you are welcome to try, but it’s probably not going to be the easiest task.
Instead, I would strongly suggest to look at using SES, or Simple Email Service.
It’s basically exactly what it’s name implies, a simple service for sending email. It’s an AWS Application Service, so there’s APIs for it, and it’s part of the AWS CLI & SDKs.
It even has the ability to setup templates for HTML content with variables, which may make things a bit simpler there.
Okay, so you’ve heard 2 SAs ramble on about their favourite AWS services to use with Alexa.
Now, how do we tie all this together….
Let’s revisit that interactive fiction adventure I mentioned before…
I’m going to assume you’ve done your homework & come up with a story narrative / script.
From there, you can figure out characters or narrators that need their own voice. That voice is then rendered using Polly, and stored in S3.
The backend of your skill is a Lambda function, with each request triggering a different intent handler in that skill.
If the intent handler needs to return a character or narrator’s voice, it fetches that from your S3 bucket, and returns it as part of the SSML response.
If the game is long, then you may want to give the user a way of saving and/or auto-save their progress on skill exit. That of course is persisted to DynamoDB.
Additionally, you might have a journal intent in your skill, that allows a user to see the key points of their story / open quests, etc. You could push that to a card, but you may also want to send that to email. You could also use email as a way to notify users of new content within your skill.
Oh, and finally, once you have your killer skill built, you probably want to build some promo pages for it. Build a small info website, host it in S3, and then link to that on all the networks.
Okay, so you’ve heard 2 SAs ramble on about their favourite AWS services to use with Alexa.
Now, how do we tie all this together….
Let’s revisit that interactive fiction adventure I mentioned before…
I’m going to assume you’ve done your homework & come up with a story narrative / script.
From there, you can figure out characters or narrators that need their own voice. That voice is then rendered using Polly, and stored in S3.
The backend of your skill is a Lambda function, with each request triggering a different intent handler in that skill.
If the intent handler needs to return a character or narrator’s voice, it fetches that from your S3 bucket, and returns it as part of the SSML response.
If the game is long, then you may want to give the user a way of saving and/or auto-save their progress on skill exit. That of course is persisted to DynamoDB.
Additionally, you might have a journal intent in your skill, that allows a user to see the key points of their story / open quests, etc. You could push that to a card, but you may also want to send that to email. You could also use email as a way to notify users of new content within your skill.
Oh, and finally, once you have your killer skill built, you probably want to build some promo pages for it. Build a small info website, host it in S3, and then link to that on all the networks.
Okay, so you’ve heard 2 SAs ramble on about their favourite AWS services to use with Alexa.
Now, how do we tie all this together….
Let’s revisit that interactive fiction adventure I mentioned before…
I’m going to assume you’ve done your homework & come up with a story narrative / script.
From there, you can figure out characters or narrators that need their own voice. That voice is then rendered using Polly, and stored in S3.
The backend of your skill is a Lambda function, with each request triggering a different intent handler in that skill.
If the intent handler needs to return a character or narrator’s voice, it fetches that from your S3 bucket, and returns it as part of the SSML response.
If the game is long, then you may want to give the user a way of saving and/or auto-save their progress on skill exit. That of course is persisted to DynamoDB.
Additionally, you might have a journal intent in your skill, that allows a user to see the key points of their story / open quests, etc. You could push that to a card, but you may also want to send that to email. You could also use email as a way to notify users of new content within your skill.
Oh, and finally, once you have your killer skill built, you probably want to build some promo pages for it. Build a small info website, host it in S3, and then link to that on all the networks.
Alright, we’ve covered a few services so far.
Many of them are everyday services for Alexa developers, but we might have raised some interesting topics. Does anyone have any thoughts or questions so far?
IoT -- interact with an IoT device such as a raspberry Pi, AWS IoT button, or even a webpage using the AWS JavaScript SDK.
SNS – send push notifications to your users’ mobile devices.
Rekognition – analyze images either uploaded by users, or captured from a source.
SQS – use a queue + worker nodes to setup a worker pool & get low priority tasks out of critical paths.
API Gateway – use this to wrap your existing API & transform requests/responses to suit.
IAM -- use user & groups to grant your dev team permissions to interact with various AWS resources. use roles to allow your service to assume permissions to something -- e.g. interact with DynamoDB. This is much safer than throwing keys around.
Want to really dive deep on those CloudWatch logs?Well, you can use CWL as an event source for Lambda. From there, you can push it to Kinesis Firehose, which will emit the data to S3.You can then use Athena to pull in the data from S3 as a searchable database. You can then connect up QuickSight to that Athena, and run all kinds of analysis on it, then share those dashboards with other people in your team.
SQS -- need a queue to use for processing?