Successfully reported this slideshow.
Your SlideShare is downloading. ×

Build Voice-Enabled Experiences with Alexa

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Build Voice-Enabled Experiences with Alexa

  1. 1. Build Voice-Enabled Experiences with Alexa @AlexaDevs
  2. 2. Meet the Alexa Family
  3. 3. Meet Alexa The cloud-based voice service that powers devices like Amazon Echo and Echo Dot
  4. 4. alexa.design/video
  5. 5. The Amazon Alexa Service Supported by two powerful frameworks that leverage public APIs Lives In the Cloud Automated Speech Recognition (ASR) Natural Language Understanding (NLU) Always Improving
  6. 6. The Amazon Alexa Service Supported by two powerful frameworks that leverage open APIs Lives In the Cloud Automated Speech Recognition (ASR) Natural Language Understanding (NLU) Always Improving Alexa Skills Kit (ASK) Create Great Content ASK is how you connect to your consumer
  7. 7. The Amazon Alexa Service Supported by two powerful frameworks that leverage open APIs Lives In the Cloud Automated Speech Recognition (ASR) Natural Language Understanding (NLU) Always Improving Alexa Skills Kit (ASK) Create Great Content ASK is how you connect to your consumer Alexa Voice Service (AVS) Unparalleled Distribution AVS allows your content to be everywhere
  8. 8. Skills built using ASK Tools that make it fast & easy for you to build skills
  9. 9. Alexa, ask Lyft for a Lyft Line to work
  10. 10. Alexa, tell Starbucks start my order
  11. 11. Alexa has skills Amazon.com/skills
  12. 12. Alexa Resources - cameras out! bit.ly/alexaquickstart github.com/alexa developer.amazon.com/ask aws.amazon.com
  13. 13. Remember to Check-In • Ask the instructor for the link • You’ll get a confirmation email with details to earn free perks from Amazon
  14. 14. I. Demo “Alexa, Open Space Facts”
  15. 15. Alexa, open space facts Wake Word Starting Phrase Skill invocation Name
  16. 16. II. Let’s Build Objective: Create a skill that delivers random facts or quotes
  17. 17. What You Will Learn • Voice User Interface (VUI) Design • Intents & Utterances • one-shot vs multi-turn interactions • SSML/Speechcons • AWS Lambda • Skill Certification
  18. 18. Two sides to an Alexa skill Alexa skills have two parts – a front-end and a back-end
  19. 19. Creating an Alexa Skill Voice User Interface Programming Logic +
  20. 20. Creating an Alexa Skill + developer.amazon.com aws.amazon.com
  21. 21. Creating an Alexa Skill
  22. 22. Creating an Alexa Skill developer.amazon.com
  23. 23. Creating an Alexa Skill aws.amazon.com
  24. 24. Creating an Alexa Skill + developer.amazon.com
  25. 25. Alexa Skill Templates github.com/alexa
  26. 26. Alexa Project Structure /SpeechAssets /IntentSchema.json /SampleUtterances.txt /src /index.js
  27. 27. Fact Skill Template alexa.design/fact
  28. 28. Open a New Browser Window 1. developer.amazon.com/alexa 2. aws.amazon.com 3. github.com/alexa with these three tabs:
  29. 29. Echosim.io Let’s test our skill
  30. 30. Alexa, open space facts open, begin, start, launch, ask, tell Wake Word Starting Phrase Skill invocation Name
  31. 31. Alexa, ask space facts for trivia UtteranceWake Word Skill invocation NameStarting Phrase
  32. 32. Alexa, ask space facts for trivia tell me something give me information a fact give me trivia UtteranceWake Word Skill invocation NameStarting Phrase
  33. 33. III. How it works. Utterance to intents.
  34. 34. Audio Cards Request Response
  35. 35. Speech Recognition
  36. 36. Automatic Speech Recognition fȯr tē tīmz
  37. 37. Automatic Speech Recognition fȯr tē tīmz Forty Times? 40x
  38. 38. Automatic Speech Recognition fȯr tē tīmz Forty Times? 40x For Tea Times?
  39. 39. Automatic Speech Recognition fȯr tē tīmz Forty Times? For Tea Times? For Tee Times? 40x
  40. 40. Automatic Speech Recognition fȯr tē tīmz Forty Times? For Tea Times? Four Tee Times? 40x
  41. 41. NLU engine to the rescue Natural Language Understanding
  42. 42. Sample Utterances In order to map user input to a behavior, we provide training data, for each intent.
  43. 43. Intent Schema (JSON) An array of intents. Each intent is a behavior for your skill.
  44. 44. Inputs & Outputs User Audio in. Intents & Slots out.
  45. 45. Wake word detection Signal processing Beam forming Request Response
  46. 46. Audio Utterances JSON Intents Request Response
  47. 47. Response Request Text to speech SSML, streaming audio JSON
  48. 48. Intents & Utterances
  49. 49. Intents are the Connection
  50. 50. Intents are the Connection - JSON
  51. 51. Intents are the Connection - Code
  52. 52. Built-in Intents A library of intents for common actions. Amazon provides training data, but they can be augmented. AMAZON.CancelIntent AMAZON.HelpIntent AMAZON.StopIntent AMAZON.NextIntent AMAZON.NoIntent AMAZON.RepeatIntent AMAZON.StartOverIntent AMAZON.ShuffleOnIntent AMAZON.YesIntent REQUIRED FOR CERTIFICATION
  53. 53. Communicating with the endpoint Your endpoint needs to receive and react to a JSON object
  54. 54. The Endpoint Must be Internet-accessible Adhere to ASK service interface - JSON Web service or AWS Lambda Uses HTTP over SSL/TLS - port 443
  55. 55. Communicating with the Endpoint Request body: • session: Information about the current conversation • request: Describes the user input
  56. 56. Communicating with the Endpoint Response body: • outputSpeech: Alexa’s response • card: (optional) graphical response • reprompt: (optional) reminder • shouldEndSession: used to end or keep session open
  57. 57. Types of requests The journey from user utterance to intents.
  58. 58. Alexa, open space facts LaunchRequest
  59. 59. Alexa, exit SessionEndedRequest
  60. 60. IntentRequest : GetNewFactIntent Alexa, ask space facts for trivia
  61. 61. Alexa SDK: emit, ask, tell
  62. 62. Ask vs Tell Tell: Ask: Present data to user, ends conversation (session). Wait for user input, doesn’t end conversation (session).
  63. 63. Emit – output speech/event Speech: Event: A way to route behavior in your code.
  64. 64. Alexa Resources - cameras out! bit.ly/alexaquickstart github.com/alexa developer.amazon.com/ask aws.amazon.com
  65. 65. Remember to Check-In • Ask the instructor for the link • You’ll get a confirmation email with details to earn free perks from Amazon

Editor's Notes

  • Hello, and welcome to Alexa Workshop.
  • Alexa Family

    Echo
    The Echo is the first and best-known endpoint for Alexa
    Amazon launched the Amazon Echo in 2014. Echo is really a hands-free speaker with far-field voice recognition, which means you can just talk to it from across the room.
    The Echo is the first and best-known endpoint of the Alexa Ecosystem. We released Echo in 2014 to allow customers to engage with Alexa and control their home via voice. Alexa and The Echo device was built to make life easier and more enjoyable.
    Echo Dot:
    is a hands-free, voice-controlled device that uses the same far-field voice recognition as Amazon Echo. Dot has a small built-in speaker—it can also connect to your speakers over Bluetooth or with the included audio cable.
    The Echo and the Echo Dot are what we call far-field Alexa devices. You interact with them in a completely hand’s free way from anywhere in the room…even if that room is noisy.
    The difference between Echo and Echo Dot is simple: Echo has a powerful built-in speaker that provides room filling sound.
    Echo Dot is smaller and contains a less powerful speaker and works great when connected to another audio system. Both include the same 7 microphone mic-array with advanced beam-forming and noise cancelling technology and are otherwise functionally identical
    Amazon Tap: Alexa is also available other Amazon devices including Tap, our a portable battery powered speaker
    Other
    Alexa is available on Amazon’s Fire Tablets, and Amazon shopping apps on mobile.
    Alexa is also available on Fire TV via the push-to talk remote control that comes with it.
  • What is Alexa

    It is a cloud based service that handles all the speech recognition, machine learning, and Natural language understanding for all Alexa enabled devices.
    Since it lives in the cloud, is always getting smarter, it’s constantly improving and learning. The more you use it, the more it adapts to your speech patterns, vocabulary, and personal preferences.
    And because Alexa takes all her intelligence from the cloud, new updates and features are delivered automatically.
  • We’re now interacting with technology in the most natural way possible – by talking.

    http://alexa.design/video
  • There are so many possibilities when it comes to Alexa, and we are really excited about it. With Alexa, we are building a cloud-based voice service that’s free to all developers, companies, and hobbyists. Best of all, you don’t need a background in NLU or speech recognition to build great voice experiences for your customers.

    Alexa is supported by two sets of APIs & SDKs -

    Alexa Skills Kit (ASK) is an SDK that allows you to build custom skills that customers can voice enable on all Amazon Alexa products. Many of our customers who build their own smart home products with Alexa also create complementary skills that can be accessed in the Skills Storefront.
    Alexa Voice Service (AVS): is a set of APIs and developer tools that you can use to build Alexa into your product, whether you’re in the automotive, smart home, or home audio industry.






    --

  • On one side we have ASK (Alexa Skills Kit) – an API that allows you as a developer to add more capabilities to Alexa. So when we released Alexa, she’s didn’t have the capability to order an Uber, or order a pizza from Dominos. But what we did was that we opened up the API so these companies could build skills that create rich voice experiences for their customers; We now have over 12000 skills that have been published today, and we expect to see a lot more of these in the future.

    All you Have to Do Is ASK (What is the Alexa Skills Kit?)
    The ASK is our SDK, read human….our way of making the voice experience via Alexa possible.
    ASK gives you the ability to create new voice-driven capabilities (also known as skills, think Apps) for Alexa using the new Alexa Skills Kit (ASK).
    You can connect existing services to Alexa in minutes with just a few lines of code.
    You can also build entirely new voice-powered experiences in a matter of hours, even if you know nothing about speech recognition or natural language processing.
  • On the other side is AVS (Alexa Voice Service), - set of APIs that allow you to integrate Alexa in to your devices and apps. So think cars, microwaves, refrigerator, speaker or the likes. As long as your device has a microphone, speaker and internet connection, you can integrate Alexa. In fact we recently released a Raspberry Pi version of Alexa using the AVS APIs.


    AVS: Serving a Platform Agnostic Voice Experience
    It’s through the Alexa Voice Service that, device makers and hardware manufacturers can incorporate an Alexa-driven voice experience into their devices.
    Any device that has a speaker, a microphone, and an Internet connection can integrate Alexa.
    Just imagine what that means. You can picture everything from a car to a microwave to a pen, and more...all enabled to deliver an experience by voice

    Both ASK and AVS are completely free to use. Here’s a rule of thumb to understanding what feature set makes sense for your use case:

    You can add your product to Alexa through the Alexa Skills Kit (ASK)
    Or, you can add Alexa to your product with the Alexa Voice Service (AVS),
  • Let’s switch gears now and talk a bit about Skills – which is really capabilities that Alexa has.

    What is a Skill?
    Skills are how you, as a developer, make Alexa smarter. They give customers new experiences. They’re the voice-first apps for Alexa. When we launched Echo, Alexa could do the basics - weather, music, read the news, but now you can Lyft, Dominos etc.
    There are two kinds of skills
    built in skills (like playing music, weather forecast, general knowledge questions) and
    custom skills that you as developers can build.
    Building skills using Alexa Skills Kit (ASK)
    The way you build skills is by using the Alexa Skills Kit.
    The Alexa Skills Kit is a collection of self-service APIs, tools, documentation and code samples that make it fast and easy for you to add skills to Alexa. Thousands of developers are building skills to expand Alexa’s capabilities.
    We launched the Alexa Skills Kit so anyone can develop Skills for Alexa, at no cost.
    Very similar to Apps on your phone, except that nothing gets installed on the device.
    What can you do with ASK
    You can connect existing services to Alexa
    You can also build entirely new voice-powered experiences in a matter of hours, even if you know nothing about speech recognition or natural language processing.
  • When Alexa launched in the US , it had dozens of capabilities or skills, and has now thousands of capabilities.

    You can now say “Alexa, ask Lyft for a Lyft Line to work”

  • Or

    Alexa, ask Capital One what did I spend?
  • Or

    Alexa, tell Starbucks to start my order
  • The free Amazon Alexa App is a companion to your Alexa device for setup, remote control, and enhanced features. Alexa is always ready to play your favorite music, provide weather and news updates, answer questions, create lists, and much more.

    You can also visit amazon.com/skill to view the complete catalog of skills.
  • Let’s see the Fact Skill in action before we start building it. Talk to Alexa – quick demo of the fact skill.

    “Alexa, open space facts”

    You can also say –

    Alexa, tell me a fact, or
    Alexa, give me a space fact



  • Wake Word - A command that the user says to tell Alexa that they want to talk to her. Example: “Alexa, ask History Buff what happened on December seventh.” Here, “Alexa” is the wake word. Alexa users can select from a defined set of wake words.
    Starting Phrase – open, ask, begin, play, start, talk, tell etc.
    Invocation Name: A name that represents the custom skill the user wants to use. Users say a skill’s invocation name to begin an interaction with a particular custom skill.
    For example, if the invocation name is “Daily Horoscopes”, users can say:User: Alexa, ask Daily Horoscopes for the horoscope for Gemini
    You must say the name of the skill as part of the user utterance. That’s the way Alexa can map it to the appropriate skill. It’s like launching a mobile app. You have to open the app to use the specific functionality.


  • Much like the web and mobile apps, there are two pieces to building an Alexa skill.
  • Alexa skills have two parts:

    Configuration data in Amazon Developer Portal (Frontend)
    Hosted Service responding to user requests (Backend)
  • Alexa skills have two parts:

    Configuration data in Amazon Developer Portal (Frontend): done at developer.amazon.com
    Hosted Service responding to user requests (Backend): we’ll be using AWS Lambda as our backend, so we’ll do this at aws.amazon.com
  • Work in progress: This slide will be tweaked.

    Create VUI Interaction Model. Front-end = Skill Info + Interaction Model
    Lambda – Your code or your hosted service backend
    Connect VUI to code
    Testing
    Customization – Make it your own
    Publish it

  • Create VUI Interaction Model (Front End)
    Skill Info + Interaction Model
    Create AWS Lambda Function: Your code or your hosted service backend
    Connect VUI to the Lambda Function
    Testing
    Customization: Make it your own
    Publish it

  • Create VUI Interaction Model (Front End)
    Skill Info + Interaction Model
    Create AWS Lambda Function: Your code or your hosted service backend
    Connect VUI to the Lambda Function
    Testing
    Customization: Make it your own
    Publish it

  • Create VUI Interaction Model (Front End)
    Skill Info + Interaction Model
    Create AWS Lambda Function: Your code or your hosted service backend
    Connect VUI to the Lambda Function
    Testing
    Customization: Make it your own
    Publish it
  • GitHub templates
  • A typical Alexa project on GitHub has the following structure:

    /SpeechAssets
    Provides the VUI or the Front End for the skill
    Meant to go inside your skill at developer.amazon.com
    /src
    Provides the code for the skill
    Meant to go into your Lambda Function at aws.amazon.com

  • About this Skill
    This sample covers the basics of skill building. It delivers random facts or quotes and serves as a very simple example.
    You can also customize your fact skill with your favorite topic. 
    Concepts you will learn with this skill
    Intents and Intent Schema
    Sample Utterance
    Generating a randomized response from Alexa
  • We’ll be switching between these as we build our skill

  • Visit echosim.io, and login using Amazon
  • As a developer you are never asked to work with audio or raw text coming from the user.
    You receive a JSON object that was generated by the Alexa Service, this is how it works.
  • This is “bird’s eye” view of a user interacting with a custom skill through an Echo.
    We will go into further detail latter in this presentation, but it’s important to remember that Alexa and all skill’s code live in the cloud.
  • In order to understand what a user says, we first have to turn sounds into words.
    This process is called speech recognition.
  • In this example we have the phonetic spelling for three sounds.
    Let’s see what words these could form.
  • Forty times? Maybe the user wants to multiply something by forty
  • For tea times? Is the user searching for good times to have some tea?
  • For Tee Times?
    Does the user want to play golf?
  • Or does the user want to play a lot of golf
  • Having a Natural Language Understanding Engine on top of speech recognition allows us to go from words to meanings.
    We can also train this engine using utterances and slots to map user input with high accuracy.
  • The way we train the NLU engine is by using sample utterances, that we associate to an intent
  • Each intent define a specific behavior your skill can take, like buttons on a web page, they take user input and execute some code based on it.
  • Let’s take a step-by-step look at how user input, in the form of spoken word (audio), is turned into a JSON object that our code can read and respond to.
  • - The first thing that has to happen is for the device to “wake up” when it hears the correct word.
    - Once the device is awake it’ll stream all the audio to the Alexa Service hosted by AWS in the cloud.
    - Alexa devices like the Amazon Echo, or the Echo dot feature microphone arrays, these allow us to capture high quality audio, by using beam forming and canceling background noise.
  • Once the audio reaches the Alexa Service, it is converted into a JSON object, based on the meaning of the words the user spoke.
    This JSON object is easy to read from any programing language and contains enough information to allow us to respond to the user’s input.
  • Your code just has to return a (properly formatted) JSON object to the Alexa Service and the service will take care of turning it into audio and routing it to the correct user and device.
    Your response can contain plain text, Speech Synthesis Markup Language (SSM) and references to audio files to be played.

  • Intents are the behaviors your skill can take.
    Sample Utterances are training data used to map user input to each behavior.
    The name of an intent is what connects everything together.

  • Here we can see the intent schema &sample utterances side-to-side.
    As you can see, the thing they have in common is the name of the intent.
  • Here is an example of a JSON object that would get sent to your code.
    Here we can see that there is an intent component that has a name and it exactly matches what we had in our intent schema
  • Since we are using the alexa-sdk in our code we define an handler for an event that matches the intent name.
    The intent name connects everything together, the intent schema, the training data, the JSON object and the code.
  • Along with custom intents, Amazon provides a series of “built in” intents you can leverage, these intents don’t require any training data.
    The 3 highlighted intents are required for skill publishing.
  • We use the term endpoint to describe your code along with were it’s hosted.
  • You can leverage any programming language and hosting technology to build your endpoint.
    The only requirement is that you securely receive and send JSON in the correct format.
    The easiest way to host your endpoint is using AWS Lambda
  • This is an example of JSON object generated by the Alexa Voice Service based on user input.

    The request body has two main components session and request.
    The session object has information about the current conversation, including what user and skill made the request.
    Request contains the payload LaunchRequest
    IntentRequest
    SessionEndedRequest
  • This is an what a JSON object generated by your endpoint should look like.

    It can be broken down into the following components:
    outputSpeech: This is the message users will hear as interpreted by Alexa.
    card: Optional graphical component, that will be rendered and stored in the Alexa mobile app and alexa.amazon.com
    reprompt: Optional message to remind a user we are waiting for input, if timeout is met.
    shouldEndSession: Indicates if service should wait for user input by keeping the session open or end it.
  • There are three main types of requests, the Alexa Service will generate the appropriate type based on the users input
  • The sentence at the top is turned into a LaunchRequest, this is analogous to opening an app or website, we are just launching third party functionality.
  • This type of request is sent back so you can do any necessary cleanup and store data.
  • This example showcases how a single command from the user can wake up a device, launch a custom skill and trigger functionality within it.
    The JSON object for this command would have the type listed as an IntentRequest and the intent name for this example would be GetNewFactIntent
  • The alexa-sdk gives us a series of tools that make working with JSON objects a lot easier, although it is not required for developing skills, it makes a huge difference.
    It is packaged as node module distributed through NPM.
  • The SDK works as an event emitter and provides easy ways for us to declare and attach handlers for events.
    It also allows us to quickly create responses by emit and event that contains :tell or :ask removing the need for us to craft JSON by hand.
  • Besides emitting an event that gets turned into a response, we can also use the emitter and handlers to control de code flow.
    We can emit any event and as long as we have a handler for it, we can trigger any of our codes functionality.

×