(MBL308) Extending Alexa’s Built-in Skills. See How Capital One Did It

5,322 views

Published on

Alexa, the voice service that powers Echo, provides a set of built-in abilities, or skills, that enable customers to interact with devices in a more intuitive way using voice. In this session we’ll provide best practices on how to create a compelling voice experience leveraging Alexa’s built-in skills. Scott Totman, VP of Mobile and Innovation at Capital One, will describe what they learned building their first voice experience, including how they mapped utterances to intents and optimized it for spoken language understanding.

Published in: Technology

(MBL308) Extending Alexa’s Built-in Skills. See How Capital One Did It

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scott Totman - Capital One, VP of Mobile and Innovation Mike Hines – Amazon, Developer Evangelist October 2015 MBL308 How Capital One Developed a Skill for Alexa
  2. 2. CREDIBLY INNOVATE PHOTO HERE Alexa Skills Kit TODAY’S AGENDA About Alexa Capital One skill demo Building the Capital One skill Alexa skill best practices
  3. 3. About Alexa
  4. 4. What is Alexa? Alexa is a cloud-based voice service that can answer questions play music read the news and more. Echo is an always-on always-connected hands-free device that connects to Alexa.
  5. 5. Alexa architecture Amazon Alexa serviceGUI cards are rendered in the Amazon Alexa app User audio is streamed to the service Audio responses are rendered on device
  6. 6. Alexa is always learning. Alexa gets smarter by learning new skills. Developers can create new skills for Alexa. Alexa is ALWAYS LEARNING
  7. 7. Creating your own ALEXA SKILLS Alexa skills have two parts: Configuration data in Amazon Developer Portal Hosted service responding to user requests
  8. 8. Alexa Skills Kit architecture Amazon Alexa service Developer’s application service Amazon’s Developer Portal Application, intents, sample data, developer service URL endpoint Configured through portal User intents and arguments are sent to the developer service GUI cards are rendered in the Amazon Alexa app User audio is streamed to the service Audio responses are rendered on-device Text response and/or GUI card data is returned
  9. 9. Building an Alexa skill HOSTED SERVICE • You define interactions for your voice app through intent schemas • Each intent consists of two fields. The intent field gives the name of the intent. The slots field lists the slots associated with that intent. • Slots can also included types, such as LITERAL, NUMBER, DATE, etc.
  10. 10. Building an Alexa skill HOSTED SERVICE • The mappings between intents and the typical utterances that invoke those intents are provided in a tab-separated text document of sample utterances. • Each possible phrase is assigned to one of the defined intents. • GetHoroscope what is the horoscope for {pisces|Sign} • GetHoroscope what will the horoscope for {leo|Sign} be {next tuesday|Date}
  11. 11. Capital One’s Journey
  12. 12. Capital One’s Alexa approach June: A few developers buy Echos July: Full day tech offsite & side of desk project kickoff August: Rapid prototyping and expanding Capital One skill Goal: Pair Alexa with the Capital One app and allow users to get their credit card balance
  13. 13. Consumer insights: Design thinking/test + learn Customers like it! • Hands-free convenience is valuable • Interested in using Echo for informational purposes • Open to making payments/transactions But… • Concerns about local security • Users don’t want financial information captured by a third party (Amazon)
  14. 14. Prototyping: Prerequisites & new development Leverage existing API model built for Android/iPhone apps Piggy-back off “glance” services built for Apple Watch Build new JS service as the ASK orchestrator* *Used Alexa app node library (Thanks Matt Kruse!)
  15. 15. Capital One skills focus Read-only information Transactional skills Experimenting • Default accounts (credit card, bank, loans) • Account balances • Bill due date • Last payment • Last transactions • Interest rate • Pay bill(s) • Transfer $ • App usage Patterns • O-Auth • Customer service/ support • Customer acquisition • Alexa adoption • Alexa evolution Skill development segmented into three priority buckets
  16. 16. Demo
  17. 17. Alexa challenges discovered during prototyping Numerical utterances, device latency, and security were our most significant
  18. 18. Numerical utterances Challenge: • “Twenty-two” is hard to turn into 22 instead of 20 and 2 • “Three hundred and forty-four dollars” • Needed to call out words like ‘hundred’, ‘and’ Solution: • Programmatically create utterances (big list)! • Optional words • ASK support for CURRENCY data type
  19. 19. PayAmount {one|THOUSANDS} thousand {one|HUNDREDS} hundred {one|DOLLARS} dollars and {eighty eight|CENTS} cents PayAmount {one|THOUSANDS} thousand {one|HUNDREDS} hundred {one|DOLLARS} dollars and {ninety nine|CENTS} cents PayAmount {twenty-one|THOUSANDS} thousand {twenty-one|HUNDREDS} hundred {twenty-one|DOLLARS} dollars PayAmount {twenty-two|THOUSANDS} thousand {twenty-one|HUNDREDS} hundred {twenty-one|DOLLARS} dollars PayAmount {twenty-three|THOUSANDS} thousand {twenty-one|HUNDREDS} hundred {twenty-one|DOLLARS} dollars PayAmount {twenty-one|THOUSANDS} thousand {twenty-two|HUNDREDS} hundred {twenty-one|DOLLARS} dollars PayAmount {twenty-two|THOUSANDS} thousand {twenty-two|HUNDREDS} hundred {twenty-one|DOLLARS} dollars PayAmount {twenty-three|THOUSANDS} thousand {twenty-two|HUNDREDS} hundred {twenty-one|DOLLARS} dollars PayAmount {twenty-one|THOUSANDS} thousand {twenty-three|HUNDREDS} hundred {twenty-one|DOLLARS} dollars PayAmount {twenty-two|THOUSANDS} thousand {twenty-three|HUNDREDS} hundred {twenty-one|DOLLARS} dollars PayAmount {twenty-three|THOUSANDS} thousand {twenty-three|HUNDREDS} hundred {twenty-one|DOLLARS} dollars PayAmount {twenty two|DOLLARS} dollar {twenty two|CENTS} cents PayAmount {sixty-seven|THOUSANDS} thousand and {sixty-eight|DOLLARS} dollars PayAmount {fifty-seven|THOUSANDS} thousand {fifty-eight|DOLLARS} dollars and {fifty-eight|CENTS} cents PayAmount {twenty-one|THOUSANDS} thousand {twenty-two|HUNDREDS} hundred {twenty-two|DOLLARS} dollars PayAmount {one|THOUSANDS} thousand {one|HUNDREDS} hundred {one|DOLLARS} dollars and {sixty six|CENTS} cents PayAmount {eighty eight|DOLLARS} dollar {thirty three|CENTS} cents PayAmount {twenty-three|THOUSANDS} thousand {twenty-two|HUNDREDS} hundred {twenty-two|DOLLARS} dollars PayAmount {twenty-one|THOUSANDS} thousand {twenty-three|HUNDREDS} hundred {twenty-two|DOLLARS} dollars PayAmount {twenty-two|THOUSANDS} thousand {twenty-three|HUNDREDS} hundred {twenty-two|DOLLARS} dollars PayAmount {twenty-three|THOUSANDS} thousand {twenty-three|HUNDREDS} hundred {twenty-two|DOLLARS} dollars PayAmount {twenty-one|THOUSANDS} thousand {twenty-one|HUNDREDS} hundred {twenty-three|DOLLARS} dollars PayAmount {twenty-two|THOUSANDS} thousand {twenty-one|HUNDREDS} hundred {twenty-three|DOLLARS} dollars PayAmount {twenty-three|THOUSANDS} thousand {twenty-one|HUNDREDS} hundred {twenty-three|DOLLARS} dollars PayAmount {twenty-one|THOUSANDS} thousand {twenty-two|HUNDREDS} hundred {twenty-three|DOLLARS} dollars PayAmount {twenty-two|THOUSANDS} thousand {twenty-two|HUNDREDS} hundred {twenty-three|DOLLARS} dollars Sample numerical utterances…out of 712
  20. 20. Latency Challenge: • Coding visually is great for websites, not for voice • Pauses while the service looks up data are a much bigger deal for voice Solution: • Keep APIs fast • Leverage Alexa session data • Keep explanations terse…but not rude
  21. 21. Security Challenge: • Account linking didn’t exist as an available solution • Figure out how to connect an Echo with a customer account • No guarantee of privacy on Echo end Solution: • Make vulnerabilities dependent on compromised account • Pairing code for secure account linking • 2nd factor authentication for moving money
  22. 22. Pairing process workflow 1. Open session 2. Device ID not recognized 3. Generate 6-digit PIN 4. Log in to C1 app – provide PIN
  23. 23. Keeping things in context Challenge: • Context is hard with multiple accounts • Helping a user with tasks and cross- context: • Switching context • Keeping context • Recognizing context Solution: • Map user workflow • When in doubt, ask the user
  24. 24. Code sample: Context switching function getCreditCardAccount(){ var currentAccount = hasContext() if( currentAccount && currentAccount.isCreditCard() ) { return currentAccount } var accounts = getCachedAccounts(req,res) if( accounts ) { var cached = accounts.filter(function(entry) { return entry.isCreditCard() }) if( cached.length == 1 ) { return cached[0] } } return null }
  25. 25. Capital One takeaways Wish list • Skill discoverability • Handle vocal interruptions better, with context • Notification indicator Works great • Straightforward • Majority of the effort is on customer experience, not implementation • ASK is evolving quickly + adding new capabilities
  26. 26. Best Practices
  27. 27. Making it sound easy A person can absorb and process a lot more written information than audio information. Instructions that makes sense in an average web page dialog are probably going to sound intimidating in a spoken command. Follow these best practices for better results. Image of Picture of an Ear
  28. 28. 1. Make it clear the user needs to respond Not so good Trivia challenge: Trivia Challenge. You can choose from the following categories: 80’s Pop Songs, Potent Potables, or European History.
  29. 29. 1. Make it clear the user needs to respond Better Trivia challenge: Trivia Challenge. Here are your categories: 80’s Pop Songs, Potent Potables, or European History. Which one do you want?
  30. 30. 1. Make it clear the user needs to respond Best practice If you expect the user to say something, make sure you end your prompt with a question.
  31. 31. 2. Don’t assume the user knows what to do Not so good Car Fu: Car Fu.
  32. 32. 2. Don’t assume the user knows what to do Better Car Fu: Car Fu. You can ask to get a ride or request a fare estimate. Which will it be? User: Get a ride. Car Fu: Sending your request. A mobile alert on your cell phone will let you know when your car arrives.
  33. 33. 2. Don’t assume the user knows what to do Best practice When launching a skill or finishing an interaction, always suggest what the user can do next.
  34. 34. 3. Present the options clearly Not so good Food Taxi: Would you like french fries or a salad? User: Yes
  35. 35. 3. Present the options clearly Better Food Taxi: Which side would you like: French fries or a salad? User: Salad.
  36. 36. 3. Present the options clearly Best practice Either/or questions must be stated explicitly, lest it be interpreted as a yes/no question.
  37. 37. 4. Keep it brief Not so good Astrology Daily: There are 12 Zodiac signs that I can give you a horoscope for. Please tell which one you’d like. Image Here
  38. 38. 4. Keep it brief Better Astrology Daily: Get the Horoscope for which sign? Image Here
  39. 39. 4. Keep it brief Best practice Use fewer words than you might on your website. Image Here
  40. 40. 5. Avoid verbose choices Not so good Dairy Shack: What flavor do you want? For chocolate, say Chocolate. For vanilla, say Vanilla. Or for strawberry, say Strawberry. Image Here
  41. 41. 5. Avoid verbose choices Better Dairy Shack: Which flavor would you like? You can say Chocolate, Vanilla, or Strawberry.
  42. 42. 5. Avoid verbose choices Best practice Do not present more than three choices and avoid repetitive wording.
  43. 43. 6. Avoid crowding options Not so good Score Keeper: Score Keeper. You can give a player points, add a new player, ask for the score, start a new game, clear all players, or stop if you’re done. Now, what would you like? User: What was that again? Image Here
  44. 44. 6. Avoid crowding options Better Score Keeper: Score Keeper. You can give a player points, ask for the score, or say Help. What would you like? User: Help. Score Keeper: Here are some things you can say: add John, give John 5 points, tell me the score, start a new game, or reset all players. You can also say stop if you’re done. So, how can I help?
  45. 45. 6. Avoid crowding options Best practice Present the 2-3 choices that users will pick 80% of the time and expose the rest through ‘Help’.
  46. 46. 7. Get one piece of information at a time and use it Not so good Joke Bank: Would you like to hear a joke? User: Yes. Joke Bank: What’s black, white, and red all over? An embarrassed skunk. “One, Two, Five!” “Three, sir! Three!”
  47. 47. 7. Get one piece of information at a time and use it Better Joke Bank: What’s black, white, and red all over? An embarrassed skunk.
  48. 48. 7. Get one piece of information at a time and use it Best practice Make smart assumptions where possible. Avoid asking non-essential questions.
  49. 49. 8. Finally, make the user comfortable Best practice • Let users know they’re in the right place. • Present usable chunks of information, not overload. • Take care of technical and legal details when enabling the skill, not in the audio. • Don’t blame the user.
  50. 50. Best practices 1. Make it clear the user needs to respond 2. Don’t assume the user knows what to do 3. Present the options clearly 4. Keep it brief 5. Avoid verbose choices 6. Avoid crowding options 7. Get information and use it 8. Make users comfortable
  51. 51. @MikeFHines developer.amazon.com/blog Learn more: http://developer.amazon.com/ASK http://capitalone.com How did we do: Remember to complete the evaluation! Follow us:
  52. 52. Thank you! http://bit.ly/appstoregiveaway

×