1. Building Apps for Alexa
Voice Design & Alexa Skill 101
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
2. Alex Nicol
Software Engineer / Conversational system specialist
• EDF Energy R&D UK Centre
• Bounce Technologies
@nicol_alexandre
webnicol.fr
GitHub: alexandrenicol
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
3. A bit of context
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
10. Having a conversation with an object,
science-fiction made us dream about it,
but today this experience still isn’t natural.
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
12. Paul Grice’s maxims of conversation - 1975
• Quality
• Quantity
• Relation
• Manner
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
13. Amazon’s Alexa Design Guide
• Be adaptable: Let users speak in their own words.
• Be personal: Individualise your entire interaction.
• Be available: Collapse your menus; make all options top-level.
• Be relatable: Talk with them, not at them.
Establish and maintain trust
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
14. Confirmation and understanding
• Explicit vs implicit vs hybrid confirmation
• Speech recognition
• User input
• Context / Multiturn conversation
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
15. Multimodality
• How to solve problems around cognitive
overload, confirmation and speech recognition
error?
• Voice only VS Voice first
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
16. // Key findings:
I. Is it AI?
II. Text ≠ Voice
III. Platforms matter
IV. UX First
V. Discoverability
VI. Not everything can/should be bot-ed
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
17. // Interested in voice design?
I. Book: Conversational Design (Erika Hall, A Book
Apart)
II. Book: Designing Voice User Interfaces: Principles of
Conversational Experiences (Cathy Pearl, O’Reilly)
III. https://voiceprinciples.com/ (Ben Sauer)
Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
19. Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
What you need to start:
• An Amazon (developer) account
• That’s it!
• Recommended stack is AWS Lambda / Node.js
21. Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
Vocabulary
• Lambda
• Skill
• Invocation name
• Intent
• Entity/Slot
• Utterances
22. Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
Some specific ASK features
• [Speech] SSML & Speechcons
• [Dialog] Slot Filling & Auto-delegation
• [Logic] Persistence & Context
• [Display] APL
23. Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
Let’s Build a BWDM skill!
• Build (the dialog model)
• Code (the logic)
• Test (the skill)
24. Process to create a “feature”
• Create the Intent
• Add some utterances/slots
• Code the intent handler (canHandle / handle)
• Register the intent handler
25. Typical Intent Handler
canHandle is triggered for all intent handlers until one return true;
For the intent handler that has been selected, its handle function is triggered.
27. Context
• How do you know what they talk about when they just say
something about that?
28. Context
• How do you know what they talk about when they just say
something about that?
• Yes, context is important! Use persistence in canHandle to
register multiple handler for the same intent, based on the
context.
29. Brighton Web Development Meetup | August 2019 | Alex Nicol @nicol_alexandre
Build!
• Free Echo Dot
• Free monthly $100 AWS credits
I’m a software engineer, specialised in conversational systems.
I have got two jobs: I work for the EDF Energy R&D UK Centre since 2015, in the digital innovation team. We do research and development on innovative technologies like blockchain, augmented and virtual reality, machine learning, and of course chatbots and voice.
And I have recently joined Bounce Technologies, which is a software development studio, that aims to build technology that people love, and which has a positive impact on the world. Our core focus is helping early to mid-stage companies bring their digital products to market, while also creating our own products.
23rd June 2015
In 2016, following our discovery of the Amazon Echo and Alexa, we decided to invest a bit of our time to create a prototype.
That prototype showed how an Alexa skill - a voice application for Alexa - could allow our customers to access their account information simply using their voice and their ears.
That prototype very quickly became a project of its own, supported by the customer business and by Amazon.
Our goal was not to create a prototype anymore but to create the first EDF Energy vocal application and to be ready for the launch of the Amazon Echo in the UK, scheduled a few months later.
On the 28th of Septembre 2016, Amazon released the Echo family in the UK, and EDF Energy was one of the 12 launch partners, alongside companies such as Uber, Sky Sports or The Guardian.
It was a massive success for us. In only a few months, we managed to familiarise ourselves with this new technology, and to roll it out to our customers.
Following this success, we carried on exploring voice application, but also other conversational systems like chatbots.
EDF Energy skill we made in 2016 was good, but it wasn't perfect. We made mistakes because this technology was very young; because we knew very little about conversational principles, and we knew even less when this conversation happens between a system and a human. And I’ll come back on that later.
Today, the Echo Family is now a lot bigger, going from the more traditional smart speaker to the devices with screens, or even alexa-enabled security cameras…
And it’s not just about Alexa. Although Amazon has made smart speakers mainstream, there’s now voice assistant everywhere, starting on your smartphone, maybe on your telly, or even in your car. You might even have a voice-enabled microwave!
Anyway, the stats don’t really lies, there’s more and more sales of voice-enabled devices, and more and more voice assistants in use.
Right, let’s now talk about Voice Design
One of the main advantages that people have talked about with voice systems is how natural it can feel. But for the most of us, it isn’t natural. Even though we dreamt about it since Science-Fiction movies exist, talking to black boxes is still something very new for us. and it takes some time to adapt to these machine. It’s a new kind of conversation.
One of the difficult aspect of voice system, is for most systems, the lack of screen!
There's something called Cognitive Load, which is roughly the amount of effort your memory need to retain some information. When using a voice-only device, this load can quickly become an overload.
Let's take an example. Imagine asking for the best restaurant in London, and Alexa replies to you with a list of restaurants rated 4 or 5, including their name, their address, the type of food they serve and an indicator of price.
I can guarantee you that after the fifth restaurant, you'll have forgotten everything about the third one on the list.
Usually we say that one response should not contain more than 5 different bit of information.
And there’s even a more radical approach which is known as the “one-breath” rule. One should be able to say the response out loud in one single breath.
Working with a voice-only device means that the information you are going to give is only going to be alive in your users' memory. Which means that the answer you provide should be as short and as relevant as possible. And that it should also be simple for your user to get that information again in case they haven't retained it the first time.
If you haven’t heard about these, Paul Grice, a British philosopher of language, wrote the maxims of conversation. And although he did not have in mind automated systems, these four maxims works very well when you need to design any kind of dialogue.
One of our research project last year was to look at how we can solve problems around cognitive overload, confirmation and speech recognition error?
We’ve explored multiple solutions, some using an external device, and some using devices such as the Echo Show, and the Alexa Presentation Language, I’ll talk a bit more about that during the code part.
AI and Bots is different, chatbots are a communication channel, and interface. Which can deliver a product powered by AI. Which mean, you don’t need to be a deep learning expert to create chatbots. There’s loads of existing tools allowing you to kickstart your project.
Text and voice, I said it, are different. They both have pros and cons, are the use cases you can apply these technologies to are also very different. Think about time for example, voice use cases are usually beginning and ending in a short period. Designing dialogs for these two technologies is different.
I’ve talk quite a lot about the platforms, but keep it in mind, it’s important that it matches the task and your user expectations.
UX First. Always think about the customers first. They are the ones that will be interacting with your systems. You can spend month benchmarking natural language processing engine, if your conversation and dialog are not well designed, it won’t matter what NLP is behind, your customers will not even get far enough to see the different.
Discoverability. Conversational experiences are very new for our customers. And it’s quite unlikely that your customers will be able to find your bot naturally, so make your are using other channels to push your users into this new experience. And once they have found your bot, make sure that is really clear what they are able to do with this bot, help them familiarise with the bot, don’t let me them explore by themselves.
Finally, II don’t think everything can be transformed into bots. Even though it might be technically possible, it does not mean we should do it this way. A conversational system may not be the best interface to your product. I’ve got a video to highlight this thought.
If you don’t have an AWS Account, Amazon creates a specific one just for your skill, but it’s limited, so if you’re comfortable with AWS, I’d recommend using your own account.
Amazon provides SDK for Node.js, Java and Python. But realistically, you can use whatever language you want, as long as you can read the request from amazon and reply to the expected format. I’ve seen people using Ruby for example. I would not recommend using that. I would recommend sticking with AWS Lambda, Node, Python or Java, that’s up to you.